I am trying to read a large excel file , around 700000 records, via Apache POI.
It takes lots of time and resources to read this kind of large excel file.
Is it any effective approach to handle this kind of situation?
The original code will take around 60 seconds to read:
package tw.com.abc.dcb.service; import com.monitorjbl.xlsx.StreamingReader; import lombok.extern.slf4j.Slf4j; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.springframework.stereotype.Component; import tw.com.abc.dcb.vo.DcbRecord; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.Iterator; import java.util.List; @Component @Slf4j public class ExcelReaderService { /** * Read Excel file * * @param xlsFile excel file path * @return excel data * @throws IOException fail to read excel file */ public List<DcbRecord> readExcel(String xlsFile) throws IOException { List<DcbRecord> result = new ArrayList<>(); int rowCount = 1; try (Workbook workbook = WorkbookFactory.create(new File(xlsFile))) { Sheet sheet = workbook.getSheetAt(0); Iterator<Row> rowIterator = sheet.iterator(); while (rowIterator.hasNext()) { if (rowCount == 1) { // skip header rowIterator.next(); rowCount++; } else { Row row = rowIterator.next(); DcbRecord record = DcbRecord.builder() .company(row.getCell(0).getStringCellValue().trim()) .paymentDescription(row.getCell(1).getStringCellValue().trim()) .amount((int) row.getCell(2).getNumericCellValue()) .build(); result.add(record); rowCount++; } } } catch (IOException e) { throw new IOException("fail to read excel file : " + e.getMessage(), e); } return result; } }
How-To
The original approach to read excel file is not very memory efficient.
We can make good use of Excel Streaming Reader to import this disadvantage.
The new approach's read time will improve to 20 seconds.
Sample code:
package tw.com.abc.dcb.service; import com.monitorjbl.xlsx.StreamingReader; import lombok.extern.slf4j.Slf4j; import org.apache.poi.ss.usermodel.Row; import org.apache.poi.ss.usermodel.Sheet; import org.apache.poi.ss.usermodel.Workbook; import org.apache.poi.ss.usermodel.WorkbookFactory; import org.springframework.stereotype.Component; import tw.com.abc.dcb.vo.DcbRecord; import java.io.File; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStream; import java.util.ArrayList; import java.util.Iterator; import java.util.List; @Component @Slf4j public class ExcelReaderService { /** * Read large excel file * * @param xlsxFile large excel file path * @return excel data * @throws IOException fail to read excel file */ public List<DcbRecord> readLargeExcel(String xlsxFile) throws IOException { List<DcbRecord> result = new ArrayList<>(); int rowCount = 1; try (InputStream is = new FileInputStream(new File(xlsxFile)); Workbook workbook = StreamingReader.builder().rowCacheSize(100).bufferSize(4096).open(is)) { for (Sheet sheet : workbook) { for (Row row : sheet) { if (rowCount == 1) { // skip header rowCount++; } else { DcbRecord record = DcbRecord.builder() .company(row.getCell(0).getStringCellValue().trim()) .paymentDescription(row.getCell(1).getStringCellValue().trim()) .amount((int) row.getCell(2).getNumericCellValue()) .build(); result.add(record); } } } } catch (IOException e) { throw new IOException("fail to read excel file : " + e.getMessage(), e); } return result; } }
Reference
[1] https://github.com/monitorjbl/excel-streaming-reader