2019/11/12

[Apache POI] How to read large excel file

Problem
I am trying to read a large excel file , around 700000 records, via Apache POI.
It takes  lots of time and resources to read this kind of large excel file.
Is it any effective approach to handle this kind of situation?

The original code will take around 60 seconds to read:
package tw.com.abc.dcb.service;

import com.monitorjbl.xlsx.StreamingReader;
import lombok.extern.slf4j.Slf4j;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.springframework.stereotype.Component;
import tw.com.abc.dcb.vo.DcbRecord;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

@Component
@Slf4j
public class ExcelReaderService {

    /**
     * Read Excel file
     *
     * @param xlsFile excel file path
     * @return excel data
     * @throws IOException fail to read excel file
     */
    public List<DcbRecord> readExcel(String xlsFile) throws IOException {
        List<DcbRecord> result = new ArrayList<>();
        int rowCount = 1;

        try (Workbook workbook = WorkbookFactory.create(new File(xlsFile))) {
            Sheet sheet = workbook.getSheetAt(0);

            Iterator<Row> rowIterator = sheet.iterator();
            while (rowIterator.hasNext()) {
                if (rowCount == 1) {
                    // skip header
                    rowIterator.next();
                    rowCount++;
                } else {
                    Row row = rowIterator.next();

                    DcbRecord record = DcbRecord.builder()
                            .company(row.getCell(0).getStringCellValue().trim())
                            .paymentDescription(row.getCell(1).getStringCellValue().trim())
                            .amount((int) row.getCell(2).getNumericCellValue())
                            .build();

                    result.add(record);
                    rowCount++;
                }
            }
        } catch (IOException e) {
            throw new IOException("fail to read excel file : " + e.getMessage(), e);
        }
        return result;
    }

}



How-To
The original approach to read excel file is not very memory efficient.
We can make good use of Excel Streaming Reader to import this disadvantage.
The new approach's read time will improve to 20 seconds.

Sample code:
package tw.com.abc.dcb.service;

import com.monitorjbl.xlsx.StreamingReader;
import lombok.extern.slf4j.Slf4j;
import org.apache.poi.ss.usermodel.Row;
import org.apache.poi.ss.usermodel.Sheet;
import org.apache.poi.ss.usermodel.Workbook;
import org.apache.poi.ss.usermodel.WorkbookFactory;
import org.springframework.stereotype.Component;
import tw.com.abc.dcb.vo.DcbRecord;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

@Component
@Slf4j
public class ExcelReaderService {

    /**
     * Read large excel file
     *
     * @param xlsxFile large excel file path
     * @return excel data
     * @throws IOException fail to read excel file
     */
    public List<DcbRecord> readLargeExcel(String xlsxFile) throws IOException {
        List<DcbRecord> result = new ArrayList<>();
        int rowCount = 1;
        try (InputStream is = new FileInputStream(new File(xlsxFile));
             Workbook workbook = StreamingReader.builder().rowCacheSize(100).bufferSize(4096).open(is)) {
            for (Sheet sheet : workbook) {
                for (Row row : sheet) {
                    if (rowCount == 1) {
                        // skip header
                        rowCount++;
                    } else {
                        DcbRecord record = DcbRecord.builder()
                                .company(row.getCell(0).getStringCellValue().trim())
                                .paymentDescription(row.getCell(1).getStringCellValue().trim())
                                .amount((int) row.getCell(2).getNumericCellValue())
                                .build();

                        result.add(record);
                    }
                }
            }
        } catch (IOException e) {
            throw new IOException("fail to read excel file : " + e.getMessage(), e);
        }
        return result;
    }

}




Reference

[1] https://github.com/monitorjbl/excel-streaming-reader

No comments:

Post a Comment