设为首页收藏本站

EPS数据狗论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 1586|回复: 0

[其他] 流式生成Excel文件

[复制链接]

17

主题

117

金钱

199

积分

入门用户

发表于 2019-9-25 14:56:39 | 显示全部楼层 |阅读模式

当我们要导出数据库数据到Excel文件时,如果数据量特别大,那么可能需要耗费较多内存造成OOM。即使没有OOM,也有可能因为生成Excel文件的时间太久导致请求超时。这时候就需要POI的SXSSF(org.apache.poi.xssf.streaming)功能了。

Excel两种格式
Excel 97(-2007) file format
Excel 2007 OOXML (.xlsx) file format

HSSF is the POI Project's pure Java implementation of the Excel '97(-2007) file format. XSSF is the POI Project's pure Java implementation of the Excel 2007 OOXML (.xlsx) file format.
HSSF and XSSF provides ways to read spreadsheets create, modify, read and write XLS spreadsheets. They provide:

low level structures for those with special needs
an eventmodel api for efficient read-only access
a full usermodel api for creating, reading and modifying XLS files

Since 3.8-beta3, POI provides a low-memory footprint SXSSF API built on top of XSSF.
SXSSF is an API-compatible streaming extension of XSSF to be used when very large spreadsheets have to be produced, and heap space is limited. SXSSF achieves its low memory footprint by limiting access to the rows that are within a sliding window, while XSSF gives access to all rows in the document. Older rows that are no longer in the window become inaccessible, as they are written to the disk.
In auto-flush mode the size of the access window can be specified, to hold a certain number of rows in memory. When that value is reached, the creation of an additional row causes the row with the lowest index to to be removed from the access window and written to disk. Or, the window size can be set to grow dynamically; it can be trimmed periodically by an explicit call to flushRows(int keepRows) as needed.
Due to the streaming nature of the implementation, there are the following limitations when compared to XSSF:

Only a limited number of rows are accessible at a point in time.
Sheet.clone() is not supported.
Formula evaluation is not supported


SXSSF
SXSSF是如何减小内存消耗的呢?它通过将数据写到临时文件来减少内存使用,降低发生OOM错误的概率。
  1. // turn off auto-flushing and accumulate all rows in memory
  2. SXSSFWorkbook wb = new SXSSFWorkbook(-1);
复制代码

你也可以在构造方法里,指定-1来关闭自动写入数据到文件,将所有数据内容保持在内存里。

虽然这里处理了内存OOM的问题,但是还是必须将全部数据写到一个临时文件之后才能响应请求,请求超时的问题没有解决。


流式生成
Excel 2007 OOXML (.xlsx) 文件格式其实本质上是一个zip文件,我们可以把.xlsx文件后缀名改为.zip,然后解压:
  1. $ mv output.xlsx output.zip
  2. $ unzip output.zip
  3. $ tree output/
  4. output/
  5. ├── [Content_Types].xml
  6. ├── _rels
  7. ├── docProps
  8. │   ├── app.xml
  9. │   └── core.xml
  10. └── xl
  11.     ├── _rels
  12.     │   └── workbook.xml.rels
  13.     ├── sharedStrings.xml
  14.     ├── styles.xml
  15.     ├── workbook.xml
  16.     └── worksheets
  17.         └── sheet1.xml

  18. 5 directories, 8 files
复制代码

我们可以看到这个Excel文件解压后包含了上面那些文件,其中styles是我们定义的样式格式(包括字体、文字大小、颜色、居中等属性),worksheets目录下是我们的数据内容。
通过具体分析数据格式,我们可以自己控制xlsx文件的写入过程,将数据直接写到响应流上而非临时文件就可以完美解决请求超时的问题。

示例代码:
  1. XSSFWorkbook wb = new XSSFWorkbook()
  2. XSSFCellStyle headerStyle = genHeaderStyle(wb)
  3. sheets.each { sheet ->
  4.     def xssfSheet = wb.createSheet(sheet.name)
  5.     sheet.setXSSFSheet(xssfSheet)
  6.     sheet.setHeaderStyle(headerStyle)
  7. }
  8. File template = genTemplateFile(wb)
  9. ZipOutputStream zos = new ZipOutputStream(responseStream);
  10. ZipFile templateZip = new ZipFile(template);
  11. Enumeration<ZipEntry> templateEntries = templateZip.entries();
  12. try {
  13.   while (templateEntries.hasMoreElements()) {
  14.     // copy all template content to the ZipOutputStream zos
  15.     // except the sheet itself
  16.   }
  17.   zos.putNextEntry(new ZipEntry(sheetName)); // now the sheet
  18.   OutputStreamWriter sheetOut = new OutputStreamWriter(zos, "UTF-8");
  19.   try {
  20.     sheetOut.write("<?xml version="1.0" encoding="UTF-8"?>");
  21.     sheetOut.write("<worksheet><sheetData>");
  22.     // write the content – rows and cells
  23.     sheetOut.write("</sheetData></worksheet>");
  24.   } finally { sheetOut.close(); }
  25. } finally { zos.close(); }
复制代码

其中,template包含了一些索引信息,比如建了哪些样式、几个sheet等,这些信息是放到ZIP文件的最前面的,最后才是sheet内容数据。
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

站长推荐上一条 /1 下一条

客服中心
关闭
在线时间:
周一~周五
8:30-17:30
QQ群:
653541906
联系电话:
010-85786021-8017
在线咨询
客服中心

意见反馈|网站地图|手机版|小黑屋|EPS数据狗论坛 ( 京ICP备09019565号-3 )   

Powered by BFIT! X3.4

© 2008-2028 BFIT Inc.

快速回复 返回顶部 返回列表