Templates, low footprint mode, improved integration with Spark for the HadoopOffice library for reading/writing Excel files on Big data platforms

Although it seems to be that it was only a small improvement, version 1.0.4 of the HadoopOffice library has a lot of new features for reading/writing Excel files:

  • Templates, so you can define complex documents with diagrams or other features in MSExcel and fill it with data or formulas from your Big Data platform in Hadoop, Spark & Co
  • Low footprint mode – this mode leverages the Apache POI event and streaming APIs. It saves CPU and memory consumption significantly at the expense of certain features (e.g. evaluation of formulas which is only supported in standard mode). This mode supports reading old MS Excel (.xls)/new MS Excel (.xlsx) and writing new MS Excel (.xlsx) documents
  • New features in the Spark 2 datasource:
    • Inferring of the DataFrame schema consisting of simple Spark SQL DataTypes (Boolean, Date, Byte, Short, Integer, Long, Decimal, String) based on the data in the Excel file
    • Improved writing of a DataFrame based on a schema with simpel Spark SQL DataTypes
    • Interpreting the first row of an Excel file as column names for the DataFrame for reading (“header”)
    • Writing column names of a DataFrame as the first row of an Excel file (“header”)
    • Support for Spark 2.0.1, 2.1, 2.2

 

Of course still other features are still usable, such as metadata reading/writing, encryption/decryption or linked workbooks, support for Hadoop MapReduce, support for Spark2 datasources and  support for Spark 1.

 

What is next?

  • Support for Apache Flink for reading/writing Excel files
  • Support for Apache Hive (Hive SerDe) for reading/writing Excel files
  • Support for digitally signing/verifying signature(s) of Excel files
  • Support for reading access files
  • … many more
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s