HadoopOffice – A Vision for the coming Years

HadoopOffice is already since more than a year available (first commit: 16.10.2016). Currently it supports Excel formats based on the Apache POI parsers/writers. Meanwhile a lot of functionality has been added, such as:

  • Support for .xlsx and .xls formats – reading and writing
  • Encryption/Decryption Support
  • Support for Hadoop mapred.* and mapreduce.* APIs
  • Support for Spark 1.x (via mapreduce.*) and Spark 2.x (via data source APIs)
  • Low footprint mode to use less CPU and memory resources to parse and write Excel documents
  • Template support – add complex diagrams and other functionality in your Excel documents without coding

Within 2018 and the coming years we want to go beyond this functionality:

  • Add further security functionality: Signing and verification of signatures of new Excel files (in XML format via XML signature) / Store credentials for encryption, decryption, signing in keystores
  • Apache Hive Support
  • Apache Flink Support
  • Add support for reading/writing Access based on the Jackcess library including encryption/decryption support
  • Add support for dbase formats
  • Develop a new spreadsheet format suitable for the Big Data world: There is currently a significant gap in the Big Data world. There are formats optimized for data exchange, such as Apache Avro, and for large scale analytics queries, such as Apache ORC or Apache Parquet. These formats have been proven as very suitable in the Big Data world. However, they only store data, but not formulas. This means every time simple data calculation need to be done they have to be done in dedicated ETL/batch processes varying on each cluster or software instance. This makes it very limiting to exchange data, to determine how data was calculated, compare calculations or flexible recalculate data – one of the key advantages of Spreadsheet formats, such as Excel. However, Excel is not designed for Big Data processing. Hence, the goal is to find a SpreadSheet format suitable for Big Data processing and as flexible as Excel/LibreOffice Calc. Finally,  a streaming SpreadSheet format should be supported.


HadoopOffice aims at supporting legacy office formats (Excel, Access etc.) in a secure manner on Big Data platforms but also paving the way for a new spreadsheet format suitable for the Big Data world.


Enabling WebRTC in modern Java Enterprise Web Applications

I recently started a small project to create a sample enterprise Big Data web application using Spring.

You can find the source code here and a demonstration here.

One feature in this application WebRTC. I started working with WebRTC since its introduction around 2011/2012. Now, it became a W3C standard and has been implemented in nearly all popular browsers, such as Mozilla Firefox, Google Chrome or Opera. Basically it offers you secure video/voice chat, screen sharing and peer to peer data exchange for your browser. If you want to have a simple online demonstration of WebRTC in general then you can try it out here.

All major browsers support WebRTC on mobile, but also on desktop computers. Gateway software exists to connect a WebRTC client to SIP and thus the “standard” phone network. STUN and TURN server support you to correctly deal with firewalls.

You do not need any additional plugins in your browser to enable all of this. You can compare the functionality with Skype – except that it is possible in web applications without plugins. Hence, it works as well on smartphones and tablets, where you usually cannot install plugins for your browser.

WebRTC in Enteprise Applications

Communcation between people is certainly an important aspect of enterprise web applications. Hence, the WebRTC standard is interesting and relevant for them. Although WebRTC is at its core a peer to peer solution, the developer of an enterprise solution needs to provide a “signaling channel”. This channel is responsible so that the people participating in a WebRTC exchange, such as a video/voice chat, find each other and let their browsers exchange information on how they can connect directly to each other or via a gateway.

Basically, this signaling channel needs to transmit JSON objects

  1. Between all users in a conversation so they can contact each other directly
  2. Between two users so they can have a secure connection to each other.It should be noted that point 2) is also needed in a group chat, because a peer to peer connection is always established between two users. This means in a group chat consisting of three users, “user 1” has a peer to peer connection to “user 2” and another one to “user 3”. Additionally there is one between “user 2” and “user 3”. This is illustrated in the following figure.


The signaling channel does not transmit any video/voice or other data, it is just for establishing and maintaining the direct connection between two peer to peer users.

Implementing a WebRTC signaling channel in a Web Enterprise Application

Implementing a signaling channel for an enterprise application needs to take into account secure, scalable and reliable message delivery via message-oriented middleware that does not impose any additional plugins on the web browser. Basically you can implement such a channel as follows

  1. The web application sends signaling messages to the backend using the WebSocket-Protocol or fallbacks for older browser (Sock.js)
  2. The Streaming Text Oriented Messaging Protocol (STOMP) is used to send signaling messages to a topic and private queues of the users within a message-oriented middleware connected to the web application backend to ensure that messages are delivered properly.
  3. The backend is connected to a message-oriented middleware, such as RabbitMQ, JBOSS HornetQ or with any JMS-capable middleware via the Kaazing Websocket Gateway. This can be configured in a flexible manner in the example application, because we use the Spring Messaging interface.

Those technologies have been integrated in the example enterprise web application.

WebRTC: Next Generation Communication

WebRTC has other exciting use cases, such as E-Learning, E-Health, Sales Support, Customer-Relationship Management (CRM), CoBrowsing or becoming the default protocol for the Internet of Things to link people and things. It is growing more and more. A lot of startups have emerged recently and big companies are starting to support WebRTC in their communication software.