HadoopCryptoLedger library a vision for the coming Years

The first commit of the HadoopCryptoLedger has been on 26th March of 2016. Since then a lot of new functionality has been added, such as support for major Big Data platforms including Hive / Flink / Spark. Furthermore, besides Bitcoin, Altcoins based on Bitcoin (e.g. Namecoin, Litecoin or Bitcoin Cash) and Ethereum (including Altcoins) have been implemented for analytics.

Since the library integrates seamlessly with Big Data platforms you can join blockchain data with any other data you may have, such as currency exchange rates from various platforms.

Blockchain analytics is getting more and more attention by industry, policy makers and research. This is not surprising, because one of the key element is that blockchains should be transparent for everyone – even for the normal citizen.

Given that background I foresee two major directions for 2018 and the following years:

  • Streaming: Streaming has become a hot topic in Big Data platforms, virtually all Big Data platforms, such as Apache Flink or Apache Spark, move towards streaming has the default way to process streaming and non-streaming data in general. The idea here is to stream blockchain data directly from blockchain networks, such as Bitocin and Ethereum, into your Big Data platform for direct analysis. This would also offer the possibility of some further interesting analytics, such as how many bad blocks/transactions are spammed into the network, when did forks happen, how many forks/subnetworks are established, what is the percentage of nodes piggybacking on the network (cf. merged mining for Bitcoin) and many other interesting data based on the blockchain network metadata.

  • Business & Conceptual Aspects of Blockchain Analytics: Surprisingly one finds very little research and investigations on business and conceptual aspects of blockhain (cf. here), especially analytics. Most of them describe only technical concepts of implementing block chain technology (see here). The idea here is to establish some basic framework, such as interesting metrics and how to efficiently calculate them, finding interesting patterns using machine learning algorithm or to derive them by joining other datasets (e.g. currency exchange rates). Another aspect is security and validity of analysis results. Of course this theoretical/conceptual work needs to be validated with practical investigations using the HadoopCryptoLedger library.

Some other topics supporting the aforementioned two topics are:

  • Contract Analytics: Virtually all blockchain technologies allow more or less powerful definition of contracts. The goal here is to find out 1) how can express contracts formally and find flaws in their definition 2) find evidence for these flaws actually been exploited/abused in the blockchain data. Furthermore, this will also enable linking contract data with other datasets.

  • Cloud Deployment: We want to create a cloud deployment in docker container format that is open to everyone, so everyone can deploy the analytics chain including download of the blockchain data within their preferred cloud solution. Of course, we would use this also to do more advanced integration tests of our analytic solution and showcase some of the aforementioned business analytics concepts.

  • Quality Assurance: Also 2018 will be characterized by lifting up quality assurance – increasing unit test coverage is a key element. This also includes getting rid of legacy stuff, such as supporting already outdated platform versions.

  • More Currencies: Although we support already a wide range of currencies by offering support for Bitcoin & Altcoins (Namecoin, Litecoin, Bitcoin Cash and many more) as well as Ethereum and Altcoins (Ethereum Classic etc.), there are further interesting blockchain concepts based payment networks/practical byzantine fault tolerance, proof-of-burn and direct acyclic graph based blockchains that are worth to analyse.

  • New research: QuantumChains (not to be confused with Quantum Money). QuantumChains is a rather new concept to explore quantum computing for representing blockchains. The advanced would be not only that those may get rid of some current issues with blockchains (proof of work, instant payment, large storage needs), but also make blockchains easier and faster to analyze for anyone – not only the biggest player with all the computing power. The How? may not be answered in 2018, but we hope to have some interesting conceptual Gedankenexperimente (though experiments) on how this could really work.

This is a pretty ambitious agenda for 2018, but it should be also seen that it will be further explored in the coming years.

Advertisements

Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark

HadoopCryptoLedger release 1.1.0 added support for another well-known cryptocurrency: Ethereum and its Altcoins. Of course similar to its Bitcoin & Altcoin support you can use the library with many different frameworks in the Hadoop ecosystem:

Furthermore, you can use it with various other frameworks due to its implementation as a FileInputFormat for Big Data platforms.

Analysing blockchains, such as Ethereum or Bitcoin, using Big Data platforms brings you several advantages. You can analyse its liquidity, how many people are using it or if it is a valid blockchain at all. Furthermore, you can cross-integrate any other data source, such as currency rates, news or events from Internet of Things platforms (e.g. payments using hardware wallets). Governments can integrate it with various other sources to investigate criminal activity.

All and all it enables transparency for everyone.