Ethereum & Analytics: Explore the blockchain using Hadoop, Hive, Flink and Spark

HadoopCryptoLedger release 1.1.0 added support for another well-known cryptocurrency: Ethereum and its Altcoins. Of course similar to its Bitcoin & Altcoin support you can use the library with many different frameworks in the Hadoop ecosystem:

Furthermore, you can use it with various other frameworks due to its implementation as a FileInputFormat for Big Data platforms.

Analysing blockchains, such as Ethereum or Bitcoin, using Big Data platforms brings you several advantages. You can analyse its liquidity, how many people are using it or if it is a valid blockchain at all. Furthermore, you can cross-integrate any other data source, such as currency rates, news or events from Internet of Things platforms (e.g. payments using hardware wallets). Governments can integrate it with various other sources to investigate criminal activity.

All and all it enables transparency for everyone.

Advertisements

Big Data Analytics on Bitcoin‘s first Altcoin: NameCoin

This blog post is about analyzing the Namecoin Blockchain using different Big Data technologies based on the HadoopCryptoLedger library. Currently, this library enables you to analyze the Bitcoin blockchain and Altcoins based on Bitcoin (incl. segregated witness), such as Namecoin, Litecoin, Zcash etc., on Big Data platforms, such as Hadoop, Hive, Flink and Spark. A support for Ethereum is planned for the near future.

However, back to Namecoin and why it is interesting:

  • It was one of the first Altcoin based on Bitcoin
  • It supports a decentralized domain name and identity system based on blockchain technology – no central actor can censor it
  • It is one of the first systems that supports merged mining/AuxPOW. Merged mining enables normal Bitcoin mining pools to mine Altcoins (such as Namecoin) without any additional effort while mining normal Bitcoins. Hence, they can make more revenue and at the same time they support Altcoins, which would be without Bitcoin mining pool support much weaker or not existing. It can be expected that many Altcoins based on Bitcoin will switch to it eventually.

HadoopCryptoLedger supported already from the beginning Altcoins based on Bitcoin. Usually it is expected that Big Data applications based on the HadoopCryptoLedger library implement the analytics they are interested in. However, sometimes we add specific functionality to make it much easier, for instance we provide the Hive UDFs to make certain analysis easier in Hive or we provide certain utility functions for MapReduce, Flink and Spark to make some things which require more detailed know how more easily available to beginners in block chain technology.

We have done this also for Namecoin by providing functionality to:

  • determine the type of name operation (new, firstupdate, update)
  • get more information about available domains, subdomains, ip addresses, identities

With this analysis is rather easy, because Namecoin requires you to update the information at least every 35,999 blocks (roughly between 200 to 250 days) or information are considered as expired. You just have to take simply the most recent information for a given domain or identity – it contains a full update of all information and there is no need to merge it with information from previous transactions. However, sometimes additional information are provided in so-called delegate entries and in this case you need to combine information.

Finally, we provided additional support to read blockchains based on Bitcoin and the merged mining/AuxPOW specification. You can enable it with a single configuration option.
Of course, we provide examples, e.g. for Spark and Hive. Nevertheless all major Big Data platforms (including Flink) are supported.
Find here what we plan next – amongst others:

  • Support analytics on Ethereum
  • Run analytics based on HadoopCryptoLedger in the cloud (e.g. Amazon EMR and/or Microsoft Azure) and provide real time aggregations as a simple HTML page
  • … much more

Let us know via Github issues what you are interested in!

Leverage the Power of Apache Flink to analyze the Bitcoin Blockchain

The hadoopcryptoledger library has been enhanced with a datasource for Apache Flink. This means you can use the Big Data processing framework Apache Flink to analyze the Bitcoin Blockchain.

It also includes an example that counts the total number of transactions in the Bitcoin blockchain. Of course given the power of Apache Flink you can think about more complex analysis applications, such as:

  • Graph analysis on the Bitcoin transaction graph, e.g. to identify clusters or connected components to find out close interactions between Bitcoin addresses
  • Trace money flows through the Bitcoin network
  • Predict power of mining pools, difficulty of block processing, impact of changes on the Bitcoin protocol or rules
  • Join it with other data to make predictions on prices, criminal activity and economics

In the future, we want to work on the following things :

  • Support for other cryptoledgers, e.g. Ethereum
  • Provide examples for analyzing other currencies based on the Bitcoin Blockchain, such as Litecoin and Namecoin
  • A Flume data source to stream Bitcoin Blockchain data directly into your cluster
  • Support selected blockchains provided via the Hyperledger Framework