Spark+Scala+Graphx: Analyzing the Bitcoin Transaction Graph

The hadoopcryptoledger library provides now an example how you can generate a Bitcoin Transaction Graph using the Big Data graph analysis technologies Spark+Scala+Graphx. Basically it demonstrates how to read the Bitcoin Blockchain from HDFS, transform it into a graph with Bitcoin addresses as vertices and transactions between them as edges. The example returns the 5 top bitcoin addresses having the most input transactions. This could indicate that they belong to Mixing services that try to obfuscate transactions between two addresses. The graph exemplified in the following figure showing four vertices with transactions between them:

transactiongraph

Of course this is just one example. You can think about numerous of other analysis related to this graph using algorithms such as strongly connected components or PageRank. Particularly if you connect it with other data that you collect related to the blockchain. You can also use this graph to do visual analytics on it.

In the coming weeks, further extensions are planned to be published:

  • Some common analytics pattern to analyze the Bitcoin economy

  • Some technical patterns, such as Bitcoin block validation

  • A flume source for receiving new Bitcoin blocks including Economic and technical consensus (storing and accessing it in the Hadoop ecosystem, e.g. in Hbase)

  • Adding support for more crypto ledgers, such as Ethereum