You can now analyze the Bitcoin Blockchain using Hive and the hadoopcryptoledger library with the new HiveSerde plugin.
Basically you can link any data that you loaded in Hive with Bitcoin Blockchain data. For example, you can link Blockchain data with important events in history to determine what causes Bitcoin exchange rates to increase or decrease.
The site provides several examples on how to use SQL in Hive to do calculation upon Blockchain data, such as
- Number of blocks in blockhain
- Number of transactions in the blockchain
- Total sum of all outputs of all transactions in the output
Of course, you can calculate nearly anything you can imagine using the Bitcoin Blockchain data as input. Furthermore, you can link the data with other data.
Although accessing Bitcoin blockchain data is rather fast for analytics, you can optimize your analytics by extracting often used data from the blockchain and storing them in a format optimized for analytics, such as the columnar format ORC in Hive.
The following simple example shows how you can do this. I assume that the Bitcoin Blockchain data is represented as the table “BitcoinBlockchain” and you want to copy the hashsum of each Bitcoin block, the block size and the version number in the table “BlockAnalytics” optimized for analytics:
CREATE TABLE BlockAnalytics STORED AS ORC AS SELECT hashmerkleroot, blocksize, version FROM BitcoinBlockchain;
Of course you can access the tables in Hive with analytical and visual analytic tools, such as Tableau, Matlab, SAS, R, SAP Lumira, DS3.js etc.
In the coming weeks, further extensions are planned to be published:
Some common analytics pattern to analyze the Bitcoin economy (e.g. similar to the ones shown on https://blockchain.info/)
Some technical patterns, such as Bitcoin block validation
A flume source for receiving new Bitcoin blocks including Economic and technical consensus (storing and accessing it in the Hadoop ecosystem, e.g. in Hbase)
Adding support for more crypto ledgers, such as Ethereum