Big Data Lab in the Cloud with Hadoop+Spark+R+Python

This is an update of the second big data lab for the cloud. Similar to previous versions, this document described how you can create a Big Data Lab in the cloud on Amazon EMR.

Besides some major upgrades to the newest Amazon Hadoop AMI (3.6.0) Spark (1.3.0) and R, it includes now also the possibility to use Python in the browser. There, you have the same functionality as in R. This means you can use Hadoop M/R, Spark and SparkSQL in Python from the browser. Similar to R, Python has gained attention by data scientists.

You can find the newest version here.

In future blog posts, I will write how you can use some of the open datasets on the Internet in the Big Data lab.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s