Creating a Big Data lab in the Cloud using Amazon EMR

This first blog post is about creating your own Big Data lab in the Cloud using Amazon EMR. Follow my instructions here.

These instructions allow you within 15 minutes the following:

  • You can use the analytics language R in a browser to access the full functionality of Hadoop/Spark, Hive/Shark (data warehouse), Rhipe (MapReduce for R), RMR (Map Reduce for R)
  • Leverage the unlimited data and computing power of the Amazon Elastic Map Reduce cloud
  • Create reports about your analytics results that you can distribute in any format
  • Data Scientists simply use their browser to work with the data
  • They can come up with new models based on your data in the organization to enhance your business processes and applications
    • Improved personalized advertisement
    • Improved sales targeting
    • Predictive Maintenance for your assets
    • User preference learning
    • Gamification
    • Resilience: Detect disasters in your software systems before they happen