Back in December Pragmatic Works' BI Consultant, Alan Faulkner, did a presentation on Intro to Big Data Analytics. In this session he introduced the Hadoop-based services in Windows Azure and used PowerBI features to build a self-service BI Big Data solution. Now we are offering you an outline and demo to accompany his presentation.
Provisioning an HDInsight Cluster
HDInsight offers several ways to provision an HDInsight cluster. In this demo we will provision an HDInsight cluster using the Windows Azure Management portal.
We achieve this by
- Creating an Azure storage account.
- HDInsight uses Azure Blob storage for storing data. It is called WASB or Azure Storage – Blob as Microsoft’s implementation of HDFS.
- When You specify an Azure storage account while provisioning a cluster. A specific Blob container from the account is used as the default file system, just like in HDFS.
- The HDIsight cluster is by default provisioned in the same data center as the storage account you specify.
- To simplify this demo only the default blob container and the default storage account are used.
- Provisioning HDInsight cluster. HDInsight makes ApacheHadoop available as a service in the cloud. Hadoop offers a distributed platform to store and manage large volumes of unstructured data, commonly called Big Data.
- The creation of theHDInsight Cluster goes through 5 status/phases:
- Cluster Storage Provisioned
- Windows Azure VM Configuration
Once we have the HDInsight Cluster provisioned, the next step is to run a Hive job to query a sample Hive table. The table hivesample, which comes with HDInsight clusters The data contains data on mobile device manufacturer, platforms, and models. We will query this table to retrieve data for mobile devices by a specific manufacturer.
Big Data Analytics
- Using HDInsight to create Hive tables out of raw data coming from sensors. We will analyze this data using Excel and Power View.
- Basically the solution involves:
- Sensor data from heating, ventilation, and air conditioning (HVAC) systems is loaded into BLOB storage as comma separated files (CSV)
- Hive queries are used to expose the data in the CSV files as Hive tables, Additional tables are created by enriching this data.
- Excel connects to HDInsight using the Hive ODBC Driver and visualizes the data using Power View
- Additionally demonstrate how to use the data imported into Excel to perform more dynamic analysis using Power Pivot.
Alan is an IT professional with over 20+ years of progressive technical experience managing, consulting, data integration architecture, data warehousing, and business intelligence development projects. The focus of his work is to work closely with stakeholders to develop architecture frameworks that align with organizational strategy, processes, and IT assets. He specializes in Business Intelligence and Data Warehousing database architecture.
Click here to view Alan's blog.
Interested in seeing more demos? Let us know in the comments section below what you would like to see!