Azure Every Day has been working on a mini-series all about Azure Databricks. Today’s installment will start with the basics, what is Databricks, what does it do and who uses it and why?
What is Azure Databricks?
- A Platform as a Service (PaaS) that provides a unified data analysis system to organizations.
- Cloud-based big data solution used for processing and transforming massive quantities of data.
- It has built-in machine learning models which allow collaboration between data engineers, data analysts and data scientists.
What does Databricks do?
- With a Databricks cluster, you can streamline workflows and collaborate in an interactive workspace.
- Creates a unified data analysis system that provides a single source of truth for reporting in organizations.
- Basic Databricks architecture:
- It begins with ingesting the data from apps or devices.
- Then, it stores structured and unstructured data, in a blob or Data Lake for instance.
- The data is prepped and trained using the Databricks engine to process, transform, cleanse, and create that single version of the truth layer.
- Lastly, it models and serves that data to end users for BI tools like reports and dashboards.
Who uses Databricks and why?
- Databricks is for organizations that want to use their data to gain insights. Sounds like all organizations, right?
- If you’ve worked with data, you know that most organizations spend most of their time managing and processing the data, and less time getting insights from it.
- Here’s a better definition: Databricks is for organizations that want to maximize insights rather than focusing on the infrastructure, storage, and prepping of the data.
Below are examples of how some top companies have benefited from Databricks.
- Showtime has access to massive amounts of subscriber data. They wanted to leverage this data to drive content strategy, but their legacy systems required tons of time to manage and process the data.
- Their machine learning lifecycle was a manual process prone to human error. By using a Databricks solution, their ETL process was completed in 4 hours instead of 24, 6x faster!
- Now decisions are made much faster and collaboration allowed them to tap into insights and helped them reach their goal of a better, more personalized experience for subscribers.
- Nationwide realized their legacy batch analysis process was slow and inaccurate and had infrastructure challenges like cross team silos and individual workstations analyzing limited datasets.
- A Databricks solution provided 9x faster data pipelines and 50% time reduction in the ML lifecycle. Nationwide has also seen remarkable improvements in data processing speeds and can now quickly train accurate models for their use cases.
- It’s crucial for Shell to keep machines running and have a supply of spare parts for oil rigs and other machinery without overstocking.
- Their current processes and legacy systems for maintaining inventory faced challenges.
- A Databricks solution reduced the time required to run the inventory analysis and prediction model from 48 hours to 45 minutes, a 32x performance gain!
- Conde Nast is a digital conglomerate of over 20 brands including Vanity Fair, Bon Appetit, and the New Yorker.
- Their websites get over 100 million visits and over 800 million page views a month. They wanted to use machine learning to provide personalized content recommendation and targeted ads.
- A Databricks solution allowed them to scale up to collect over 1 trillion data points per month, and innovate and deploy more models into production.
- Conde Nast saw a 60% time reduction of ETL and a 50% reduction in IT operational costs.
This Databricks 101 has shown you what Azure Databricks is and what it can do. Keep up with Azure Every Day for upcoming posts that dig deeper into Databricks.
If you’re interested in learning more about Databricks and how to leverage it in your organization, we can help. Contact us and let our team of MVPs and industry experts create a custom solution for your business.