The Business Case for Implementing Data Lake

December 20, 2018 / Anoop Prasad

Two broad trends are driving the case for implementing Data Lakes in organizations: popularity of Cloud based storage which is driving down cost of storing petabytes of data; and growth of machine-generated unstructured data from mobiles, devices, IoT which can be combined with structured data to get powerful business insights.

Most organizations do not have a comprehensive data strategy, using data in isolation and missing out on the big picture. Data Lake fills this gap by providing a data strategy which benefits all groups within the organization to get new insights into operations and opportunities.

So what exactly is a Data Lake? It is a single source of truth comprising vast amounts of structured and unstructured data from a variety of sources, including raw copies of source data and transformed data which can be used for reporting, visualization and analytics.

Data Lakes ingests data very quickly and prepares it for usage—a key activity being to organize data from different sources, otherwise it quickly becomes a Data Swamp.

Setting up Data Lake Has Become Easy

The rising popularity of Data Lake is because Cloud provides a cost-effective way of storing and cataloguing data. AWS Cloud provides the scale and a supporting ecosystem of open-source tools as managed service to build Data Lake.

S3: Customers can achieve huge savings by separating storage and compute. For example, you can store large data sets in the highly scalable S3 at minimal cost, and run queries using AWS Athena paying only for queries you run.
Athena: Athena queries both structured and unstructured data in S3 and being serverless, you can start analysing data almost immediately after defining the schema and visualize output using Amazon QuickSight.
AWS Glue: If you need to extract and transform data, you can use Amazon Glue—a fully managed serverless service which automatically discovers properties in data via Catalog engine and generate ETL code to transform source code into target schemas.
AWS EMR: Open sourced tools such as Hadoop, Hive, Spark have become easily available as managed services on AWS EMR clusters to run big data workloads.
AWS Redshift: You can extend your data warehouse to the Data Lake and get powerful insights which you cannot by querying independent silos. Using Spectrum, a feature in Redshift you can directly query open data formats stored in S3 without moving data.

In contrast storing petabytes of data using a database or warehouse is expensive as you need to factor in cost of licence, storage and equivalent compute cost. Also there is the constraint of time and effort as data that goes into must be cleansed and prepared before storing, which is not feasible with tons and tons of unstructured data—without guarantee that all data will be used. Considering that unstructured data spans anything from social media to machine data such as logs and sensor data, processing all data requires careful consideration.

Data Lake Architecture in AWS Cloud

Use Cases for Data Lake

Reports and Dashboard: Reports and dashboards on operations, customer preferences, etc become more powerful with insights from integrated data across channels and multiple engagement points.
Advanced Analytics such as recommendation Engine: Optimize search and provide more meaningful recommendation to users by analyzing customer queries and tracking preferences across all touch points.
Ad-hoc queries for management reports: Supplement decision making and hypothesis with quick queries into comprehensive pool of data.
Large report generation and batch processing: Create extensive, detailed reports continuously to achieve operational excellence, thwart threats, take advantage of opportunities in real-time.
Pattern finding using ML algorithms: Spot trends and make predictions with large, heterogenous data sets using machine learning.

Umbrella has helped many organizations reap the benefits of a comprehensive data strategy by implementing Data Lake with robust extraction, loading and transformation methodologies. If you are interested in knowing more about a data strategy or want to implement a Data Lake, reach out to us.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

The Business Case for Implementing Data Lake

Setting up Data Lake Has Become Easy

Data Lake Architecture in AWS Cloud

Use Cases for Data Lake

6 things to consider when you Migrate your Data Warehouse to Amazon Redshift

5 Challenges with Big Data and How AWS Can Help Handle Them

How to Best Secure Data in AWS S3

Let's Explore Your Cloud Requirements

Subscribe

Cloud Services

Company

Contact

IN +91 11 430 78540

US +1 703-752-6293

sales@umbrellainfocare.com

The Business Case for Implementing Data Lake

Setting up Data Lake Has Become Easy

Data Lake Architecture in AWS Cloud

Use Cases for Data Lake

Continuous Compliance in Fast Paced Environment

The Imperative for DevSecOps

Related Posts

Let's Explore Your Cloud Requirements

Subscribe

Cloud Services

Company

Contact

Navigate a successful journey to AWS with Umbrella Infocare