Two broad trends are driving the case for implementing Data Lakes in organizations: popularity of Cloud based storage which is driving down cost of storing petabytes of data; and growth of machine-generated unstructured data from mobiles, devices, IoT which can be combined with structured data to get powerful business insights.
Most organizations do not have a comprehensive data strategy, using data in isolation and missing out on the big picture. Data Lake fills this gap by providing a data strategy which benefits all groups within the organization to get new insights into operations and opportunities.
So what exactly is a Data Lake? It is a single source of truth comprising vast amounts of structured and unstructured data from a variety of sources, including raw copies of source data and transformed data which can be used for reporting, visualization and analytics.
Data Lakes ingests data very quickly and prepares it for usage—a key activity being to organize data from different sources, otherwise it quickly becomes a Data Swamp.
The rising popularity of Data Lake is because Cloud provides a cost-effective way of storing and cataloguing data. AWS Cloud provides the scale and a supporting ecosystem of open-source tools as managed service to build Data Lake.
In contrast storing petabytes of data using a database or warehouse is expensive as you need to factor in cost of licence, storage and equivalent compute cost. Also there is the constraint of time and effort as data that goes into must be cleansed and prepared before storing, which is not feasible with tons and tons of unstructured data—without guarantee that all data will be used. Considering that unstructured data spans anything from social media to machine data such as logs and sensor data, processing all data requires careful consideration.
Data Lake Architecture in AWS Cloud
Use Cases for Data Lake
Umbrella has helped many organizations reap the benefits of a comprehensive data strategy by implementing Data Lake with robust extraction, loading and transformation methodologies. If you are interested in knowing more about a data strategy or want to implement a Data Lake, reach out at firstname.lastname@example.org or call 0956 070 0360.