Data lakes: how they work and where they are used

Industrial Automation Experts
2 min readAug 18, 2021

--

A data lake is a large repository of raw data, both unstructured and semi-structured. It combines data from a variety of unrelated sources.

Data lakes are mostly used by large companies. This is because a data lake enables to identify patterns from scattered information from unrelated divisions of the company that are almost impossible to find in any other way.

- Advantages

Despite their seemingly chaotic structure, data lakes provide companies with great opportunities for development. The high level of data centralization and granularity allows to literally “find a needle in a haystack”. Compared to conventional data warehouses, data lakes are more profitable because they do not require preliminary data processing.

Using a data lake allows a company to make decisions based on in-depth statistics and facts from various sources and scale the solution easily when the company reaches the next level of development.

- Disadvantages

The disadvantages of data lakes include their requirement for a high level of expertise; only highly qualified analysts can truly benefit from them. They require additional Business Intelligence tools to help transform insights into a coherent business strategy.

Also, data lakes require the use of third-party systems. In this case, the company depends on the provider. A system crash or data leak can lead to massive financial losses.

- How to implement a data lake?

The whole process of creating a data lake can be divided into four stages:

1) Development of a raw data collecting platform. At this stage, it is important to learn how to retrieve and store information.

2) Platform development and first experiments. This is where data analysis begins, and analytical model prototypes appear.

3) Forming a connection with data storages. The amount of data flowing into the data lake increases, and the navigation process is simplified.

4) The data lake becomes the center of the company’s architecture. At this final stage, application scenarios are worked out; the company begins to launch new services with a convenient interface and proceed to use the Data-as-a-Service model.

- What tasks can a data lake solve?

One example of the effective use of data lakes is Amazon, whose data sources number in the thousands. For example, Amazon’s financial transactions were stored in 25 different databases, which were arranged and organized in different ways. This created confusion and inconvenience. The data lake made it possible to combine all these materials and establish a unified data protection system. Now experts can easily query the data they need and get predictions from machine learning algorithms. In addition, the data lake will help build detailed reports to improve company productivity, efficiently allocate resources and avoid shortages of goods, conduct analytics to launch products faster than competitors, and much more.

--

--

Industrial Automation Experts
Industrial Automation Experts

Written by Industrial Automation Experts

Community of Industry 4.0 enthusiasts. Discussion of innovations in the field of industrial automation and digitalization, IIoT (Industrial Internet of Things)

No responses yet