Databricks Inc.’s Delta Lake today became the latest open-source software project to fall under the banner of the Linux Foundation.
Delta Lake has rapidly gained momentum since it was open-sourced by Databricks in April, and is already being used by “thousands of organizations” including important backers such as Alibaba Group Holding Ltd., Booz Allen Hamilton Corp. and Intel Corp., its founders say. The project was conceived as a way of improving the reliability of so-called “data lakes,” which are systems or repositories of data stored in its natural format, usually in object “blobs” or files.
Data lakes are popularly used by large enterprises as they provide a reliable way of ensuring that data can be accessed by anyone within an organization. They can be used to store any kind of data, including both structured and unstructured information in its native format, and also support analysis of data that helps provide real-time insights on business matters.
But data lakes aren’t without their problems, the most common of which is that a lot of the information they store is unreliable or inaccurate. This is the result of several reasons, including things such as failed writes, schema mismatches and data inconsistencies that arise when batch and streaming data is mixed together.
Unreliable data can be a burden because it prevents companies from getting accurate insights in a timely fashion. It can also slow down initiatives such as machine learning model training, which requires consistent data to ensure accuracy.
Delta Lake was designed to improve the efficiency of data lakes and ensure information is kept accurate and reliable. It does so by managing transactions across batch and streaming data and multiple simultaneous writes. It also does away with the need to build the complicated data pipelines that are used to move information across different computing systems.
In fact, it’s fair to say that Delta Lake is actually more similar to a “data warehouse” such as Apache Hive than a data lake. The main difference between the two is that the information in the latter is transformed to conform to the data warehouses’ own pre-defined schema, which means it cannot be stored in its native format. This means that the data is more reliable, though it means enterprises lose a lot of flexibility when it comes to analyzing it.
Databricks co-founder and Chief Executive Officer Ali Ghodsi said the company was handing over stewardship of the project to the Linux Foundation in order to encourage more innovation from the open source community.
“To address organizations’ data challenges we want to ensure this project is open source in the truest form,” Ghodsi said. “We’re confident that Delta Lake will quickly become the standard for data storage in data lakes.”
Constellation Research Inc. analyst Holger Mueller told SiliconANGLE Databricks was moving Delta Lake to the Linux Foundation to try to encourage more innovation from the open-source community.
“The data lake is the foundation of the enterprise, and so it’s good to see standardization and open-source beneficial dynamics at work,” Mueller said. “But providing technology assets to open-source bodies is still not a guarantee of success, only time will tell.”
The Linux Foundation said Delta Lake will operate under an open governance model that’s meant to foster more participation in the project.
Ghodsi spoke about Databricks with theCUBE, SiliconANGLE’s livestreaming studio, earlier this year:
Since you’re here …
… We’d like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.
If you like the reporting, video interviews and other ad-free content here, please take a moment to check out a sample of the video content supported by our sponsors, tweet your support, and keep coming back to SiliconANGLE.
Join To Our Newsletter
You are welcome