Data Culture, Data Platforms

Databricks Data + AI Summit 2024 Keynote Recap: The 5 Biggest Announcements

Tim Osborn

Tim is a content creator at Monte Carlo who writes about data quality, technology, and snacks—occasionally in that order.

Databricks’ Wednesday morning keynote opened with the usual bombastic techno beats and disorienting light show we’ve come to expect from the Data + AI Summit

But in spite of its ear-rupturing volume, all that bombast was merely the appetizer for the bombastic tech announcements yet to unfold. 

And unfold they did. 

Check out our recap to find out what was shared, what you can expect next, and why in the world Ali Ghodsi can’t stop talking about the Data Intelligence Platform. 

Ali Ghodsi’s opening remarks

After a fever-dream of disparate words and images raced across the screen, Databricks CEO Ali Ghodsi took to the stage with the obligatory whoops and hollers you would expect for one of the most prominent voices in AI. 

But once Ghodsi took the stage, it was right down to business. 

Ali opened the session by running through a list of impressive statistics about the event (16,000 in-person attendees), before setting the scene with the corporate mission of Databricks: a statement that would serve as backdrop for the announcements to come. 

According to Ali, the mission of Databricks is and has always been, “to democratize data and AI.”

Whether or not that’s always been true, I don’t know, but after wading through the proceeding list of new features, I can confirm that it’s certainly true today. 

As Ali sees it, there are three primary challenges to the data world realizing a democratized future: 

  1. Everyone wants AI. 
  2. Security and privacy are under pressure. 
  3. The data estate is fragmented. 

Fortunately, Ali came prepared with the solution. He calls it The Data Intelligence Platform.

According to Databricks, the Data Intelligence Platform is a platform solution that allows an entire organization to use data and AI—and each of the morning’s big announcements served that vision in one way or another. 

So, with that in mind let’s dive into the top 5 announcements coming out of Databricks Data + AI Summit this year.

Databricks’ acquires Tabular to help customers “own their own data”

First and foremost was the news that wasn’t news: Databricks’ now-ancient agreement to purchase Tabular, a data management company founded by Apache Iceberg creators Ryan Blue, Daniel Weeks, and Jason Reid.

Ali shared a bit of added context on his company’s vision for the merger, namely eliminating platform fragmentation between Databricks’ own Delta Lake table format and the new coolest kid at school Apache Iceberg. 

Foreward to the once-new announcement, Ali went on to share Databricks’ intention to work more closely with both the Delta Lake and Apache Iceberg communities in an effort to bring the formats closer together and enable more interoperability for data teams. 

In Ali’s words, this unification will empower Databricks’ customers to truly own their data and eliminate once and for all the control vendors hold over their users’ data. 

Unity Catalog is now open source

What was clearly the biggest announcement of the morning (at least in the eyes of the audience) speaks to the unique audience Data + AI Summit attracts—and what they value in a data platform solution.

After releasing Unity Catalog back in 2021 to meet customer demand for greater discoverability in the lakehouse, Databricks announced this morning that it would be open sourcing its native catalog feature to create a unified solution for both data and AI workflows. 

Complete with open APIs and Apache 2.0 licensed open source server, Unity Catalog OSS will offer a universal interface to support any data format and compute engine, including the ability to read tables with Delta Lake, Apache Iceberg™, and Apache Hudi™ clients via Delta Lake UniForm.

This announcement also enables Databricks’ users to support unified governance and gives users more control by enabling both interoperability and choice across  a variety of compute engines, tools, and platforms.

Serverless compute is going GA

Another moment that warmed the hearts of Summit attendees was the announcement of general availability for Databricks’ much-anticipated serverless computing. 

While serverless has been in public preview for some time, the broader Databricks populous have been waiting with baited breath for the sweet news of general availability. And that news finally came this morning to a round of applause from the audience in San Francisco.

However, while that news was certainly music to the data engineering ears in the room, it also came with some heavy foreshadowing as well. It appears that serverless isn’t just an option for Databricks customers, it’s the future of the platform.

Based on Ali’s remarks, it’s likely that future Databricks products will only be available for serverless customers—which means that users who want to take full advantage of their platform will need to make the switch sooner rather than later.

Stated benefits of the shift included the ability to:

  • Track and predict spend
  • Only pay for what you need
  • Eliminate versioning 

According to available docs, both autoscaling and vectorized query engine Photon will be automatically enabled for the serverless compute resources that run a given job.

Ali reiterated that every Databricks customer should make the switch as soon as possible, and that serverless computing would be the future of the Databricks platform.  

Jensen Huang IRL

If a picture is worth a thousand words, the sight of 16,000 phones flying into the air to snap a picture of Nvidia Founder Jensen Huang’s now-iconic leather jacket must say something pretty profound.

“I love myself! … Just kidding.” Jensen Huang

During this standout session, there was some talk about native support for NVIDIA-accelerated computing with Photon, something about Databricks’ DBRX open-source model becoming available on NVIDIA NIM— and a bunch of other stuff about GPUs and velocity or something—but honestly, who cares. 

The biggest news here was simply that Jensen Huang exists in the world and he’s here at Databricks. 

Mosaic AI is open for business

If Jensen Huang’s IRL presence was the most exciting AI reveal of the morning, the most robust was undoubtedly Mosaic AI.  

After Ali’s big announcement last year that Databricks had acquired LLM darling MosaicAI, news of the platform’s integration had remained relatively scant. But all that changed this morning when VP of engineering Patrick Wendell took the stage to announce the complete integration of Mosaic AI’s capabilities—including what features were available today, and what users could look forward to in the coming weeks. 

To innumerate how the platform could be leveraged to build production-quality AI systems and applications, Patrick discussed several key capabilities in preview now, including:

Mosaic AI Model Training: a zero-code training solution to fine-tune foundational models with an organization’s private data.

Mosaic AI Agent Framework: A resource to help data teams author, deploy, and test RAG applications using foundationals models and their enterprise data (using Mosaic AI Vector Search).

Mosaic AI Agent Evaluation: An AI-assisted evaluation tool to measure AI outputs by defining quality, collecting feedback, and identifying and resolving issues in the model. 

Mosaic AI Gateway: a central point to manage and control the availability of LLMs to help customers switch between LLMs without complicated changes to the application code.

And finally, Mosaic AI Tool Catalog: a resource to help organizations govern, share, and register AI tools (AI applications that carry out actions like executing code, searching the web, and calling APIs) using Databricks Unity Catalog.  

While most tools referenced during the presentation were in public preview (or generally available,) Mosaic AI Tool Catalog was the only featuring that was currently still in private preview. 

This announcement was followed by brief session about Shutterstock’s new AI image generator powered by Databricks and a live demo that would eventually become a highlight of the morning—complete with technical snafus, corporate anxiety, and a triumphant final output that showcased the admittedly impressive framework Databricks has developed for its customers.  

The message? AI is the future—and the data platform is too.

While AI was clearly the star of the show for a second year, the most galvanizing announcements—and the undercurrent of the conference writ large—has undoubtedly been the need for more interoperability, scalability,  and reliability in the data estate. 

And the biggest developments weren’t the AI tools and new models—it was all the ways data teams could leverage their own enterprise data to deliver AI value faster.

However, even with all that new technological progress, the Summit’s speakers made one thing abundantly clear: realizing that value depends first and foremost on the quality of the enterprise data being leveraged.

Without reliable data at the foundation, better tools can’t deliver enterprise-ready AI: they can only deliver broken AI faster. 

Data quality IS the past, present, and future of AI.

Solve for reliable data first and the rest of those new tools look a whole lot more valuable. Contact our team to find out how data observability can help you deliver scalable, interoperable, and comprehensive data reliability to your data estate.

Our promise: we will show you the product.