The hype around generative AI is real, and data and ML teams are feeling the heat.
Across industries, executives are pushing their data leaders to build AI-powered products that will save time, drive revenue, or give them a competitive advantage.
And tech giants like OpenAI, Google, Amazon, and Microsoft have been flooding the market with features driven by large language models (LLMs) and image-generating diffusion models. They promise to help companies analyze data at scale, summarize and synthesize information, generate content, and otherwise transform their businesses.
But where do most companies actually start when it comes to incorporating generative AI? What generative AI use cases are realistic, achievable, and actually worth the ROI?
We dug deep into the early adopters’ strategies to learn how companies are putting this technology to use today — and what it takes for a data team to implement gen AI at scale.
Table of Contents
Build more efficient workflows for knowledge workers
Across industries, companies are driving early generative AI use cases by automating and simplifying time-intensive processes for knowledge workers.
Given LLMs’ capacity to understand and extract insights from unstructured data, businesses are finding value in summarizing, analyzing, searching, and surfacing insights from large amounts of internal information. Let’s explore how a few key sectors are putting gen AI to use.
In the legal industry, AI-powered systems are assisting firms by:
- Automating regulatory monitoring to ensure clients are up-to-date with compliance
- Drafting and review of standard documents like wills and contracts
- Assisting with due diligence by reviewing large volumes of documents to identify potential risks and issues
- Analyzing contracts to flag possible issues or suggest revisions
- Assisting in legal research by identifying, analyzing, and summarizing pertinent information from case law, statutes, journals, regulations, and other relevant publications
Tech solutions: Legal teams are adopting specialized solutions that have either custom models or fine-tuned LLMs for the legal system, including CoCounsel (powered by GPT-4), Harvey, and Thomson Reuters’s suite of software.
Real-life use case: London law firm Macfarlanes uses Harvey to support research, analyze and summarize documents, and create first drafts of emails and memos, including client work — with human lawyers reviewing its work.
Way back in early 2023, Wall Street institutions like Goldman Sachs and Citigroup famously banned ChatGPT due to data privacy concerns. Despite those “anti-AI” headlines, the financial industry has made use of machine learning algorithms for years — powering fraud detection algorithms and instantaneous credit decisions. And financial products and firms are ripe with potential use cases for generative AI.
For now, though, Databricks estimates that 80% of generative AI use cases in financial services are focused on streamlining processes to save time and resources. That includes:
- Conversational finance chatbots that can use internal documents as a knowledge base
- Automating basic accounting functions like invoice capture and processing
- Analyzing, summarizing, and extracting insights from documents like annual reports, insurance contracts, and earning call transcripts
Additionally, industry leaders believe that AI’s capacity to detect and stop financial crime and fraud is an enormously compelling application.
Tech solutions: Customized solutions are beginning to emerge, including BloombergGPT, a 50-billion parameter LLM specifically developed for financial services.
Real life use case: In September 2023, Morgan Stanley launched an AI-powered assistant to support financial advisors by providing easy access to its internal database of research reports and documents. Employees can use the tool to ask questions about markets, internal processes, and recommendations.
Sales and marketing teams are adopting generative AI in droves with use cases like:
- Writing first drafts of emails, landing pages, blog posts, and other content
- Personalizing content for individual outreach based on CRM data
- Analyzing sales interactions to coach representatives
- Automating lead scoring based on demographics, firmographics, and digital behaviors
- Summarizing interactions from calls and video meetings
Tech solutions: Sales platforms like Gong use proprietary models to produce call summaries and recommend next steps to help move prospects along their buying journey, while Salesforce’s Einstein Copilot auto-generates email replies and account updates based on a customer’s specific context.
Real-life use case: Account engagement platform 6sense uses an AI-enabled conversational email solution in their prospect communications — which contributes 10% of the new pipeline generation from marketing-engaged accounts.
Automate engineering and data processes
By automating repetitive or mundane aspects of coding and data engineering, generative AI is streamlining workflows and driving productivity for software and data engineers alike.
For example, teams can use gen AI to:
- Automatically generate chunks of code and review code for errors
- Automatically debug and rectify minor errors, or predict where bugs are likely to occur
- Generate large amounts of synthetic data that mirror real-world information so engineers can test models without worrying about privacy concerns
- Automatically generate detailed documentation around code and projects
- More readily update legacy software from languages like COBOL (redolent in the financial sector and a significant cost) to modern ones
LLMs are also being incorporated directly into developer solutions. For example, within the Monte Carlo platform, we leverage the OpenAI API to support two features — Fix with AI and Generate with AI — that help teams better operationalize data observability. Fix with AI uses LLMs to identify bugs in data quality checks, and Generate with AI uses LLMs to generate suggestions for new data quality checks.
Even at OpenAI itself, LLMs are used to support DevOps and internal functions. As Yaniv Markovsi, Head of AI Specialist, told us, their team uses GPT models to aggregate and translate operational signals, like server logs or social media events, to understand what their customers experience when they use their products. This is considerably more streamlined than the traditional approach of a Site Reliability Engineering team manually investigating and triaging incidents.
Real-life use case: One global media company’s data engineering team is using LLMs to classify pull-requests into different levels of required triage in their dbt workflows. Depending on the classification of the change, the model triggers a different build command. This helps streamline development workflows considerably — as the team’s alternative was to hardcode some complex parsing to determine which command was appropriate to test the changes.
Democratize data with the rest of your company
Within the world of data, the ripest opportunity for companies to leverage gen AI may be to increase access to data for non-technical consumers. LLMs provide a path for team members across the organization to enter natural language prompts that can generate SQL queries to retrieve specific data points or answer complex questions.
This is the precise use case that Adam Conway, SVP of Products at Databricks, recently highlighted as the clearest first step for companies.
“I’ve seen examples of industries with huge amounts of documentation that want to enable their internal teams to retrieve answers out of tens of thousands of pages of records,” said Adam. “That’s the right approach, because the risk is low — it allows you to get your hands dirty, provides a lot of value, and doesn’t create a lot of risk. At Databricks, we have an internal chatbot that helps employees figure things out and look at their data. And we see a lot of value there.”
Tech solutions: Platforms like Databricks are working on embedded functionalities — they recently announced their LakehouseIQ, which promises to make it possible for teams to query their data in plain language.
While these technologies are still emerging, data teams can fine-tune models based on internal documents or knowledge bases to build customized capabilities for their organizations — or use gen AI to help employees shortcut their way to self-serve queries, as our real-life example describes.
Real-life use case: Livestream shopping platform Whatnot highly encourages every employee to know SQL so they can query their own data, create their own dashboards, and write their own dbt models — even across non-technical departments like marketing, finance, and operations. Generative AI plays a role in employee training.
As Engineering Director Emmanuel Fuentes told us recently, “It’s helping people bootstrap. If they come in with no background in SQL, it’s helping them ramp up fairly quickly, which is really great to see. If someone doesn’t know how to do a window function, for example, they can describe what they’re trying to do, get a chunk of SQL out, and then swap in our data tables. It’s like having a tutor for someone who just doesn’t know how to do any advanced analytics.”
Scale customer support
Customer support teams deserve their own shoutout as an especially ideal audience for LLM-enabled workflows. By incorporating semantic search into basic chatbots and workflows, data teams can enable CS teams to access information, create responses, and resolve requests much more quickly.
Tech solutions: Some CX solutions are already including gen AI capabilities in their platforms. For example, Oracle’s Fusion Cloud CX uses an LLM that references internal data to help agents generate instant responses to service requests based on the history of the client’s interactions, and suggests new knowledge base content in response to emerging service issues.
Real-life use case: Vimeo engineers used generative AI to build a help desk chat prototype. The tool indexes the company’s Zendesk-hosted help articles in a vector store (more on vector databases below) and connects that store to the LLM provider. When a customer has an unsuccessful conversation with the existing front-end chatbot, the transcript is sent to the LLM for further help. The LLM would rephrase the problem into a single question, query the vector store for articles with related content, and receive the resulting relevant documents. Then, the LLM would generate a final, summarized answer for the customer.
Support translation and language services
Finally, generative AI makes it possible to automate near-instantaneous translations and language support across organizations, which spend nearly $60 billion annually on language services — but only translate a fraction of the content they produce. LLMs like GPT-4 have the potential to help teams provide multilingual customer service interactions, conduct global sentiment analysis, and localize content at scale.
Tech solutions: Currently, most models may lack the training data to be proficient in less commonly spoken languages — or to pick up on colloquialisms or industry-specific terms — so teams may need to fine-tune models to produce solid results. That said, Google is working on a universal speech model trained on over 400 languages, with the goal of building a universal translator.
Real-life use case: In a unique twist on a traditional translation model, health tech company Vital launched an AI-powered doctor-to-patient translator to instantly turn highly technical medical terminology into plain language.
Three key considerations when getting started with gen AI
As your team forays into the ever-shifting landscape of gen AI, there are a few key considerations to keep in mind.
Supplement your tech stack
Having the right tech stack in place to support gen AI will help your team scale and create value much more quickly. In addition to the usual components of a modern data stack, you’ll want to consider adding:
Vector databases are currently one of the most effective ways for teams to build scalable applications with OpenAI’s LLMs. These databases enable vector embedding, which carry semantic information that helps AI understand relationships and patterns within your data.
For teams with more custom needs, fine-tuning models — training a pretrained model on a dataset specific to your needs — will likely be the next step beyond vector embedding. Tools like Tensorflow and HuggingFace are good options to fine-tune your models.
Unstructured or streaming data processing
Generative AI tends to deliver the most value by extracting insights from large volumes of unstructured data. If you haven’t yet incorporated unstructured data processing into your stack, you’ll likely need to implement a tool like Spark — or Kafka, if you’re venturing into streaming data.
Secure the right team and resources
Creating an AI pilot project takes time and resources. While you may have a gung-ho CEO who will spare no expense to bring gen AI into your product or business, it’s still important to have a realistic idea of how long it will take — and how much it will cost.
Set up your team
You’re likely going to redirect existing employees to prototype or prove out a concept rather than hire experienced gen AI developers right off the bat (partly because this is a brand new field, so experienced gen AI developers don’t really exist yet). These tiger teams are usually populated with data engineers with some ML background.
In other words, some of your valuable players will need to be redirected away from immediate, revenue-generating work to take on your AI pilot project. Consider the inherent opportunity cost and incorporate it into your overall planning — and pair your team with a business sponsor who can advocate for this shift in resourcing while keeping your team close to the business value.
Consider your hardware costs
If you plan to fine-tune your model, and you’re new to ML Ops, predict and pay attention to the compute costs you’ll incur with all that custom training. Those GPU hours can add up.
Prioritize data quality
Regardless of your tech stack, your model of choice, or your use case, one truth remains: you need to ensure the quality of your data inputs and your data outputs. Otherwise, you risk exposing bad data to more internal teams, either directly through natural language prompts or indirectly through gen AI-powered products.
This is where data observability becomes that last key component of your stack, delivering a clear view into the health of your data at each stage in the pipeline. Data observability monitors data for issues with volume, freshness, accuracy, and other important factors, and provides field-level lineage to help enable faster resolution when issues arise. In other words, observability helps ensure that the data powering your AI-enabled products is reliable and accurate — laying the foundation for trustworthy outputs.
Bottom line: Generative AI has the potential to transform every business, but it’s not without risks and potential pitfalls. Data observability helps ensure gen AI creates immense value — not embarrassing data disasters — for your organization.
Special thanks to Naren Venkatraman, Yaniv Markovski, and Emmanuel Fuentes for taking the time to chat with us for this article.
Learn how data observability can help teams deliver reliable AI products at scale. Reach out to Molly and the rest of the team to learn more.