Kubrick Head of Next-Generation Technology Lawrence Freeman and Delivery Lead Lewis Allsop reflect on the latest innovations, surprises, and the most important takeaways from Databricks’ flagship event.
Databricks’ flagship conference is the Data+AI Summit, aptly named at the intersection of these technology spheres. What aspects of data and AI were highest on the agenda?
Lawrence: To start with – and this was really useful context - they recapped the brief history of the event. It started as the Spark Summit, then becoming Spark+AI Summit, and now to the Data+AI Summit, as a reflection of the current sentiment surrounding these technologies. Spark is the underlying capability behind Databricks, but the focus on what their technology can mean for the wider data and AI agenda has risen to the surface.
Amidst the great variety of talks and sessions, there were two clear focuses on everyone’s minds. Unsurprisingly, generative AI and Large Language Models dominated the agenda, but there was also a lot of important discussion about management, governance, and security. We were thrilled to attend the Partner Summit and get a sneak peak at some exciting innovations within those two areas, which we will get into shortly.
Lewis: The change in focus from last year’s summit is remarkable. Then, the conference was all about data engineering and their Delta Lake, which is an open-source storage layer. Now, the data agenda is all about pushing the boundaries of data democratization: how can you access lots of data on lots of different systems through their database platform. If you think about some of the large-scale organizations which work with Kubrick, they are complex, intricate ecosystems with teams working will all different systems. To be able to bring that data into one place is a gamechanger – and rightfully a big theme across the event.
Lawrence: To add to that, and to reference Databricks CEO Ali Ghodsi, the most important strategy Databricks now is their Unity Catalog. It was launched last year, but the version they unveiled at Data+AI Summit 2023 totally enables that democratization of data largescale organizations are seeking by allowing you “to discover querying govern all your data no matter where it lives”, to quote Ghodsi. You don't have to move your data into the Lake House; the data can remain where it is.
But Unity Catalog will allow you to centrally connect to all that data, still leveraging the power and compute of the underlying systems, if it's sitting in Teradata or Snowflake, for example. Unity Catalog isn't trying to take over, it's just directing users to access the data and therefore unlocks all of the incredible new features that Databricks have launched. One of these new tools that completely wowed me is Lake House IQ. That, when coupled with Unity Catalog, allows further increases the democratization of data by allowing interaction with, again I quote “the hottest new programming language: English”. You couldn't get much more democratized than allowing people to access insights using their own mother tongue.
As a partner of Databricks, what most excites you about these announcements and what does it mean for Kubrick as well as the wider data and AI community?
Lawrence: How Databricks have developed complete end-to-end capability is truly exciting. It now encompasses everything from data aggregation to governance, and analytics to AI, is all held within Databricks. Now, with the realization that you don't have to move all your data into one place to be able to reap these benefits, there is even more scope for innovation by breaking down costly siloes.
The biggest change and driver of new value for Kubrick and the wider technology community will learning how to leverage English as the ‘hottest new programming language’. The rise of Prompt Engineering, which we are certainly exploring as a potential skill or practice at Kubrick, will rely on a variety of skills previously overlooked in the data and tech world, like linguistics. While Kubrick already has opportunities open to non-STEM graduates, this kind of technological advancement really helps bring data into the hands of a broader range of people. This, again, supports the idea of true democratization which will be critical for increasing diversity and overcoming biases.
The hype around generative AI is so palpable, and the summit’s agenda was teeming with discussions and demos of its capabilities. What were the key takeaways for technology leaders to navigate the overwhelming possibilities and find their first steps on the journey?
Lawrence: The volume of unstructured data has easily grown, at least over the last decade, to make up around 90% of all the available data that we have. It's the text, the multimedia videos, and image data that we generate constantly with the advent of social media. And, until now, this data has been pretty much untapped. Our current state of capability is largely only focused on easy-to-use, structured data which is such a small fraction of the whole picture. The rise of generative AI, with computer vision and deep learning at its roots, will suddenly open the gates to enormous amount of data that we have but have not turned into tangible information.
If the end goal is to drive optimization and decision-making which is both faster and more holistic, then there's endless possibilities of what we’re going to be able to do. So, where do we start this? This question is exactly what our clients are trying to answer when jumping into the world of generative AI, and we’re excited to be helping them to ideate and find strong use cases. The capabilities we have inside Kubrick, from our Training experts to our Squad and Sprint teams, are proving to be an interesting breeding ground for these revolutionary ideas. It feels that there are so few things that that can't be achieved when we tap into that 90% pool of unstructured data.
Lewis: One of many things that I was feeling when I left the Data+AI Summit was overwhelmed. There was so much new information to process, and our clients are telling us that they feel the same. It’s not just about finding a clear starting point but finding one which has proven value. Training these models is a very expensive endeavor, which is something that might get lost amidst the hype and is set to get even more expensive before the cost plateaus. While the potential for LLMs and generative AI to make cost-saving efficiencies is enticing, the initial investment might outweigh immediate benefits without finding an effective first use case.
One of the best sessions from the summit, a keynote from former Google CEO Eric Schmidt, gave some helpful, practical advice for approaching the challenge. By spending time in departments across your business, you can generate a long list of potential use cases for how generative AI can make different teams more productive, which should help illuminate a top priority to focus on for the maximum return for your own team. By starting as your own consumer and improving your own productivity, it will be easy to demonstrate the impact to the rest of the business.
Lawrence: There was a strong focus on making these Large Language Models more accessible in themselves. Just a couple of months ago, OpenAI CEO Sam Altman said we’re already at the end of the era of giant AI models. As we start creating smaller, more bespoke models, we will make them more accessible from both a cost and technical perspective, as well as increase their reliability when trained on datasets specific to their creators/users on an individual organization level.
Daniela Rus, Director of the Computer Science and Artificial Intelligence Laboratory at MIT, gave a fantastic keynote about changing our mindset to embracing generative AI instead of fearing it. There is a lot of media hype about how it could replace jobs, but its functionality could really act as an assistant to enhance our roles. We can have real-time recommendations and insights to inform our work, without spending time gathering, analyzing, and presenting data.
This ability to instantly access and generate insights from your organization’s data is powerful, but the recommendations created by generative AI are only as good as the data that feeds it. This links back conveniently to the awesomeness of Databricks’ Unity Catalog, which is the centralization of all of your data in one place. The more data you can access, the richer and more reliable the insights generative AI can create to support data-informed decision-making. It’s still vital that it is a human making those decisions, but we will be more confident in our decision-making as a result.
Lewis: And that raises the question of why data quality is so important. Some of the biggest LLMs have been trained on potentially inaccurate sources, like Wikipedia, which is a risk. But, if you build a specialized, smaller model on your own data which has robust data quality and governance processes in place, you can be assured of the results.
Undoubtedly generative AI is going to change the way we work. How did the summit explore the impact on teams and roles, and how can we prepare for these changes on a large scale?
Lewis: This is an aspect of the rise of AI that I’m just starting to consider; there has been so much discussion about the technical developments, but not enough about how we will actually implement it. We’ve all worked on projects where we’ve tried to make a process more efficient using technology, but the end users are panicked or intimidated by the product and don’t adopt it. Now, with fears for job stability, the element of change management is going to be even more difficult. We’re going to need really strong product managers with great communication and interpersonal skills to manage these changes, which again invites more people from other backgrounds into the tech sphere.
Lawrence: There is no denying that there will be a change in the way we work, and some jobs might be affected more than others. For example, a mid-level data science role will likely change significantly as generative AI augments code production. However, there are many roles that will be created as a result of AI, particularly within cyber security. We’re anticipating an increase in cyber crime with AI technology, and we’ll need more security specialists to detect patterns and harness AI to fight AI.
Finally, how did it feel to be at the Data+AI Summit as a key partner of Databricks? What were the biggest surprises or most impactful messages you took away?
Lewis: I came away from that event thinking that AI is essentially going to change my job – and the world – and I’m quickly going fall behind if I don’t start learning more. It could be daunting, but I’m feeling inspired.
Lawrence: For me, the event was a great reminder of why Databricks is so pivotal in the world of data and AI. They are the creators of Spark and the Data Lakehouse, technology which supports so many other vendors and innovators, as well as our own clients. Databricks is being used by 50% of Fortune 500 companies – that is a pretty impressive stat. They are an awesome Technology Partner of ours and we can't wait to get stuck into the new releases and how they support Gen AI – meaning both generative AI and Generation AI!