What the DBRX release should teach you about open source strategy
Why you're probably missing this key aspect of the DBRX Model
Hi, I’m Bill. I was a PM at Databricks and Anyscale. Over the past 9 months I decided to go independent. I’m building a startup as an indiehacker, advising some high growth startups, running some side hustles.
I still like to write about trends in Data + AI, that’s what you’ll find below.
Drop a comment or reply to this email if you’ve got questions.
Be sure to follow me on LinkedIn and Twitter.
People don’t understand open source.
They kinda get it. Yeah, it’s free, licensing, blah blah blah.
“Bro, open source is just free marketing.”
No it’s not.
The true power of open source is how it changes the competitive landscape (at least with the right product).
Databricks has put on an open source strategy clinic with the release of DBRX.
I thought it would be a good opportunity to use it as an example for others to understand what’s going on behind the scenes.
Let’s peek behind the curtain.
The Power of Open Source
When a company open sources a project, they're not just giving away their work for free; they're inviting the entire community to collaborate, contribute, and build upon it.
When people say free adoption, that’s what they mean.
When done well, this creates a virtuous cycle of innovation and improvement that can accelerate the development of new technologies and capture mindshare.
This creates a network effect that can quickly establish a project as the de facto standard in its space.
How does that relate to the release of DBRX?
Open source is the most interesting strategic tool I see being deployed by Databricks in the spat
When Databricks open sourced its Delta Lake project (years ago), it wasn't just a technical decision; it was a strategic move to establish Delta Lake as the standard for data storage and management. By making Delta Lake open source, Databricks forced Snowflake to make a decision: either integrate with Delta Lake and acknowledge its importance, or risk being seen as out of touch with the broader ecosystem.
This is the power of open source in action.
Open Source Software (OSS) creates a new playing field where the rules are different, and where the traditional advantages of proprietary software may not apply. In the open source world, it's not just about who has the best technology; it's about who can build the strongest community and ecosystem around their project.
The Double-Edged Sword of Open Source
While open source can be a powerful tool for driving innovation and adoption, it's not without its challenges and risks.
One of the biggest challenges is maintaining compatibility between the open source version of a project and any proprietary implementations that a company may offer.
This is where the concept of an "open source interface" comes into play.
When a company open sources a project, they're essentially defining a public API that other developers can build against. They’re also exposing the internals.
But if the company also offers a proprietary implementation of that project, they need to make sure that it remains compatible with the open source interface over time.
For this reason, good interface designers are worth their weight in bitcoin.
This can be a tricky balancing act, as the company needs to continue evolving its proprietary offering while still maintaining backward compatibility with the open source interface. If the proprietary version diverges too much from the open source version, it can create confusion and frustration for developers who are trying to build against the open source API.
This challenge is evergreen in open source, here’s an example that took me 20 seconds to find on Supabase’s Docs.
And the interface issue is called out.
Perfectly makes my point.
We've also seen this challenge play out in the case of Delta Lake, which Databricks open sourced in 2019. Since then, Databricks has continued to evolve its proprietary implementation of Delta Lake, adding new features and optimizations that are not necessarily available in the open source version.
Snowflake, of course, has been critical of Databricks' approach to Delta Lake.
Snowflake’s argument is that basically it’s not truly open. A strange argument for an entirely closed source product but I digress.
Snowflake will not be immune to these challenges either. Or at least won’t be.
Took me 20 seconds to find an example.
The company has been working on supporting the open source Iceberg table format, which is seen as a competitor to Delta Lake. If you can actually compete on something called a “table format”.
Snowflake, I assume, is going to want to optimize their abilities with Iceberg. They will (likely) need to navigate the same compatibility challenges that Databricks has faced with Delta Lake.
The lesson here is that open source can be a double-edged sword.
While it can be a powerful tool for driving innovation and adoption, it also comes with its own set of challenges and risks. Companies that want to use open source as a competitive tool need to be prepared to navigate these challenges and make tough decisions about how to balance their proprietary offerings with their open source commitments.
Why Open Sourcing a Model Might be the Holy 🏆
While open sourcing libraries and APIs is difficult. There's one area where open source can be a particularly powerful tool: models and in particular LLMs.
When a company open sources a model, they're essentially giving away a pre-trained, ready-to-use piece of software that other developers can incorporate into their own applications. This can be a huge time-saver for developers, who don't have to spend months or years training their own models from scratch.
But the benefits of open sourcing models go beyond just convenience.
When a model is open sourced, it becomes a static artifact that can evolve independently of the source material. This means that developers can take the model and fine-tune it for their own specific use cases, without having to worry about compatibility issues or breaking changes in the underlying library.
There is no interface to maintain!
Shoe horning an analogy
Bear with me and read the whole thing.
Open sourcing a model is like giving away a Honda Civic.
A Honda Civic works. You can drive it. People are happy with them.
But the true power is that can customize the shit out of my Honda Civic too. That’s how I like my cars.
Everyone can take their Honda Civic and do whatever they want to with it.
Databricks (or broadly, the company open sourcing a model) doesn’t have to maintain the thing at all. Caveat emptor.
Bringing the anology home
So, how is this like a model?
Well, basically Databricks knows
how to build LLMs and ones that work well at that. Not only that, they know the entire process to build that LLM.
That means that Databricks can take different components of the LLM (imprecisely speaking) and enable customers to do new and exciting things with them.
On top of that, those customers will own those models - making them feel like they’re in control (as opposed to using a 3rd party API like OpenAI).
Back to our regularly scheduled programming
By releasing DBRX as an open source model, Databricks was able to give developers a powerful tool for building AI applications.
The key difference is the maintenance burden. The maintenance burden for maintaining DBRX vs something like Delta Lake is basically non-existent.
This is the allure of open source models: they provide a simple, self-contained way for developers to incorporate advanced functionality into their applications, without having to worry about the underlying complexity of maintaining a library or even an API. They can even customize that model for every customer that wants to, and not have anything more to maintain.
Of course, open sourcing models (or building LLMs) is not without its own challenges. There are questions around licensing, intellectual property, and the potential for misuse or abuse. But overall it’s so much of the power of open sourcing with fewer drawbacks.
You’ve just got to be good enough to pull it off.
Conclusion
As we've seen, open source is a powerful tool that can shape the future of the tech industry. In the battle between Databricks and Snowflake, open source has become a key weapon.
For companies that want to use open source as a competitive tool, the key is to be strategic and thoughtful about how they approach it.
This means carefully considering which projects to open source, how to balance proprietary and open source offerings, and how to build and maintain a strong community around their open source efforts.
Foundation models, in particular, provide an interesting development in open source. I haven’t thought deeply about the implications for a company like HuggingFace here, but it’s going to be interesting to see how all this develops.
But one thing is clear: open source will continue to play a crucial role in shaping the future of the tech industry, and companies that want to stay ahead of the curve will need to be strategic and thoughtful about how they approach it.