Databricks and Snowflake have emerged as predominant suppliers of massive information analytics platforms within the cloud. Whereas there are some similarities between the 2 rivals, there are additionally necessary variations of their choices that potential prospects might care to learn about.
In the event you’re transferring your information operations to the cloud, you’re seemingly contemplating Databricks and Snowflake, that are two of the most important and quickest rising firms within the house. Each firms appear to be benefiting from the relative independence of not being named AWS, Google Cloud, or Microsoft Azure, because the concern of lock-in drives enterprises to embrace multi-cloud methods and multi-cloud platforms.
There are clearly similarities between the 2 firms, however they’ve totally different strengths and weaknesses that would affect your determination to go together with one or the opposite–or neither, because the case could also be. This can be a very dynamic trade, and new massive information startups are being spawned on a regular basis to take down the titans. It wasn’t way back that Databricks and Snowflake had been the energetic startups selecting fights with institution distributors.
With that stated, right here’s a rundown on the important thing similarities and variations between the 2 firms and their choices.
Snowflake gives an information warehouse delivered by way of the software-as-a-service (SaaS) technique. It helps structured and semi-structured information (unstructured help remains to be immature), and likewise runs on AWS, Google Cloud, and Microsoft Azure.
Databricks gives an information lakehouse delivered by way of the platform-as-a-service (PaaS) technique. It helps structured, semi-structured, and unstructured information, and runs on AWS, Google Cloud, and Microsoft Azure.
Databricks started as an implementation of Apache Spark within the cloud (though as we speak it’s way more than that) and it continues to excel in offering the kind of large-scale information processing that Spark is renown for. In the present day, the corporate promotes its information lakehouse structure, which mixes the scalability benefits of information lake storage (by way of object storage) with the information high quality benefits of a conventional warehouse, or analytics database. It boasts of its “unified analytics platform” that mixes information engineering, AI, and machine studying.
Snowflake began as an analytics database with storage decoupled from compute, which makes it easier to scale. The corporate typically promotes the efficiency and velocity of its proprietary analytics database, which was designed for conventional analytics and BI workloads. In recent times, it has began including machine studying and AI capabilities by way of its Snowpark choices, increasing into unstructured information sorts.
The Snowflake Knowledge Cloud is finest recognized for delivering excessive velocity SQL-based information warehousing capabilities for conventional analytics and BI workloads. With UniStore, it’s mixing transactional with analytical information and workloads. Workloads scale elastically primarily based on demand. Third-party integrations carry help for numerous ETL and information visualization instruments. Unified information governance atop centralized storage is taken into account a energy. Its Snowpark developer framework brings help for Python, Java, and Scala growth, in addition to new capabilities for creating machine studying and AI functions on unstructured information, reminiscent of textual content and imagery. Its acquisition of Streamlit additionally supplies entry to instruments for quickly constructing Python apps.
The Databricks Lakehouse Platform gives a big selection of capabilities for information engineering, information science, and information evaluation. Clients can construct and run giant batch jobs, real-time streaming workloads, and machine studying functions on Databricks. Growth might be executed by way of notebooks or IDEs, with SQL, Python, and Scala, in addition to open supply ML frameworks like PyTorch and Tensorflow. Its MLflow providing helps to handle machine studying workflows. Its Delta Lake providing helps safe information sharing, whereas it supplies built-in information governance with its Unity Catalog. This yr it rolled out Dolly, a big language mannequin (LLM).
Databricks’ functionality to scale to deal with large information workloads is taken into account a energy. It supplies some automated question optimization capabilities by vectorization and cost-based optimization, however customers will sometimes want some technical experience to actually dial within the efficiency of SQL analytics workloads. It’s extra open for making modifications, reminiscent of choosing sure node sorts. As a PaaS, Databricks is extra open and invitations customers to plug in a wide range of open-source instruments.
As a SaaS providing, Snowflake is designed to be straightforward to get going shortly. Snowflake has executed a variety of engineering below the covers to optimize efficiency out of the field, and its market success displays that. There should not as many choices for fine-tuning the configuration, as Snowflake deliberately shields prospects from that complexity. There isn’t any choice to configure node sorts, for instance. Snowflake usually is much less open and gives fewer choices to customers, which reduces complexity and makes it simpler to make use of by a large group of individuals.
Snowflake manages information for patrons. It helps encryption at relaxation and in transit, role-based entry management (RBAC), and auditing. It additionally helps options reminiscent of digital non-public cloud (VPC) peering for enhanced community safety, in addition to information masking.
In Databricks’ cloud, prospects handle their very own information. Databricks helps encryption at relaxation and in transit and RBAC. Helps Azure Digital Community (VNet Injection) and community safety teams (NSGs) for community isolation on the Microsoft cloud.
Databricks gives pay-as-you-go pricing in addition to committed-use pricing, which brings a reduction. Customers are charged for the precise compute providers they use (reminiscent of “All-Objective Compute”), the variety of digital machine situations they use, how typically they use it, the cloud it’s working on, and the help program (commonplace, premium, enterprise). Since information is managed by prospects, it doesn’t cost for storage.
Snowflake additionally gives pay-as-you-go pricing, however because it manages prospects information, it fees for compute time in addition to information storage. For Snowflake On Demand, it fees primarily based on the quantity the client makes use of it, with per-second pricing. Clients can get reductions by pre-purchasing Snowflake capability. Pricing additionally varies by cloud, area, and help tier (commonplace, enterprise, enterprise crucial, and digital non-public Snowflake [VPS]).
Snowflake launched its Knowledge Trade in 2019, and adjusted the title to the Knowledge Market a yr later. It at present supplies greater than 2,200 information merchandise, lots of that are free. The Snowflake Market additionally gives greater than 1,700 functions, which it calls Native Apps.
Databricks launched its Market in 2022 as a technique to share information utilizing its Delta Sharing protocol. It at present supplies greater than 500 information merchandise, together with 287 free information units.
With its roots in Apache Spark, Databricks makes use of open supply software program extensively in its platform, and contributes a variety of its work to the open supply group. Nonetheless, it was criticized for holding again a few of its know-how, such because the Delta desk format, from the open supply group, a transfer it has since reversed.
Snowflake is just not a giant supporter of open supply, and in reality its leaders have voiced many criticisms of open supply software program, together with the failures of Apache Hadoop. The inside workings of its proprietary database is a thriller. Nonetheless, it has come out in help of open supply Apache Iceberg, a competitor to Delta desk.
Databricks was based in 2013 by the group of laptop scientists at Cal Berkeley’s AMPLab who had been behind Apache Spark. That features Matei Zaharia, who’s typically credited with creating Spark, in addition to his two advisors Ali Ghodsi and Ion Stoica. Co-founders Reynold Xin, Patrick Wendell, Andy Konwinski, and Arsalan Tavakoli-Shiraji are additionally laptop scientists with ties to Berkeley.
Snowflake was based in 2012 by three information warehousing consultants, together with Benoît Dageville and Thierry Cruanes, who each labored as information architect at Oracle, and Marcin Żukowski, the co-founder of Vectorwise, an MPP analytics database that’s now owned by Actian.
Income, Buyer Depend, and Valuation
Databricks has about 10,300 prospects, in line with 6sense, an organization that gives insights on applied sciences, or “technographics.” The comapny, which is privately held, is reportedly valued at $43 billion, a determine cited by Bloomberg in a latest story concerning the firm being in talks for a brand new funding spherical. That’s up from $38 billion, a determine cited two years in the past in the course of the firm’s most up-to-date funding spherical. In June, Databricks handed the $1 billion income mark for the previous 12 months for the primary time.
Snowflake has a market capitalization of $52.5 billion, which is down from about $123 billion in November 2021, when its inventory reached an all-time excessive of about $392 per share. Snowflake recorded $2.07 billion in income for fiscal yr 2023. Snowflake reported that it had greater than 8,100 prospects on the finish of its first quarter for fiscal yr 2024, which ended April 30, 2023.