Databricks Community

chari · ‎03-09-2024

Hello Databricks Community,

I have a hard time understanding how is Databricks SQL different from microsoft SQL ? Also, why does databricks provide spark SQL ?

If you direct me to a well-written webpage or document its of immense help!

Thanks,

Kaniz · ‎03-13-2024

Hi @chari,

Certainly! Let’s delve into the differences between Databricks SQL and Microsoft SQL Server, as well as the rationale behind Spark SQL in Databricks.

Databricks SQL vs. Microsoft SQL Server:
- Databricks SQL is an integral part of the Databricks Lakehouse Platform. Here’s how it compares to Microsoft SQL Server:
  - Primary Database Model:
    - Databricks SQL: It combines elements of data lakes and data warehouses, providing a unified view of structured and unstructured data. It is based on Apache Spark.
    - Microsoft SQL Server: A flagship relational DBMS.
  - Secondary Database Models:
    - Databricks SQL: Supports additional models like document store and graph DBMS.
    - Microsoft SQL Server: Primarily a relational DBMS.
  - APIs and Access Methods:
    - Databricks SQL: Supports JDBC, ODBC, and a RESTful HTTP API.
    - Microsoft SQL Server: Offers a wide range of access methods including ADO.NET, JDBC, ODBC, and OLE DB.
  - Programming Languages:
    - Databricks SQL: Supports Python, R, Scala, C#, C++, Delphi, Go, Java, and JavaScript (Node.js).
    - Microsoft SQL Server: Allows server-side scripts in Transact SQL, .NET languages, R, Python, and (with SQL Server 2019) Java.
  - Consistency and Durability:
    - Both adhere to ACID properties (Atomicity, Consistency, Isolation, Durability).
  - In-Memory Capabilities:
    - Microsoft SQL Server offers in-memory capabilities, while Databricks SQL does not.
- For more detailed information, you can explore the Databricks documentation and the Microsoft SQL Server documentation.
Why Spark SQL in Databricks?:
- Spark SQL is an essential component of Databricks for several reasons:
  - Unified Data Processing: Spark SQL seamlessly integrates structured data processing (SQL queries) with unstructured data processing (Spark operations).
  - Performance: It leverages the power of Apache Spark, enabling distributed, parallel processing across large datasets.
  - Data Lakehouse Paradigm: Databricks aims to bridge the gap between data lakes and data warehouses. Spark SQL plays a crucial role in this by providing a unified interface for both types of data.
  - Ecosystem Compatibility: Spark SQL allows users to work with data stored in various formats (Parquet, JSON, Avro, etc.) within the same platform.
If you’d like to explore further, you can refer to the Databricks documentation.

Remember, both Databricks SQL and Microsoft SQL Server serve different purposes, and your choice depends on your specific use case and requirements. 🚀🔍

View solution in original post

Kaniz · ‎03-13-2024

Hi @chari,

Certainly! Let’s delve into the differences between Databricks SQL and Microsoft SQL Server, as well as the rationale behind Spark SQL in Databricks.

Databricks SQL vs. Microsoft SQL Server:
- Databricks SQL is an integral part of the Databricks Lakehouse Platform. Here’s how it compares to Microsoft SQL Server:
  - Primary Database Model:
    - Databricks SQL: It combines elements of data lakes and data warehouses, providing a unified view of structured and unstructured data. It is based on Apache Spark.
    - Microsoft SQL Server: A flagship relational DBMS.
  - Secondary Database Models:
    - Databricks SQL: Supports additional models like document store and graph DBMS.
    - Microsoft SQL Server: Primarily a relational DBMS.
  - APIs and Access Methods:
    - Databricks SQL: Supports JDBC, ODBC, and a RESTful HTTP API.
    - Microsoft SQL Server: Offers a wide range of access methods including ADO.NET, JDBC, ODBC, and OLE DB.
  - Programming Languages:
    - Databricks SQL: Supports Python, R, Scala, C#, C++, Delphi, Go, Java, and JavaScript (Node.js).
    - Microsoft SQL Server: Allows server-side scripts in Transact SQL, .NET languages, R, Python, and (with SQL Server 2019) Java.
  - Consistency and Durability:
    - Both adhere to ACID properties (Atomicity, Consistency, Isolation, Durability).
  - In-Memory Capabilities:
    - Microsoft SQL Server offers in-memory capabilities, while Databricks SQL does not.
- For more detailed information, you can explore the Databricks documentation and the Microsoft SQL Server documentation.
Why Spark SQL in Databricks?:
- Spark SQL is an essential component of Databricks for several reasons:
  - Unified Data Processing: Spark SQL seamlessly integrates structured data processing (SQL queries) with unstructured data processing (Spark operations).
  - Performance: It leverages the power of Apache Spark, enabling distributed, parallel processing across large datasets.
  - Data Lakehouse Paradigm: Databricks aims to bridge the gap between data lakes and data warehouses. Spark SQL plays a crucial role in this by providing a unified interface for both types of data.
  - Ecosystem Compatibility: Spark SQL allows users to work with data stored in various formats (Parquet, JSON, Avro, etc.) within the same platform.
If you’d like to explore further, you can refer to the Databricks documentation.

Remember, both Databricks SQL and Microsoft SQL Server serve different purposes, and your choice depends on your specific use case and requirements. 🚀🔍

Databricks Community

What is databricks SQL, spark SQL and how are they different from MS SQL ?

Get Certified at Data & AI Summit and Earn this Exclusive Databricks Jacket

Supercharge Your Code Generation

Registration now open! Databricks Data + AI Summit 2024

Announcing General Availability of Liquid Clustering

Introducing the Databricks AI Fund