cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is there a Sample Java Program using Databricks Connect Library to query a table In the Free Editio?

ShawnRR
Visitor

Hello,

    I was wondering if there was sample code indicating how a java program might leverage Databricks Connect to query the table in the Free Edition of Databricks?

   I would like to use Connect as I am trying to avoid JDBC and its overhead and thought I might do better by creating dataframes and then leveraging connect to write them to databricks in parquet based files.  I note that Databricks connect claims to support Java in some places but the documentation focuses on... Python, R and Scala.

https://docs.databricks.com/aws/en/dev-tools/databricks-connect/

  I saw there use to be a standalone... which is what I believe I wanted, but it looks like it is to be deprecated.
  I am new to databricks and its concepts but am familiar with ICEBERG (in which I'd simply use the Iceberg Java API's leveraging a file appender and then the Catalog API to register my parquet files with the manifest).   What is the equivalent here to write directly out parquet in parallel and then register?  (presumably leveraging their spark compute to do it)

1 ACCEPTED SOLUTION

Accepted Solutions

anuj_lathi
Databricks Employee
Databricks Employee

Hi — welcome to Databricks! Unfortunately, Databricks Connect v2 (DBR 13.3+) does not support Java — it only supports Python, Scala, and R. The legacy v1 did support Java, but it's been deprecated and is end-of-support.

That said, here are your options as a Java developer:

Option 1: Use Scala with Databricks Connect (JVM interop)

Since Scala runs on the JVM, you can call the Databricks Connect Scala APIs from Java. This gives you full DataFrame read/write support:

// Scala — callable from Java via JVM interop

import com.databricks.connect.DatabricksSession

import org.apache.spark.sql.types._

 

val spark = DatabricksSession.builder().getOrCreate()

 

// Write a DataFrame to a table

val df = spark.read.table("samples.nyctaxi.trips")

df.limit(5).show()

 

// Create and write your own DataFrame

val schema = StructType(Seq(

  StructField("id", IntegerType, false),

  StructField("name", StringType, false)

))

val data = Seq(Row(1, "Alice"), Row(2, "Bob"))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

df.write.saveAsTable("my_catalog.my_schema.my_table")

 

Add the Maven dependency:

<dependency>

  <groupId>com.databricks</groupId>

  <artifactId>databricks-connect</artifactId>

  <version>15.4.0</version> <!-- match your DBR version -->

</dependency>

 

See: Databricks Connect Scala Examples

Option 2: Databricks SDK for Java + SQL (Pure Java, no Spark dependency)

If you want to stay in pure Java, the Databricks SDK for Java lets you:

  • Upload parquet files to Unity Catalog Volumes via the Files API
  • Execute SQL via the Statement Execution API to register/query tables

This is closer to the Iceberg pattern you described (write files, then register):

import com.databricks.sdk.WorkspaceClient;

 

WorkspaceClient w = new WorkspaceClient();

 

// Upload a parquet file to a Volume

w.files().upload("/Volumes/my_catalog/my_schema/my_volume/data.parquet", inputStream);

 

// Then run SQL to create a table from the file

// (via Statement Execution API or JDBC for the SQL part)

 

Maven dependency:

<dependency>

  <groupId>com.databricks</groupId>

  <artifactId>databricks-sdk-java</artifactId>

  <version>0.2.0</version> <!-- use latest from Maven Central -->

</dependency>

 

Option 3: JDBC with Bulk Ingestion

I know you want to avoid JDBC, but it's worth noting that Databricks JDBC supports Arrow-based bulk ingestion which significantly reduces the overhead compared to traditional row-by-row JDBC inserts. It may be faster than you expect.

A Note on Free Edition

Databricks Connect requires a cluster or serverless compute with Spark Connect enabled. The Free Edition (Community Edition) has limited compute options, so Databricks Connect may not work there. The SDK + SQL approach (Option 2) or JDBC (Option 3) are more likely to work on the free tier.

Docs:

Hope that helps point you in the right direction!

Anuj Lathi
Solutions Engineer @ Databricks

View solution in original post

1 REPLY 1

anuj_lathi
Databricks Employee
Databricks Employee

Hi — welcome to Databricks! Unfortunately, Databricks Connect v2 (DBR 13.3+) does not support Java — it only supports Python, Scala, and R. The legacy v1 did support Java, but it's been deprecated and is end-of-support.

That said, here are your options as a Java developer:

Option 1: Use Scala with Databricks Connect (JVM interop)

Since Scala runs on the JVM, you can call the Databricks Connect Scala APIs from Java. This gives you full DataFrame read/write support:

// Scala — callable from Java via JVM interop

import com.databricks.connect.DatabricksSession

import org.apache.spark.sql.types._

 

val spark = DatabricksSession.builder().getOrCreate()

 

// Write a DataFrame to a table

val df = spark.read.table("samples.nyctaxi.trips")

df.limit(5).show()

 

// Create and write your own DataFrame

val schema = StructType(Seq(

  StructField("id", IntegerType, false),

  StructField("name", StringType, false)

))

val data = Seq(Row(1, "Alice"), Row(2, "Bob"))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)

df.write.saveAsTable("my_catalog.my_schema.my_table")

 

Add the Maven dependency:

<dependency>

  <groupId>com.databricks</groupId>

  <artifactId>databricks-connect</artifactId>

  <version>15.4.0</version> <!-- match your DBR version -->

</dependency>

 

See: Databricks Connect Scala Examples

Option 2: Databricks SDK for Java + SQL (Pure Java, no Spark dependency)

If you want to stay in pure Java, the Databricks SDK for Java lets you:

  • Upload parquet files to Unity Catalog Volumes via the Files API
  • Execute SQL via the Statement Execution API to register/query tables

This is closer to the Iceberg pattern you described (write files, then register):

import com.databricks.sdk.WorkspaceClient;

 

WorkspaceClient w = new WorkspaceClient();

 

// Upload a parquet file to a Volume

w.files().upload("/Volumes/my_catalog/my_schema/my_volume/data.parquet", inputStream);

 

// Then run SQL to create a table from the file

// (via Statement Execution API or JDBC for the SQL part)

 

Maven dependency:

<dependency>

  <groupId>com.databricks</groupId>

  <artifactId>databricks-sdk-java</artifactId>

  <version>0.2.0</version> <!-- use latest from Maven Central -->

</dependency>

 

Option 3: JDBC with Bulk Ingestion

I know you want to avoid JDBC, but it's worth noting that Databricks JDBC supports Arrow-based bulk ingestion which significantly reduces the overhead compared to traditional row-by-row JDBC inserts. It may be faster than you expect.

A Note on Free Edition

Databricks Connect requires a cluster or serverless compute with Spark Connect enabled. The Free Edition (Community Edition) has limited compute options, so Databricks Connect may not work there. The SDK + SQL approach (Option 2) or JDBC (Option 3) are more likely to work on the free tier.

Docs:

Hope that helps point you in the right direction!

Anuj Lathi
Solutions Engineer @ Databricks