Is there a Databricks spark connector for java?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-13-2025 11:30 PM
Is there a Databricks Spark connector for Java, just like we have for Snowflake (reference of Snowflake spark connector - https://docs.snowflake.com/en/user-guide/spark-connector-use)
Essentially, the use case is to transfer data from S3 to a Databricks table. In the current implementation, I am using Spark to read data from S3 and JDBC to write data to Databricks. But I want to use Spark instead to write data to Databricks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-14-2025 06:42 AM
read function with the appropriate format based on your data (e.g., csv, parquet, etc.) and specify the S3 path. scala
val data = spark.read.format("parquet").load("s3://bucket-name/folder-name")
Ensure you configure your AWS credentials for accessing S3.- Writing Data to Databricks Table: Use the Delta format or another supported format to write data directly to a Databricks table:
scala data.write.format("delta").save("/mnt/databricks-table-path")If the table is pre-defined, you can use thesaveAsTablemethod instead:scala data.write.format("delta").mode("overwrite").saveAsTable("database.table_name")
DataStreamReader and DataStreamWriter in Java mirror their Scala equivalents.- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2025 03:10 AM
Thanks @Louis_Frolio
Just wanted to clear my use case, I want to run the code locally, but the data insertion should happen in a remote Databricks workspace. I tried using JDBC, but it seems its performance is low in the case of write operations, even after adding batch size and number of partitions.
Is there any alternative for my use case? Also, I am using Java for the current implementation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2025 09:28 AM
We have native connectivity with VSCode. Check it out here: https://docs.databricks.com/aws/en/dev-tools/vscode-ext/
You may also want to dig into Databricks Connect. Check it out here: https://docs.databricks.com/aws/en/release-notes/dbconnect/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-15-2025 08:49 PM
You don't need a separate Spark connector ,Databricks natively supports writing to Delta tables using standard Spark APIs. Instead of using JDBC, you can use df.write().format("delta") to efficiently write data from S3 to Databricks tables.