Data Engineering

Forum Posts

Sorted by:

by I-am-Biplab • New Contributor II

05-13-2025 11:30:33 PM

1809 Views
4 replies
4 kudos

Is there a Databricks spark connector for java?

Is there a Databricks Spark connector for Java, just like we have for Snowflake (reference of Snowflake spark connector - https://docs.snowflake.com/en/user-guide/spark-connector-use)Essentially, the use case is to transfer data from S3 to a Databric...

Data Engineering

1809 Views
4 replies
4 kudos

05-13-2025 11:30:33 PM

View Replies

Latest Reply

sandeepmankikar
Databricks Partner

05-15-2025 8:49:46 PM

4 kudos

You don't need a separate Spark connector ,Databricks natively supports writing to Delta tables using standard Spark APIs. Instead of using JDBC, you can use df.write().format("delta") to efficiently write data from S3 to Databricks tables.

4 kudos

05-15-2025 8:49:46 PM

3 More Replies

by turagittech • Contributor

03-30-2025 3:17:21 PM

1741 Views
5 replies
1 kudos

Reading different file structures for json files in blob stores

Hi All,We are planning to store some mixed json files in blob store and read into Databricks. I am questioning whether we should have a container for each structure or if the various tools in Databricks can successfully read the different types. I ha...

Data Engineering

1741 Views
5 replies
1 kudos

03-30-2025 3:17:21 PM

View Replies

Latest Reply

sandeepmankikar
Databricks Partner

05-15-2025 8:44:48 PM

1 kudos

Organize files by schema into subfolders (e.g., /schema_type_a/, /schema_type_b/) in the same container.Avoid putting all JSON types in one folder

1 kudos

05-15-2025 8:44:48 PM

4 More Replies

by LearnDB123 • New Contributor

09-04-2024 9:14:43 AM

3274 Views
2 replies
0 kudos

Saving a file to /tmp is not working after migration to Unity Catalog

Hi,We upgraded our runtime cluster to Unity Catalog recently and since some of the code has been failing which was working fine earlier. We used to save files to "/tmp/" and then move them from temp into our blob storage however since the migration t...

Data Engineering

3274 Views
2 replies
0 kudos

09-04-2024 9:14:43 AM

View Replies

Latest Reply

Rahul6
New Contributor II

05-15-2025 7:25:10 PM

0 kudos

Hi @filipniziol Could we use volumes for this temp processing rather than doing S3

0 kudos

05-15-2025 7:25:10 PM

1 More Replies

by utkarshamone • New Contributor III

05-15-2025 8:32:08 AM

1344 Views
1 replies
0 kudos

Internal errors when running SQLs

We are running Databricks on GCP with a classic SQL warehouse. Its on the current version (v 2025.15)We have a pipeline that runs DBT on top of the SQL warehouseSince the 9th of May, our queries have been failing intermittently with internal errors f...

Data Engineering

1344 Views
1 replies
0 kudos

05-15-2025 8:32:08 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

05-15-2025 10:54:05 AM

0 kudos

Hi @utkarshamone The error messages you've shared—such as:-- [INTERNAL_ERROR] Query could not be scheduled: HTTP Response code: 503-- ExecutorLostFailure ... exited with code 134, sigabrt-- Internal error—indicate that your Databricks SQL warehouse o...

0 kudos

05-15-2025 10:54:05 AM

by I-am-Biplab • New Contributor II

05-13-2025 11:25:03 PM

2150 Views
3 replies
1 kudos

Is there a Databricks spark connector for java?

Data Engineering

2150 Views
3 replies
1 kudos

05-13-2025 11:25:03 PM

View Replies

Latest Reply

Shua42
Databricks Employee

05-15-2025 9:40:18 AM

1 kudos

Hey @I-am-Biplab , If running locally, it is going to be difficult to tune the performance up that much, but there are a few things you can try: 1. Up the partitions and batch size, as much as your machine will allow. Also, running repartition() coul...

1 kudos

05-15-2025 9:40:18 AM

2 More Replies

by jeremy98 • Honored Contributor

12-09-2024 2:26:56 PM

11203 Views
11 replies
4 kudos

Resolved! ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'

Hello community,I installed databricks extension on my vscode ide. How to fix this error? I created the environment to run locally my notebooks and selected the available remote cluster to execute my notebook, what else?I Have this error: ImportError...

Data Engineering

11203 Views
11 replies
4 kudos

12-09-2024 2:26:56 PM

View Replies

Latest Reply

jeremy98
Honored Contributor

12-19-2024 11:42:57 AM

4 kudos

@unj1m yes, as Alberto said you don't need to install pyspark, it is included in your cluster configuration.

4 kudos

12-19-2024 11:42:57 AM

10 More Replies

by Prajit0710 • New Contributor II

05-15-2025 2:24:49 AM

674 Views
1 replies
0 kudos

Resolved! Authentication issue in HiveMetastore

Problem Statement:When I execute the below code as a part of the notebook both manually and in workflow it works as expecteddf.write.mode("overwrite") \.format('delta') \.option('path',ext_path) \.saveAsTable("tbl_schema.Table_name")but when I integr...

Data Engineering

674 Views
1 replies
0 kudos

05-15-2025 2:24:49 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

05-15-2025 7:17:39 AM

0 kudos

Hi @Prajit0710 This is an interesting issue where your Delta table write operation works as expected when run directly,but when executed within a function, the table doesn't get recognized by the HiveMetastore.The key difference is likely related to ...

0 kudos

05-15-2025 7:17:39 AM

by tebodelpino1234 • New Contributor

02-19-2025 11:36:58 AM

3552 Views
1 replies
0 kudos

can view allow_expectations_col in unit catalog

I am developing a dlt that manages expectations and it works correctly.but I need to see the columns__DROP_EXPECTATIONS_COL__MEETS_DROP_EXPECTATIONS__ALLOW_EXPECTATIONS_COLin the unified catalog, I can see them in the delta table that the dlt generat...

Data Engineering

3552 Views
1 replies
0 kudos

02-19-2025 11:36:58 AM

View Replies

Latest Reply

kamal_ch
Databricks Employee

05-15-2025 6:30:40 AM

0 kudos

Materialization tables created by DLT include these columns to process expectations but they might not propagate to Unity Catalog representations such as views or schema-level metadata unless explicitly set up for such lineage or column-level exposur...

0 kudos

05-15-2025 6:30:40 AM

by KS12 • New Contributor

02-25-2025 9:55:45 AM

4072 Views
1 replies
0 kudos

Unable to get s3 data - o536.ls.

Error while executingdisplay(dbutils.fs.ls(f"s3a://bucket-name/"))bucket-name has read/list permissionsshaded.databricks.org.apache.hadoop.fs.s3a.AWSClientIOException: getFileStatus on s3a://bucket-name/ com.amazonaws.SdkClientException: Unable to ex...

Data Engineering

4072 Views
1 replies
0 kudos

02-25-2025 9:55:45 AM

View Replies

Latest Reply

kamal_ch
Databricks Employee

05-15-2025 6:21:06 AM

0 kudos

To start with add SSL debugging logs by passing the JVM option -Djavax.net.debug=ssl in cluster configuration. This helps identify whether the handshake is failing due to missing certificates or invalid paths, Also check the cluster initialization sc...

0 kudos

05-15-2025 6:21:06 AM

by minhhung0507 • Valued Contributor

02-13-2025 7:11:56 PM

4440 Views
1 replies
0 kudos

Error Listing Delta Log on GCS in Databricks

I am encountering an issue while working with a Delta table in Databricks. The error message is as follows:java.io.IOException: Error listing gs://cimb-prod-lakehouse/bronze-layer/dbd/customer_info_update_request_processing/_delta_log/ This issue occ...

Data Engineering

4440 Views
1 replies
0 kudos

02-13-2025 7:11:56 PM

View Replies

Latest Reply

kamal_ch
Databricks Employee

05-15-2025 6:14:48 AM

0 kudos

Ensure that the Databricks workspace has the necessary permissions to access the GCS bucket. Check if the service account used for Databricks has "Storage Object Viewer" or a similar role granted. Verify that the path "gs://cimb-prod-lakehouse/bronze...

0 kudos

05-15-2025 6:14:48 AM

by DaPo • New Contributor III

03-20-2025 8:59:19 AM

4018 Views
2 replies
0 kudos

DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE

I have several DLT Pipeline, writing to some schema in a unity catalog. The storage location of the unity-catalog is managed by the databricks deployment (on AWS).The schema and the dlt-pipeline are managed via databricks asset bundles. I did not cha...

Data Engineering

4018 Views
2 replies
0 kudos

03-20-2025 8:59:19 AM

View Replies

Latest Reply

mani_22
Databricks Employee

05-11-2025 3:19:02 AM

0 kudos

Hi @DaPo , Have you made any code changes to your streaming query? There are limitations on what changes in a streaming query are allowed between restarts from the same checkpoint location. Refer this documentation The checkpoint location appears to ...

0 kudos

05-11-2025 3:19:02 AM

1 More Replies

by oscarramosp • New Contributor II

05-13-2025 12:20:14 PM

1834 Views
3 replies
1 kudos

DLT Pipeline upsert question

Hello, I'm working on a DLT pipeline to build a what would be a Datawarehouse/Datamart. I'm facing issues trying to "update" my fact table when the dimensions that are outside the pipeline fail to be up to date at my processing time, so on the next r...

Data Engineering

1834 Views
3 replies
1 kudos

05-13-2025 12:20:14 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

05-13-2025 12:32:41 PM

1 kudos

The error encountered, "Cannot have multiple queries named catalog.schema.destination_fact for catalog.schema.destination_fact. Additional queries on that table must be named," arises because Delta Live Tables (DLT) disallows multiple unnamed queries...

1 kudos

05-13-2025 12:32:41 PM

2 More Replies

by Zeruno • New Contributor II

08-09-2024 8:42:27 AM

4156 Views
1 replies
0 kudos

UDFs with modular code - INVALID_ARGUMENT

I am migrating a massive codebase to Pyspark on Azure Databricks,using DLT Pipelines. It is very important that code will be modular, that is I am looking to make use of UDFs for the timebeing that use modules and classes.I am receiving the following...

Data Engineering

4156 Views
1 replies
0 kudos

08-09-2024 8:42:27 AM

View Replies

Latest Reply

briceg
Databricks Employee

05-14-2025 4:12:04 PM

0 kudos

Hi @Zeruno. What you can do is to package up your code and pip install in your pipeline. I had the same situation where I developed some code which ran fine in a notebook, but when used in a DLT pipeline, the deps were not found. Packaging them up an...

0 kudos

05-14-2025 4:12:04 PM

by jlynlangford • New Contributor

05-14-2025 1:19:50 PM

1144 Views
1 replies
0 kudos

collect() in SparkR and sparklyr

Hello,I'm have a vast difference in performance between SparkR:collect() and sparklyr:collect. I have a somewhat complicated query that uses WITH AS syntax to get the data set I need; there are several views defined and joins required. The final data...

Data Engineering

1144 Views
1 replies
0 kudos

05-14-2025 1:19:50 PM

View Replies

Latest Reply

niteshm
New Contributor III

05-14-2025 2:23:15 PM

0 kudos

@jlynlangford This is a tricky situation, and multiple resolutions can be tried to address the performance gap,Schema Complexity: If the DataFrame contains nested structs, arrays, or map types, collect() can become significantly slower due to complex...

0 kudos

05-14-2025 2:23:15 PM

by thomas_berry • Databricks Partner

05-13-2025 10:05:33 AM

1854 Views
3 replies
2 kudos

Resolved! federated queries on PostgreSQL - TimestampNTZ option

Hello,I am trying to migrate some spark reads away from JDBC into the federated queries based in unity catalog.Here is an example of the spark read command that I want to migrate:spark.read.format("jdbc").option("driver", "org.postgresql.Driver").opt...

Data Engineering

1854 Views
3 replies
2 kudos

05-13-2025 10:05:33 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

05-14-2025 8:46:10 AM

2 kudos

Thanks @thomas_berry I hope so

2 kudos

05-14-2025 8:46:10 AM

2 More Replies

Databricks Community

Forum Posts

Is there a Databricks spark connector for java?

Reading different file structures for json files in blob stores

Saving a file to /tmp is not working after migration to Unity Catalog

Internal errors when running SQLs

Is there a Databricks spark connector for java?

Resolved! ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'

Resolved! Authentication issue in HiveMetastore

can view allow_expectations_col in unit catalog

Unable to get s3 data - o536.ls.

Error Listing Delta Log on GCS in Databricks

DLT Fails with Exception: CANNOT_READ_STREAMING_STATE_FILE

DLT Pipeline upsert question

UDFs with modular code - INVALID_ARGUMENT

collect() in SparkR and sparklyr

Resolved! federated queries on PostgreSQL - TimestampNTZ option

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template