cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ep208
by New Contributor
  • 2825 Views
  • 1 replies
  • 0 kudos

How to resolve Location Overlap

Hi,I am trying to ingest abfss://datalake@datalakename.dfs.core.windows.net/Delta/Project1/sales_table but when writting the table schema on the yamls, I uncorrectly wrote this table in other unit catalog table:---kind: SinkDeltaTablemetadata:  name:...

  • 2825 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @ep208 ,From the error message you’re seeing (LOCATION_OVERLAP), it seems that Unity Catalog is still tracking a table or volume that points to the same path you’re now trying to reuse:abfss://datalake@datalakename.dfs.core.windows.net/Delta/Proj...

  • 0 kudos
KG_777
by New Contributor II
  • 3000 Views
  • 1 replies
  • 2 kudos

Resolved! Capturing deletes for SCD2 using apply changes or apply as delete decorator

We're looking to implement scd2 for tables in our lakehouse and we need to keep track of records that are being deleted in the source. Does anyone have a similar use case and can they outline some of the challenges they faced and workarounds they imp...

  • 3000 Views
  • 1 replies
  • 2 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 2 kudos

Hi @KG_777 Tracking deleted records in an SCD Type 2 implementation for a lakehouse architecture is indeed a challenging but common requirement.Here's an overview of approaches, challenges, and workarounds based on industry experience:Common Approach...

  • 2 kudos
thiagoawstest
by Contributor
  • 14082 Views
  • 3 replies
  • 2 kudos

Save file to /tmp

Hello, I have python code that collects data in json, and sends it to an S3 bucket, everything works fine. But when there is a lot of data, it causes memory overflow.So I want to save locally, for example in /tmp or dbfs:/tmp and after sending it to ...

  • 14082 Views
  • 3 replies
  • 2 kudos
Latest Reply
JimBiard
New Contributor III
  • 2 kudos

I am experiencing the same problem. I create a file in /tmp and can verify that it exists. But when an attempt is made to open the file using pyspark, the file is not found. I noticed that the path I used to create the file is /tmp/foobar.parquet and...

  • 2 kudos
2 More Replies
shubham_007
by Contributor III
  • 5570 Views
  • 7 replies
  • 0 kudos

Assistance needed on DQX framework as we are referring GitHub resource but not enough details

Hi Community Experts,I hope this message finds you well. Our team is currently working on enhancing data quality within our Databricks environment and we are utilizing the Databricks DQX framework for this purpose. We are seeking detailed guidance an...

  • 5570 Views
  • 7 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor II
  • 0 kudos

Hi shubham,How are you doing today?, It’s great to see your team focusing on data quality using the DQX framework—it’s a solid tool for keeping your data clean and reliable. To get started, I’d suggest beginning with simple checks like NOT NULL, IN R...

  • 0 kudos
6 More Replies
CJOkpala
by New Contributor II
  • 1347 Views
  • 2 replies
  • 0 kudos

Error message while running queries

While running queries, both in SQL or notebooks, we get this error message below:INTERNAL_ERROR: Unexpected error when trying to access the statement result. Missing credentials to access the DBFS root storage container in Azure.The access connector ...

  • 1347 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @CJOkpala This error suggests an issue with the credentials needed to access your Azure storage container from Databricks. Let's troubleshoot this methodically since there seems to be a disconnect between your configured access connector and the a...

  • 0 kudos
1 More Replies
gehbiszumeis
by New Contributor II
  • 1603 Views
  • 2 replies
  • 1 kudos

Copy a library into the folder of script ran in workflow job

I have a python script which gets run in a databricks workflow job task run using the git integration. Originally, in the repo contained a git submodule with a library (not supported by databricks). Therefore I need to copy the library repo (which I ...

  • 1603 Views
  • 2 replies
  • 1 kudos
Latest Reply
gehbiszumeis
New Contributor II
  • 1 kudos

Thank you @lingareddy_Alva for your reply. Is it possible to have the path identification and copying done in a bash init script? I'd like to keep my run file clean as it is supposed to run also on other environments.

  • 1 kudos
1 More Replies
ibrahim21124
by Databricks Partner
  • 16003 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks Job Timeout after 20 minutes

Hello,I have created a job with no timeout-seconds provided. But I am getting Error: Timed out within 20 minutes. I am running the below commands using Bash@3 task in ADO Pipeline yaml file. The code for the same is given belowtask: Bash@3  timeoutIn...

  • 16003 Views
  • 4 replies
  • 1 kudos
Latest Reply
KavyaKusuma
New Contributor II
  • 1 kudos

I am also facing the same issue, can you please let me know where to change the default timeout ?

  • 1 kudos
3 More Replies
Gilg
by Contributor II
  • 7562 Views
  • 2 replies
  • 0 kudos

Adding column as StructType

Hi Team,Just wondering, how can I add a column to an existing table.I'd tried the below script but giving me an error:ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near '<'(line 1, pos 121)ALTER TABLE table_clone ADD COLUMNS col_name1 STRUC...

  • 7562 Views
  • 2 replies
  • 0 kudos
Latest Reply
sandeepmankikar
Databricks Partner
  • 0 kudos

To add a STRUCT column to an existing table, use the correct syntax without $ symbols, such as ALTER TABLE table_clone ADD COLUMNS (col_name1 STRUCT<type: STRING, values: ARRAY<STRING>>)

  • 0 kudos
1 More Replies
I-am-Biplab
by New Contributor II
  • 2456 Views
  • 4 replies
  • 4 kudos

Is there a Databricks spark connector for java?

Is there a Databricks Spark connector for Java, just like we have for Snowflake (reference of Snowflake spark connector - https://docs.snowflake.com/en/user-guide/spark-connector-use)Essentially, the use case is to transfer data from S3 to a Databric...

  • 2456 Views
  • 4 replies
  • 4 kudos
Latest Reply
sandeepmankikar
Databricks Partner
  • 4 kudos

You don't need a separate Spark connector ,Databricks natively supports writing to Delta tables using standard Spark APIs. Instead of using JDBC, you can use df.write().format("delta") to efficiently write data from S3 to Databricks tables.

  • 4 kudos
3 More Replies
turagittech
by Contributor
  • 2317 Views
  • 5 replies
  • 1 kudos

Reading different file structures for json files in blob stores

Hi All,We are planning to store some mixed json files in blob store and read into Databricks. I am questioning whether we should have a container for each structure or if the various tools in Databricks can successfully read the different types. I ha...

  • 2317 Views
  • 5 replies
  • 1 kudos
Latest Reply
sandeepmankikar
Databricks Partner
  • 1 kudos

Organize files by schema into subfolders (e.g., /schema_type_a/, /schema_type_b/) in the same container.Avoid putting all JSON types in one folder

  • 1 kudos
4 More Replies
LearnDB123
by New Contributor
  • 3678 Views
  • 2 replies
  • 0 kudos

Saving a file to /tmp is not working after migration to Unity Catalog

Hi,We upgraded our runtime cluster to Unity Catalog recently and since some of the code has been failing which was working fine earlier. We used to save files to "/tmp/" and then move them from temp into our blob storage however since the migration t...

  • 3678 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rahul6
New Contributor II
  • 0 kudos

Hi @filipniziol Could we use volumes for this temp processing rather than doing S3  

  • 0 kudos
1 More Replies
utkarshamone
by New Contributor III
  • 1819 Views
  • 1 replies
  • 0 kudos

Internal errors when running SQLs

We are running Databricks on GCP with a classic SQL warehouse. Its on the current version (v 2025.15)We have a pipeline that runs DBT on top of the SQL warehouseSince the 9th of May, our queries have been failing intermittently with internal errors f...

  • 1819 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @utkarshamone The error messages you've shared—such as:-- [INTERNAL_ERROR] Query could not be scheduled: HTTP Response code: 503-- ExecutorLostFailure ... exited with code 134, sigabrt-- Internal error—indicate that your Databricks SQL warehouse o...

  • 0 kudos
I-am-Biplab
by New Contributor II
  • 2824 Views
  • 3 replies
  • 1 kudos

Is there a Databricks spark connector for java?

Is there a Databricks Spark connector for Java, just like we have for Snowflake (reference of Snowflake spark connector - https://docs.snowflake.com/en/user-guide/spark-connector-use)Essentially, the use case is to transfer data from S3 to a Databric...

  • 2824 Views
  • 3 replies
  • 1 kudos
Latest Reply
Shua42
Databricks Employee
  • 1 kudos

Hey @I-am-Biplab , If running locally, it is going to be difficult to tune the performance up that much, but there are a few things you can try: 1. Up the partitions and batch size, as much as your machine will allow. Also, running repartition() coul...

  • 1 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 12621 Views
  • 11 replies
  • 4 kudos

Resolved! ImportError: cannot import name 'AnalyzeArgument' from 'pyspark.sql.udtf'

Hello community,I installed databricks extension on my vscode ide. How to fix this error? I created the environment to run locally my notebooks and selected the available remote cluster to execute my notebook, what else?I Have this error: ImportError...

  • 12621 Views
  • 11 replies
  • 4 kudos
Latest Reply
jeremy98
Honored Contributor
  • 4 kudos

@unj1m yes, as Alberto said you don't need to install pyspark, it is included in your cluster configuration.

  • 4 kudos
10 More Replies
Prajit0710
by New Contributor II
  • 911 Views
  • 1 replies
  • 0 kudos

Resolved! Authentication issue in HiveMetastore

Problem Statement:When I execute the below code as a part of the notebook both manually and in workflow it works as expecteddf.write.mode("overwrite") \.format('delta') \.option('path',ext_path) \.saveAsTable("tbl_schema.Table_name")but when I integr...

  • 911 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Prajit0710 This is an interesting issue where your Delta table write operation works as expected when run directly,but when executed within a function, the table doesn't get recognized by the HiveMetastore.The key difference is likely related to ...

  • 0 kudos
Labels