cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

soumiknow
by Contributor II
  • 1803 Views
  • 0 replies
  • 0 kudos

data not inserting in 'overwrite' mode - Value has type STRUCT which cannot be inserted into column

We have the following code which we used to load data to BigQuery table after reading the parquet files from Azure Data Lake Storage:df.write.format("bigquery").option( "parentProject", gcp_project_id ).option("table", f"{bq_table_name}").option( "te...

  • 1803 Views
  • 0 replies
  • 0 kudos
ChingizK
by New Contributor III
  • 2871 Views
  • 2 replies
  • 1 kudos

Hyperopt Error: There are no evaluation tasks, cannot return argmin of task losses.

The trials succeed when the cell in the notebook is executed manually:However, the same process fails when executed as a Workflow: The error simply says that there's an issue with the objective function. However how can that be the case if I'm able t...

01.png 02.png
Data Engineering
hyperopt
Workflows
  • 2871 Views
  • 2 replies
  • 1 kudos
Latest Reply
LibertyEnergy
New Contributor II
  • 1 kudos

I have this exact same issue! Can anyone offer guidance?

  • 1 kudos
1 More Replies
ramravi
by Contributor II
  • 24469 Views
  • 3 replies
  • 0 kudos

spark is case sensitive? Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select eit...

spark is case sensitive?Spark is not case sensitive by default. If you have same column name in different case (Name, name), if you try to select either "Name" or "name" column you will get column ambiguity error.There is a way to handle this issue b...

  • 24469 Views
  • 3 replies
  • 0 kudos
Latest Reply
zerospeed
New Contributor II
  • 0 kudos

Hi I had similar issues with parquet files when trying to query athena, fix was i had to inspect the parquet file since it contained columns such as "Name", "name" which the aws crawler / athena would interpret as a duplicate column since it would se...

  • 0 kudos
2 More Replies
Nagarathna
by New Contributor II
  • 727 Views
  • 3 replies
  • 0 kudos

How to write trillions of rows to unity catalog table.

Hi team,I have a dataframe with 1269408570800 rows . I need to write this data to unity catalog table.How can I upload huge quantity of data ?I'm using databricks i runtime 15.4 LTS with 4 workers and each worker type is i3.4xlarge and driver of type...

Data Engineering
data upload
Unity Catalog
  • 727 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor II
  • 0 kudos

Hey @Nagarathna @Lucas_TBrabo  I’d like to share my opinion and some tips that might help:1. You should try to avoud filtering by spark_partition_id because  you can create skewed partitions, you should use with repartition() and spark can optimize t...

  • 0 kudos
2 More Replies
chsoni12
by New Contributor II
  • 782 Views
  • 2 replies
  • 1 kudos

Impact of VACUUM Operations on Shallow Clones in Databricks

I performed a POC where i have to check that can we create a new delta table which contains only particular version of data of normal delta table without copying the data and if we make changes or perform any operation(insert/delete/truncate/records)...

  • 782 Views
  • 2 replies
  • 1 kudos
Latest Reply
chsoni12
New Contributor II
  • 1 kudos

Thanks. It really helps me a lot But there is also an issue in shallow clone. We can only clone the full table data, particular delta version data using timestamp/version from the normal table using shallow clone but we can not clone the table data b...

  • 1 kudos
1 More Replies
ep208
by New Contributor
  • 681 Views
  • 1 replies
  • 0 kudos

How to resolve Location Overlap

Hi,I am trying to ingest abfss://datalake@datalakename.dfs.core.windows.net/Delta/Project1/sales_table but when writting the table schema on the yamls, I uncorrectly wrote this table in other unit catalog table:---kind: SinkDeltaTablemetadata:  name:...

  • 681 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor II
  • 0 kudos

Hey @ep208 ,From the error message you’re seeing (LOCATION_OVERLAP), it seems that Unity Catalog is still tracking a table or volume that points to the same path you’re now trying to reuse:abfss://datalake@datalakename.dfs.core.windows.net/Delta/Proj...

  • 0 kudos
KG_777
by New Contributor
  • 962 Views
  • 1 replies
  • 1 kudos

Resolved! Capturing deletes for SCD2 using apply changes or apply as delete decorator

We're looking to implement scd2 for tables in our lakehouse and we need to keep track of records that are being deleted in the source. Does anyone have a similar use case and can they outline some of the challenges they faced and workarounds they imp...

  • 962 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 1 kudos

Hi @KG_777 Tracking deleted records in an SCD Type 2 implementation for a lakehouse architecture is indeed a challenging but common requirement.Here's an overview of approaches, challenges, and workarounds based on industry experience:Common Approach...

  • 1 kudos
thiagoawstest
by Contributor
  • 10679 Views
  • 3 replies
  • 1 kudos

Save file to /tmp

Hello, I have python code that collects data in json, and sends it to an S3 bucket, everything works fine. But when there is a lot of data, it causes memory overflow.So I want to save locally, for example in /tmp or dbfs:/tmp and after sending it to ...

  • 10679 Views
  • 3 replies
  • 1 kudos
Latest Reply
JimBiard
New Contributor II
  • 1 kudos

I am experiencing the same problem. I create a file in /tmp and can verify that it exists. But when an attempt is made to open the file using pyspark, the file is not found. I noticed that the path I used to create the file is /tmp/foobar.parquet and...

  • 1 kudos
2 More Replies
shubham_007
by Contributor III
  • 3038 Views
  • 7 replies
  • 0 kudos

Assistance needed on DQX framework as we are referring GitHub resource but not enough details

Hi Community Experts,I hope this message finds you well. Our team is currently working on enhancing data quality within our Databricks environment and we are utilizing the Databricks DQX framework for this purpose. We are seeking detailed guidance an...

  • 3038 Views
  • 7 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi shubham,How are you doing today?, It’s great to see your team focusing on data quality using the DQX framework—it’s a solid tool for keeping your data clean and reliable. To get started, I’d suggest beginning with simple checks like NOT NULL, IN R...

  • 0 kudos
6 More Replies
CJOkpala
by New Contributor II
  • 592 Views
  • 2 replies
  • 0 kudos

Error message while running queries

While running queries, both in SQL or notebooks, we get this error message below:INTERNAL_ERROR: Unexpected error when trying to access the statement result. Missing credentials to access the DBFS root storage container in Azure.The access connector ...

  • 592 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @CJOkpala This error suggests an issue with the credentials needed to access your Azure storage container from Databricks. Let's troubleshoot this methodically since there seems to be a disconnect between your configured access connector and the a...

  • 0 kudos
1 More Replies
gehbiszumeis
by New Contributor II
  • 469 Views
  • 2 replies
  • 1 kudos

Copy a library into the folder of script ran in workflow job

I have a python script which gets run in a databricks workflow job task run using the git integration. Originally, in the repo contained a git submodule with a library (not supported by databricks). Therefore I need to copy the library repo (which I ...

  • 469 Views
  • 2 replies
  • 1 kudos
Latest Reply
gehbiszumeis
New Contributor II
  • 1 kudos

Thank you @lingareddy_Alva for your reply. Is it possible to have the path identification and copying done in a bash init script? I'd like to keep my run file clean as it is supposed to run also on other environments.

  • 1 kudos
1 More Replies
ibrahim21124
by New Contributor III
  • 9391 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks Job Timeout after 20 minutes

Hello,I have created a job with no timeout-seconds provided. But I am getting Error: Timed out within 20 minutes. I am running the below commands using Bash@3 task in ADO Pipeline yaml file. The code for the same is given belowtask: Bash@3  timeoutIn...

  • 9391 Views
  • 4 replies
  • 1 kudos
Latest Reply
KavyaKusuma
New Contributor II
  • 1 kudos

I am also facing the same issue, can you please let me know where to change the default timeout ?

  • 1 kudos
3 More Replies
Gilg
by Contributor II
  • 5329 Views
  • 2 replies
  • 0 kudos

Adding column as StructType

Hi Team,Just wondering, how can I add a column to an existing table.I'd tried the below script but giving me an error:ParseException: [PARSE_SYNTAX_ERROR] Syntax error at or near '<'(line 1, pos 121)ALTER TABLE table_clone ADD COLUMNS col_name1 STRUC...

  • 5329 Views
  • 2 replies
  • 0 kudos
Latest Reply
sandeepmankikar
Contributor
  • 0 kudos

To add a STRUCT column to an existing table, use the correct syntax without $ symbols, such as ALTER TABLE table_clone ADD COLUMNS (col_name1 STRUCT<type: STRING, values: ARRAY<STRING>>)

  • 0 kudos
1 More Replies
I-am-Biplab
by New Contributor II
  • 922 Views
  • 4 replies
  • 4 kudos

Is there a Databricks spark connector for java?

Is there a Databricks Spark connector for Java, just like we have for Snowflake (reference of Snowflake spark connector - https://docs.snowflake.com/en/user-guide/spark-connector-use)Essentially, the use case is to transfer data from S3 to a Databric...

  • 922 Views
  • 4 replies
  • 4 kudos
Latest Reply
sandeepmankikar
Contributor
  • 4 kudos

You don't need a separate Spark connector ,Databricks natively supports writing to Delta tables using standard Spark APIs. Instead of using JDBC, you can use df.write().format("delta") to efficiently write data from S3 to Databricks tables.

  • 4 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels