Data Engineering

Forum Posts

Sorted by:

by Enzo_Bahrami • New Contributor III

05-25-2023 5:53:30 PM

1466 Views
2 replies
0 kudos

Resolved! Input File Path from Autoloader in Delta Live Tables

Hello everyone!I was wondering if there is any way to get the subdirectories in which the file resides while loading while loading using Autoloader with DLT. For example:def customer(): return ( spark.readStream.format('cloudfiles') .option('clou...

Data Engineering

1466 Views
2 replies
0 kudos

05-25-2023 5:53:30 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 1:37:22 AM

0 kudos

Hi @Parsa Bahraminejad We haven't heard from you since the last response from @Vigneshraja Palaniraj , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be...

0 kudos

06-01-2023 1:37:22 AM

1 More Replies

by ros • New Contributor III

05-31-2023 12:47:59 AM

647 Views
2 replies
2 kudos

merge vs MERGE INTO

from 10.4 LTS version we have low shuffle merge, so merge is more faster. But what about MERGE INTO function that we run in sql notebook of databricks. Is there any performance difference when we use databrciks pyspark ".merge" function vs databricks...

Data Engineering

647 Views
2 replies
2 kudos

05-31-2023 12:47:59 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-01-2023 12:10:35 AM

2 kudos

Hi @Roshan RC Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers you...

2 kudos

06-01-2023 12:10:35 AM

1 More Replies

by erickeniuk • New Contributor II

05-30-2023 11:47:15 AM

923 Views
2 replies
1 kudos

Search for Databricks Jobs By Name

The Databricks CLI has the ability to list jobs by exact name using “Databricks jobs list —name my_job”. Is there a way to search for jobs using this same method, where I could put a partial name of a job and get all the jobs that match? Ex: “databri...

Data Engineering

923 Views
2 replies
1 kudos

05-30-2023 11:47:15 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-31-2023 8:17:40 PM

1 kudos

Hi @Eric Keniuk Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers y...

1 kudos

05-31-2023 8:17:40 PM

1 More Replies

by Nishant1307056 • New Contributor

05-31-2023 7:05:53 AM

395 Views
0 replies
0 kudos

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Ba...

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Badge to generate or What is the process to get it??

Data Engineering

395 Views
0 replies
0 kudos

05-31-2023 7:05:53 AM

by vijaykumarbotla • New Contributor III

05-29-2023 9:15:23 AM

2217 Views
5 replies
1 kudos

Resolved! Getting error : Analysis Exception : olumn Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark.

AnalysisException: Column Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark is unable to figure out which one. ...

Data Engineering

2217 Views
5 replies
1 kudos

05-29-2023 9:15:23 AM

View Replies

Latest Reply

vijaykumarbotla
New Contributor III

05-31-2023 6:56:24 AM

1 kudos

Hi All,the solution for this problem is very strange.this has caused due to the version of the Databricks runtime.We are using Runtime version 7.0 with Apache Spark 3.0.0 version.In PRD we are using Runtime version 11.3LTS with Apache Spark 3.3.0 ver...

1 kudos

05-31-2023 6:56:24 AM

4 More Replies

by darioAnt • New Contributor II

05-31-2023 1:37:33 AM

1020 Views
1 replies
2 kudos

Filtering delta table by CONCAT of a partition column and a non-partition one

Hi,I know how filtering a delta table on a partition column is a very powerful time-saving approach, but what if this column appears as a CONCAT in the where-clause?I explain my case: I have a delta table with only one partition column, say called co...

Data Engineering

1020 Views
1 replies
2 kudos

05-31-2023 1:37:33 AM

View Replies

Latest Reply

darioAnt
New Contributor II

05-31-2023 6:21:20 AM

2 kudos

I did myself a test and the answer is no:with a Concat filter, spark sql does not know I am using a partition-based column, so it scan all the table.

2 kudos

05-31-2023 6:21:20 AM

by Altay • New Contributor II

05-31-2023 3:33:13 AM

327 Views
0 replies
0 kudos

Delta merge drops cached variables

Hi Everyone,I have an ingestion script where I use the delta merge to update and append newly incoming data in dataframe format to an existing delta table.I am experiencing an issue where all the variables that have been used previously loose their d...

Data Engineering

327 Views
0 replies
0 kudos

05-31-2023 3:33:13 AM

by konda1 • New Contributor

05-31-2023 3:27:13 AM

513 Views
0 replies
0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

Data Engineering

513 Views
0 replies
0 kudos

05-31-2023 3:27:13 AM

by martindlarsson • New Contributor III

05-31-2023 1:43:47 AM

423 Views
0 replies
0 kudos

Autoloader and deletion vectors (Predictive IO)

We are looking into enabling Predictive IO on our delta tables. In the ingest process we are using autoloader and I am wondering if autoloader will gett a flag to enable deletion vectors at table creation? Deletion vectors is a requirement for Predic...

Data Engineering

423 Views
0 replies
0 kudos

05-31-2023 1:43:47 AM

by eyalo • New Contributor II

05-31-2023 1:13:58 AM

606 Views
0 replies
0 kudos

Ingest from FTP server doesn't work

Hi,I am trying to connect my FTP server and store the files to a dataframe with the following code:%pip install ftputilfrom ftputil import FTPHostHost = "92.118.67.49"Login = "StrideNBM-DF_BO"Passwd = "Sdf123456"ftp_dir = "/dwh-reports/"with FTPHost(...

Data Engineering

606 Views
0 replies
0 kudos

05-31-2023 1:13:58 AM

by ros • New Contributor III

05-15-2023 11:39:42 PM

1521 Views
2 replies
3 kudos

Apache Hudi Table creation using hudi maven library

I installed hudi maven library org.apache.hudi:hudi-spark3.3-bundle_2.12:0.13.0 in Dbricks Runtime Ver : 12.2 LTS (includes Apache Spark 3.3.2, Scala 2.12) with spark config :spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCat...

Data Engineering

1521 Views
2 replies
3 kudos

05-15-2023 11:39:42 PM

View Replies

Latest Reply

ros
New Contributor III

05-31-2023 12:51:14 AM

3 kudos

@Shanmugavel Chandrakasu %sql create table hudi_cow_pt_tbl ( id bigint, name string, ts bigint, dt string, hh string ) using hudi tblproperties ( type = 'cow', primaryKey = 'id', preCombineField = 'ts' ) partitioned by (dt, hh) location '/mnt/data/h...

3 kudos

05-31-2023 12:51:14 AM

1 More Replies

by Anonymous • Not applicable

05-31-2023 12:28:01 AM

395 Views
0 replies
2 kudos

Hello Everyone, I am thrilled to announce that we have our 6th winner for the raffle contest -@Bolanle Adesanya . Please join me in congratulating h...

Hello Everyone,I am thrilled to announce that we have our 6th winner for the raffle contest -@Bolanle Adesanya . Please join me in congratulating her on this remarkable achievement!Your dedication and hard work have paid off, and we are delighted t...

Data Engineering

395 Views
0 replies
2 kudos

05-31-2023 12:28:01 AM

by PawelK • New Contributor II

12-20-2022 3:15:47 AM

2233 Views
4 replies
1 kudos

Is it possible to create "Notification destinations"/"Alert destinations" through API or Pulumi/Terraform?

Hello, I'm looking for a way of defining notification destination using API or Pulumi/Terraform providers. However I cannot find it anywhere. Could you please help and advice if i'm missing something or it's not available at the moment?And If it's no...

Data Engineering

2233 Views
4 replies
1 kudos

12-20-2022 3:15:47 AM

View Replies

Latest Reply

JordanYaker
Contributor

05-30-2023 4:17:38 PM

1 kudos

This issue seems to point to the lack of a public API being the culprit behind the lack of a resource for Terraform.

1 kudos

05-30-2023 4:17:38 PM

3 More Replies

by Anonymous • Not applicable

05-30-2023 1:05:03 PM

4140 Views
0 replies
0 kudos

UNION ALL query for two tables, unmatching column names. Need Help.

Hi All,I am trying to do a UNION ALL command with two tables where one table has a few column names that do not match the other table so I am using AS for an alias to match the column names. I am still having trouble getting the two tables to UNION p...

Data Engineering

4140 Views
0 replies
0 kudos

05-30-2023 1:05:03 PM

by JordanYaker • Contributor

05-30-2023 11:47:28 AM

514 Views
0 replies
0 kudos

Integration options for Databricks Jobs and DataDog?

I know that there is already the Databricks (technically Spark) integration for DataDog. Unfortunately, that integration only covers the cluster execution itself and that means only Cluster Metrics and Spark Jobs and Tasks. I'm looking for somethin...

Data Engineering

514 Views
0 replies
0 kudos

05-30-2023 11:47:28 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Input File Path from Autoloader in Delta Live Tables

merge vs MERGE INTO

Search for Databricks Jobs By Name

I have completed the "Lakehouse Fundamentals" course and assessment and received the certificate instantly. How long will it take for the Ba...

Resolved! Getting error : Analysis Exception : olumn Is There a PO#17748 are ambiguous. It's probably because you joined several Datasets together, and some of these Datasets are the same. This column points to one of the Datasets but Spark.

Filtering delta table by CONCAT of a partition column and a non-partition one

Delta merge drops cached variables

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

Autoloader and deletion vectors (Predictive IO)

Ingest from FTP server doesn't work

Apache Hudi Table creation using hudi maven library

Hello Everyone, I am thrilled to announce that we have our 6th winner for the raffle contest -@Bolanle Adesanya . Please join me in congratulating h...

Is it possible to create "Notification destinations"/"Alert destinations" through API or Pulumi/Terraform?

UNION ALL query for two tables, unmatching column names. Need Help.

Integration options for Databricks Jobs and DataDog?

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...