Data Engineering

Forum Posts

Sorted by:

by abaschkim • New Contributor II

05-30-2022 7:31:29 AM

859 Views
4 replies
0 kudos

Delta Lake table: large volume due to versioning

I have set up a Spark standalone cluster and use Spark Structured Streaming to write data from Kafka to multiple Delta Lake tables - simply stored in the file system. So there are multiple writes per second. After running the pipeline for a while, I ...

Data Engineering

859 Views
4 replies
0 kudos

05-30-2022 7:31:29 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-29-2022 9:38:25 AM

0 kudos

Hey there @Kim Abasch Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

0 kudos

07-29-2022 9:38:25 AM

3 More Replies

by KumarShiv • New Contributor III

07-27-2022 12:41:49 AM

2731 Views
5 replies
11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

Data Engineering

2731 Views
5 replies
11 kudos

07-27-2022 12:41:49 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-27-2022 6:10:11 AM

11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

11 kudos

07-27-2022 6:10:11 AM

4 More Replies

by jwilliam • Contributor

07-27-2022 4:56:36 AM

1560 Views
3 replies
7 kudos

Resolved! Has Unity Catalog been available in Azure Gov Cloud?

We are using Databricks with Premium Tier in Azure Gov Cloud. We check the Data section but don't see any options to Create Metastore.

Data Engineering

1560 Views
3 replies
7 kudos

07-27-2022 4:56:36 AM

View Replies

Latest Reply

Kaniz
Community Manager

07-28-2022 10:59:04 PM

7 kudos

Hi @John William , We haven't heard from you on the last response from @Hubert Dudek and @Werner Stinckens, and I was checking back to see if their suggestions helped you. Also, Please don't forget to click on the "Select As Best" button whenever ...

7 kudos

07-28-2022 10:59:04 PM

2 More Replies

by dumpstech • New Contributor

07-28-2022 10:09:23 PM

292 Views
0 replies
0 kudos

Dumpstech is the best platform, they provide best practice exam questions pdf, easy way to pass your exam in first attempt

Data Engineering

292 Views
0 replies
0 kudos

07-28-2022 10:09:23 PM

by Constantine • Contributor III

06-02-2022 3:18:34 PM

1165 Views
4 replies
3 kudos

Error when writing dataframe to s3 location using PySpark

I get an error when writing dataframe to s3 location Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourI have gone through all the columns and none of them have any special characters. Any idea how to fix this?

Data Engineering

1165 Views
4 replies
3 kudos

06-02-2022 3:18:34 PM

View Replies

Latest Reply

Kaniz
Community Manager

06-09-2022 1:18:51 AM

3 kudos

Hi @John Constantine, We haven’t heard from you on the last response from @Emilie Myth , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Other...

3 kudos

06-09-2022 1:18:51 AM

3 More Replies

by Joe_C • New Contributor

06-02-2022 10:34:03 AM

638 Views
3 replies
0 kudos

From what I'm seeing Databricks doesn't have DECLARE function, how can I ... ?

How can I re-write this statement in a way that is compatible for Databricks?DECLARE @DATE_BEGIN_TEST AS DATE = DATEADD(DAY, - 60, GETDATE());DECLARE @DATE_END_TEST AS DATE = GETDATE();

Data Engineering

638 Views
3 replies
0 kudos

06-02-2022 10:34:03 AM

View Replies

Latest Reply

Kaniz
Community Manager

06-09-2022 12:38:25 AM

0 kudos

Hi @Joseph Collins , We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please do share that same with the community as it can be helpful to others. Otherwis...

0 kudos

06-09-2022 12:38:25 AM

2 More Replies

by Reza • New Contributor III

06-02-2022 8:25:32 AM

1117 Views
3 replies
1 kudos

Can we order the widgets in Databricks?

I am trying to order the way that widgets are shown in Databricks, but I cannot. For example, I have two text widgets (start date and end date). Databricks shows "end_date" before "start_date" on top, as the default order is alphabetical. Obviously, ...

Data Engineering

1117 Views
3 replies
1 kudos

06-02-2022 8:25:32 AM

View Replies

Latest Reply

Kaniz
Community Manager

06-09-2022 12:49:23 AM

1 kudos

Hi @Reza Rajabi , We haven’t heard from you on the last response from @Prabakar Ammeappin , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpfu...

1 kudos

06-09-2022 12:49:23 AM

2 More Replies

by Blake • New Contributor III

06-01-2022 11:23:57 AM

1603 Views
4 replies
1 kudos

Resolved! How to Incorporate Historical Data in Delta Live Pipeline?

Now that delta live pipeline is GA we are looking to convert our existing processes to leverage it. One thing that remains unclear is how to populate new delta live tables with historical data? Currently we are looking to use CDC by leveraging create...

Data Engineering

1603 Views
4 replies
1 kudos

06-01-2022 11:23:57 AM

View Replies

Latest Reply

Blake
New Contributor III

06-15-2022 6:53:04 AM

1 kudos

@Kaniz Fatma Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze t...

1 kudos

06-15-2022 6:53:04 AM

3 More Replies

by Trung • Contributor

06-01-2022 2:59:08 AM

1299 Views
4 replies
7 kudos

can not start cluster for DB community version

please show me why I can not start the cluster via DB community version. it show the error bellow:

Data Engineering

1299 Views
4 replies
7 kudos

06-01-2022 2:59:08 AM

View Replies

Latest Reply

Prabakar
Esteemed Contributor III

06-01-2022 5:23:57 AM

7 kudos

hi @trung nguyen have you tried reducing the cluster size? With the community version, we have limitations. Further please share the cluster config. Normally such errors are generated when we reach/exceed the limit set on the cloud.

7 kudos

06-01-2022 5:23:57 AM

3 More Replies

by dtabass • New Contributor III

05-29-2022 4:51:15 PM

1951 Views
3 replies
0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

Data Engineering

1951 Views
3 replies
0 kudos

05-29-2022 4:51:15 PM

View Replies

Latest Reply

Anonymous
Not applicable

07-28-2022 10:32:22 AM

0 kudos

Hey there @Michael Carey Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

0 kudos

07-28-2022 10:32:22 AM

2 More Replies

by arda_123 • New Contributor III

07-08-2022 7:43:10 AM

4382 Views
3 replies
4 kudos

Resolved! How to pragmatically "clear state & cell outputs" in a Databricks notebook?

I am building a dashboard and at the end of I am using a widget to reset it. At that time I want all the outputs to be removed. Is there a way to do this in python?

Data Engineering

4382 Views
3 replies
4 kudos

07-08-2022 7:43:10 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-08-2022 10:08:57 AM

4 kudos

You can build your dashboard in SQL experience instead. There is a nice API to manage them https://docs.databricks.com/sql/api/queries-dashboards.htmlRegarding "clear state and cell outputs" when you run notebook as a job state is not saved in the no...

4 kudos

07-08-2022 10:08:57 AM

2 More Replies

by StephanieRivera • Valued Contributor II

07-27-2022 9:20:42 AM

916 Views
1 replies
1 kudos

Resolved! How to find my workspace id?

My Solutions Architect is asking for my workspaceID. I do not know where to look for it. All I see is the user settings menu.

Data Engineering

916 Views
1 replies
1 kudos

07-27-2022 9:20:42 AM

View Replies

Latest Reply

PeteStern
New Contributor III

07-28-2022 2:50:04 AM

1 kudos

The workspaceID is usually in the URL of the workspace. For example:https://myworkspace.com/?o=12345The workspace id in this case is 12345. You can also just share the URL with your solutions architect.

1 kudos

07-28-2022 2:50:04 AM

by auser85 • New Contributor III

05-26-2022 10:46:20 AM

1284 Views
2 replies
1 kudos

How to reset the IDENTITY column count?

After accumulating many updates to a delta table,like,keyExample bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),my identity column values are in the hundreds of millions. Is there any way that I can reset this value through vacuumi...

Data Engineering

1284 Views
2 replies
1 kudos

05-26-2022 10:46:20 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-27-2022 10:29:19 AM

1 kudos

Hey there @Andrew Fogarty Does @Werner Stinckens's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Thanks!

1 kudos

07-27-2022 10:29:19 AM

1 More Replies

by User16869510359 • Esteemed Contributor

06-23-2021 11:39:07 PM

2455 Views
2 replies
2 kudos

Resolved! Does Databricks have a maven repository to download the jars?

Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise.

Data Engineering

2455 Views
2 replies
2 kudos

06-23-2021 11:39:07 PM

View Replies

Latest Reply

mj2022
New Contributor III

07-27-2022 10:10:10 AM

2 kudos

I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro j...

2 kudos

07-27-2022 10:10:10 AM

1 More Replies

by Herkimer • New Contributor II

07-27-2022 9:46:52 AM

563 Views
0 replies
0 kudos

intermittent connection error

I am running dbsqlcli in windows 10. I have put together the attached cmd file to pull the identity column data from a series of our tables into individual CSVs so I can upload then to a PostgreSQL DB to do a comparison of each table to those in the ...

Data Engineering

563 Views
0 replies
0 kudos

07-27-2022 9:46:52 AM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Delta Lake table: large volume due to versioning

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

Resolved! Has Unity Catalog been available in Azure Gov Cloud?

Dumpstech is the best platform, they provide best practice exam questions pdf, easy way to pass your exam in first attempt

Error when writing dataframe to s3 location using PySpark

From what I'm seeing Databricks doesn't have DECLARE function, how can I ... ?

Can we order the widgets in Databricks?

Resolved! How to Incorporate Historical Data in Delta Live Pipeline?

can not start cluster for DB community version

How does one access/use SparkSQL functions like array_size?

Resolved! How to pragmatically "clear state & cell outputs" in a Databricks notebook?

Resolved! How to find my workspace id?

How to reset the IDENTITY column count?

Resolved! Does Databricks have a maven repository to download the jars?

intermittent connection error

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...