cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

abaschkim
by New Contributor II
  • 859 Views
  • 4 replies
  • 0 kudos

Delta Lake table: large volume due to versioning

I have set up a Spark standalone cluster and use Spark Structured Streaming to write data from Kafka to multiple Delta Lake tables - simply stored in the file system. So there are multiple writes per second. After running the pipeline for a while, I ...

  • 859 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Kim Abasch​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
3 More Replies
KumarShiv
by New Contributor III
  • 2731 Views
  • 5 replies
  • 11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

DB_Issue
  • 2731 Views
  • 5 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

  • 11 kudos
4 More Replies
jwilliam
by Contributor
  • 1560 Views
  • 3 replies
  • 7 kudos

Resolved! Has Unity Catalog been available in Azure Gov Cloud?

We are using Databricks with Premium Tier in Azure Gov Cloud. We check the Data section but don't see any options to Create Metastore.

  • 1560 Views
  • 3 replies
  • 7 kudos
Latest Reply
Kaniz
Community Manager
  • 7 kudos

Hi @John William​ , We haven't heard from you on the last response from @Hubert Dudek​ and @Werner Stinckens​, and I was checking back to see if their suggestions helped you. Also, Please don't forget to click on the "Select As Best" button whenever ...

  • 7 kudos
2 More Replies
Constantine
by Contributor III
  • 1165 Views
  • 4 replies
  • 3 kudos

Error when writing dataframe to s3 location using PySpark

I get an error when writing dataframe to s3 location Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourI have gone through all the columns and none of them have any special characters. Any idea how to fix this?

  • 1165 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @John Constantine​, We haven’t heard from you on the last response from @Emilie Myth​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Other...

  • 3 kudos
3 More Replies
Joe_C
by New Contributor
  • 638 Views
  • 3 replies
  • 0 kudos

From what I'm seeing Databricks doesn't have DECLARE function, how can I ... ?

How can I re-write this statement in a way that is compatible for Databricks?DECLARE @DATE_BEGIN_TEST AS DATE = DATEADD(DAY, - 60, GETDATE());DECLARE @DATE_END_TEST AS DATE = GETDATE();

  • 638 Views
  • 3 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Joseph Collins​ , We haven’t heard from you on the last response from me, and I was checking back to see if you have a resolution yet. If you have any solution, please do share that same with the community as it can be helpful to others. Otherwis...

  • 0 kudos
2 More Replies
Reza
by New Contributor III
  • 1117 Views
  • 3 replies
  • 1 kudos

Can we order the widgets in Databricks?

I am trying to order the way that widgets are shown in Databricks, but I cannot. For example, I have two text widgets (start date and end date). Databricks shows "end_date" before "start_date" on top, as the default order is alphabetical. Obviously, ...

  • 1117 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Reza Rajabi​ , We haven’t heard from you on the last response from @Prabakar Ammeappin​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpfu...

  • 1 kudos
2 More Replies
Blake
by New Contributor III
  • 1603 Views
  • 4 replies
  • 1 kudos

Resolved! How to Incorporate Historical Data in Delta Live Pipeline?

Now that delta live pipeline is GA we are looking to convert our existing processes to leverage it. One thing that remains unclear is how to populate new delta live tables with historical data? Currently we are looking to use CDC by leveraging create...

  • 1603 Views
  • 4 replies
  • 1 kudos
Latest Reply
Blake
New Contributor III
  • 1 kudos

@Kaniz Fatma​ Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze t...

  • 1 kudos
3 More Replies
Trung
by Contributor
  • 1299 Views
  • 4 replies
  • 7 kudos

can not start cluster for DB community version

please show me why I can not start the cluster via  DB community version. it show the error bellow:

image
  • 1299 Views
  • 4 replies
  • 7 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 7 kudos

hi @trung nguyen​ have you tried reducing the cluster size? With the community version, we have limitations. Further please share the cluster config. Normally such errors are generated when we reach/exceed the limit set on the cloud.

  • 7 kudos
3 More Replies
dtabass
by New Contributor III
  • 1951 Views
  • 3 replies
  • 0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

  • 1951 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Michael Carey​ Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

  • 0 kudos
2 More Replies
arda_123
by New Contributor III
  • 4382 Views
  • 3 replies
  • 4 kudos

Resolved! How to pragmatically "clear state & cell outputs" in a Databricks notebook?

I am building a dashboard and at the end of I am using a widget to reset it. At that time I want all the outputs to be removed. Is there a way to do this in python?

  • 4382 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You can build your dashboard in SQL experience instead. There is a nice API to manage them https://docs.databricks.com/sql/api/queries-dashboards.htmlRegarding "clear state and cell outputs" when you run notebook as a job state is not saved in the no...

  • 4 kudos
2 More Replies
StephanieRivera
by Valued Contributor II
  • 916 Views
  • 1 replies
  • 1 kudos

Resolved! How to find my workspace id?

 My Solutions Architect is asking for my workspaceID. I do not know where to look for it. All I see is the user settings menu.

  • 916 Views
  • 1 replies
  • 1 kudos
Latest Reply
PeteStern
New Contributor III
  • 1 kudos

The workspaceID is usually in the URL of the workspace. For example:https://myworkspace.com/?o=12345The workspace id in this case is 12345. You can also just share the URL with your solutions architect.

  • 1 kudos
auser85
by New Contributor III
  • 1284 Views
  • 2 replies
  • 1 kudos

How to reset the IDENTITY column count?

After accumulating many updates to a delta table,like,keyExample bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),my identity column values are in the hundreds of millions. Is there any way that I can reset this value through vacuumi...

  • 1284 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Andrew Fogarty​ Does @Werner Stinckens​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Thanks!

  • 1 kudos
1 More Replies
User16869510359
by Esteemed Contributor
  • 2455 Views
  • 2 replies
  • 2 kudos

Resolved! Does Databricks have a maven repository to download the jars?

Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise. 

  • 2455 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro j...

  • 2 kudos
1 More Replies
Herkimer
by New Contributor II
  • 563 Views
  • 0 replies
  • 0 kudos

intermittent connection error

I am running dbsqlcli in windows 10. I have put together the attached cmd file to pull the identity column data from a series of our tables into individual CSVs so I can upload then to a PostgreSQL DB to do a comparison of each table to those in the ...

  • 563 Views
  • 0 replies
  • 0 kudos
Labels
Top Kudoed Authors