cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ramya
by New Contributor III
  • 18733 Views
  • 4 replies
  • 3 kudos

Resolved! Databricks Rest API

Hi, I am having an issue accessing data bricks API 2.0/workspace/mkdirs through python. I am using the below azure method to generate the access token. I am not sure why I am getting 404 any suggestions?token_credential = DefaultAzureCredential()sc...

  • 18733 Views
  • 4 replies
  • 3 kudos
Latest Reply
Ramya
New Contributor III
  • 3 kudos

Yes that is correct!. It worked. Thanks

  • 3 kudos
3 More Replies
Dineshkumar_Raj
by New Contributor
  • 2922 Views
  • 2 replies
  • 1 kudos

why the job running time and command execution time not matching in databricks

I have a azure databricks job and it's triggered via ADF using a API call. I want see why the job has been taking n minutes to complete the tasks. When the job execution results, The job execution time says 15 mins and the individual cells/commands d...

  • 2922 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @DineshKumar​ Does @Prabakar Ammeappin​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Cheers!

  • 1 kudos
1 More Replies
abaschkim
by New Contributor II
  • 2668 Views
  • 4 replies
  • 0 kudos

Delta Lake table: large volume due to versioning

I have set up a Spark standalone cluster and use Spark Structured Streaming to write data from Kafka to multiple Delta Lake tables - simply stored in the file system. So there are multiple writes per second. After running the pipeline for a while, I ...

  • 2668 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Kim Abasch​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you....

  • 0 kudos
3 More Replies
KumarShiv
by New Contributor III
  • 5538 Views
  • 5 replies
  • 11 kudos

Resolved! Databricks Issue:- assertion failed: Invalid shuffle partition specs:

I hv a complex script which consuming more then 100GB data and have some aggregation on it and in the end I am simply try simply write/display data from Data frame. Then i am getting issue (assertion failed: Invalid shuffle partition specs: ).Pls hel...

DB_Issue
  • 5538 Views
  • 5 replies
  • 11 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 11 kudos

Please use display(df_FinalAction)Spark is lazy evaluated but "display" not, so you can debug by displaying each dataframe at the end of each cell.

  • 11 kudos
4 More Replies
Constantine
by Contributor III
  • 2416 Views
  • 2 replies
  • 3 kudos

Error when writing dataframe to s3 location using PySpark

I get an error when writing dataframe to s3 location Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourI have gone through all the columns and none of them have any special characters. Any idea how to fix this?

  • 2416 Views
  • 2 replies
  • 3 kudos
Latest Reply
Emilie
New Contributor II
  • 3 kudos

I got this error when I was running a query given to me, and the author didn't have aliases on aggregates. Something like:sum(dollars_spent)needed an alias:sum(dollars_spent) as sum_dollars_spent

  • 3 kudos
1 More Replies
Reza
by New Contributor III
  • 2264 Views
  • 2 replies
  • 1 kudos

Can we order the widgets in Databricks?

I am trying to order the way that widgets are shown in Databricks, but I cannot. For example, I have two text widgets (start date and end date). Databricks shows "end_date" before "start_date" on top, as the default order is alphabetical. Obviously, ...

  • 2264 Views
  • 2 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Hi @Reza Rajabi​ this is a known thing and we have a feature request to fix this. I do not have an ETA on when this feature will be available. So for now to avoid the widgets being in alphabetical order, you need to use the prefix like 1,2,3.. or A,B...

  • 1 kudos
1 More Replies
blakedwb
by New Contributor III
  • 5982 Views
  • 2 replies
  • 1 kudos

Resolved! How to Incorporate Historical Data in Delta Live Pipeline?

Now that delta live pipeline is GA we are looking to convert our existing processes to leverage it. One thing that remains unclear is how to populate new delta live tables with historical data? Currently we are looking to use CDC by leveraging create...

  • 5982 Views
  • 2 replies
  • 1 kudos
Latest Reply
blakedwb
New Contributor III
  • 1 kudos

@Kaniz Fatma​ Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze t...

  • 1 kudos
1 More Replies
Trung
by Contributor
  • 2697 Views
  • 2 replies
  • 3 kudos

can not start cluster for DB community version

please show me why I can not start the cluster via  DB community version. it show the error bellow:

image
  • 2697 Views
  • 2 replies
  • 3 kudos
Latest Reply
Prabakar
Databricks Employee
  • 3 kudos

hi @trung nguyen​ have you tried reducing the cluster size? With the community version, we have limitations. Further please share the cluster config. Normally such errors are generated when we reach/exceed the limit set on the cloud.

  • 3 kudos
1 More Replies
dtabass
by New Contributor III
  • 3313 Views
  • 3 replies
  • 0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

  • 3313 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Michael Carey​ Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

  • 0 kudos
2 More Replies
arda_123
by New Contributor III
  • 7267 Views
  • 3 replies
  • 4 kudos

Resolved! How to pragmatically "clear state & cell outputs" in a Databricks notebook?

I am building a dashboard and at the end of I am using a widget to reset it. At that time I want all the outputs to be removed. Is there a way to do this in python?

  • 7267 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You can build your dashboard in SQL experience instead. There is a nice API to manage them https://docs.databricks.com/sql/api/queries-dashboards.htmlRegarding "clear state and cell outputs" when you run notebook as a job state is not saved in the no...

  • 4 kudos
2 More Replies
StephanieAlba
by Databricks Employee
  • 2597 Views
  • 1 replies
  • 1 kudos

Resolved! How to find my workspace id?

 My Solutions Architect is asking for my workspaceID. I do not know where to look for it. All I see is the user settings menu.

  • 2597 Views
  • 1 replies
  • 1 kudos
Latest Reply
PeteStern
Databricks Employee
  • 1 kudos

The workspaceID is usually in the URL of the workspace. For example:https://myworkspace.com/?o=12345The workspace id in this case is 12345. You can also just share the URL with your solutions architect.

  • 1 kudos
auser85
by New Contributor III
  • 2743 Views
  • 2 replies
  • 1 kudos

How to reset the IDENTITY column count?

After accumulating many updates to a delta table,like,keyExample bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),my identity column values are in the hundreds of millions. Is there any way that I can reset this value through vacuumi...

  • 2743 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Andrew Fogarty​ Does @Werner Stinckens​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Thanks!

  • 1 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 4219 Views
  • 2 replies
  • 2 kudos

Resolved! Does Databricks have a maven repository to download the jars?

Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise. 

  • 4219 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro j...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels