cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Constantine
by Contributor III
  • 2978 Views
  • 2 replies
  • 3 kudos

Error when writing dataframe to s3 location using PySpark

I get an error when writing dataframe to s3 location Found invalid character(s) among " ,;{}()\n\t=" in the column names of yourI have gone through all the columns and none of them have any special characters. Any idea how to fix this?

  • 2978 Views
  • 2 replies
  • 3 kudos
Latest Reply
Emilie
New Contributor II
  • 3 kudos

I got this error when I was running a query given to me, and the author didn't have aliases on aggregates. Something like:sum(dollars_spent)needed an alias:sum(dollars_spent) as sum_dollars_spent

  • 3 kudos
1 More Replies
Reza
by New Contributor III
  • 2775 Views
  • 2 replies
  • 1 kudos

Can we order the widgets in Databricks?

I am trying to order the way that widgets are shown in Databricks, but I cannot. For example, I have two text widgets (start date and end date). Databricks shows "end_date" before "start_date" on top, as the default order is alphabetical. Obviously, ...

  • 2775 Views
  • 2 replies
  • 1 kudos
Latest Reply
Prabakar
Databricks Employee
  • 1 kudos

Hi @Reza Rajabi​ this is a known thing and we have a feature request to fix this. I do not have an ETA on when this feature will be available. So for now to avoid the widgets being in alphabetical order, you need to use the prefix like 1,2,3.. or A,B...

  • 1 kudos
1 More Replies
blakedwb
by New Contributor III
  • 6744 Views
  • 2 replies
  • 1 kudos

Resolved! How to Incorporate Historical Data in Delta Live Pipeline?

Now that delta live pipeline is GA we are looking to convert our existing processes to leverage it. One thing that remains unclear is how to populate new delta live tables with historical data? Currently we are looking to use CDC by leveraging create...

  • 6744 Views
  • 2 replies
  • 1 kudos
Latest Reply
blakedwb
New Contributor III
  • 1 kudos

@Kaniz Fatma​ Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze t...

  • 1 kudos
1 More Replies
Trung
by Contributor
  • 3174 Views
  • 2 replies
  • 3 kudos

can not start cluster for DB community version

please show me why I can not start the cluster via  DB community version. it show the error bellow:

image
  • 3174 Views
  • 2 replies
  • 3 kudos
Latest Reply
Prabakar
Databricks Employee
  • 3 kudos

hi @trung nguyen​ have you tried reducing the cluster size? With the community version, we have limitations. Further please share the cluster config. Normally such errors are generated when we reach/exceed the limit set on the cloud.

  • 3 kudos
1 More Replies
dtabass
by New Contributor III
  • 4043 Views
  • 3 replies
  • 0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

  • 4043 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Michael Carey​ Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

  • 0 kudos
2 More Replies
arda_123
by New Contributor III
  • 8558 Views
  • 3 replies
  • 4 kudos

Resolved! How to pragmatically "clear state & cell outputs" in a Databricks notebook?

I am building a dashboard and at the end of I am using a widget to reset it. At that time I want all the outputs to be removed. Is there a way to do this in python?

  • 8558 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You can build your dashboard in SQL experience instead. There is a nice API to manage them https://docs.databricks.com/sql/api/queries-dashboards.htmlRegarding "clear state and cell outputs" when you run notebook as a job state is not saved in the no...

  • 4 kudos
2 More Replies
StephanieAlba
by Databricks Employee
  • 3739 Views
  • 1 replies
  • 1 kudos

Resolved! How to find my workspace id?

 My Solutions Architect is asking for my workspaceID. I do not know where to look for it. All I see is the user settings menu.

  • 3739 Views
  • 1 replies
  • 1 kudos
Latest Reply
PeteStern
Databricks Employee
  • 1 kudos

The workspaceID is usually in the URL of the workspace. For example:https://myworkspace.com/?o=12345The workspace id in this case is 12345. You can also just share the URL with your solutions architect.

  • 1 kudos
auser85
by New Contributor III
  • 3528 Views
  • 2 replies
  • 1 kudos

How to reset the IDENTITY column count?

After accumulating many updates to a delta table,like,keyExample bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),my identity column values are in the hundreds of millions. Is there any way that I can reset this value through vacuumi...

  • 3528 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Andrew Fogarty​ Does @Werner Stinckens​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Thanks!

  • 1 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 5032 Views
  • 2 replies
  • 2 kudos

Resolved! Does Databricks have a maven repository to download the jars?

Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise. 

  • 5032 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro j...

  • 2 kudos
1 More Replies
Herkimer
by New Contributor II
  • 1584 Views
  • 0 replies
  • 0 kudos

intermittent connection error

I am running dbsqlcli in windows 10. I have put together the attached cmd file to pull the identity column data from a series of our tables into individual CSVs so I can upload then to a PostgreSQL DB to do a comparison of each table to those in the ...

  • 1584 Views
  • 0 replies
  • 0 kudos
sage5616
by Valued Contributor
  • 8969 Views
  • 2 replies
  • 3 kudos

Resolved! Running local python code with arguments in Databricks via dbx utility.

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason. Co...

  • 8969 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You can pass parameters using dbx launch --parametersIf you want to define it in the deployment template please try to follow exactly databricks API 2.1 schema https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreate (for examp...

  • 3 kudos
1 More Replies
ACK
by New Contributor II
  • 4647 Views
  • 2 replies
  • 2 kudos

Resolved! How do I pass kwargs to wheel method?

Hi,I have a method named main it takes **kwargs as a parameter. def main(**kwargs): parameterOne = kwargs["param-one"] parameterTwo = kwargs["param-two"] parameterThree = kwargs["param-optional-one"] if "param-optional-one" in kwargs else...

  • 4647 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

it is command-line parameters so it is like ---param-one=testyou can test it with ArgumentParserfrom argparse import ArgumentParser   parser = ArgumentParser() parser.add_argument("--param-one", dest="parameterOne")   args = parser.parse_args()

  • 2 kudos
1 More Replies
Will_Sullivan
by New Contributor
  • 1930 Views
  • 0 replies
  • 0 kudos

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...

  • 1930 Views
  • 0 replies
  • 0 kudos
bl12
by New Contributor II
  • 3883 Views
  • 2 replies
  • 2 kudos

Resolved! Any ways to power a Databricks SQL dashboard widget with a dynamic query?

Hi, I'm using Databricks SQL and I need to power the same widget in a dashboard with a dynamic query. Are there any recommended solutions for this? For more context, I'm building a feature that allows people to see the size of something. That size is...

  • 3883 Views
  • 2 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

I believe reDash isn't built that way within Databricks. It's still very limited in its capabilities. I've two solutions for you. I haven't tried any but see if it works for you:Use preset with DB SQL. A hack - read below:I'm assuming you have one wi...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels