cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

blakedwb
by New Contributor III
  • 4758 Views
  • 4 replies
  • 1 kudos

Resolved! How to Incorporate Historical Data in Delta Live Pipeline?

Now that delta live pipeline is GA we are looking to convert our existing processes to leverage it. One thing that remains unclear is how to populate new delta live tables with historical data? Currently we are looking to use CDC by leveraging create...

  • 4758 Views
  • 4 replies
  • 1 kudos
Latest Reply
blakedwb
New Contributor III
  • 1 kudos

@Kaniz Fatma​ Hello, sorry for the delayed response. The guide does not answer how to incorporate existing delta tables that container historical data into a delta live pipeline. We ended up changing the source data to pull from the existing bronze t...

  • 1 kudos
3 More Replies
Trung
by Contributor
  • 1828 Views
  • 4 replies
  • 7 kudos

can not start cluster for DB community version

please show me why I can not start the cluster via  DB community version. it show the error bellow:

image
  • 1828 Views
  • 4 replies
  • 7 kudos
Latest Reply
Prabakar
Esteemed Contributor III
  • 7 kudos

hi @trung nguyen​ have you tried reducing the cluster size? With the community version, we have limitations. Further please share the cluster config. Normally such errors are generated when we reach/exceed the limit set on the cloud.

  • 7 kudos
3 More Replies
dtabass
by New Contributor III
  • 2386 Views
  • 3 replies
  • 0 kudos

How does one access/use SparkSQL functions like array_size?

The following doesn't work for me:%sql SELECT user_id, array_size(education) AS edu_cnt FROM users ORDER BY edu_cnt DESC LIMIT 10; I get an error saying: Error in SQL statement: AnalysisException: Undefined function: array_size. This function is nei...

  • 2386 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Michael Carey​ Hope everything is going great!We are glad to hear that you were able to find a solution to your question. Would you be happy to mark an answer as best so that other members can find the solution more quickly?Cheers!

  • 0 kudos
2 More Replies
arda_123
by New Contributor III
  • 5228 Views
  • 3 replies
  • 4 kudos

Resolved! How to pragmatically "clear state & cell outputs" in a Databricks notebook?

I am building a dashboard and at the end of I am using a widget to reset it. At that time I want all the outputs to be removed. Is there a way to do this in python?

  • 5228 Views
  • 3 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

You can build your dashboard in SQL experience instead. There is a nice API to manage them https://docs.databricks.com/sql/api/queries-dashboards.htmlRegarding "clear state and cell outputs" when you run notebook as a job state is not saved in the no...

  • 4 kudos
2 More Replies
StephanieRivera
by Valued Contributor II
  • 1378 Views
  • 1 replies
  • 1 kudos

Resolved! How to find my workspace id?

 My Solutions Architect is asking for my workspaceID. I do not know where to look for it. All I see is the user settings menu.

  • 1378 Views
  • 1 replies
  • 1 kudos
Latest Reply
PeteStern
New Contributor III
  • 1 kudos

The workspaceID is usually in the URL of the workspace. For example:https://myworkspace.com/?o=12345The workspace id in this case is 12345. You can also just share the URL with your solutions architect.

  • 1 kudos
auser85
by New Contributor III
  • 1707 Views
  • 2 replies
  • 1 kudos

How to reset the IDENTITY column count?

After accumulating many updates to a delta table,like,keyExample bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),my identity column values are in the hundreds of millions. Is there any way that I can reset this value through vacuumi...

  • 1707 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Andrew Fogarty​ Does @Werner Stinckens​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Thanks!

  • 1 kudos
1 More Replies
brickster_2018
by Esteemed Contributor
  • 3052 Views
  • 2 replies
  • 2 kudos

Resolved! Does Databricks have a maven repository to download the jars?

Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise. 

  • 3052 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro j...

  • 2 kudos
1 More Replies
Herkimer
by New Contributor II
  • 747 Views
  • 0 replies
  • 0 kudos

intermittent connection error

I am running dbsqlcli in windows 10. I have put together the attached cmd file to pull the identity column data from a series of our tables into individual CSVs so I can upload then to a PostgreSQL DB to do a comparison of each table to those in the ...

  • 747 Views
  • 0 replies
  • 0 kudos
sage5616
by Valued Contributor
  • 5301 Views
  • 2 replies
  • 3 kudos

Resolved! Running local python code with arguments in Databricks via dbx utility.

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason. Co...

  • 5301 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You can pass parameters using dbx launch --parametersIf you want to define it in the deployment template please try to follow exactly databricks API 2.1 schema https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreate (for examp...

  • 3 kudos
1 More Replies
ACK
by New Contributor II
  • 2442 Views
  • 2 replies
  • 2 kudos

Resolved! How do I pass kwargs to wheel method?

Hi,I have a method named main it takes **kwargs as a parameter. def main(**kwargs): parameterOne = kwargs["param-one"] parameterTwo = kwargs["param-two"] parameterThree = kwargs["param-optional-one"] if "param-optional-one" in kwargs else...

  • 2442 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

it is command-line parameters so it is like ---param-one=testyou can test it with ArgumentParserfrom argparse import ArgumentParser   parser = ArgumentParser() parser.add_argument("--param-one", dest="parameterOne")   args = parser.parse_args()

  • 2 kudos
1 More Replies
Will_Sullivan
by New Contributor
  • 1127 Views
  • 0 replies
  • 0 kudos

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...

  • 1127 Views
  • 0 replies
  • 0 kudos
bl12
by New Contributor II
  • 2002 Views
  • 2 replies
  • 2 kudos

Resolved! Any ways to power a Databricks SQL dashboard widget with a dynamic query?

Hi, I'm using Databricks SQL and I need to power the same widget in a dashboard with a dynamic query. Are there any recommended solutions for this? For more context, I'm building a feature that allows people to see the size of something. That size is...

  • 2002 Views
  • 2 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

I believe reDash isn't built that way within Databricks. It's still very limited in its capabilities. I've two solutions for you. I haven't tried any but see if it works for you:Use preset with DB SQL. A hack - read below:I'm assuming you have one wi...

  • 2 kudos
1 More Replies
Krish-685291
by New Contributor III
  • 1112 Views
  • 2 replies
  • 0 kudos

Which is the recommended way to write the data back to the delta lake?

Hi,I wanted to understand whether my approach to deal with delta lake is correct or not? 1. First time I create a delta lake using the following command.   -> df_json.write.mode('overwrite').format('delta').save(delta_silver + json_file_path )  2. I ...

image
  • 1112 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Krishna Puthran​ Hope everything is going great!Does @Kaniz Fatma​'s answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you further.We'd love to hear from you.Cheers!

  • 0 kudos
1 More Replies
devashishraverk
by New Contributor II
  • 1632 Views
  • 2 replies
  • 2 kudos

Not able to create SQL Endpoint in Databricks SQL (Databricks 14-day free trial)

Hi,I am not able to create SQL Endpoint getting below error, I have selected Cluster size as 2X-Small on Azure platform:Clusters are failing to launch. Cluster launch will be retried. Details for the latest failure: Error: Error code: PublicIPCountLi...

  • 1632 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Devashish Raverkar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

  • 2 kudos
1 More Replies
Direo
by Contributor
  • 6065 Views
  • 3 replies
  • 2 kudos

Default indentation for Python has changed after migration to the new workspace

In our old workspace default identation was 2 spaces. In our new one it has changed to 4 spaces. Of course you can manually change it back to 2 spaces as we used to have, but it does not work. Does anyone know how to solve this issue?

  • 6065 Views
  • 3 replies
  • 2 kudos
Latest Reply
ranged_coop
Valued Contributor II
  • 2 kudos

You do have that option of Settings --> User Settings (Admin Settings ? not sure - I don't have admin access) --> Notebook Settings --> Default indentation for Python cells (in spaces)This will change the indentation for newer cells, but existing one...

  • 2 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels