cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

auser85
by New Contributor III
  • 3157 Views
  • 2 replies
  • 1 kudos

How to reset the IDENTITY column count?

After accumulating many updates to a delta table,like,keyExample bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),my identity column values are in the hundreds of millions. Is there any way that I can reset this value through vacuumi...

  • 3157 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Andrew Fogarty​ Does @Werner Stinckens​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. Thanks!

  • 1 kudos
1 More Replies
brickster_2018
by Databricks Employee
  • 4595 Views
  • 2 replies
  • 2 kudos

Resolved! Does Databricks have a maven repository to download the jars?

Using OSS jars is causing classpath issues always when running the job on Databricks. The same job works fine on EMR/on-premise. 

  • 4595 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

I following the https://docs.databricks.com/dev-tools/databricks-connect.html#intellij-scala-or-java to obtain spark-avro jar since databricks have it's custom from_avro method to use with kafka schema registry, But i am not able to find spark-avro j...

  • 2 kudos
1 More Replies
Herkimer
by New Contributor II
  • 1397 Views
  • 0 replies
  • 0 kudos

intermittent connection error

I am running dbsqlcli in windows 10. I have put together the attached cmd file to pull the identity column data from a series of our tables into individual CSVs so I can upload then to a PostgreSQL DB to do a comparison of each table to those in the ...

  • 1397 Views
  • 0 replies
  • 0 kudos
sage5616
by Valued Contributor
  • 8271 Views
  • 2 replies
  • 3 kudos

Resolved! Running local python code with arguments in Databricks via dbx utility.

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason. Co...

  • 8271 Views
  • 2 replies
  • 3 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 3 kudos

You can pass parameters using dbx launch --parametersIf you want to define it in the deployment template please try to follow exactly databricks API 2.1 schema https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreate (for examp...

  • 3 kudos
1 More Replies
ACK
by New Contributor II
  • 4067 Views
  • 2 replies
  • 2 kudos

Resolved! How do I pass kwargs to wheel method?

Hi,I have a method named main it takes **kwargs as a parameter. def main(**kwargs): parameterOne = kwargs["param-one"] parameterTwo = kwargs["param-two"] parameterThree = kwargs["param-optional-one"] if "param-optional-one" in kwargs else...

  • 4067 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

it is command-line parameters so it is like ---param-one=testyou can test it with ArgumentParserfrom argparse import ArgumentParser   parser = ArgumentParser() parser.add_argument("--param-one", dest="parameterOne")   args = parser.parse_args()

  • 2 kudos
1 More Replies
Will_Sullivan
by New Contributor
  • 1741 Views
  • 0 replies
  • 0 kudos

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...

  • 1741 Views
  • 0 replies
  • 0 kudos
bl12
by New Contributor II
  • 3558 Views
  • 2 replies
  • 2 kudos

Resolved! Any ways to power a Databricks SQL dashboard widget with a dynamic query?

Hi, I'm using Databricks SQL and I need to power the same widget in a dashboard with a dynamic query. Are there any recommended solutions for this? For more context, I'm building a feature that allows people to see the size of something. That size is...

  • 3558 Views
  • 2 replies
  • 2 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 2 kudos

I believe reDash isn't built that way within Databricks. It's still very limited in its capabilities. I've two solutions for you. I haven't tried any but see if it works for you:Use preset with DB SQL. A hack - read below:I'm assuming you have one wi...

  • 2 kudos
1 More Replies
Krish-685291
by New Contributor III
  • 2325 Views
  • 1 replies
  • 0 kudos

Which is the recommended way to write the data back to the delta lake?

Hi,I wanted to understand whether my approach to deal with delta lake is correct or not? 1. First time I create a delta lake using the following command.   -> df_json.write.mode('overwrite').format('delta').save(delta_silver + json_file_path )  2. I ...

image
  • 2325 Views
  • 1 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Krishna Puthran​ Hope everything is going great!Does @Kaniz Fatma​'s answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you further.We'd love to hear from you.Cheers!

  • 0 kudos
devashishraverk
by New Contributor II
  • 2561 Views
  • 2 replies
  • 2 kudos

Not able to create SQL Endpoint in Databricks SQL (Databricks 14-day free trial)

Hi,I am not able to create SQL Endpoint getting below error, I have selected Cluster size as 2X-Small on Azure platform:Clusters are failing to launch. Cluster launch will be retried. Details for the latest failure: Error: Error code: PublicIPCountLi...

  • 2561 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey there @Devashish Raverkar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

  • 2 kudos
1 More Replies
Direo
by Contributor II
  • 11237 Views
  • 3 replies
  • 2 kudos

Default indentation for Python has changed after migration to the new workspace

In our old workspace default identation was 2 spaces. In our new one it has changed to 4 spaces. Of course you can manually change it back to 2 spaces as we used to have, but it does not work. Does anyone know how to solve this issue?

  • 11237 Views
  • 3 replies
  • 2 kudos
Latest Reply
ranged_coop
Valued Contributor II
  • 2 kudos

You do have that option of Settings --> User Settings (Admin Settings ? not sure - I don't have admin access) --> Notebook Settings --> Default indentation for Python cells (in spaces)This will change the indentation for newer cells, but existing one...

  • 2 kudos
2 More Replies
Michael_Galli
by Contributor III
  • 4175 Views
  • 4 replies
  • 2 kudos

Resolved! Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?

When writing unit tests with unittest / pytest in PySpark, reading mockup datasources with built-in datatypes like csv, json (spark.read.format("json")) works just fine.But when reading XML´s with spark.read.format("com.databricks.spark.xml") in the ...

  • 4175 Views
  • 4 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 2 kudos

Please install spark-xml from Maven. As it is from Maven you need to install it for cluster which you are using in cluster settings (alternatively using API or CLI)https://mvnrepository.com/artifact/com.databricks/spark-xml

  • 2 kudos
3 More Replies
mj2022
by New Contributor III
  • 3305 Views
  • 2 replies
  • 2 kudos

Spark Streaming with SASL_SSL Kafka throwing java.nio.file.NoSuchFileException: dbfs:/mnt/**/kafka.client.truststore.imported.jks

I testing Spark Streaming working withSASL_SSL enabled kafka broker in a notebook.as per this guide https://docs.databricks.com/spark/latest/structured-streaming/kafka.htmli have copied jsk files in an s3 bucket and mounted it in dbfs.In notebook wh...

  • 3305 Views
  • 2 replies
  • 2 kudos
Latest Reply
mj2022
New Contributor III
  • 2 kudos

Thanks..Yes '/dbfs/mnt/xxxx/kafka.client.truststore.imported.jks'  path worked. Also other workaround we got it working, is copy the file from s3 to filesystem using init script and use filepath.

  • 2 kudos
1 More Replies
whatthespark
by New Contributor II
  • 6167 Views
  • 4 replies
  • 1 kudos

Inconsistent duplicated row with Spark (Databricks on MS Azure)

I'm having a weird behavior with Apache Spark, which I run in a Python Notebook on Azure Databricks. I have a dataframe with some data, with 2 columns of interest: name and ftimeI found that I sometime have duplicated values, sometime not, depending ...

  • 6167 Views
  • 4 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

I would like to see how you create the df dataframe.In pyspark you can get weird results if you do not clear state, or when you reuse dataframe names.

  • 1 kudos
3 More Replies
jwilliam
by Contributor
  • 3768 Views
  • 3 replies
  • 2 kudos

Resolved! Cannot use Web Terminal when creating cluster with Custom container.

I follow this guide to create cluster with custom container: https://docs.databricks.com/clusters/custom-containers.htmlHowever, when cluster created, I coudn't access to web terminal. It resulted in 502 bad gateway.

image
  • 3768 Views
  • 3 replies
  • 2 kudos
Latest Reply
Ravi
Databricks Employee
  • 2 kudos

This is a limitation at the moment. Enabling Docker Container Services disables web terminal.https://docs.databricks.com/clusters/web-terminal.html#limitations

  • 2 kudos
2 More Replies
pantelis_mare
by Contributor III
  • 7370 Views
  • 6 replies
  • 2 kudos

Resolved! Delta log statistics - timestamp type not working

Hello team!As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.This is the case for numerical types, and timestamp is sup...

max value image.png
  • 7370 Views
  • 6 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

are you sure the timestamp column is a valid spark-timestamp-type?

  • 2 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels