Data Engineering

Forum Posts

Sorted by:

by sage5616 • Valued Contributor

07-26-2022 10:50:30 AM

4520 Views
2 replies
3 kudos

Resolved! Running local python code with arguments in Databricks via dbx utility.

I am trying to execute a local PySpark script on a Databricks cluster via dbx utility to test how passing arguments to python works in Databricks when developing locally. However, the test arguments I am passing are not being read for some reason. Co...

Data Engineering

4520 Views
2 replies
3 kudos

07-26-2022 10:50:30 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-27-2022 3:20:47 AM

3 kudos

You can pass parameters using dbx launch --parametersIf you want to define it in the deployment template please try to follow exactly databricks API 2.1 schema https://docs.databricks.com/dev-tools/api/latest/jobs.html#operation/JobsCreate (for examp...

3 kudos

07-27-2022 3:20:47 AM

1 More Replies

by ACK • New Contributor II

07-27-2022 1:12:36 AM

1962 Views
2 replies
2 kudos

Resolved! How do I pass kwargs to wheel method?

Hi,I have a method named main it takes **kwargs as a parameter. def main(**kwargs): parameterOne = kwargs["param-one"] parameterTwo = kwargs["param-two"] parameterThree = kwargs["param-optional-one"] if "param-optional-one" in kwargs else...

Data Engineering

1962 Views
2 replies
2 kudos

07-27-2022 1:12:36 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-27-2022 3:41:17 AM

2 kudos

it is command-line parameters so it is like ---param-one=testyou can test it with ArgumentParserfrom argparse import ArgumentParser parser = ArgumentParser() parser.add_argument("--param-one", dest="parameterOne") args = parser.parse_args()

2 kudos

07-27-2022 3:41:17 AM

1 More Replies

by Will_Sullivan • New Contributor

07-26-2022 12:46:16 PM

917 Views
0 replies
0 kudos

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Any one know how to solve this error?Course: Data Engineering with Databricks, Notebook: DE 4.2 - Providing Options for External SourcesAttempts to fix: Detached and reattached my cluster and started it again.%run ../Includes/Classroom-Setup-4.2resul...

Data Engineering

917 Views
0 replies
0 kudos

07-26-2022 12:46:16 PM

by bl12 • New Contributor II

07-21-2022 11:06:35 AM

1581 Views
2 replies
2 kudos

Resolved! Any ways to power a Databricks SQL dashboard widget with a dynamic query?

Hi, I'm using Databricks SQL and I need to power the same widget in a dashboard with a dynamic query. Are there any recommended solutions for this? For more context, I'm building a feature that allows people to see the size of something. That size is...

Data Engineering

1581 Views
2 replies
2 kudos

07-21-2022 11:06:35 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

07-23-2022 1:30:29 AM

2 kudos

I believe reDash isn't built that way within Databricks. It's still very limited in its capabilities. I've two solutions for you. I haven't tried any but see if it works for you:Use preset with DB SQL. A hack - read below:I'm assuming you have one wi...

2 kudos

07-23-2022 1:30:29 AM

1 More Replies

by Krish-685291 • New Contributor III

05-26-2022 7:37:19 AM

725 Views
2 replies
0 kudos

Which is the recommended way to write the data back to the delta lake?

Hi,I wanted to understand whether my approach to deal with delta lake is correct or not? 1. First time I create a delta lake using the following command. -> df_json.write.mode('overwrite').format('delta').save(delta_silver + json_file_path ) 2. I ...

Data Engineering

725 Views
2 replies
0 kudos

05-26-2022 7:37:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-26-2022 9:16:36 AM

0 kudos

Hey there @Krishna Puthran Hope everything is going great!Does @Kaniz Fatma's answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you further.We'd love to hear from you.Cheers!

0 kudos

07-26-2022 9:16:36 AM

1 More Replies

by devashishraverk • New Contributor II

05-26-2022 7:07:37 AM

1342 Views
2 replies
2 kudos

Not able to create SQL Endpoint in Databricks SQL (Databricks 14-day free trial)

Hi,I am not able to create SQL Endpoint getting below error, I have selected Cluster size as 2X-Small on Azure platform:Clusters are failing to launch. Cluster launch will be retried. Details for the latest failure: Error: Error code: PublicIPCountLi...

Data Engineering

1342 Views
2 replies
2 kudos

05-26-2022 7:07:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

07-26-2022 9:09:08 AM

2 kudos

Hey there @Devashish Raverkar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear f...

2 kudos

07-26-2022 9:09:08 AM

1 More Replies

by Direo • Contributor

07-20-2022 2:34:23 AM

4076 Views
3 replies
2 kudos

Default indentation for Python has changed after migration to the new workspace

In our old workspace default identation was 2 spaces. In our new one it has changed to 4 spaces. Of course you can manually change it back to 2 spaces as we used to have, but it does not work. Does anyone know how to solve this issue?

Data Engineering

4076 Views
3 replies
2 kudos

07-20-2022 2:34:23 AM

View Replies

Latest Reply

ranged_coop
Valued Contributor II

07-26-2022 6:57:12 AM

2 kudos

You do have that option of Settings --> User Settings (Admin Settings ? not sure - I don't have admin access) --> Notebook Settings --> Default indentation for Python cells (in spaces)This will change the indentation for newer cells, but existing one...

2 kudos

07-26-2022 6:57:12 AM

2 More Replies

by Michael_Galli • Contributor II

07-26-2022 2:30:25 AM

2034 Views
4 replies
2 kudos

Resolved! Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?

When writing unit tests with unittest / pytest in PySpark, reading mockup datasources with built-in datatypes like csv, json (spark.read.format("json")) works just fine.But when reading XML´s with spark.read.format("com.databricks.spark.xml") in the ...

Data Engineering

2034 Views
4 replies
2 kudos

07-26-2022 2:30:25 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

07-26-2022 5:51:39 AM

2 kudos

Please install spark-xml from Maven. As it is from Maven you need to install it for cluster which you are using in cluster settings (alternatively using API or CLI)https://mvnrepository.com/artifact/com.databricks/spark-xml

2 kudos

07-26-2022 5:51:39 AM

3 More Replies

by mj2022 • New Contributor III

06-30-2022 5:51:43 AM

1547 Views
2 replies
2 kudos

Spark Streaming with SASL_SSL Kafka throwing java.nio.file.NoSuchFileException: dbfs:/mnt/**/kafka.client.truststore.imported.jks

I testing Spark Streaming working withSASL_SSL enabled kafka broker in a notebook.as per this guide https://docs.databricks.com/spark/latest/structured-streaming/kafka.htmli have copied jsk files in an s3 bucket and mounted it in dbfs.In notebook wh...

Data Engineering

1547 Views
2 replies
2 kudos

06-30-2022 5:51:43 AM

View Replies

Latest Reply

mj2022
New Contributor III

07-26-2022 5:48:10 AM

2 kudos

Thanks..Yes '/dbfs/mnt/xxxx/kafka.client.truststore.imported.jks' path worked. Also other workaround we got it working, is copy the file from s3 to filesystem using init script and use filepath.

2 kudos

07-26-2022 5:48:10 AM

1 More Replies

by whatthespark • New Contributor II

07-26-2022 1:41:24 AM

2153 Views
4 replies
1 kudos

Inconsistent duplicated row with Spark (Databricks on MS Azure)

I'm having a weird behavior with Apache Spark, which I run in a Python Notebook on Azure Databricks. I have a dataframe with some data, with 2 columns of interest: name and ftimeI found that I sometime have duplicated values, sometime not, depending ...

Data Engineering

2153 Views
4 replies
1 kudos

07-26-2022 1:41:24 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-26-2022 2:55:40 AM

1 kudos

I would like to see how you create the df dataframe.In pyspark you can get weird results if you do not clear state, or when you reuse dataframe names.

1 kudos

07-26-2022 2:55:40 AM

3 More Replies

by jwilliam • Contributor

07-26-2022 2:37:33 AM

1251 Views
3 replies
2 kudos

Resolved! Cannot use Web Terminal when creating cluster with Custom container.

I follow this guide to create cluster with custom container: https://docs.databricks.com/clusters/custom-containers.htmlHowever, when cluster created, I coudn't access to web terminal. It resulted in 502 bad gateway.

Data Engineering

1251 Views
3 replies
2 kudos

07-26-2022 2:37:33 AM

View Replies

Latest Reply

Ravi
Valued Contributor

07-26-2022 2:55:43 AM

2 kudos

This is a limitation at the moment. Enabling Docker Container Services disables web terminal.https://docs.databricks.com/clusters/web-terminal.html#limitations

2 kudos

07-26-2022 2:55:43 AM

2 More Replies

by pantelis_mare • Contributor III

07-25-2022 5:20:26 AM

2981 Views
6 replies
2 kudos

Resolved! Delta log statistics - timestamp type not working

Hello team!As per the documentation, I understand that the table statistics can be fetched through the delta log (eg min, max, count) in order to not read the underlying data of a delta table.This is the case for numerical types, and timestamp is sup...

Data Engineering

2981 Views
6 replies
2 kudos

07-25-2022 5:20:26 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

07-26-2022 12:32:26 AM

2 kudos

are you sure the timestamp column is a valid spark-timestamp-type?

2 kudos

07-26-2022 12:32:26 AM

5 More Replies

by niels • New Contributor III

05-02-2022 2:08:14 AM

1789 Views
5 replies
12 kudos

Resolved! Change cluster mid-pipeline

I have a notebook functioning as a pipeline, where multiple notebooks are chained together. The issue I'm facing is that some of the notebooks are spark-optimized, others aren't, and what I want is to use 1 cluster for the former and another for the ...

Data Engineering

1789 Views
5 replies
12 kudos

05-02-2022 2:08:14 AM

View Replies

Latest Reply

Kaniz
Community Manager

06-14-2022 8:40:19 AM

12 kudos

Hi @Niels Ota , We haven’t heard from you on the last response from @Prabakar Ammeappin , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others. Othe...

12 kudos

06-14-2022 8:40:19 AM

4 More Replies

by ThomasKastl • Contributor

08-11-2021 7:16:16 AM

3064 Views
5 replies
4 kudos

Calling Databricks API from Databricks notebook

A similar question has already been added, but the reply is very confusing to me. Basically, for automated jobs, I want to log the following information from inside a Python notebook that runs in the job: - What is the cluster configuration (most im...

Data Engineering

3064 Views
5 replies
4 kudos

08-11-2021 7:16:16 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

07-25-2022 2:06:39 PM

4 kudos

hi @Thomas Kastl,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

4 kudos

07-25-2022 2:06:39 PM

4 More Replies

by shawncao • New Contributor II

04-26-2022 1:48:18 PM

924 Views
3 replies
2 kudos

best practice of using data bricks API

Hello, I'm building a Databricks connector to allow users to issue command/SQL from a web app.In general, I think the REST API is okay to work with, though it's pretty tedious to write wrap code for each API call.[Q1]Is there an official (or semi-off...

Data Engineering

924 Views
3 replies
2 kudos

04-26-2022 1:48:18 PM

View Replies

Latest Reply

shawncao
New Contributor II

07-25-2022 3:17:57 PM

2 kudos

I don't know if I fully understand DBX, sounds like a job client to manage jobs and deployment and I don't see NodeJS support for this project yet. My question was about how to "stream" query results back from Databricks in a NodeJs application, curr...

2 kudos

07-25-2022 3:17:57 PM

2 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Running local python code with arguments in Databricks via dbx utility.

Resolved! How do I pass kwargs to wheel method?

How to solve Error in Databricks Academy course DE 4.2 & 4.3, run classroom-setup-4.2 error, "[SQLITE_ERROR] SQL error or missing database (no such table: users)"

Resolved! Any ways to power a Databricks SQL dashboard widget with a dynamic query?

Which is the recommended way to write the data back to the delta lake?

Not able to create SQL Endpoint in Databricks SQL (Databricks 14-day free trial)

Default indentation for Python has changed after migration to the new workspace

Resolved! Unittest in PySpark - how to read XML with Maven com.databricks.spark.xml ?

Spark Streaming with SASL_SSL Kafka throwing java.nio.file.NoSuchFileException: dbfs:/mnt/**/kafka.client.truststore.imported.jks

Inconsistent duplicated row with Spark (Databricks on MS Azure)

Resolved! Cannot use Web Terminal when creating cluster with Custom container.

Resolved! Delta log statistics - timestamp type not working

Resolved! Change cluster mid-pipeline

Calling Databricks API from Databricks notebook

best practice of using data bricks API

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...