Data Engineering

Forum Posts

Sorted by:

by Jeff_Luecht • New Contributor II

11-07-2021 10:32:49 AM

4112 Views
1 replies
2 kudos

Resarting existing community edition clusters

I am new to Databricks community edition. I was following the quckstart guide and running through basic cluster management - create, start, etc. For whatever reason, I cannot restart an e3xisting cluster. There is nothing in the cluster event logs or...

Data Engineering

4112 Views
1 replies
2 kudos

11-07-2021 10:32:49 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

11-07-2021 10:48:28 AM

2 kudos

Community free edition is quite limited so it can be the reason.

2 kudos

11-07-2021 10:48:28 AM

by Anonymous • Not applicable

11-06-2021 1:04:22 AM

887 Views
0 replies
0 kudos

spacecoastdaily.com

This Vigor Now male improvement pill contains still up in the air trimmings that together work on working on your overall prosperity by boosting the levels and production of testosterone in your body. Such extended testosterone creation can certainly...

Data Engineering

887 Views
0 replies
0 kudos

11-06-2021 1:04:22 AM

by Daniel • New Contributor III

11-03-2021 2:44:57 PM

13868 Views
11 replies
6 kudos

Resolved! Autocomplete parentheses, quotation marks, brackets and square stopped working

Hello guys, can someone help me?Autocomplete parentheses, quotation marks, brackets and square stopped working in python notebooks.How can I fix this?Daniel

Data Engineering

13868 Views
11 replies
6 kudos

11-03-2021 2:44:57 PM

View Replies

Latest Reply

Daniel
New Contributor III

11-05-2021 6:09:01 AM

6 kudos

@Piper Wilson , @Werner Stinckens Thank you so much for your help.I made the suggestion of the @Jose Gonzalez and now it works.

6 kudos

11-05-2021 6:09:01 AM

10 More Replies

by Constantine • Contributor III

11-04-2021 11:36:30 AM

3670 Views
2 replies
4 kudos

Resolved! Generating Spark SQL query using Python

I have a Spark SQL notebook on DB where I have a sql query likeSELECT * FROM table_name WHERE condition_1 = 'fname' OR condition_1 = 'lname' OR condition_1 = 'mname' AND condition_2 = 'apple' AND condition_3 ='orange'There are a lot ...

Data Engineering

3670 Views
2 replies
4 kudos

11-04-2021 11:36:30 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

11-04-2021 5:16:45 PM

4 kudos

Hi @John Constantine ,I think you can also use arrays_overlap() for your OR statements docs here

4 kudos

11-04-2021 5:16:45 PM

1 More Replies

by Braxx • Contributor II

11-04-2021 2:12:27 AM

3523 Views
4 replies
5 kudos

Resolved! Conditionally create a dataframe

I would like to implement a simple logic:if Df1 is empty return Df2 else newDf = Df1.union(Df2)May happened that Df1 is empty and the output is simply: []. In that case I do not need union.I have it like this but getting error when creating datafra...

Data Engineering

3523 Views
4 replies
5 kudos

11-04-2021 2:12:27 AM

View Replies

Latest Reply

cconnell
Contributor II

11-04-2021 6:12:20 AM

5 kudos

Also try df.head(1).isEmpty

5 kudos

11-04-2021 6:12:20 AM

3 More Replies

by Vaibhav1000 • New Contributor II

11-03-2021 11:09:58 PM

5325 Views
2 replies
1 kudos

Resolved! How does databricks optimized auto-scaling behave when scaling-out is failing (Eg: Insufficient resources on AWS side)?

Data Engineering

5325 Views
2 replies
1 kudos

11-03-2021 11:09:58 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-04-2021 4:37:47 AM

1 kudos

@Vaibhav Gour , It kinda depends on the case:if there are no workers available when your job starts, you get an error. As the cluster is unable to start so code cannot be executed. But this is not an autoscale issue.If you need to scale up, but for ...

1 kudos

11-04-2021 4:37:47 AM

1 More Replies

by Braxx • Contributor II

10-17-2021 11:52:14 AM

10772 Views
6 replies
4 kudos

Resolved! Object of type bool_ is not JSON serializable

I am doing a convertion of a data frame to nested dict/json. One of the column called "Problematic__c" is boolean type.For some reason json does not accept this data type retriving error: "Object of type bool_ is not JSON serializable" I need this as...

Data Engineering

10772 Views
6 replies
4 kudos

10-17-2021 11:52:14 AM

View Replies

Latest Reply

Braxx
Contributor II

10-22-2021 1:16:15 AM

4 kudos

Thanks Dan, that make sens!

4 kudos

10-22-2021 1:16:15 AM

5 More Replies

by Manoj • Contributor II

10-28-2021 3:17:27 PM

11454 Views
4 replies
8 kudos

Resolved! Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Data Engineering

11454 Views
4 replies
8 kudos

10-28-2021 3:17:27 PM

View Replies

Latest Reply

BilalAslamDbrx
Databricks Employee

11-03-2021 6:48:48 AM

8 kudos

@Manoj Kumar Rayalla DBSQL currently limits execution to 10 concurrent queries per cluster so there could be some queuing with 30 concurrent queries. You may want to turn on multi-cluster load balancing to horizontally scale with 1 more cluster for...

8 kudos

11-03-2021 6:48:48 AM

3 More Replies

by Nasreddin • New Contributor

11-02-2021 1:20:19 PM

6905 Views
0 replies
0 kudos

ColumnTransformer not fitted after sklearn Pipeline loaded from Mlflow

I am building a machine learning model using sklearn Pipeline which includes a ColumnTransformer as a preprocessor before the actual model. Below is the code how the pipeline is created.transformers = [] num_pipe = Pipeline(steps=[ ('imputer', Si...

Data Engineering

6905 Views
0 replies
0 kudos

11-02-2021 1:20:19 PM

by Nick_Hughes • New Contributor III

11-02-2021 1:34:12 AM

2966 Views
3 replies
3 kudos

Is there an alerting API please?

Is there an alerting api so that alerts can be source controlled and automated, please ?https://docs.databricks.com/sql/user/alerts/index.html

Data Engineering

2966 Views
3 replies
3 kudos

11-02-2021 1:34:12 AM

View Replies

Latest Reply

Dan_Z
Databricks Employee

11-02-2021 10:30:42 AM

3 kudos

Hello @Nick Hughes , as of today we do not expose or document the API for these features. I think it will be a useful feature so I created an internal feature request for it (DB-I-4289). If you (or any future readers) want more information on this f...

3 kudos

11-02-2021 10:30:42 AM

2 More Replies

by William_Scardua • Valued Contributor

10-26-2021 11:42:15 AM

4548 Views
6 replies
2 kudos

How not to reprocess old files without delta ?

Hi guys,Look that case: Company ACME (hypothetical company)This company does not use delta, but uses open source Spark to process raw data for .parquet, we have a 'sales' process which consists of receiving every hour a new dataset (.csv) within th...

Data Engineering

4548 Views
6 replies
2 kudos

10-26-2021 11:42:15 AM

View Replies

Latest Reply

William_Scardua
Valued Contributor

11-01-2021 11:44:24 AM

2 kudos

Hi @Jose Gonzalez , I agree the best option is to use auto load, but some cases you don`t have the databricks plataform and don`t use delta, i this cases you need build a way to process the new raw files

2 kudos

11-01-2021 11:44:24 AM

5 More Replies

by kaslan • New Contributor II

10-28-2021 11:46:45 PM

10176 Views
5 replies
0 kudos

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want to filter them out, preferably in the stream itself rather than using a filter operation.A...

Data Engineering

10176 Views
5 replies
0 kudos

10-28-2021 11:46:45 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-29-2021 1:29:39 AM

0 kudos

According to the docs you linked, the glob filter on input-path only works on directories, not on the files themselves.So if you want to filter on certain files in the concerning dirs, you can include an additional filter through the pathGlobFilter o...

0 kudos

10-29-2021 1:29:39 AM

4 More Replies

by HamzaJosh • New Contributor II

10-27-2021 3:27:38 PM

17192 Views
6 replies
3 kudos

I want to use databricks workers to run a function in parallel on the worker nodes

I have a function making api calls. I want to run this function in parallel so I can use the workers in databricks clusters to run it in parallel. I have tried with ThreadPoolExecutor() as executor: results = executor.map(getspeeddata, alist)to run m...

Data Engineering

17192 Views
6 replies
3 kudos

10-27-2021 3:27:38 PM

View Replies

Latest Reply

HamzaJosh
New Contributor II

11-01-2021 6:49:53 AM

3 kudos

You guys are not getting the point, I am making API calls in a function and want to store the results in a dataframe. I want multiple processes to run this task in parallel. How do I create a UDF and use it in a dataframe when the task is calling an ...

3 kudos

11-01-2021 6:49:53 AM

5 More Replies

by sarosh • New Contributor

09-27-2021 1:36:38 PM

19582 Views
2 replies
1 kudos

ModuleNotFoundError / SerializationError when executing over databricks-connect

I am running into the following error when I run a model fitting process over databricks-connect.It looks like worker nodes are unable to access modules from the project's parent directory. Note that the program runs successfully up to this point; n...

Data Engineering

19582 Views
2 replies
1 kudos

09-27-2021 1:36:38 PM

View Replies

Latest Reply

Manjunath
Databricks Employee

11-02-2021 12:01:43 AM

1 kudos

@Sarosh Ahmad , Could you try adding the zip of the module to the addPyFile like belowspark.sparkContext.addPyFile("src.zip")

1 kudos

11-02-2021 12:01:43 AM

1 More Replies

by Tankala_Harika • New Contributor II

11-01-2021 11:50:29 AM

828 Views
0 replies
0 kudos

Hi juliet Wu I have completed my databricks apache spark associate developer exam on 7/10/2021 after subsequent completion of my exam I got my badge t...

Hi juliet Wu I have completed my databricks apache spark associate developer exam on 7/10/2021 after subsequent completion of my exam I got my badge to my Webaccesor Mail immediately after 1 day of exam which is on 8/10/2021.but I didn't received my...

Data Engineering

828 Views
0 replies
0 kudos

11-01-2021 11:50:29 AM

Databricks Community

Forum Posts

Resarting existing community edition clusters

spacecoastdaily.com

Resolved! Autocomplete parentheses, quotation marks, brackets and square stopped working

Resolved! Generating Spark SQL query using Python

Resolved! Conditionally create a dataframe

Resolved! How does databricks optimized auto-scaling behave when scaling-out is failing (Eg: Insufficient resources on AWS side)?

Resolved! Object of type bool_ is not JSON serializable

Resolved! Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

ColumnTransformer not fitted after sklearn Pipeline loaded from Mlflow

Is there an alerting API please?

How not to reprocess old files without delta ?

How to filter files in Databricks Autoloader stream

I want to use databricks workers to run a function in parallel on the worker nodes

ModuleNotFoundError / SerializationError when executing over databricks-connect

Hi juliet Wu I have completed my databricks apache spark associate developer exam on 7/10/2021 after subsequent completion of my exam I got my badge t...

Join Us as a Local Community Builder!

how to avoid extra column after retry upon Unknown...

user standard serverless with asset bundle on Azur...

ONLY PNG format is available for databricks dashbo...

How to create a Unity Catalog Connection to SQL Se...

remote_query() is not working