Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Hubert-Dudek • Esteemed Contributor III

11-03-2021 9:26:02 AM

6418 Views
5 replies
17 kudos

Resolved! Optimize and Vacuum - which is the best order of operations?

Optimize -> VacuumorVacuum -> Optimize

Data Engineering

6418 Views
5 replies
17 kudos

11-03-2021 9:26:02 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

11-04-2021 5:17:41 AM

17 kudos

I optimize first as delta lake knows which files are relevant for the optimize. Like that I have my optimized data available faster. Then a vacuum. Seemed logical to me, but I might be wrong. Never actually thought about it

17 kudos

11-04-2021 5:17:41 AM

4 More Replies

by Constantine • Contributor III

11-04-2021 11:36:30 AM

1067 Views
2 replies
4 kudos

Resolved! Generating Spark SQL query using Python

I have a Spark SQL notebook on DB where I have a sql query likeSELECT * FROM table_name WHERE condition_1 = 'fname' OR condition_1 = 'lname' OR condition_1 = 'mname' AND condition_2 = 'apple' AND condition_3 ='orange'There are a lot ...

Data Engineering

1067 Views
2 replies
4 kudos

11-04-2021 11:36:30 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

11-04-2021 5:16:45 PM

4 kudos

Hi @John Constantine ,I think you can also use arrays_overlap() for your OR statements docs here

4 kudos

11-04-2021 5:16:45 PM

1 More Replies

by Braxx • Contributor II

11-04-2021 2:12:27 AM

1146 Views
5 replies
5 kudos

Resolved! Conditionally create a dataframe

I would like to implement a simple logic:if Df1 is empty return Df2 else newDf = Df1.union(Df2)May happened that Df1 is empty and the output is simply: []. In that case I do not need union.I have it like this but getting error when creating datafra...

Data Engineering

1146 Views
5 replies
5 kudos

11-04-2021 2:12:27 AM

View Replies

Latest Reply

cconnell
Contributor II

11-04-2021 6:12:20 AM

5 kudos

Also try df.head(1).isEmpty

5 kudos

11-04-2021 6:12:20 AM

4 More Replies

by Braxx • Contributor II

10-17-2021 11:52:14 AM

4266 Views
7 replies
5 kudos

Resolved! Object of type bool_ is not JSON serializable

I am doing a convertion of a data frame to nested dict/json. One of the column called "Problematic__c" is boolean type.For some reason json does not accept this data type retriving error: "Object of type bool_ is not JSON serializable" I need this as...

Data Engineering

4266 Views
7 replies
5 kudos

10-17-2021 11:52:14 AM

View Replies

Latest Reply

Braxx
Contributor II

10-22-2021 1:16:15 AM

5 kudos

Thanks Dan, that make sens!

5 kudos

10-22-2021 1:16:15 AM

6 More Replies

by Manoj • Contributor II

10-28-2021 3:17:27 PM

6611 Views
4 replies
8 kudos

Resolved! Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Data Engineering

6611 Views
4 replies
8 kudos

10-28-2021 3:17:27 PM

View Replies

Latest Reply

BilalAslamDbrx
Honored Contributor II

11-03-2021 6:48:48 AM

8 kudos

@Manoj Kumar Rayalla DBSQL currently limits execution to 10 concurrent queries per cluster so there could be some queuing with 30 concurrent queries. You may want to turn on multi-cluster load balancing to horizontally scale with 1 more cluster for...

8 kudos

11-03-2021 6:48:48 AM

3 More Replies

by Nick_Hughes • New Contributor III

11-02-2021 1:34:12 AM

1050 Views
3 replies
3 kudos

Is there an alerting API please?

Is there an alerting api so that alerts can be source controlled and automated, please ?https://docs.databricks.com/sql/user/alerts/index.html

Data Engineering

1050 Views
3 replies
3 kudos

11-02-2021 1:34:12 AM

View Replies

Latest Reply

Dan_Z
Honored Contributor

11-02-2021 10:30:42 AM

3 kudos

Hello @Nick Hughes , as of today we do not expose or document the API for these features. I think it will be a useful feature so I created an internal feature request for it (DB-I-4289). If you (or any future readers) want more information on this f...

3 kudos

11-02-2021 10:30:42 AM

2 More Replies

by William_Scardua • Valued Contributor

10-26-2021 11:42:15 AM

1521 Views
7 replies
2 kudos

How not to reprocess old files without delta ?

Hi guys,Look that case: Company ACME (hypothetical company)This company does not use delta, but uses open source Spark to process raw data for .parquet, we have a 'sales' process which consists of receiving every hour a new dataset (.csv) within th...

Data Engineering

1521 Views
7 replies
2 kudos

10-26-2021 11:42:15 AM

View Replies

Latest Reply

William_Scardua
Valued Contributor

11-01-2021 11:44:24 AM

2 kudos

Hi @Jose Gonzalez , I agree the best option is to use auto load, but some cases you don`t have the databricks plataform and don`t use delta, i this cases you need build a way to process the new raw files

2 kudos

11-01-2021 11:44:24 AM

6 More Replies

by kaslan • New Contributor II

10-28-2021 11:46:45 PM

4283 Views
6 replies
0 kudos

How to filter files in Databricks Autoloader stream

I want to set up an S3 stream using Databricks Auto Loader. I have managed to set up the stream, but my S3 bucket contains different type of JSON files. I want to filter them out, preferably in the stream itself rather than using a filter operation.A...

Data Engineering

4283 Views
6 replies
0 kudos

10-28-2021 11:46:45 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-29-2021 1:29:39 AM

0 kudos

According to the docs you linked, the glob filter on input-path only works on directories, not on the files themselves.So if you want to filter on certain files in the concerning dirs, you can include an additional filter through the pathGlobFilter o...

0 kudos

10-29-2021 1:29:39 AM

5 More Replies

by HamzaJosh • New Contributor II

10-27-2021 3:27:38 PM

8665 Views
7 replies
3 kudos

I want to use databricks workers to run a function in parallel on the worker nodes

I have a function making api calls. I want to run this function in parallel so I can use the workers in databricks clusters to run it in parallel. I have tried with ThreadPoolExecutor() as executor: results = executor.map(getspeeddata, alist)to run m...

Data Engineering

8665 Views
7 replies
3 kudos

10-27-2021 3:27:38 PM

View Replies

Latest Reply

HamzaJosh
New Contributor II

11-01-2021 6:49:53 AM

3 kudos

You guys are not getting the point, I am making API calls in a function and want to store the results in a dataframe. I want multiple processes to run this task in parallel. How do I create a UDF and use it in a dataframe when the task is calling an ...

3 kudos

11-01-2021 6:49:53 AM

6 More Replies

by Tankala_Harika • New Contributor II

11-01-2021 11:50:29 AM

253 Views
0 replies
0 kudos

Hi juliet Wu I have completed my databricks apache spark associate developer exam on 7/10/2021 after subsequent completion of my exam I got my badge t...

Hi juliet Wu I have completed my databricks apache spark associate developer exam on 7/10/2021 after subsequent completion of my exam I got my badge to my Webaccesor Mail immediately after 1 day of exam which is on 8/10/2021.but I didn't received my...

Data Engineering

253 Views
0 replies
0 kudos

11-01-2021 11:50:29 AM

by Mihai1 • New Contributor III

10-29-2021 4:26:08 AM

1267 Views
1 replies
2 kudos

Resolved! How to source control a Dashboard?

Is it possible to source control the dashboard along with a notebook code? When source controlling a python notebook it gets converted to *.py. It looks like the resulting *.py file loses the information about the dashboard cells. Thus, if this *.py ...

Data Engineering

1267 Views
1 replies
2 kudos

10-29-2021 4:26:08 AM

View Replies

Latest Reply

Dan_Z
Honored Contributor

11-01-2021 10:23:46 AM

2 kudos

No, you will need to save as another source, like DBC Archive, to replicate the Notebook features.

2 kudos

11-01-2021 10:23:46 AM

by cconnell • Contributor II

10-27-2021 10:00:17 AM

3318 Views
11 replies
7 kudos

Resolved! What is the proper way to import the new pyspark.pandas library?

I am moving an existing, working pandas program into Databricks. I want to use the new pyspark.pandas library, and change my code as little as possible. It appears that I should do the following:1) Add from pyspark import pandas as ps at the top2) Ch...

Data Engineering

3318 Views
11 replies
7 kudos

10-27-2021 10:00:17 AM

View Replies

Latest Reply

Anonymous
Not applicable

10-30-2021 3:04:31 AM

7 kudos

Make sure to use the 10.0 Runtime which includes Spark 3.2

7 kudos

10-30-2021 3:04:31 AM

10 More Replies

by IgnacioCastinei • New Contributor III

09-15-2021 4:49:02 AM

5312 Views
7 replies
2 kudos

CLI Command <databricks fs cp> Not Uploading Files to DBFS

Hi all, So far I have been successfully using the CLI interface to upload files from my local machine to DBFS/FileStore/tables. Specifically, I have been using my terminal and the following command: databricks fs cp -r <MyLocalDataset> dbfs:/FileStor...

Data Engineering

5312 Views
7 replies
2 kudos

09-15-2021 4:49:02 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-29-2021 3:40:20 PM

2 kudos

hi @Ignacio Castineiras ,If Arjun.kr's fully answered your question, would you be happy to mark their answer as best so that others can quickly find the solution?Please let us know if you still are having this issue.

2 kudos

10-29-2021 3:40:20 PM

6 More Replies

by ExtreemTactical • New Contributor

10-30-2021 2:13:11 AM

233 Views
0 replies
0 kudos

1. DIFFERENT TYPES OF TACTICAL GEAR 1. HARDWAREOptical hardware, for instance, cuffs, laser sights, optics, and night vision goggles accompany a hug...

1. DIFFERENT TYPES OF TACTICAL GEAR1. HARDWAREOptical hardware, for instance, cuffs, laser sights, optics, and night vision goggles accompany a huge group of features and capacities. Packs and pockets are made of climate-safe material planned to ke...

Data Engineering

233 Views
0 replies
0 kudos

10-30-2021 2:13:11 AM

by Adrien • New Contributor

10-28-2021 5:34:51 AM

759 Views
2 replies
1 kudos

Creating a table like in SQL with Spark

Hi !I'm working on a project at my company on Databricks using Scala and Spark. I'm new to Spark and Databricks and so I would like to know how to create a table on specific location (on the Delta Lake of my company). In SQL + some Delta features, I ...

Data Engineering

759 Views
2 replies
1 kudos

10-28-2021 5:34:51 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

10-29-2021 4:12:34 PM

1 kudos

Hi @Adrien MERAT ,I would like to share the following documentation that will provide examples on how to create Delta tables:Create Delta table linkDelta data types link

1 kudos

10-29-2021 4:12:34 PM

1 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Resolved! Optimize and Vacuum - which is the best order of operations?

Resolved! Generating Spark SQL query using Python

Resolved! Conditionally create a dataframe

Resolved! Object of type bool_ is not JSON serializable

Resolved! Is there a way to submit multiple queries to data bricks SQL END POINT using REST API ?

Is there an alerting API please?

How not to reprocess old files without delta ?

How to filter files in Databricks Autoloader stream

I want to use databricks workers to run a function in parallel on the worker nodes

Hi juliet Wu I have completed my databricks apache spark associate developer exam on 7/10/2021 after subsequent completion of my exam I got my badge t...

Resolved! How to source control a Dashboard?

Resolved! What is the proper way to import the new pyspark.pandas library?

CLI Command <databricks fs cp> Not Uploading Files to DBFS

1. DIFFERENT TYPES OF TACTICAL GEAR 1. HARDWAREOptical hardware, for instance, cuffs, laser sights, optics, and night vision goggles accompany a hug...

Creating a table like in SQL with Spark

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...