Data Engineering

Forum Posts

Sorted by:

by PradeepRavi • New Contributor III

08-01-2018 9:36:24 PM

42150 Views
6 replies
10 kudos

How do I prevent _success and _committed files in my write output?

Is there a way to prevent the _success and _committed files in my output. It's a tedious task to navigate to all the partitions and delete the files. Note : Final output is stored in Azure ADLS

Data Engineering

42150 Views
6 replies
10 kudos

08-01-2018 9:36:24 PM

View Replies

Latest Reply

shan_chandra
Databricks Employee

06-04-2022 11:57:58 AM

10 kudos

Please find the below steps to remove _SUCCESS, _committed and _started files.spark.conf.set("spark.databricks.io.directoryCommit.createSuccessFile","false") to remove success file.run vacuum command multiple times until _committed and _started files...

10 kudos

06-04-2022 11:57:58 AM

5 More Replies

by auser85 • New Contributor III

06-03-2022 8:53:23 AM

3570 Views
3 replies
1 kudos

dbutils.notebook.run() fails with job aborted but running the notebook individually works

I have a notebook that runs many notebooks in order, along the lines of:```%pythonnotebook_list = ['Notebook1', 'Notebook2'] for notebook in notebook_list: print(f"Now on Notebook: {notebook}") try: dbutils.notebook.run(f'{notebook}', 3600) e...

Data Engineering

3570 Views
3 replies
1 kudos

06-03-2022 8:53:23 AM

View Replies

Latest Reply

auser85
New Contributor III

06-04-2022 5:12:07 AM

1 kudos

I found the problem. Even if a notebook creates and specifies a widget fully, the notebook run process, e.g, dbutils.notebook.run('notebook') will not know how to use it. If I replace my widget with a non-widget provided value, the process works fine...

1 kudos

06-04-2022 5:12:07 AM

2 More Replies

by pieseautoford • New Contributor

06-03-2022 10:49:54 PM

801 Views
0 replies
0 kudos

www.pieseford.ro

Hi, my name is Jerry Maguire and I`m automatic engineer at Piese Ford. Piese originale Ford Fiesta 2008-2012

Data Engineering

801 Views
0 replies
0 kudos

06-03-2022 10:49:54 PM

by jwilliam • Contributor

05-27-2022 2:08:07 AM

6357 Views
4 replies
2 kudos

Resolved! How to view the SQL Query History of traditional Databricks cluster (not Databricks SQL)?

I tried use the Spark Cluster UI. But the queries are truncated.

Data Engineering

6357 Views
4 replies
2 kudos

05-27-2022 2:08:07 AM

View Replies

Latest Reply

walkermaster12
New Contributor II

05-31-2022 3:09:44 AM

2 kudos

In Apache Spark prior to 2.1, once a SQL query was run, there was no way to re-run it; all history was lost. Spark SQL introduced the "replay" functionality in Spark 2.1.0, enabling users to re-run any query they have already run. You can run a query...

2 kudos

05-31-2022 3:09:44 AM

3 More Replies

by Phani1 • Databricks MVP

06-01-2022 1:20:36 AM

4245 Views
2 replies
3 kudos

Resolved! Terminated with exception: Could not initialize class org.rocksdb.Options

Problem Statement : When running Delta Live tables ,it is giving the error.Error Message : Could not initialize class org.rocksdb.Optionsorg.apache.spark.sql.streaming.StreamingQueryException: Query cpicpg_us_tgt_amz_bronze [id = a42eec82-0ee8-41b4-9...

Data Engineering

4245 Views
2 replies
3 kudos

06-01-2022 1:20:36 AM

View Replies

Latest Reply

Phani1
Databricks MVP

06-03-2022 10:25:08 AM

3 kudos

Hi Team ,Thanks for your response, I faced this issue while executing the Delta Live tables / pipeline.Initially i choose product edition as Core and attached 4 notebooks to the pipeline and each notebook have Bronze and silver tables creation. duri...

3 kudos

06-03-2022 10:25:08 AM

1 More Replies

by Phani1 • Databricks MVP

05-11-2022 11:44:54 PM

6307 Views
1 replies
0 kudos

Execute tasks parallel to process multiple files parallel

Hi all, If we have multiple tasks under the job, How to invoke a specific task under a job.Do we have any API to invoke Job and its specific tasks instead of Job.Use case: When we receive multiple messages from the event hub, each underlying task in ...

Data Engineering

6307 Views
1 replies
0 kudos

05-11-2022 11:44:54 PM

View Replies

Latest Reply

Phani1
Databricks MVP

06-03-2022 10:17:16 AM

0 kudos

Thanks for your response, My question is ,if we have multiple tasks in a job ,How can we invoke specific task, I can see API to invoke the job but not a particular task in it. Kindly find attachment for your reference.

0 kudos

06-03-2022 10:17:16 AM

by klllmmm • New Contributor II

05-24-2022 9:22:22 AM

5916 Views
3 replies
1 kudos

Error as no such file when reading CSV file using pandas

I'm trying to read a CSV file saved in data using pandas read_csv function. But it gives No such file error.%fs ls /FileStore/tables/ df= pd.read_csv('/dbfs/FileStore/tables/CREDIT_1.CSV') df= pd.read_csv('/dbfs:/FileStore/tables/CREDIT_1.CSV')...

Data Engineering

5916 Views
3 replies
1 kudos

05-24-2022 9:22:22 AM

View Replies

Latest Reply

klllmmm
New Contributor II

06-03-2022 9:33:44 AM

1 kudos

Thanks to @Werner Stinckens for the answer.I understood that I have to use spark to read data from clusters.

1 kudos

06-03-2022 9:33:44 AM

2 More Replies

by yopbibo • Contributor II

06-02-2022 4:48:27 AM

7400 Views
3 replies
4 kudos

Resolved! Column name, starting with a number

Hi,I see it is possible to start a column name with a number, like `123_test`And store in a hive table with a location in delta.On that documentation https://www.stitchdata.com/docs/destinations/databricks-delta/reference#transformations--column-nami...

Data Engineering

7400 Views
3 replies
4 kudos

06-02-2022 4:48:27 AM

View Replies

Latest Reply

yopbibo
Contributor II

06-03-2022 8:50:41 AM

4 kudos

ha ha, yes, I try to find back the right page in DB documentation. If you have it, please, share.

4 kudos

06-03-2022 8:50:41 AM

2 More Replies

by auser85 • New Contributor III

06-01-2022 11:13:03 AM

3558 Views
2 replies
4 kudos

Resolved! Cache Select on Temp Table?

How might I cache a temp table?The documentation suggests it is possible: https://docs.databricks.com/spark/latest/spark-sql/language-manual/delta-cache.htmlConsider the following on DBR 10.5 and Spark 3.2.1:```%pythondf.createOrReplaceTempView("chan...

Data Engineering

3558 Views
2 replies
4 kudos

06-01-2022 11:13:03 AM

View Replies

Latest Reply

auser85
New Contributor III

06-03-2022 8:35:21 AM

4 kudos

Thank you! The newer documentation does indeed work for me.

4 kudos

06-03-2022 8:35:21 AM

1 More Replies

by Vibhor • Contributor

01-03-2022 7:41:11 AM

7598 Views
5 replies
2 kudos

Get current date as string in databricks using scala

I want to get current date in scala as a string for example today current date is 3rd jan, want to store it as a new variable dynamically as below, how to get it.val currdate : String = “20220103”when I am using val currdate = Calendar.getInstance.ge...

Data Engineering

7598 Views
5 replies
2 kudos

01-03-2022 7:41:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-03-2022 8:25:43 AM

2 kudos

Hey @Vibhor Sethi Hope you are well!Thank you for posting your question and letting us know that you were able to resolve the issue. Would you be happy to mark it as the best solution? It would be really helpful for the other members too.Cheers!

2 kudos

06-03-2022 8:25:43 AM

4 More Replies

by SailajaB • Valued Contributor III

12-29-2021 2:28:42 AM

4258 Views
2 replies
5 kudos

An error occurred while calling o303.mount: Operation failed: "This request is not authorized to perform this operation

Hi Team,We are unable to mount storage container in below scenario We created Gen 2 using VNet and added firewall restrictions (i.e allow trusted sources)And deployed Data bricks workspace with out VNet injection. Is it possible to add databricks pub...

Data Engineering

4258 Views
2 replies
5 kudos

12-29-2021 2:28:42 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-03-2022 8:17:00 AM

5 kudos

Hey @Sailaja B Hope everything is great!Does Hubert's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?Thanks!

5 kudos

06-03-2022 8:17:00 AM

1 More Replies

by sheree • New Contributor III

05-25-2022 9:51:38 AM

3500 Views
3 replies
1 kudos

Resolved! I can't access to my account.

I can't access to my account.This acccount was created today(not community, after 14 days trial it will chargable)when I'm try to access my account it gives meInvalid email address or passwordNote: Emails/usernames are case-sensitiveI tried to reset ...

Data Engineering

3500 Views
3 replies
1 kudos

05-25-2022 9:51:38 AM

View Replies

Latest Reply

sheree
New Contributor III

06-03-2022 5:48:26 AM

1 kudos

I got a reset link from the community. Actually the problem was with my username ,it did not identify a character within my username which was my email id.

1 kudos

06-03-2022 5:48:26 AM

2 More Replies

by oussamak • New Contributor II

05-27-2022 6:54:12 AM

4355 Views
1 replies
2 kudos

How to install JAR libraries from ADLS? I'm having an error

I mounted the ADLS to my Azure Databricks resource and I keep on getting this error when I try to install a JAR from a container:Library installation attempted on the driver node of cluster 0331-121709-buk0nvsq and failed. Please refer to the followi...

Data Engineering

4355 Views
1 replies
2 kudos

05-27-2022 6:54:12 AM

View Replies

by chandan_a_v • Valued Contributor

05-05-2022 11:23:48 PM

19616 Views
6 replies
6 kudos

Resolved! Spark Driver Out of Memory Issue

Hi, I am executing a simple job in Databricks for which I am getting below error. I increased the Driver size still I faced same issue. Spark config :from pyspark.sql import SparkSessionspark_session = SparkSession.builder.appName("Demand Forecasting...

Data Engineering

19616 Views
6 replies
6 kudos

05-05-2022 11:23:48 PM

View Replies

Latest Reply

chandan_a_v
Valued Contributor

05-08-2022 12:05:48 PM

6 kudos

I am getting the above issue while writing a Spark DF as a parquet file to AWS S3. Not doing any broadcast join actually.

6 kudos

05-08-2022 12:05:48 PM

5 More Replies

by William_Scardua • Valued Contributor

04-25-2022 12:30:05 PM

3396 Views
1 replies
2 kudos

Resolved! Best way to encrypt PII data

Hi guys, I have around 600GB per load, in you opnion, what is the best way to encrypt PII data in terms of performance ? (lib, cluster type, etc.)Thank youWilliam

Data Engineering

3396 Views
1 replies
2 kudos

04-25-2022 12:30:05 PM

View Replies

Latest Reply

Prabakar
Databricks Employee

06-01-2022 11:08:38 PM

2 kudos

Hello @William Scardua please check if the blog helps you.https://databricks.com/blog/2020/11/20/enforcing-column-level-encryption-and-avoiding-data-duplication-with-pii.html

2 kudos

06-01-2022 11:08:38 PM

Databricks Community

Forum Posts

How do I prevent _success and _committed files in my write output?

dbutils.notebook.run() fails with job aborted but running the notebook individually works

www.pieseford.ro

Resolved! How to view the SQL Query History of traditional Databricks cluster (not Databricks SQL)?

Resolved! Terminated with exception: Could not initialize class org.rocksdb.Options

Execute tasks parallel to process multiple files parallel

Error as no such file when reading CSV file using pandas

Resolved! Column name, starting with a number

Resolved! Cache Select on Temp Table?

Get current date as string in databricks using scala

An error occurred while calling o303.mount: Operation failed: "This request is not authorized to perform this operation

Resolved! I can't access to my account.

How to install JAR libraries from ADLS? I'm having an error

Resolved! Spark Driver Out of Memory Issue

Resolved! Best way to encrypt PII data

Join Us as a Local Community Builder!

Delta live tables - foreign keys

Inconsistent behaviour when using read_files to re...

SQL Warehouse - Table does not support overwrite b...

Naming question about SQL server database schemas

API Call to return more than 100 jobs