Data Engineering

Forum Posts

Sorted by:

by govind • New Contributor

07-28-2021 7:49:40 AM

2833 Views
2 replies
0 kudos

Write 160M rows with 300 columns into Delta Table using Databricks?

Hi, I am using databricks to load data from one delta table into another delta table. I'm using SIMBA Spark JDBC connector to pull data from delta table in my source instance and writing into delta table in my databricks instance. The source has...

Data Engineering

2833 Views
2 replies
0 kudos

07-28-2021 7:49:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-02-2022 8:41:34 AM

0 kudos

Hi @govind@dqlabs.ai Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

0 kudos

05-02-2022 8:41:34 AM

1 More Replies

by Zii • New Contributor II

05-02-2022 8:34:29 AM

4345 Views
0 replies
1 kudos

Delta Live Tables Quality check for distinct Values

Hi All, I have been having an issue identifying how to do a uniqueness check for the quality check. Below is an example. @dlt.expect("origin_not_dup", "origin is distinct from origin") def harmonized_data(): df=dlt.read("raw_data") for col in...

Data Engineering

4345 Views
0 replies
1 kudos

05-02-2022 8:34:29 AM

by Vikram • New Contributor II

03-28-2022 6:00:57 PM

4919 Views
4 replies
4 kudos

Resolved! CVE-2022-0778

How can we update the OpenSSL version for the cluster to address this vulnerability ?https://ubuntu.com/security/CVE-2022-0778Tried with this global init script to auto update the openssl version but does not seem to work as apt-utils is missing. apt...

Data Engineering

4919 Views
4 replies
4 kudos

03-28-2022 6:00:57 PM

View Replies

Latest Reply

Atanu
Databricks Employee

04-28-2022 5:53:41 PM

4 kudos

I can see below from our internal communication. CVSSv3 score: 4.0 (Medium) AV:N/AC:H/PR:N/UI:N/S:C/C:N/I:N/A:LReference: https://www.openssl.org/news/secadv/20220315.txtSeverity: HighThe BN_mod_sqrt() function, which computes a modular square root, ...

4 kudos

04-28-2022 5:53:41 PM

3 More Replies

by pavanb • New Contributor II

04-05-2022 4:50:37 AM

13394 Views
3 replies
3 kudos

Resolved! memory issues - databricks

Hi All, All of a sudden in our Databricks dev environment, we are getting exceptions related to memory such as out of memory , result too large etc.Also, the error message is not helping to identify the issue.Can someone please guide on what would be...

Data Engineering

13394 Views
3 replies
3 kudos

04-05-2022 4:50:37 AM

View Replies

Latest Reply

pavanb
New Contributor II

04-06-2022 2:43:20 AM

3 kudos

Thanks for the response @Hubert Dudek .if i run the same code in test environment , its getting successfully completed and in dev its giving out of memory issue. Also the configuration of test nand dev environment is exactly same.

3 kudos

04-06-2022 2:43:20 AM

2 More Replies

by Vee • New Contributor

04-07-2022 11:37:05 AM

7652 Views
1 replies
1 kudos

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Please could you suggest best cluster configuration for a use case stated below and tips to resolve the errors shown below -Use case:There could be 4 or 5 spark jobs that run concurrently.Each job reads 40 input files and spits out 120 output files ...

Data Engineering

7652 Views
1 replies
1 kudos

04-07-2022 11:37:05 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

04-29-2022 3:09:48 PM

1 kudos

Hi @Vetrivel Senthil , Just wondering if this question is a duplicate from this one https://community.databricks.com/s/feed/0D53f00001qvQJcCAM?

1 kudos

04-29-2022 3:09:48 PM

by Vee • New Contributor

04-11-2022 11:38:26 AM

4252 Views
1 replies
0 kudos

Tips for resolving follolwing errors related to AWS S3 read / write

Job aborted due to stage failure: Task 0 in stage 3084.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3084.0 (TID...., ip..., executor 0): org.apache.spark.SparkExecution: Task failed while writing rowsJob aborted due to stage failure:...

Data Engineering

4252 Views
1 replies
0 kudos

04-11-2022 11:38:26 AM

View Replies

by Rk2 • New Contributor II

04-19-2022 12:09:46 AM

2527 Views
2 replies
4 kudos

Resolved! scheduling a job with multiple notebooks using common parameter

I have a practical use casethree notebooks (pyspark ) all have one common parameter. need to schedule all three notebooks in a sequence is there any way to run them by setting one parameter value, as they are same in all. please suggest the ...

Data Engineering

2527 Views
2 replies
4 kudos

04-19-2022 12:09:46 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-19-2022 2:17:15 AM

4 kudos

@Ramesh Kotha , in notebook get parameter like that:my_parameter = dbutils.widgets.get("my_parameter")and set it in a task like that:

4 kudos

04-19-2022 2:17:15 AM

1 More Replies

by zyx • New Contributor II

04-22-2022 12:30:34 AM

2002 Views
2 replies
3 kudos

data bricks bi tool Supported from pdf formatted ?

as per Reporting point of view pdf formatted supporting or not.

Data Engineering

2002 Views
2 replies
3 kudos

04-22-2022 12:30:34 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

04-22-2022 2:28:13 AM

3 kudos

@Bhanu aravapalli , Can you explain the use case in more detail?

3 kudos

04-22-2022 2:28:13 AM

1 More Replies

by WojtekJ • New Contributor

04-25-2022 4:39:13 AM

6461 Views
1 replies
1 kudos

Is it possible to use Iceberg instead of DeltaLake?

Hi.Do you know if it is possible to use Iceberg table format instead DeltaLake?Ideally, I would like to see the tables in Databricks stored as Iceberg and use them as usual in the notebooks.I read that there is also an option to link external metasto...

Data Engineering

6461 Views
1 replies
1 kudos

04-25-2022 4:39:13 AM

View Replies

by SailajaB • Databricks Partner

02-24-2022 5:16:11 AM

6064 Views
3 replies
7 kudos

Resolved! how we can use config file to change pysparks dataframe names without hardcoding

Hi,Can we use config file to change pyspark dataframe attribute names (root, nested of both struct and array type) .Actually in input we are getting attributes in lowercase we need to convert them into camel case(please note we don't have any separat...

Data Engineering

6064 Views
3 replies
7 kudos

02-24-2022 5:16:11 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-29-2022 7:56:34 AM

7 kudos

Hi @Sailaja B This is awesome!Thanks for coming in and posting the solution. We really appreciate it.Cheers!

7 kudos

04-29-2022 7:56:34 AM

2 More Replies

by Tahseen0354 • Valued Contributor

04-19-2022 4:25:16 AM

2102 Views
1 replies
1 kudos

Configure CLI on databricks on GCP

Hi, I have a service account in my GCP project and the service account is added as a user in my databricks GCP account. Is it possible to configure CLI on databricks on GCP using that service account ? Something similar to:databricks configure ---tok...

Data Engineering

2102 Views
1 replies
1 kudos

04-19-2022 4:25:16 AM

View Replies

by LukaszJ • Contributor III

03-14-2022 4:15:00 AM

6554 Views
4 replies
4 kudos

Resolved! Terraform: get metastore id without creating new metastore

Hello,I want to create database (schema) and tables in my Databricks workspace using terraform.I found this resources: databricks_schemaIt requires databricks_catalog, which requires metastore_id.However, I have databricks_workspace and I did not cre...

Data Engineering

6554 Views
4 replies
4 kudos

03-14-2022 4:15:00 AM

View Replies

Latest Reply

Atanu
Databricks Employee

04-17-2022 10:41:09 AM

4 kudos

https://registry.terraform.io/providers/databrickslabs/databricks/latest/docs/resources/schema I think this is for UC. https://docs.databricks.com/data-governance/unity-catalog/index.html

4 kudos

04-17-2022 10:41:09 AM

3 More Replies

by Juniper_AIML • New Contributor

02-15-2022 8:54:12 AM

5540 Views
3 replies
0 kudos

How to access the virtual environment directory where the databricks notebooks are running?

How to get access to a separate virtual environment space and its storage location on databricks so that we can move our created libraries into it without waiting for their installation each time the cluster is brought up.What we want basically is a ...

Data Engineering

5540 Views
3 replies
0 kudos

02-15-2022 8:54:12 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2022 10:28:48 AM

0 kudos

Hey there @Aman Gaurav Thank you for posting your question.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

0 kudos

04-28-2022 10:28:48 AM

2 More Replies

by alejandrofm • Valued Contributor

02-28-2022 5:43:25 AM

6072 Views
4 replies
4 kudos

Resolved! Are there any recommended spark config settings for Delta/Databricks?

Hi! I'm starting to test configs on DataBricks, for example, to avoid corrupting data if two processes try to write at the same time:.config('spark.databricks.delta.multiClusterWrites.enabled', 'false')Or if I need more partitions than default .confi...

Data Engineering

6072 Views
4 replies
4 kudos

02-28-2022 5:43:25 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2022 9:27:40 AM

4 kudos

Hey there @Alejandro Martinez Hope everything is going well.Just wanted to see if you were able to find an answer to your question. If yes, would you be happy to let us know and mark it as best so that other members can find the solution more quickl...

4 kudos

04-28-2022 9:27:40 AM

3 More Replies

by DejanSunderic • New Contributor III

08-04-2016 10:49:04 AM

16841 Views
11 replies
3 kudos

is command stuck?

I created some ETL using DataFrames in python. It used to run 180 sec. But it is not taking ~ 1200 sec. I have been changing it, so it could be something that I introduced, or something in the environment.Part of the process is appending results into...

Data Engineering

16841 Views
11 replies
3 kudos

08-04-2016 10:49:04 AM

View Replies

Latest Reply

Carneiro
New Contributor II

04-28-2022 9:09:29 AM

3 kudos

I am having a problem very similar. Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands like: dataframe.show(n=1) dataframe.toPandas() dataframe.description() dataframe.wr...

3 kudos

04-28-2022 9:09:29 AM

10 More Replies

Databricks Community

Forum Posts

Write 160M rows with 300 columns into Delta Table using Databricks?

Delta Live Tables Quality check for distinct Values

Resolved! CVE-2022-0778

Resolved! memory issues - databricks

Cluster configuration and optimal number for fs.s3a.connection.maximum , fs.s3a.threads.max

Tips for resolving follolwing errors related to AWS S3 read / write

Resolved! scheduling a job with multiple notebooks using common parameter

data bricks bi tool Supported from pdf formatted ?

Is it possible to use Iceberg instead of DeltaLake?

Resolved! how we can use config file to change pysparks dataframe names without hardcoding

Configure CLI on databricks on GCP

Resolved! Terraform: get metastore id without creating new metastore

How to access the virtual environment directory where the databricks notebooks are running?

Resolved! Are there any recommended spark config settings for Delta/Databricks?

is command stuck?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template