Data Engineering

Forum Posts

Sorted by:

Start a conversation

by SIRIGIRI • Contributor

12-05-2022 8:59:58 PM

2164 Views
9 replies
12 kudos

Here, Find my achievement

Data Engineering

2164 Views
9 replies
12 kudos

12-05-2022 8:59:58 PM

View Replies

Latest Reply

nphau
Valued Contributor

12-06-2022 11:28:18 PM

12 kudos

Good job, congratulations on all of your achievements.

12 kudos

12-06-2022 11:28:18 PM

8 More Replies

by THIAM_HUATTAN • Valued Contributor

12-05-2022 1:26:56 AM

2302 Views
6 replies
5 kudos

https://www.databricks.com/notebooks/recitibikenycdraft/data-preparation.htmlCould someone help to see in that Step 3: Prepare Calendar Info# derive complete list of dates between first and last datesdates = ( spark .range(0,days_between).withCol...

Data Engineering

2302 Views
6 replies
5 kudos

12-05-2022 1:26:56 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-05-2022 9:31:12 AM

5 kudos

Hi @THIAM HUAT TAN In your notebook, you are creating a integer column days_between with the codedays_between = (last_date - first_date).days + 10Logically speaking, what the nb trying to do is to fetch all the dates between two dates to do a foreca...

5 kudos

12-05-2022 9:31:12 AM

5 More Replies

by boyelana • Contributor III

12-06-2022 7:34:57 AM

3347 Views
6 replies
8 kudos

Resolved! SQL Queries on Databricks

I just noticed i can only run limited number of SQL queries. Is that the default or norm?

Data Engineering

3347 Views
6 replies
8 kudos

12-06-2022 7:34:57 AM

View Replies

Latest Reply

tunstila
Contributor II

12-06-2022 1:08:38 PM

8 kudos

Hi there, you can only run 10 concurrent SQL queries per cluster.

8 kudos

12-06-2022 1:08:38 PM

5 More Replies

by vr • Contributor

12-02-2022 12:21:04 PM

1749 Views
7 replies
7 kudos

Where can I report about a problem on community.databricks.com?

I tried contact details on the bottom, but they seem to be generic Databricks contact and support links. The issue I faced was this:I think this word made its way to the stop list by a mistake.

Data Engineering

1749 Views
7 replies
7 kudos

12-02-2022 12:21:04 PM

View Replies

Latest Reply

Vartika
Moderator

12-05-2022 12:42:23 AM

7 kudos

Hey @Vladimir Ryabtsev and @Hubert Dudek,Thank you for highlighting this. Seems they were added to the block list in combination with other words.We will have this fixed as soon as possible.It's always great to have help from our community members....

7 kudos

12-05-2022 12:42:23 AM

6 More Replies

by Rishabh264 • Honored Contributor II

12-06-2022 12:14:28 AM

1280 Views
6 replies
2 kudos

delta live table

If i have two stages bronze and silver and when i create delta live tables we need to give the target schema to store the results , but i need to store tables in two databases bronze AND silver , for this i need to create two different delta live tab...

Data Engineering

1280 Views
6 replies
2 kudos

12-06-2022 12:14:28 AM

View Replies

Latest Reply

Geeta1
Valued Contributor

12-06-2022 4:08:07 AM

2 kudos

Hi @Rishabh Pandey , yes you have to create 2 DLT tables

2 kudos

12-06-2022 4:08:07 AM

5 More Replies

by LavaLiah_85929 • New Contributor II

12-05-2022 1:56:55 PM

951 Views
2 replies
1 kudos

Resolved! Log has failed integrity check error when altering a table property

Below is the integrity check error we are getting when trying to set the deletedRetentionFileDuration table property to 10 days. Observation: The table data is sitting in S3. The size of all the files in S3 is in TB. There are millions of files for t...

Data Engineering

951 Views
2 replies
1 kudos

12-05-2022 1:56:55 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-06-2022 3:20:13 AM

1 kudos

Please backup your table, then run the repair of filesFSCK REPAIR TABLE table_nameyou can also try to make dry run firstFSCK REPAIR TABLE table_name DRY RUNif data is partitioned can be helpful to refresh metastoreMSCK REPAIR TABLE mytable

1 kudos

12-06-2022 3:20:13 AM

1 More Replies

by Sreekanth1 • New Contributor II

11-30-2022 10:56:42 AM

690 Views
2 replies
0 kudos

How to pass job task parameters to another task in scala

Hi Team,I have a requirement in workflow job. Job has two tasks, one is python-task and another one is scala-task (both are running their own cluster).I have defined dbutils.job.taskValue in python which is not able to read value in scala because o...

Data Engineering

690 Views
2 replies
0 kudos

11-30-2022 10:56:42 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-06-2022 1:56:39 AM

0 kudos

Hi @Sreekanth Nallapa please refer this link This might help you in this

0 kudos

12-06-2022 1:56:39 AM

1 More Replies

by ridrasura • New Contributor III

11-25-2022 4:32:39 AM

1317 Views
1 replies
5 kudos

Optimal Batch Size for Batch Insert Queries using JDBC for Delta Tables

Hi,I am currently experimenting with databricks-jdbc : 2.6.29 and trying to execute batch insert queries What is the optimal batch size recommended by Databricks for performing Batch Insert queries?Currently it seems that values are inserted row by r...

Data Engineering

1317 Views
1 replies
5 kudos

11-25-2022 4:32:39 AM

View Replies

Latest Reply

ridrasura
New Contributor III

12-06-2022 1:47:18 AM

5 kudos

Just an observation : By using auto optimize table level property, I was able to see batch inserts inserting records in single file.https://docs.databricks.com/optimizations/auto-optimize.html

5 kudos

12-06-2022 1:47:18 AM

by BkP • Contributor

06-10-2022 4:34:49 AM

3170 Views
15 replies
9 kudos

Suggestion Needed for a Orchestrator/Scheduler to schedule and execute Jobs in an automated way

Hello Friends,We have an application which extracts dat from various tables in Azure Databricks and we extract it to postgres tables (postgres installed on top of Azure VMs). After extraction we apply transformation on those datasets in postgres tabl...

Data Engineering

3170 Views
15 replies
9 kudos

06-10-2022 4:34:49 AM

View Replies

Latest Reply

VaibB
Contributor

12-05-2022 9:29:45 PM

9 kudos

You can leverage Airflow, which provides a connector for databricks jobs API, or can use databricks workflow to orchestrate your jobs where you can define several tasks and set dependencies accordingly.

9 kudos

12-05-2022 9:29:45 PM

14 More Replies

by nk76 • New Contributor III

07-01-2022 2:09:09 AM

3736 Views
11 replies
5 kudos

Resolved! Custom library import fails randomly with error: not found: value it

Hello,I have an issue with the import of a custom library, in Azure Databricks.(roughly) 95% of the times it works fine, but sometimes it fails.I searched the internet and this community with no luck, so far.It is a scala library in a scala notebook,...

Data Engineering

3736 Views
11 replies
5 kudos

07-01-2022 2:09:09 AM

View Replies

Latest Reply

Naskar
New Contributor II

12-05-2022 5:49:46 PM

5 kudos

Even I also encountered the same error. While Importing a file getting an error as "Import failed with error: Could not deserialize: Exceeded 16777216 bytes (current = 16778609)"

5 kudos

12-05-2022 5:49:46 PM

10 More Replies

by Chris_Konsur • New Contributor III

10-18-2022 7:19:04 AM

1580 Views
3 replies
1 kudos

Resolved! Configure Autoloader with the file notification mode for production

I configured ADLS Gen2 standard storage and successfully configured Autoloader with the file notification mode.In this documenthttps://docs.databricks.com/ingestion/auto-loader/file-notification-mode.html"ADLS Gen2 provides different event notificati...

Data Engineering

1580 Views
3 replies
1 kudos

10-18-2022 7:19:04 AM

View Replies

Latest Reply

Ryan_Chynoweth
Honored Contributor III

12-05-2022 10:46:34 AM

1 kudos

Hi, @Chris Konsur. You do not need anything with the FlushWithClose event REST API that is just the event type that we listen to. As for backfill setting, this is for handling late data or late event that are being triggered. This setting largely de...

1 kudos

12-05-2022 10:46:34 AM

2 More Replies

by 899572 • New Contributor II

12-05-2022 9:01:24 AM

1002 Views
4 replies
1 kudos

"approxQuantile" not working as part of a delta live table workflow pipeline.

I am trying to compute outliers using approxQuantile on a Dataframe. It works fine in a Databricks notebook, but the call doesn't work correctly when it is part of a delta live table pipeline. (This is python.) Here is the line that isn't working a...

Data Engineering

1002 Views
4 replies
1 kudos

12-05-2022 9:01:24 AM

View Replies

Latest Reply

899572
New Contributor II

12-05-2022 12:24:07 PM

1 kudos

Good recommendation. I was able to do something similar that appears to work.# Step 2. Compute the is outlier sessions based on duration_minutes lc = session_agg_df.selectExpr("percentile(duration_minutes, 0.25) lower_quartile") session_agg_d...

1 kudos

12-05-2022 12:24:07 PM

3 More Replies

by mattjones • New Contributor II

12-05-2022 11:47:52 AM

266 Views
0 replies
0 kudos

www.meetup.com

DEC 13 MEETUP: Arbitrary Stateful Stream Processing in PySparkFor folks in the Bay Area- Dr. Karthik Ramasamy, Databricks' Head of Streaming, will be joined by engineering experts on the streaming and PySpark teams at Databricks for this in-person me...

Data Engineering

266 Views
0 replies
0 kudos

12-05-2022 11:47:52 AM

by karthik_p • Esteemed Contributor

10-18-2022 3:52:48 PM

1595 Views
5 replies
7 kudos

How to properly convert DUB's consumed into Doller Amount in Databricks AWS/GCP/AZURE

HI Team,I have gone through Lot of articles, but it looks there is some gap on pricing. can anyone please let me know accurate way to calculate DBU Pricing into dollarsas per my understandingTotal DBU Cost: DBU /hour * total job ran in hours (Shows a...

Data Engineering

1595 Views
5 replies
7 kudos

10-18-2022 3:52:48 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

12-05-2022 8:44:24 AM

7 kudos

DBU is per VM, and every VM has a different DBU price

7 kudos

12-05-2022 8:44:24 AM

4 More Replies

by Christine • Contributor

12-05-2022 5:55:55 AM

5794 Views
1 replies
2 kudos

ADD COLUMN IF NOT EXISTS does not recognize "IF NOT EXIST". How do I add a column to an existing delta table with SQL if the column does not already exist?

How do I add a column to an existing delta table with SQL if the column does not already exist?I am using the following code: <%sqlALTER TABLE table_name ADD COLUMN IF NOT EXISTS column_name type; >but it prints the error: <[PARSE_SYNTAX_ERROR] Synta...

Data Engineering

5794 Views
1 replies
2 kudos

12-05-2022 5:55:55 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-05-2022 9:57:26 AM

2 kudos

Hi @Christine Pedersen I guess IF NOT EXISTS or IF EXISTS can be used in conjunction with DROP or PARTITIONS according to the documentation. If you want to do this the same checking way, you can do using a try catch block in pyspark or as per your l...

2 kudos

12-05-2022 9:57:26 AM

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Here, Find my achievement

Error in Databricks code?

Resolved! SQL Queries on Databricks

Where can I report about a problem on community.databricks.com?

delta live table

Resolved! Log has failed integrity check error when altering a table property

How to pass job task parameters to another task in scala

Optimal Batch Size for Batch Insert Queries using JDBC for Delta Tables

Suggestion Needed for a Orchestrator/Scheduler to schedule and execute Jobs in an automated way

Resolved! Custom library import fails randomly with error: not found: value it

Resolved! Configure Autoloader with the file notification mode for production

"approxQuantile" not working as part of a delta live table workflow pipeline.

www.meetup.com

How to properly convert DUB's consumed into Doller Amount in Databricks AWS/GCP/AZURE

ADD COLUMN IF NOT EXISTS does not recognize "IF NOT EXIST". How do I add a column to an existing delta table with SQL if the column does not already exist?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...