Data Engineering

Forum Posts

Sorted by:

by isaac_gritz • Valued Contributor II

08-22-2022 11:54:29 PM

3378 Views
4 replies
2 kudos

Performance Tuning Best Practices

Recommendations for performance tuning best practices on DatabricksWe recommend also checking out this article from my colleague @Franco Patano on best practices for performance tuning on Databricks.Performance tuning your workloads is an important...

Data Engineering

3378 Views
4 replies
2 kudos

08-22-2022 11:54:29 PM

View Replies

Latest Reply

isaac_gritz
Valued Contributor II

08-22-2022 11:55:26 PM

2 kudos

Let us know in the comments if you have any other performance tuning tips & tricks

2 kudos

08-22-2022 11:55:26 PM

3 More Replies

by 438037 • New Contributor

08-22-2022 10:12:11 PM

427 Views
0 replies
0 kudos

Databricks VPC - EKS VPC security groups

Hi,We have a databricks deployment in our AWS account in a dedicated VPC which we created a VPC peering to our EKS VPC, in the EKS main security group we added a rule that opens all TCP ports from the Databricks VPC and now it's working. Once I try t...

Data Engineering

427 Views
0 replies
0 kudos

08-22-2022 10:12:11 PM

by Vadim1 • New Contributor III

06-23-2022 1:12:03 AM

796 Views
2 replies
2 kudos

How to pass HBase-site.xml to a Databricks job?

Hi, I have Azure Hbase cluster and Databricks. I want to run jobs on Databricks that write data to Hbase. To connect to Hbase I need to get Hbase-site.xml and have it in the classpath or env of a job.Question: How can I run the Databricks job with an...

Data Engineering

796 Views
2 replies
2 kudos

06-23-2022 1:12:03 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

08-22-2022 5:57:46 PM

2 kudos

Hi @Vadim Z,Just a friendly follow-up. Did the response from Hubert help you to resolve your issues? let us know if you still are looking for help

2 kudos

08-22-2022 5:57:46 PM

1 More Replies

by Reddraider • New Contributor

08-22-2022 5:01:49 PM

502 Views
0 replies
0 kudos

What happened to the Custom option in the Cluster Configuration Access Mode menu option?

We are trying to configure a job cluster for a workflow. It looks as though we no longer have the option in the Access mode drop down for 'Custom'. We need custom as we have additional Spark configuration key/value settings we apply. The UI throws an...

Data Engineering

502 Views
0 replies
0 kudos

08-22-2022 5:01:49 PM

by sanchit_popli • New Contributor II

08-22-2022 2:43:15 PM

607 Views
0 replies
0 kudos

How can process 3.5GB GZ (~90GB) nested JSON and convert them to tabular formats with less processing time and optimized cost in Azure Databricks?

I have a total of 5000 files (Nested JSON ~ 3.5 GB). I have written a code which converts the json to Table in minutes (for JSON size till 1 GB) but when I am trying to process 3.5GB GZ json it is mostly getting failed because of Garbage collection. ...

Data Engineering

607 Views
0 replies
0 kudos

08-22-2022 2:43:15 PM

by Delta • New Contributor II

06-13-2022 9:46:10 PM

9447 Views
2 replies
3 kudos

Is Delta table with auto-increment column as unique identifier for delta table is supported? If, yes, how to create that. I am not using Databrics version of Delta.

Data Engineering

9447 Views
2 replies
3 kudos

06-13-2022 9:46:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-22-2022 9:03:46 AM

3 kudos

Hey @Rahul Kumar Hope everything is going great.Just checking in. Does @Kaniz Fatma's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if ...

3 kudos

08-22-2022 9:03:46 AM

1 More Replies

by Erik • Valued Contributor II

08-13-2022 5:06:01 AM

2150 Views
1 replies
3 kudos

Resolved! How to combine medallion architecture and delta live-tables nicely?

As many of you, we have implemented a "medallion architecture" (raw/bronze/silver/gold layers), which are each stored on seperate storrage accounts. We only create proper hive tables of the gold layer tables, so our powerbi users connecting to the da...

Data Engineering

2150 Views
1 replies
3 kudos

08-13-2022 5:06:01 AM

View Replies

Latest Reply

merca
Valued Contributor II

08-22-2022 9:01:40 AM

3 kudos

I can answer the first question:You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct acc...

3 kudos

08-22-2022 9:01:40 AM

by Kasi • New Contributor II

08-22-2022 5:15:05 AM

347 Views
0 replies
0 kudos

Unable to execute 6.1 and 6.2 examples

Hi All,I am unable to execute "Classroom-Setup-06.1" & "Classroom-Setup-06.2" setups in DataEngineering Course. On checking, I found that "DA = DBAcademyHelper()" statement is not executing in the include section of the code.I am using the community ...

Data Engineering

347 Views
0 replies
0 kudos

08-22-2022 5:15:05 AM

by Host • New Contributor

02-12-2022 2:50:31 AM

1079 Views
1 replies
0 kudos

hostinc-logo

Hostinc is the best place to match the price and quality of the product at the most affordable price. If you are looking for a server that can make your marketing campaign a huge success here you go with our one of the most powerful Dedicated Server ...

Data Engineering

1079 Views
1 replies
0 kudos

02-12-2022 2:50:31 AM

View Replies

Latest Reply

Sovchenko
New Contributor II

08-22-2022 1:31:46 AM

0 kudos

Thanks for sharing! Before you hire mobile app developers, you need to carefully study this topic.

0 kudos

08-22-2022 1:31:46 AM

by User16790091296 • Contributor II

06-24-2021 8:16:54 AM

10407 Views
6 replies
1 kudos

How to delete from a temp view or equivalent in spark sql databricks?

I need to delete from a temp view in databricks, but it looks like i can do only merge, select and insert. Maybe i missed something but I did not find any documentation on this.

Data Engineering

10407 Views
6 replies
1 kudos

06-24-2021 8:16:54 AM

View Replies

Latest Reply

crazy_horse
New Contributor II

08-22-2022 12:33:11 AM

1 kudos

What about%sqlDROP TABLE IF EXISTS xxxxx

1 kudos

08-22-2022 12:33:11 AM

5 More Replies

by Bin • New Contributor

08-21-2022 10:36:23 PM

681 Views
0 replies
0 kudos

How to do an "overwrite" output mode using spark structured streaming without deleting all the data and the checkpoint

I have this delta lake in ADLS to sink data through spark structured streaming. We usually append new data from our data source to our delta lake, but there are some cases when we find errors in the data that we need to reprocess everything. So what ...

Data Engineering

681 Views
0 replies
0 kudos

08-21-2022 10:36:23 PM

by mp • New Contributor II

09-24-2021 10:26:22 PM

1398 Views
4 replies
6 kudos

Resolved! How can I convert a parquet into delta table?

I am looking to migrate my legacy warehouse data. How can I convert a parquet into delta table?

Data Engineering

1398 Views
4 replies
6 kudos

09-24-2021 10:26:22 PM

View Replies

Latest Reply

Kaniz
Community Manager

03-07-2022 3:16:10 AM

6 kudos

Hi @Manish P , You have three options for converting a Parquet table to a Delta table.Convert files to Delta Lake format and then create a Delta table:CONVERT TO DELTA parquet.`/data-pipeline/` CREATE TABLE events USING DELTA LOCATION '/data-pipelin...

6 kudos

03-07-2022 3:16:10 AM

3 More Replies

by ilarsen • Contributor

08-21-2022 5:18:12 PM

501 Views
0 replies
1 kudos

Trouble referencing a column that has been added by schema evolution (Auto Loader with Delta Live Tables)

Hi,I have a Delta Live Tables pipeline, using Auto Loader, to ingest from JSON files. I need to do some transformations - in this case, converting timestamps. Except one of the timestamp columns does not exist in every file. This is causing the DLT p...

Data Engineering

501 Views
0 replies
1 kudos

08-21-2022 5:18:12 PM

by serg-v • New Contributor III

06-17-2022 2:25:53 AM

1087 Views
3 replies
0 kudos

Running large window spark structured streaming aggregations with small slide duration

I want to run aggregations on large windows (90 days) with small slide duration (5 minutes).Straightforward solution leads to giant state around hundreds of gigabytes, which doesn't look acceptable.Is there any best practices doing this?Now I conside...

Data Engineering

1087 Views
3 replies
0 kudos

06-17-2022 2:25:53 AM

View Replies

Latest Reply

Kaniz
Community Manager

06-23-2022 5:40:11 AM

0 kudos

Hi @Sergey Volkov, Thanks for your question. Here are some fantastic articles on EWMA and Event-time Aggregation in Apache Spark™’s Structured Streaming. Please have a look. Let us know if that helps.https://towardsdatascience.com/time-series-from-s...

0 kudos

06-23-2022 5:40:11 AM

2 More Replies

by SailajaB • Valued Contributor III

01-17-2022 6:05:32 AM

1061 Views
2 replies
8 kudos

Resolved! How to restrict Azure users to use launch workspace to login to ADB workspace as admin when user has owner or contributor role

HI,Is there any way to disable launch workspace option in Azure portal for ADB.We have user accesses at resource group, so we need to restrict users who are part of owner or contributor role to launch ADB worksapce as admin.Thank you

Data Engineering

1061 Views
2 replies
8 kudos

01-17-2022 6:05:32 AM

View Replies

Latest Reply

none_ranjeet
New Contributor III

08-20-2022 7:20:27 PM

8 kudos

Deny Assignments don't block subscription contributor to launch workspace and become admin. Actually I haven't find any way to block that after many tries of different methods.

8 kudos

08-20-2022 7:20:27 PM

1 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Performance Tuning Best Practices

Databricks VPC - EKS VPC security groups

How to pass HBase-site.xml to a Databricks job?

What happened to the Custom option in the Cluster Configuration Access Mode menu option?

How can process 3.5GB GZ (~90GB) nested JSON and convert them to tabular formats with less processing time and optimized cost in Azure Databricks?

Is Delta table with auto-increment column as unique identifier for delta table is supported? If, yes, how to create that. I am not using Databrics version of Delta.

Resolved! How to combine medallion architecture and delta live-tables nicely?

Unable to execute 6.1 and 6.2 examples

hostinc-logo

How to delete from a temp view or equivalent in spark sql databricks?

How to do an "overwrite" output mode using spark structured streaming without deleting all the data and the checkpoint

Resolved! How can I convert a parquet into delta table?

Trouble referencing a column that has been added by schema evolution (Auto Loader with Delta Live Tables)

Running large window spark structured streaming aggregations with small slide duration

Resolved! How to restrict Azure users to use launch workspace to login to ADB workspace as admin when user has owner or contributor role

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...