Data Engineering

Forum Posts

Sorted by:

by Dhara • New Contributor III

06-13-2022 10:40:39 PM

3453 Views
3 replies
1 kudos

Access multiple .mdb files using Python

Hi, I wanted to access multiple .mdb access files which are stored in the Azure Data Lake Storage(ADLS) or on Databricks File System using Python. Can anyone help me on how can I do it? It would be great if you can share some code snippets for the sa...

Data Engineering

3453 Views
3 replies
1 kudos

06-13-2022 10:40:39 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-23-2022 9:04:53 AM

1 kudos

Hey there @Dhara Mandal Hope everything is going great.Just wanted to check in to see if you were able to resolve your issue and would you be happy to mark an answer as best or do you need more help? We'd love to hear from you.Thanks!

1 kudos

08-23-2022 9:04:53 AM

2 More Replies

by sai_731566 • New Contributor II

08-16-2022 6:50:34 AM

3115 Views
1 replies
0 kudos

How to pass parameters/arguments to shell script from scala in databricks.

I was running shell scrip in data bricks using %sh magic command.I am having requirement where I need to pass parameters/arguments to the script. Is there any way we can get this done with scala as base language.

Data Engineering

3115 Views
1 replies
0 kudos

08-16-2022 6:50:34 AM

View Replies

by isaac_gritz • Databricks Employee

08-23-2022 1:08:55 AM

2405 Views
0 replies
1 kudos

Data Mesh with Databricks

Where to Learn More about Databricks for Data MeshWe recommend checking out our Data & AI Summit Talk on how the Databricks Lakehouse platform is the best platform for distributed architectures like Data Mesh. We would also recommend checking out thi...

Data Engineering

2405 Views
0 replies
1 kudos

08-23-2022 1:08:55 AM

by isaac_gritz • Databricks Employee

08-23-2022 1:06:10 AM

3132 Views
0 replies
4 kudos

CI/CD Best Practices

Best Practices for CI/CD on DatabricksFor CI/CD and software engineering best practices with Databricks notebooks we recommend checking out this best practices guide (AWS, Azure, GCP).For CI/CD and local development using an IDE, we recommend dbx, a ...

Data Engineering

3132 Views
0 replies
4 kudos

08-23-2022 1:06:10 AM

by weldermartins • Honored Contributor

08-19-2022 4:35:10 AM

13601 Views
9 replies
13 kudos

Resolved! Delta table upsert - databricks community

Hello guys,I'm trying to use upsert via delta lake following the documentation, but the command doesn't update or insert newlines.scenario: my source table is separated in bronze layer and updates or inserts are in silver layer.from delta.tables impo...

Data Engineering

13601 Views
9 replies
13 kudos

08-19-2022 4:35:10 AM

View Replies

Latest Reply

weldermartins
Honored Contributor

08-22-2022 11:55:40 AM

13 kudos

I managed to find the solution. In insert and update I was setting the target.tanks @Werner Stinckens !delta_df = DeltaTable.forPath(spark, 'dbfs:/mnt/silver/vendas/') delta_df.alias('target').m...

13 kudos

08-22-2022 11:55:40 AM

8 More Replies

by isaac_gritz • Databricks Employee

08-23-2022 12:14:12 AM

2282 Views
0 replies
2 kudos

Connecting Applications and BI Tools to Databricks SQL

Access Data in Databricks Using an Application or your Favorite BI ToolYou can leverage Partner Connect for easy, low-configuration connections to some of the most popular BI tools through our optimized connectors. Alternatively, you can follow these...

Data Engineering

2282 Views
0 replies
2 kudos

08-23-2022 12:14:12 AM

by isaac_gritz • Databricks Employee

08-23-2022 12:05:12 AM

1502 Views
0 replies
3 kudos

Optimize Azure VM / AWS EC2 / GKE Cloud Infrastructure Costs

Tips on Reducing Cloud Compute Infrastructure Costs for Azure VM, AWS EC2, and GCP GKE on DatabricksDatabricks takes advantage of the latest Azure VM / AWS EC2 / GKE VM/instance types to ensure you get the best price performance for your workloads on...

Data Engineering

1502 Views
0 replies
3 kudos

08-23-2022 12:05:12 AM

by isaac_gritz • Databricks Employee

08-22-2022 11:54:29 PM

14352 Views
4 replies
3 kudos

Performance Tuning Best Practices

Recommendations for performance tuning best practices on DatabricksWe recommend also checking out this article from my colleague @Franco Patano on best practices for performance tuning on Databricks.Performance tuning your workloads is an important...

Data Engineering

14352 Views
4 replies
3 kudos

08-22-2022 11:54:29 PM

View Replies

Latest Reply

isaac_gritz
Databricks Employee

08-22-2022 11:55:26 PM

3 kudos

Let us know in the comments if you have any other performance tuning tips & tricks

3 kudos

08-22-2022 11:55:26 PM

3 More Replies

by 438037 • New Contributor

08-22-2022 10:12:11 PM

1766 Views
0 replies
0 kudos

Databricks VPC - EKS VPC security groups

Hi,We have a databricks deployment in our AWS account in a dedicated VPC which we created a VPC peering to our EKS VPC, in the EKS main security group we added a rule that opens all TCP ports from the Databricks VPC and now it's working. Once I try t...

Data Engineering

1766 Views
0 replies
0 kudos

08-22-2022 10:12:11 PM

by Vadim1 • New Contributor III

06-23-2022 1:12:03 AM

2435 Views
2 replies
2 kudos

How to pass HBase-site.xml to a Databricks job?

Hi, I have Azure Hbase cluster and Databricks. I want to run jobs on Databricks that write data to Hbase. To connect to Hbase I need to get Hbase-site.xml and have it in the classpath or env of a job.Question: How can I run the Databricks job with an...

Data Engineering

2435 Views
2 replies
2 kudos

06-23-2022 1:12:03 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

08-22-2022 5:57:46 PM

2 kudos

Hi @Vadim Z,Just a friendly follow-up. Did the response from Hubert help you to resolve your issues? let us know if you still are looking for help

2 kudos

08-22-2022 5:57:46 PM

1 More Replies

by Reddraider • Databricks Partner

08-22-2022 5:01:49 PM

2432 Views
0 replies
0 kudos

What happened to the Custom option in the Cluster Configuration Access Mode menu option?

We are trying to configure a job cluster for a workflow. It looks as though we no longer have the option in the Access mode drop down for 'Custom'. We need custom as we have additional Spark configuration key/value settings we apply. The UI throws an...

Data Engineering

2432 Views
0 replies
0 kudos

08-22-2022 5:01:49 PM

by sanchit_popli • New Contributor II

08-22-2022 2:43:15 PM

2011 Views
0 replies
0 kudos

How can process 3.5GB GZ (~90GB) nested JSON and convert them to tabular formats with less processing time and optimized cost in Azure Databricks?

I have a total of 5000 files (Nested JSON ~ 3.5 GB). I have written a code which converts the json to Table in minutes (for JSON size till 1 GB) but when I am trying to process 3.5GB GZ json it is mostly getting failed because of Garbage collection. ...

Data Engineering

2011 Views
0 replies
0 kudos

08-22-2022 2:43:15 PM

by Delta • New Contributor II

06-13-2022 9:46:10 PM

18096 Views
1 replies
2 kudos

Is Delta table with auto-increment column as unique identifier for delta table is supported? If, yes, how to create that. I am not using Databrics version of Delta.

Data Engineering

18096 Views
1 replies
2 kudos

06-13-2022 9:46:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

08-22-2022 9:03:46 AM

2 kudos

Hey @Rahul Kumar Hope everything is going great.Just checking in. Does @Kaniz Fatma's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if ...

2 kudos

08-22-2022 9:03:46 AM

by Erik • Valued Contributor III

08-13-2022 5:06:01 AM

5069 Views
1 replies
3 kudos

Resolved! How to combine medallion architecture and delta live-tables nicely?

As many of you, we have implemented a "medallion architecture" (raw/bronze/silver/gold layers), which are each stored on seperate storrage accounts. We only create proper hive tables of the gold layer tables, so our powerbi users connecting to the da...

Data Engineering

5069 Views
1 replies
3 kudos

08-13-2022 5:06:01 AM

View Replies

Latest Reply

merca
Valued Contributor II

08-22-2022 9:01:40 AM

3 kudos

I can answer the first question:You can define data storage by setting the `path` parameter for tables. The "storage path" in pipeline settings will then only hold checkpoints (and some other pipeline stuff) and data will be stored in the correct acc...

3 kudos

08-22-2022 9:01:40 AM

by Kasi • New Contributor II

08-22-2022 5:15:05 AM

975 Views
0 replies
0 kudos

Unable to execute 6.1 and 6.2 examples

Hi All,I am unable to execute "Classroom-Setup-06.1" & "Classroom-Setup-06.2" setups in DataEngineering Course. On checking, I found that "DA = DBAcademyHelper()" statement is not executing in the include section of the code.I am using the community ...

Data Engineering

975 Views
0 replies
0 kudos

08-22-2022 5:15:05 AM

Databricks Community

Forum Posts

Access multiple .mdb files using Python

How to pass parameters/arguments to shell script from scala in databricks.

Data Mesh with Databricks

CI/CD Best Practices

Resolved! Delta table upsert - databricks community

Connecting Applications and BI Tools to Databricks SQL

Optimize Azure VM / AWS EC2 / GKE Cloud Infrastructure Costs

Performance Tuning Best Practices

Databricks VPC - EKS VPC security groups

How to pass HBase-site.xml to a Databricks job?

What happened to the Custom option in the Cluster Configuration Access Mode menu option?

How can process 3.5GB GZ (~90GB) nested JSON and convert them to tabular formats with less processing time and optimized cost in Azure Databricks?

Is Delta table with auto-increment column as unique identifier for delta table is supported? If, yes, how to create that. I am not using Databrics version of Delta.

Resolved! How to combine medallion architecture and delta live-tables nicely?

Unable to execute 6.1 and 6.2 examples

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template