Data Engineering

Forum Posts

Sorted by:

by yvuignie • Contributor

11-02-2022 6:22:52 AM

8199 Views
12 replies
3 kudos

Resolved! Unity catalog - How do you modify groups properly ?

Hello,What is the best practice to modify/delete/recreate groups properly ?In order to rename a group, the only mean was to delete/recreate. But after deletion in the account console, the permissions granted to the deleted groups in the tables were i...

Data Engineering

8199 Views
12 replies
3 kudos

11-02-2022 6:22:52 AM

View Replies

Latest Reply

RobinK
Contributor

02-15-2024 5:04:12 AM

3 kudos

Hello,I have exactly the same issue - I am also using terraform.I deleted a group and the catalog permissions are in bad state. I am not able to revoke access to this group using the Databricks UI nor REST API. I also tried to recreate the group wit...

3 kudos

02-15-2024 5:04:12 AM

11 More Replies

by aladda • Databricks Employee

05-28-2021 11:50:52 AM

11895 Views
3 replies
3 kudos

Resolved! What's the best practice on running ANALYZE on Delta Tables for query performance optimization?

Data Engineering

11895 Views
3 replies
3 kudos

05-28-2021 11:50:52 AM

View Replies

Latest Reply

jlickt
New Contributor II

05-09-2024 10:10:51 AM

3 kudos

Super write-up; very useful in understanding how the Delta and non-Delta approaches have evolved.

3 kudos

05-09-2024 10:10:51 AM

2 More Replies

by Mr__D • New Contributor II

03-23-2023 11:10:20 AM

26643 Views
7 replies
1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

Data Engineering

26643 Views
7 replies
1 kudos

03-23-2023 11:10:20 AM

View Replies

Latest Reply

Gamlet
New Contributor II

01-17-2024 5:33:35 AM

1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

1 kudos

01-17-2024 5:33:35 AM

6 More Replies

by andrew0117 • Contributor

05-19-2023 9:54:28 AM

1240 Views
1 replies
0 kudos

what is best practice to handle the concurrency issue in batch processing?

Normally, our ELT framework takes in batches one by one and loads the data into target tables. But if more than one batches come in at the same time, the framework will break due to the concurrency issue that multiple sources are trying to write the ...

Data Engineering

1240 Views
1 replies
0 kudos

05-19-2023 9:54:28 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

06-14-2023 2:25:50 PM

0 kudos

you can partition you table to avoid the changes of getting this exception.

0 kudos

06-14-2023 2:25:50 PM

by Anonymous • Not applicable

06-21-2021 2:37:37 PM

10284 Views
3 replies
0 kudos

Resolved! What is the best practice for managing different environments (Staging vs Production) on Databricks?

Should we create separate workspaces for Dev/Test/Prod ? Or should we have 1 workspace and create separate folders for Dev/Test/Prod?

Data Engineering

10284 Views
3 replies
0 kudos

06-21-2021 2:37:37 PM

View Replies

Latest Reply

Srikanth_Gupta_
Databricks Employee

06-22-2021 7:51:55 AM

0 kudos

as per my previous experience, its always good to have different workspaces for different envs, its easy to maintain and helps better with CICD pipeline as well, because lot of organizations provide deployment access to Developers in Dev env but not ...

0 kudos

06-22-2021 7:51:55 AM

2 More Replies

by Trung • Contributor

02-06-2023 8:29:19 PM

1900 Views
4 replies
0 kudos

best practice for managing resources to avoid creators leaving out of the project

Dear DB community!As I know, when the resource creator is out of the project all resources they create will be deleted as well. So the question is that: can I assign the owner role to a group, that will help protect the resource from deletion or not...

Data Engineering

1900 Views
4 replies
0 kudos

02-06-2023 8:29:19 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-08-2023 12:18:56 AM

0 kudos

Hi @trung nguyen Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

0 kudos

04-08-2023 12:18:56 AM

3 More Replies

by chanansh • Contributor

02-03-2023 5:02:06 AM

1814 Views
2 replies
0 kudos

Delta table acceleration for group by on key columns using ZORDER does not work

What is the best practice for accelerating queries which looks like the following?win = Window.partitionBy('key1','key2').orderBy('timestamp') df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))I have tried to use OP...

Data Engineering

1814 Views
2 replies
0 kudos

02-03-2023 5:02:06 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-05-2023 11:55:40 PM

0 kudos

Hi @Hanan Shteingart Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

0 kudos

04-05-2023 11:55:40 PM

1 More Replies

by flobib123 • New Contributor III

03-10-2023 10:54:49 PM

7125 Views
5 replies
3 kudos

Resolved! Syntax highlight support for python multiline SQL strings

I would like to know if a tool is available in Databricks to manage the SQL syntax highlight in text.Like this VSCode plugin:https://marketplace.visualstudio.com/items?itemName=ptweir.python-string-sqlThank you.

Data Engineering

7125 Views
5 replies
3 kudos

03-10-2023 10:54:49 PM

View Replies

Latest Reply

flobib123
New Contributor III

03-21-2023 11:01:19 PM

3 kudos

Hello @Vidula Khanna ,No, no one answered my question correctly, but I'm using PySpark now, so I'm not on this topic anymore.But thank you for taking the time to answer me.

3 kudos

03-21-2023 11:01:19 PM

4 More Replies

by AJDJ • New Contributor III

10-11-2022 1:25:29 PM

2996 Views
2 replies
6 kudos

Cost as per the Databricks demo

Hi there,I came across this Databricks demo from the below link. https://youtu.be/BqB7YQ1-KKcKindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and datab...

Data Engineering

2996 Views
2 replies
6 kudos

10-11-2022 1:25:29 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-19-2022 6:39:47 AM

6 kudos

Hi @AJ DJ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

6 kudos

11-19-2022 6:39:47 AM

1 More Replies

by Gim • Contributor

11-02-2022 3:30:05 PM

67836 Views
3 replies
9 kudos

Best practice for logging in Databricks notebooks?

What is the best practice for logging in Databricks notebooks? I have a bunch of notebooks that run in parallel through a workflow. I would like to keep track of everything that happens such as errors coming from a stream. I would like these logs to ...

Data Engineering

67836 Views
3 replies
9 kudos

11-02-2022 3:30:05 PM

View Replies

Latest Reply

karthik_p
Esteemed Contributor

11-03-2022 7:42:35 AM

9 kudos

@Gimwell Young AS @Debayan Mukherjee mentioned if you configure verbose logging in workspace level, logs will be moved to your storage bucket that you have provided during configuration. from there you can pull logs into any of your licensed log mo...

9 kudos

11-03-2022 7:42:35 AM

2 More Replies

by alejandrofm • Valued Contributor

10-04-2022 11:27:19 AM

2673 Views
4 replies
2 kudos

Resolved! Orphan (?) files on Databricks S3 bucket

Hi, I'm seeing a lot of empty (and not) directories on routes like:xxxxxx.jobs/FileStore/job-actionstats/xxxxxx.jobs/FileStore/job-result/xxxxxx.jobs/command-results/Can I create a lifecycle to delete old objects (files/directories)? how many days? w...

Data Engineering

2673 Views
4 replies
2 kudos

10-04-2022 11:27:19 AM

View Replies

Latest Reply

alejandrofm
Valued Contributor

10-13-2022 2:51:01 PM

2 kudos

Hi! I didn't know that, Purging right now, is there a way to schedule that so logs are retained for less time? Maybe I want to maintain the last 7 days for everything?Thanks!

2 kudos

10-13-2022 2:51:01 PM

3 More Replies

by j02424 • New Contributor

08-30-2022 1:09:27 AM

3484 Views
1 replies
4 kudos

Best practice to delete /dbfs/tmp ?

What is best practice regarding the tmp folder? We have a very large amount of data in that folder and not sure whether to delete, back up etc?

Data Engineering

3484 Views
1 replies
4 kudos

08-30-2022 1:09:27 AM

View Replies

Latest Reply

Debayan
Databricks Employee

08-30-2022 2:04:10 PM

4 kudos

/dbfs/tmp can contain a lot of files including temporary system files used for intermediary calculations or other sub directories which can contain packages of user defined installations. It is always better to backup the files.

4 kudos

08-30-2022 2:04:10 PM

by palzor • New Contributor III

08-14-2022 2:24:43 PM

953 Views
0 replies
2 kudos

What is the best practice while loading delta table , do I infer the schema or provide the schema?

I am loading avro files into the detla tables. I am doing this for multiple tables and some files are big like (2-3GB) and most of them are small like in few MBs.I am using autoloader to load the data into the delta tables.My question is:What is the ...

Data Engineering

953 Views
0 replies
2 kudos

08-14-2022 2:24:43 PM

by dumpstech • New Contributor

07-28-2022 10:09:23 PM

637 Views
0 replies
0 kudos

Dumpstech is the best platform, they provide best practice exam questions pdf, easy way to pass your exam in first attempt

Data Engineering

637 Views
0 replies
0 kudos

07-28-2022 10:09:23 PM

by shawncao • New Contributor II

04-26-2022 1:48:18 PM

2163 Views
3 replies
2 kudos

best practice of using data bricks API

Hello, I'm building a Databricks connector to allow users to issue command/SQL from a web app.In general, I think the REST API is okay to work with, though it's pretty tedious to write wrap code for each API call.[Q1]Is there an official (or semi-off...

Data Engineering

2163 Views
3 replies
2 kudos

04-26-2022 1:48:18 PM

View Replies

Latest Reply

shawncao
New Contributor II

07-25-2022 3:17:57 PM

2 kudos

I don't know if I fully understand DBX, sounds like a job client to manage jobs and deployment and I don't see NodeJS support for this project yet. My question was about how to "stream" query results back from Databricks in a NodeJs application, curr...

2 kudos

07-25-2022 3:17:57 PM

2 More Replies