cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

yvuignie
by Contributor
  • 2888 Views
  • 11 replies
  • 3 kudos

Resolved! Unity catalog - How do you modify groups properly ?

Hello,What is the best practice to modify/delete/recreate groups properly ?In order to rename a group, the only mean was to delete/recreate. But after deletion in the account console, the permissions granted to the deleted groups in the tables were i...

  • 2888 Views
  • 11 replies
  • 3 kudos
Latest Reply
RobinK
New Contributor III
  • 3 kudos

Hello,I have exactly the same issue - I am also using terraform.I deleted a group and the catalog permissions are in bad state.  I am not able to revoke access to this group using the Databricks UI nor REST API. I also tried to recreate the group wit...

  • 3 kudos
10 More Replies
Mr__D
by New Contributor II
  • 2214 Views
  • 7 replies
  • 1 kudos

Resolved! Writing modular code in Databricks

Hi All, Could you please suggest to me the best way to write PySpark code in Databricks,I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the res...

  • 2214 Views
  • 7 replies
  • 1 kudos
Latest Reply
Gamlet
New Contributor II
  • 1 kudos

Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks...

  • 1 kudos
6 More Replies
andrew0117
by Contributor
  • 489 Views
  • 1 replies
  • 0 kudos

what is best practice to handle the concurrency issue in batch processing?

Normally, our ELT framework takes in batches one by one and loads the data into target tables. But if more than one batches come in at the same time, the framework will break due to the concurrency issue that multiple sources are trying to write the ...

  • 489 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

you can partition you table to avoid the changes of getting this exception.

  • 0 kudos
Anonymous
by Not applicable
  • 5983 Views
  • 3 replies
  • 0 kudos

Resolved! What is the best practice for managing different environments (Staging vs Production) on Databricks?

Should we create separate workspaces for Dev/Test/Prod ? Or should we have 1 workspace and create separate folders for Dev/Test/Prod?

  • 5983 Views
  • 3 replies
  • 0 kudos
Latest Reply
Srikanth_Gupta_
Valued Contributor
  • 0 kudos

as per my previous experience, its always good to have different workspaces for different envs, its easy to maintain and helps better with CICD pipeline as well, because lot of organizations provide deployment access to Developers in Dev env but not ...

  • 0 kudos
2 More Replies
Trung
by Contributor
  • 739 Views
  • 4 replies
  • 0 kudos

best practice for managing resources to avoid creators leaving out of the project

Dear DB community!As I know, when the resource creator is out of the project all resources they create will be deleted as well. So the question is that: can I assign the owner role to a group, that will help protect the resource from deletion or not...

  • 739 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @trung nguyen​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback...

  • 0 kudos
3 More Replies
chanansh
by Contributor
  • 890 Views
  • 2 replies
  • 0 kudos

Delta table acceleration for group by on key columns using ZORDER does not work

What is the best practice for accelerating queries which looks like the following?win = Window.partitionBy('key1','key2').orderBy('timestamp') df.select('timestamp', (F.col('col1') - F.lag('col1').over(win)).alias('col1_diff'))I have tried to use OP...

  • 890 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Hanan Shteingart​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answ...

  • 0 kudos
1 More Replies
flobib123
by New Contributor III
  • 2992 Views
  • 6 replies
  • 3 kudos

Resolved! Syntax highlight support for python multiline SQL strings

I would like to know if a tool is available in Databricks to manage the SQL syntax highlight in text.Like this VSCode plugin:https://marketplace.visualstudio.com/items?itemName=ptweir.python-string-sqlThank you.

  • 2992 Views
  • 6 replies
  • 3 kudos
Latest Reply
flobib123
New Contributor III
  • 3 kudos

Hello @Vidula Khanna​ ,No, no one answered my question correctly, but I'm using PySpark now, so I'm not on this topic anymore.But thank you for taking the time to answer me.

  • 3 kudos
5 More Replies
AJDJ
by New Contributor III
  • 1327 Views
  • 5 replies
  • 10 kudos

Cost as per the Databricks demo

Hi there,I came across this Databricks demo from the below link. https://youtu.be/BqB7YQ1-KKcKindly Fastforward to time 16:30 or 16:45 of the video and watch few mins of the video related to cost. My understanding is the data is in the lake and datab...

  • 1327 Views
  • 5 replies
  • 10 kudos
Latest Reply
Anonymous
Not applicable
  • 10 kudos

Hi @AJ DJ​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 10 kudos
4 More Replies
Gim
by Contributor
  • 39889 Views
  • 5 replies
  • 9 kudos

Best practice for logging in Databricks notebooks?

What is the best practice for logging in Databricks notebooks? I have a bunch of notebooks that run in parallel through a workflow. I would like to keep track of everything that happens such as errors coming from a stream. I would like these logs to ...

  • 39889 Views
  • 5 replies
  • 9 kudos
Latest Reply
Kaniz
Community Manager
  • 9 kudos

Hi @Gimwell Young​ , We haven’t heard from you on the last response from @karthik p​​, @Debayan Mukherjee​ and @Hubert Dudek​  and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that wit...

  • 9 kudos
4 More Replies
alejandrofm
by Valued Contributor
  • 919 Views
  • 4 replies
  • 2 kudos

Resolved! Orphan (?) files on Databricks S3 bucket

Hi, I'm seeing a lot of empty (and not) directories on routes like:xxxxxx.jobs/FileStore/job-actionstats/xxxxxx.jobs/FileStore/job-result/xxxxxx.jobs/command-results/Can I create a lifecycle to delete old objects (files/directories)? how many days? w...

  • 919 Views
  • 4 replies
  • 2 kudos
Latest Reply
alejandrofm
Valued Contributor
  • 2 kudos

Hi! I didn't know that, Purging right now, is there a way to schedule that so logs are retained for less time? Maybe I want to maintain the last 7 days for everything?Thanks!

  • 2 kudos
3 More Replies
j02424
by New Contributor
  • 1513 Views
  • 2 replies
  • 4 kudos

Best practice to delete /dbfs/tmp ?

What is best practice regarding the tmp folder? We have a very large amount of data in that folder and not sure whether to delete, back up etc?

  • 1513 Views
  • 2 replies
  • 4 kudos
Latest Reply
Kaniz
Community Manager
  • 4 kudos

Hi @James Owen​, We haven’t heard from you on the last response from @Debayan Mukherjee​, and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please do share that with the community as it can be helpful to...

  • 4 kudos
1 More Replies
palzor
by New Contributor III
  • 387 Views
  • 0 replies
  • 2 kudos

What is the best practice while loading delta table , do I infer the schema or provide the schema?

I am loading avro files into the detla tables. I am doing this for multiple tables and some files are big like (2-3GB) and most of them are small like in few MBs.I am using autoloader to load the data into the delta tables.My question is:What is the ...

  • 387 Views
  • 0 replies
  • 2 kudos
shawncao
by New Contributor II
  • 856 Views
  • 3 replies
  • 2 kudos

best practice of using data bricks API

Hello, I'm building a Databricks connector to allow users to issue command/SQL from a web app.In general, I think the REST API is okay to work with, though it's pretty tedious to write wrap code for each API call.[Q1]Is there an official (or semi-off...

  • 856 Views
  • 3 replies
  • 2 kudos
Latest Reply
shawncao
New Contributor II
  • 2 kudos

I don't know if I fully understand DBX, sounds like a job client to manage jobs and deployment and I don't see NodeJS support for this project yet. My question was about how to "stream" query results back from Databricks in a NodeJs application, curr...

  • 2 kudos
2 More Replies
enri_casca
by New Contributor III
  • 4444 Views
  • 13 replies
  • 2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \  .format("snowflake") \  .options(**options) \  .option('qu...

  • 4444 Views
  • 13 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

can you check this SO topic?

  • 2 kudos
12 More Replies
Labels