cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ShivangiB
by New Contributor II
  • 347 Views
  • 2 replies
  • 0 kudos

Resolved! Fatctors deciding to choose between zorder, partitioning and liquid clustering

What are the factors on which we should choose the optimization approach

  • 347 Views
  • 2 replies
  • 0 kudos
Latest Reply
canadiandataguy
New Contributor III
  • 0 kudos

I have built a decision tree on how to think about it https://www.canadiandataguy.com/p/optimizing-delta-lake-tables-liquid?triedRedirect=true

  • 0 kudos
1 More Replies
Avinash_Narala
by Valued Contributor II
  • 5802 Views
  • 2 replies
  • 1 kudos

Resolved! Liquid clustering vs partitioning

Hi,Is liquid clustering a replacement to partitioning?should we use still partitioning when we use liquid clustering?Can we use liquid clustering for all cases and ignore partitioning?

  • 5802 Views
  • 2 replies
  • 1 kudos
Latest Reply
canadiandataguy
New Contributor III
  • 1 kudos

I have built a decision tree on how to think about it https://www.canadiandataguy.com/p/optimizing-delta-lake-tables-liquid?triedRedirect=true

  • 1 kudos
1 More Replies
shubhamM
by New Contributor II
  • 627 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks File Trigger Limit

For Databricks File Trigger below limitation is mentioned.A storage location configured for a file arrival trigger can contain only up to 10,000 files. Locations with more files cannot be monitored for new file arrivals. If the configured storage loc...

  • 627 Views
  • 4 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

Your approach to managing the number of BLOBs in your Azure BLOB storage by moving older files to an archive directory is reasonable and can help ensure you do not exceed the 10,000 file limit in the monitored directories. This method will help keep ...

  • 2 kudos
3 More Replies
felix_counter
by New Contributor III
  • 14756 Views
  • 6 replies
  • 3 kudos

How to authenticate databricks provider in terraform using a system-managed identity?

Hello,I want to authenticate the databricks provider using a system-managed identity in Azure. The identity resides in a different subscription than the databricks workspace: According to the "authentication" section of the databricks provider docume...

managed identity.png
Data Engineering
authentication
databricks provider
managed identity
Terraform
  • 14756 Views
  • 6 replies
  • 3 kudos
Latest Reply
LuisArs
New Contributor II
  • 3 kudos

Hello,There is a solution for this issue?, I'm facing similar issue on Azure devops with managed identity too.│ Error: cannot read spark version: cannot read data spark version: failed during request visitor: inner token: token request: {"error":"inv...

  • 3 kudos
5 More Replies
AbkSpl
by New Contributor III
  • 12229 Views
  • 8 replies
  • 6 kudos

Resolved! Making a connection to the tables in Dynamics app through the dataverse TDS endpoint

I wish to do some analysis on tables that are stored in dataverse in databricks. I know that PowerBi uses its Dataverse connector to fetch the data using a Dataverse's TDS endpoint. The tables that we import in PowerBi using this connector is nearly ...

  • 12229 Views
  • 8 replies
  • 6 kudos
Latest Reply
NavinW
New Contributor II
  • 6 kudos

Did you manage to connect to dataverse from Databricks ?I am trying to do the same but no luck.,

  • 6 kudos
7 More Replies
jeremy98
by Contributor III
  • 945 Views
  • 6 replies
  • 3 kudos

Best practice on how to create a configuration yaml files for each workspace environment based?

Hi Community,My team and I are working on refactoring our DAB repository, and we’re considering creating a configuration folder based on our environments—Dev, Staging, and Production workspaces.What would be a common and best practice for structuring...

  • 945 Views
  • 6 replies
  • 3 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 3 kudos

Hi @jeremy98 and all, I agree with @saurabh18cs . Having configuration files for each deployment target is a very convenient and manageable solution. Since I couldn't find a plain example showing the project structure, I created one here. https://git...

  • 3 kudos
5 More Replies
jeremy98
by Contributor III
  • 437 Views
  • 2 replies
  • 1 kudos

Resolved! how to get schedule information about a job in databricks?

Hi community,I was reading the Databricks API documentation and I want to get information about one job if this is schedule with the status PAUSED or UNPAUSED. I was watching that there is this api call: https://docs.databricks.com/api/workspace/jobs...

  • 437 Views
  • 2 replies
  • 1 kudos
Latest Reply
KaranamS
Contributor III
  • 1 kudos

Hi @jeremy98 , It looks like the access token is incorrect or not valid. Can you please verify the following?1. Validate your access token - if you get 403 forbidden error, your access token is invalid.curl -X GET "https://<workspace_host>/api/2.2/jo...

  • 1 kudos
1 More Replies
antr
by New Contributor II
  • 470 Views
  • 3 replies
  • 0 kudos

DLT full refresh and resets

When doing a full refresh in DLT, the ables seem to be in a reset/empty state until they're populated again. This can break downstream dependencies, if they try to use the data during pipeline execution.How to handle such case properly?

  • 470 Views
  • 3 replies
  • 0 kudos
Latest Reply
Advika
Databricks Employee
  • 0 kudos

Hello @antr! In DLT, a full refresh on a streaming table resets state processing and checkpoint data, potentially disrupting downstream processes that rely on it. To avoid this, use incremental updates (default) or append mode instead of full refresh...

  • 0 kudos
2 More Replies
Shivap
by New Contributor III
  • 222 Views
  • 3 replies
  • 0 kudos

Need to extract data from delta tables and need to move it to on-prem, what's the best approach

I want to extract data from databricks delta tables and need to move it to on-prem what's the best way to accomplish it -

  • 222 Views
  • 3 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Valued Contributor II
  • 0 kudos

An easy way to do is to use Airbyte. You can run Airbyte locally, connect to databricks and copy the data to your OnPrem location.https://docs.airbyte.com/integrations/destinations/databricks

  • 0 kudos
2 More Replies
Rasputin312
by New Contributor II
  • 665 Views
  • 1 replies
  • 1 kudos

Resolved! Widgets Not Displaying

I am trying to run this attention visualization in my Databricks notebook.   This is my code and this is the error I get:```from IPython.display import display, Javascriptimport ipywidgets as widgetsfrom ipywidgets import interactfrom transformers im...

  • 665 Views
  • 1 replies
  • 1 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 1 kudos

Hi @Rasputin312 ! I was able to render the visualization with bertviz library. The default moview_view html_action is view that does not work with Databricks notebook. Instead, using the returned HTML, we can visualize the model. display(model_view(a...

  • 1 kudos
Kayla
by Valued Contributor II
  • 392 Views
  • 2 replies
  • 2 kudos

Resolved! Scheduled Workflow options and DST Change

So, I have a workflow that runs 2:35 am daily.Is there really no way to configure that so it isn't completely skipped during the spring time change?

  • 392 Views
  • 2 replies
  • 2 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 2 kudos

Hi @Kayla ,i suggest best solution would be to use UTC. Even databricks recommends that.Or shifting the job 30mins - 1 hr

  • 2 kudos
1 More Replies
ramy
by New Contributor II
  • 2404 Views
  • 5 replies
  • 2 kudos

Getting JOB-ID dynamically to create another job to refer as job-task

I am trying to create a new job in Databricks Asset Bundles which refers to another job-task and passing parameters to it. However, the previous job is not created yet (Or will be cretead using Databricks asset bundles in higher envs when deploying t...

  • 2404 Views
  • 5 replies
  • 2 kudos
Latest Reply
priya12
New Contributor II
  • 2 kudos

The lookup works. Here is how it can be used for a job existing outside the asset bundlevariables: my_jobid:    description: Enter the Databricks Job name you want to refer.    lookup:      job: 'My Job1'In the resources section, refer...

  • 2 kudos
4 More Replies
asisaarav
by New Contributor
  • 206 Views
  • 1 replies
  • 0 kudos

Error : The spark driver has stopped unexpectedly and is restarting

Hi community,Getting an error in the code: Error : The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically restarted. Cancel you help here in understanding what methods we can use to get it fixed. I tried look...

  • 206 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Valued Contributor III
  • 0 kudos

The error message indicates an issue with the Spark driver in your Databricks environment. This can be caused by various factors such as:Check Cluster Configuration: Ensure that your Databricks cluster has sufficient resources (CPU, memory) to handle...

  • 0 kudos
aravind-ey
by New Contributor
  • 370 Views
  • 3 replies
  • 0 kudos

vocareum lab access

Hi I am doing a data engineering course in databricks(Partner labs) and would like to have access to vocareum workspace to practice using the demo sessions.can you please help me to get the access to this workspace?regards,Aravind

  • 370 Views
  • 3 replies
  • 0 kudos
Latest Reply
twnlBO
New Contributor II
  • 0 kudos

Can you please provide links? screenshot? more info? This answer is not specific enough.I'm taking the Data Analysis learning path, there are different demos I'd like to practice and there are no SP Lab environment links as mentioned in the videos.

  • 0 kudos
2 More Replies
Adrianj
by New Contributor III
  • 12317 Views
  • 16 replies
  • 11 kudos

Databricks Bundles - How to select which jobs resources to deploy per target?

Hello, My team and I are experimenting with bundles, we follow the pattern of having one main file Databricks.yml and each job definition specified in a separate yaml for modularization. We wonder if it is possible to select from the main Databricks....

  • 12317 Views
  • 16 replies
  • 11 kudos
Latest Reply
sergiopolimante
New Contributor II
  • 11 kudos

"This include array can appear only as a top-level mapping." - you can't use include inside targets. You can use sync - exclude to exclude the yml files, but if they are in the include the workflows are going to be created anyway, even if the yml fil...

  • 11 kudos
15 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels