cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

chitrar
by New Contributor III
  • 3140 Views
  • 9 replies
  • 4 kudos

workflow/lakeflow -why does it not capture all the metadata of the jobs/tasks

Hi, I see with unity catalog we have the workflow and now the lakeflow schema.   I guess the intention is to capture audit logs of changes/ monitor runs but I wonder why we don't have all the  metadata  info on the jobs /tasks too for a given job   =...

  • 3140 Views
  • 9 replies
  • 4 kudos
Latest Reply
chitrar
New Contributor III
  • 4 kudos

@Sujitha  so, we can expect these enhancements in the "near" future ?

  • 4 kudos
8 More Replies
turagittech
by Contributor
  • 2716 Views
  • 3 replies
  • 1 kudos

External Table refresh

Hi,I have a blob storage area in Azure where json files are being created. I can create an external table on the storage blob container, but when new files are added I don't see extra rows to query the data. Is there a better approach to accessing th...

  • 2716 Views
  • 3 replies
  • 1 kudos
Latest Reply
Nivethan_Venkat
Databricks MVP
  • 1 kudos

Hi @turagittech,The above error indicates that your table seems to be in DELTA format. Please check the table creation statement, if the table format is JSON or DELTA.PS: By default, if you are not specifying any format while creating the table on to...

  • 1 kudos
2 More Replies
Walter_N
by New Contributor II
  • 1505 Views
  • 2 replies
  • 0 kudos

Resolved! DLT pipeline task with full refresh once in a while

Hi all, I'm using Databricks workflow with some dlt pipeline tasks. These tasks requires a full refresh at some times due to schema changes in the source. I've been doing the full refresh manually or set the full refresh option in the job settings, t...

  • 1505 Views
  • 2 replies
  • 0 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 0 kudos

Hi,Did you check a possibility to use if/else task? You could define some scriteria and pass it from a notebok that will check if it's time for full refresh or just resfres.

  • 0 kudos
1 More Replies
scorpusfx1
by Databricks Partner
  • 1789 Views
  • 4 replies
  • 0 kudos

Delta Live Table SCD2 performance issue

Hi Community,I am working on ingestion pipelines that take data from Parquet files (200 MB per day) and integrate them into my Lakehouse. This data is used to create an SCD Type 2 using apply_changes, with the row ID as the key and the file date as t...

Data Engineering
apply_change
dlt
SCD2
  • 1789 Views
  • 4 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Databricks Partner
  • 0 kudos

hi @scorpusfx1 What kind of source data do you have? Are these parquet files daily full snapshots of source tables? If so, you should use apply_changes_from_snapshot, which is exactly built for this use case. https://docs.databricks.com/aws/en/dlt/py...

  • 0 kudos
3 More Replies
analytics_eng
by New Contributor III
  • 7687 Views
  • 4 replies
  • 1 kudos

Connection reset by peer logging when importing custom package

Hi! I'm trying to import a custom package I published to Azure Artifacts, but I keep seeing the INFO logging below, which I don't want to display. The package was installed correctly on the cluster, and it imports successfully, but the log still appe...

  • 7687 Views
  • 4 replies
  • 1 kudos
Latest Reply
siklosib
New Contributor II
  • 1 kudos

What solved this problem for me is to remove the root logger configuration from the logging config and create another one within the loggers section. See below.{ 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'simple...

  • 1 kudos
3 More Replies
nhuthao
by New Contributor II
  • 1531 Views
  • 5 replies
  • 1 kudos

SQL is not enabled

Hi All,I have registered on Databricks successfully. However, SQL is not enabled.Please help me how to activate SQL.Thank you very much,

nhuthao_0-1741243500074.png
  • 1531 Views
  • 5 replies
  • 1 kudos
Latest Reply
Stefan-Koch
Databricks Partner
  • 1 kudos

@nhuthao How did you solved it? What was the problem?

  • 1 kudos
4 More Replies
ShivangiB
by New Contributor III
  • 3033 Views
  • 2 replies
  • 0 kudos

Resolved! Fatctors deciding to choose between zorder, partitioning and liquid clustering

What are the factors on which we should choose the optimization approach

  • 3033 Views
  • 2 replies
  • 0 kudos
Latest Reply
canadiandataguy
New Contributor III
  • 0 kudos

I have built a decision tree on how to think about it https://www.canadiandataguy.com/p/optimizing-delta-lake-tables-liquid?triedRedirect=true

  • 0 kudos
1 More Replies
Avinash_Narala
by Databricks Partner
  • 13349 Views
  • 2 replies
  • 1 kudos

Resolved! Liquid clustering vs partitioning

Hi,Is liquid clustering a replacement to partitioning?should we use still partitioning when we use liquid clustering?Can we use liquid clustering for all cases and ignore partitioning?

  • 13349 Views
  • 2 replies
  • 1 kudos
Latest Reply
canadiandataguy
New Contributor III
  • 1 kudos

I have built a decision tree on how to think about it https://www.canadiandataguy.com/p/optimizing-delta-lake-tables-liquid?triedRedirect=true

  • 1 kudos
1 More Replies
shubhamM
by New Contributor II
  • 2691 Views
  • 4 replies
  • 2 kudos

Resolved! Databricks File Trigger Limit

For Databricks File Trigger below limitation is mentioned.A storage location configured for a file arrival trigger can contain only up to 10,000 files. Locations with more files cannot be monitored for new file arrivals. If the configured storage loc...

  • 2691 Views
  • 4 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

Your approach to managing the number of BLOBs in your Azure BLOB storage by moving older files to an archive directory is reasonable and can help ensure you do not exceed the 10,000 file limit in the monitored directories. This method will help keep ...

  • 2 kudos
3 More Replies
AbkSpl
by New Contributor III
  • 15803 Views
  • 8 replies
  • 6 kudos

Resolved! Making a connection to the tables in Dynamics app through the dataverse TDS endpoint

I wish to do some analysis on tables that are stored in dataverse in databricks. I know that PowerBi uses its Dataverse connector to fetch the data using a Dataverse's TDS endpoint. The tables that we import in PowerBi using this connector is nearly ...

  • 15803 Views
  • 8 replies
  • 6 kudos
Latest Reply
NavinW
New Contributor II
  • 6 kudos

Did you manage to connect to dataverse from Databricks ?I am trying to do the same but no luck.,

  • 6 kudos
7 More Replies
jeremy98
by Honored Contributor
  • 10370 Views
  • 6 replies
  • 3 kudos

Best practice on how to create a configuration yaml files for each workspace environment based?

Hi Community,My team and I are working on refactoring our DAB repository, and we’re considering creating a configuration folder based on our environments—Dev, Staging, and Production workspaces.What would be a common and best practice for structuring...

  • 10370 Views
  • 6 replies
  • 3 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 3 kudos

Hi @jeremy98 and all, I agree with @saurabh18cs . Having configuration files for each deployment target is a very convenient and manageable solution. Since I couldn't find a plain example showing the project structure, I created one here. https://git...

  • 3 kudos
5 More Replies
jeremy98
by Honored Contributor
  • 2079 Views
  • 2 replies
  • 1 kudos

Resolved! how to get schedule information about a job in databricks?

Hi community,I was reading the Databricks API documentation and I want to get information about one job if this is schedule with the status PAUSED or UNPAUSED. I was watching that there is this api call: https://docs.databricks.com/api/workspace/jobs...

  • 2079 Views
  • 2 replies
  • 1 kudos
Latest Reply
KaranamS
Contributor III
  • 1 kudos

Hi @jeremy98 , It looks like the access token is incorrect or not valid. Can you please verify the following?1. Validate your access token - if you get 403 forbidden error, your access token is invalid.curl -X GET "https://<workspace_host>/api/2.2/jo...

  • 1 kudos
1 More Replies
antr
by Databricks Partner
  • 3036 Views
  • 3 replies
  • 0 kudos

DLT full refresh and resets

When doing a full refresh in DLT, the ables seem to be in a reset/empty state until they're populated again. This can break downstream dependencies, if they try to use the data during pipeline execution.How to handle such case properly?

  • 3036 Views
  • 3 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @antr! In DLT, a full refresh on a streaming table resets state processing and checkpoint data, potentially disrupting downstream processes that rely on it. To avoid this, use incremental updates (default) or append mode instead of full refresh...

  • 0 kudos
2 More Replies
Shivap
by New Contributor III
  • 1025 Views
  • 3 replies
  • 0 kudos

Need to extract data from delta tables and need to move it to on-prem, what's the best approach

I want to extract data from databricks delta tables and need to move it to on-prem what's the best way to accomplish it -

  • 1025 Views
  • 3 replies
  • 0 kudos
Latest Reply
Stefan-Koch
Databricks Partner
  • 0 kudos

An easy way to do is to use Airbyte. You can run Airbyte locally, connect to databricks and copy the data to your OnPrem location.https://docs.airbyte.com/integrations/destinations/databricks

  • 0 kudos
2 More Replies
Rasputin312
by Databricks Partner
  • 2048 Views
  • 1 replies
  • 1 kudos

Resolved! Widgets Not Displaying

I am trying to run this attention visualization in my Databricks notebook.   This is my code and this is the error I get:```from IPython.display import display, Javascriptimport ipywidgets as widgetsfrom ipywidgets import interactfrom transformers im...

  • 2048 Views
  • 1 replies
  • 1 kudos
Latest Reply
koji_kawamura
Databricks Employee
  • 1 kudos

Hi @Rasputin312 ! I was able to render the visualization with bertviz library. The default moview_view html_action is view that does not work with Databricks notebook. Instead, using the returned HTML, we can visualize the model. display(model_view(a...

  • 1 kudos
Labels