cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aliacovella
by Contributor
  • 2997 Views
  • 3 replies
  • 1 kudos

Resolved! Custom Checkpointing

The following is my scenario:I need to query on a daily basis from an external table that maintains a row versionI would like to be able to query for all records where the row version is greater than the max previously processed row version. The sour...

  • 2997 Views
  • 3 replies
  • 1 kudos
Latest Reply
jeremy98
Honored Contributor
  • 1 kudos

Hi, I totally agree with VZLA, within my internal team we have a similar issue and we used a table to track the latest versions of each table, since we haven't a streaming process in our side. DLT pipelines could be a choice, but depends also if you ...

  • 1 kudos
2 More Replies
ashraf1395
by Honored Contributor
  • 2850 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks Workflow design

I have 7 - 8 different dlt pipelines which have to be run at the same time according to their batch type i.e. hourly and daily. Right now they are triggered effectively according to their batch type. I want to move to a next stage where I want to clu...

  • 2850 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi @VZLA , I got the idea. There will be a small change in the way, we will use it. Since we don't schedule the workflow in databricks we trigger it using the API. So I will pass a job parameter along with the trigger according to the timestamp wheth...

  • 0 kudos
2 More Replies
maddan80
by New Contributor II
  • 1304 Views
  • 3 replies
  • 0 kudos

History load from Source and

Hi As part of our requirement we wanted to load a huge historical data from the Source System to Databricks in Bronze and then process it to Gold, We wanted to use batch with read and Write so that the historical load is done and then for the delta o...

  • 1304 Views
  • 3 replies
  • 0 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 0 kudos

I imported 16 TB of data using ADF. In this scenario I'd create a process that will extract from a source data using ADF and then execute the rest of logic to populate tables in the gold. For the new data I'd create a separate process using Autoloade...

  • 0 kudos
2 More Replies
javiomotero
by New Contributor III
  • 3965 Views
  • 4 replies
  • 4 kudos

How to consume Fabric Datawarehouse inside a Databricks notebook

Hello,I'm having a hard time figuring out (and finding the right  documentation) to be able to connect my databricks notebook to consume tables from a fabric datawarehouse. I've checked this, but seems to work only with onelake and this, but I'm not ...

Data Engineering
datawarehouse
fabric
  • 3965 Views
  • 4 replies
  • 4 kudos
Latest Reply
javiomotero
New Contributor III
  • 4 kudos

Hello, I would like to get a bit more options regarding reading Views. Using the abfss is fine for reading tables, but I don't know how to load Views, which are visible in the SQL Endpoint. Is there any alternative for connecting to Fabric and be abl...

  • 4 kudos
3 More Replies
Avinash_Narala
by Databricks Partner
  • 2085 Views
  • 3 replies
  • 4 kudos

Redshift to Databricks Migration

Hi,I want a detailed plan steps to migrate my data from redshift to databricks.where to start, what to assess and what to migrate.It could really help me if you provide the detailed explaination on migration.Thanks in Advance.

  • 2085 Views
  • 3 replies
  • 4 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 4 kudos

I migrated Oracle to Databricks and have an experience with Redshift. The cost and effort will depend on your technical stuck:- What do you use for ETL?- What do you use for data ingestion?- Reporting tools?In general, simplest steps are: data and mo...

  • 4 kudos
2 More Replies
ahen
by New Contributor
  • 4853 Views
  • 1 replies
  • 0 kudos

Deployed DABs job via Gitlab CICD. It is creating duplicate jobs.

We had error in DABs deploy and then subsequent retries resulted in a locked stateAnd as suggested in the logs, we use --force-lock option and the deploy succeededHowever, it created duplicate jobs for all assets in the bundle instead of updating the...

  • 4853 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

@ahen When you used the --force-lock option during the Databricks Asset Bundle (DAB) deployment, it likely bypassed certain checks that would normally prevent duplicate resource creation. This option is used to force a deployment even when a lock is ...

  • 0 kudos
shubham_007
by Contributor III
  • 3242 Views
  • 6 replies
  • 0 kudos

Resolved! Need urgent help and guidance on information/details with reference links on below topics:

Dear experts,I need urgent help and guidance on information/details with reference links on below topics:Steps on Package Installation with Serverless in Databricks.What are Delta Lake Connector with serverless ? How to run Delta Lake queries outside...

  • 3242 Views
  • 6 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Were you able to review the documentation provided here: https://docs.databricks.com/en/compute/serverless/dependencies.html#install-notebook-dependencies?

  • 0 kudos
5 More Replies
mrkure
by New Contributor II
  • 1237 Views
  • 2 replies
  • 0 kudos

Databricks connect, set spark config

Hi, Iam using databricks connect to compute with databricks cluster. I need to set some spark configurations, namely spark.files.ignoreCorruptFiles. As I have experienced, setting spark configuration in databricks connect for the current session, has...

  • 1237 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Have you tried setting it up in your code as: from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder \ .appName("YourAppName") \ .config("spark.files.ignoreCorruptFiles", "true") \ .getOrCreate() # Yo...

  • 0 kudos
1 More Replies
Buranapat
by New Contributor II
  • 2830 Views
  • 4 replies
  • 4 kudos

Error when accessing 'num_inserted_rows' in Spark SQL (DBR 15.4 LTS)

Hello Databricks Community,I've encountered an issue while trying to capture the number of rows inserted after executing an SQL insert statement in Databricks (DBR 15.4 LTS). My code is attempting to access the number of inserted rows as follows: row...

Buranapat_4-1727751428815.png Buranapat_3-1727750986067.png
  • 2830 Views
  • 4 replies
  • 4 kudos
Latest Reply
GeorgeP1
Databricks Partner
  • 4 kudos

Hi,we are experiencing the same issue. We also turned on liquid clustering on table and we had additional checks on the inserted data information, which was really helpful.@GavinReeves3 did you manage to solve the issue?@MuthuLakshmi any idea? Thank ...

  • 4 kudos
3 More Replies
zg
by New Contributor III
  • 2365 Views
  • 4 replies
  • 3 kudos

Resolved! Unable to Create Alert Using API

Hi All, I'm trying to create an alert using the Databricks REST API, but I keep encountering the following error:Error creating alert: 400 {"message": "Alert name cannot be empty or whitespace"}:{"alert": {"seconds_to_retrigger": 0,"display_name": "A...

  • 2365 Views
  • 4 replies
  • 3 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 3 kudos

Hi @zg ,You are sending the payload related to the new endpoint (/api/2.0/sql/alerts) to the old endpoint (/api/2.0/preview/sql/alerts).That are the docs of the old endpoint:https://docs.databricks.com/api/workspace/alertslegacy/createAs you can see ...

  • 3 kudos
3 More Replies
Mattias
by New Contributor II
  • 2640 Views
  • 3 replies
  • 0 kudos

How to increase timeout in Databricks Workflows DBT task

Hi,I have a Databricks Workflows DBT task that targets a PRO SQL warehouse. However, the task fails with a "to many retries" error (see below) if the PRO SQL warehouse is not up and running when the task starts. How can I increase the timeout or allo...

  • 2640 Views
  • 3 replies
  • 0 kudos
Latest Reply
Mattias
New Contributor II
  • 0 kudos

One option seems to be to reference a custom "profiles.yml" in the job configuration and specify a custom DBT Databricks connector timeout there (https://docs.getdbt.com/docs/core/connect-data-platform/databricks-setup#additional-parameters).However,...

  • 0 kudos
2 More Replies
Mkk1
by New Contributor
  • 1703 Views
  • 1 replies
  • 0 kudos

Joining tables across DLT pipelines

How can I join a silver table (s1) from a DLT pipeline (D1) to another silver table (S2) from a different DLT pipeline (D2)?#DLT #DeltaLiveTables

  • 1703 Views
  • 1 replies
  • 0 kudos
Latest Reply
JothyGanesan
New Contributor III
  • 0 kudos

@Mkk1 Did you get to get this completed? We are in the similar situation, how did you get to acheive this?

  • 0 kudos
MAHANK
by New Contributor II
  • 3699 Views
  • 3 replies
  • 0 kudos

How to compare two databricks notebooks which are in different folders? note we dont have GIT setup

we would to like compare two notebooks which are in different folders , we are yet set up a GIT repo for these folders.?what are the other options we have to compare two notebooks?thanksNAnda  

  • 3699 Views
  • 3 replies
  • 0 kudos
Latest Reply
arekmust
New Contributor III
  • 0 kudos

Then using the Repos and Git (GitHub/Azure DevOps) is the way to go!

  • 0 kudos
2 More Replies
MatthewMills
by Databricks Partner
  • 5592 Views
  • 3 replies
  • 7 kudos

Resolved! DLT Apply Changes Tables corrupt

Got a weird DLT error.Test harness using the new(ish) 'Apply Changes from Snapshot' Functionality and DLT Serverless (Current Channel). Azure Aus East Region.Has been working for several months without issue - but within the last week these DLT table...

Data Engineering
Apply Changes From Snapshot
dlt
  • 5592 Views
  • 3 replies
  • 7 kudos
Latest Reply
Lakshay
Databricks Employee
  • 7 kudos

We have an open ticket on this issue. The issue is caused by the maintenance pipeline renaming the backing table. We expect the fix to be rolled out soon for this issue.

  • 7 kudos
2 More Replies
shubham_007
by Contributor III
  • 1427 Views
  • 1 replies
  • 0 kudos

Urgent !! Need information/details and reference link on below two topics:

Dear experts,I need urgent help and guidance on information/details with reference links on below topics:Steps on Package Installation with Serverless in Databricks.What are Delta Lake Connector with serverless ? How to run Delta Lake queries outside...

  • 1427 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Seems like a duplicate: https://community.databricks.com/t5/data-engineering/urgent-need-information-details-and-reference-link-on-below-two/td-p/107260

  • 0 kudos
Labels