cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

noorbasha534
by Valued Contributor II
  • 6437 Views
  • 2 replies
  • 0 kudos

Lakehouse Monitoring & Expectations

DearsHas anyone successfully used at scale the lakehouse monitoring & expectations features together to measure data quality of data tables - example, to conduct freshness checks, consistency checks etc.Appreciate if you could share the lessons learn...

  • 6437 Views
  • 2 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

Not sure if you are still looking for the same. Here is a medium article - https://piethein.medium.com/data-quality-within-lakehouses-0c9417ce0487  that you can see the detailed implementation

  • 0 kudos
1 More Replies
RobsonNLPT
by Contributor III
  • 1177 Views
  • 2 replies
  • 0 kudos

Spark Configurations with Serverless Compute

I have some few problems to convert my notebooks run run with serverless compute.Right now I can't set my delta userMetadata at session and  scope level using spark or sql.Setting userMetadata at dataframe write operation is ok using the option: opti...

  • 1177 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @RobsonNLPT, There is an internal feature request for this use-case. https://databricks.aha.io/ideas/ideas/DB-I-12401 and it's under idea and not ETA on its implementation yet.

  • 0 kudos
1 More Replies
ankitmit
by New Contributor III
  • 1233 Views
  • 2 replies
  • 0 kudos

Unknown Location of files for tables created using DLT

Hi all,I created catalog and schema using managed location  But I don’t see any catalogs directory within the s3 bucket path mentioned in the image above.  Also, I created a schema with managed location, and I expected all the tables created within t...

ankitmit_0-1738140152440.png ankitmit_1-1738140152445.png ankitmit_2-1738140152449.png
Data Engineering
Databricks
dlt
Storage
  • 1233 Views
  • 2 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hello, thank you for your question. Could you confirm that your issue is regarding the location of files for tables created using Delta Live Tables (DLT) when utilizing managed storage locations at the catalog and schema levels? Specifically, it seem...

  • 0 kudos
1 More Replies
Dulce42
by New Contributor II
  • 2446 Views
  • 1 replies
  • 0 kudos

Exports history chats from genie space

Hi community!In the last days I search about how I can export the history chats from my genie space, but I couldn't find something Some of you will have done this exercise so you can guide me?

  • 2446 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi, thank you for the question! I haven't done this myself, but for context are you referring to AI/BI Genie Space? e.g.: https://docs.databricks.com/en/genie/index.htmlhttps://learn.microsoft.com/en-us/azure/databricks/genie/ If so, then it doesn't ...

  • 0 kudos
gilt
by New Contributor III
  • 4126 Views
  • 9 replies
  • 2 kudos

Auto Loader ignores data with modifiedBefore

Hello, I am trying to ingest CSV data with Auto Loader from an Azure Data Lake. I want to perform batch ingestion by using a scheduled job and the following trigger:  .trigger(availableNow=True) The CSV files are generated by Azure Synapse Link. If m...

  • 4126 Views
  • 9 replies
  • 2 kudos
Latest Reply
kostoska
New Contributor II
  • 2 kudos

Databricks should resolve this and introduce two options: soft modifiedBefore and hard modifiedBefore (files that are going to be ingored forever). In addition, this is not explained in the documentation, so it is a bit frustrating as it is not intui...

  • 2 kudos
8 More Replies
aliacovella
by Contributor
  • 3227 Views
  • 3 replies
  • 1 kudos

Resolved! Custom Checkpointing

The following is my scenario:I need to query on a daily basis from an external table that maintains a row versionI would like to be able to query for all records where the row version is greater than the max previously processed row version. The sour...

  • 3227 Views
  • 3 replies
  • 1 kudos
Latest Reply
jeremy98
Honored Contributor
  • 1 kudos

Hi, I totally agree with VZLA, within my internal team we have a similar issue and we used a table to track the latest versions of each table, since we haven't a streaming process in our side. DLT pipelines could be a choice, but depends also if you ...

  • 1 kudos
2 More Replies
ashraf1395
by Honored Contributor
  • 3039 Views
  • 3 replies
  • 0 kudos

Resolved! Databricks Workflow design

I have 7 - 8 different dlt pipelines which have to be run at the same time according to their batch type i.e. hourly and daily. Right now they are triggered effectively according to their batch type. I want to move to a next stage where I want to clu...

  • 3039 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Honored Contributor
  • 0 kudos

Hi @VZLA , I got the idea. There will be a small change in the way, we will use it. Since we don't schedule the workflow in databricks we trigger it using the API. So I will pass a job parameter along with the trigger according to the timestamp wheth...

  • 0 kudos
2 More Replies
maddan80
by New Contributor II
  • 1405 Views
  • 3 replies
  • 0 kudos

History load from Source and

Hi As part of our requirement we wanted to load a huge historical data from the Source System to Databricks in Bronze and then process it to Gold, We wanted to use batch with read and Write so that the historical load is done and then for the delta o...

  • 1405 Views
  • 3 replies
  • 0 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 0 kudos

I imported 16 TB of data using ADF. In this scenario I'd create a process that will extract from a source data using ADF and then execute the rest of logic to populate tables in the gold. For the new data I'd create a separate process using Autoloade...

  • 0 kudos
2 More Replies
javiomotero
by New Contributor III
  • 4481 Views
  • 4 replies
  • 4 kudos

How to consume Fabric Datawarehouse inside a Databricks notebook

Hello,I'm having a hard time figuring out (and finding the right  documentation) to be able to connect my databricks notebook to consume tables from a fabric datawarehouse. I've checked this, but seems to work only with onelake and this, but I'm not ...

Data Engineering
datawarehouse
fabric
  • 4481 Views
  • 4 replies
  • 4 kudos
Latest Reply
javiomotero
New Contributor III
  • 4 kudos

Hello, I would like to get a bit more options regarding reading Views. Using the abfss is fine for reading tables, but I don't know how to load Views, which are visible in the SQL Endpoint. Is there any alternative for connecting to Fabric and be abl...

  • 4 kudos
3 More Replies
Avinash_Narala
by Databricks Partner
  • 2274 Views
  • 3 replies
  • 4 kudos

Redshift to Databricks Migration

Hi,I want a detailed plan steps to migrate my data from redshift to databricks.where to start, what to assess and what to migrate.It could really help me if you provide the detailed explaination on migration.Thanks in Advance.

  • 2274 Views
  • 3 replies
  • 4 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 4 kudos

I migrated Oracle to Databricks and have an experience with Redshift. The cost and effort will depend on your technical stuck:- What do you use for ETL?- What do you use for data ingestion?- Reporting tools?In general, simplest steps are: data and mo...

  • 4 kudos
2 More Replies
ahen
by New Contributor
  • 4941 Views
  • 1 replies
  • 0 kudos

Deployed DABs job via Gitlab CICD. It is creating duplicate jobs.

We had error in DABs deploy and then subsequent retries resulted in a locked stateAnd as suggested in the logs, we use --force-lock option and the deploy succeededHowever, it created duplicate jobs for all assets in the bundle instead of updating the...

  • 4941 Views
  • 1 replies
  • 0 kudos
Latest Reply
Satyadeepak
Databricks Employee
  • 0 kudos

@ahen When you used the --force-lock option during the Databricks Asset Bundle (DAB) deployment, it likely bypassed certain checks that would normally prevent duplicate resource creation. This option is used to force a deployment even when a lock is ...

  • 0 kudos
shubham_007
by Contributor III
  • 3478 Views
  • 6 replies
  • 0 kudos

Resolved! Need urgent help and guidance on information/details with reference links on below topics:

Dear experts,I need urgent help and guidance on information/details with reference links on below topics:Steps on Package Installation with Serverless in Databricks.What are Delta Lake Connector with serverless ? How to run Delta Lake queries outside...

  • 3478 Views
  • 6 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Were you able to review the documentation provided here: https://docs.databricks.com/en/compute/serverless/dependencies.html#install-notebook-dependencies?

  • 0 kudos
5 More Replies
mrkure
by New Contributor II
  • 1422 Views
  • 2 replies
  • 0 kudos

Databricks connect, set spark config

Hi, Iam using databricks connect to compute with databricks cluster. I need to set some spark configurations, namely spark.files.ignoreCorruptFiles. As I have experienced, setting spark configuration in databricks connect for the current session, has...

  • 1422 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Have you tried setting it up in your code as: from pyspark.sql import SparkSession # Create a Spark session spark = SparkSession.builder \ .appName("YourAppName") \ .config("spark.files.ignoreCorruptFiles", "true") \ .getOrCreate() # Yo...

  • 0 kudos
1 More Replies
Buranapat
by New Contributor II
  • 3045 Views
  • 4 replies
  • 4 kudos

Error when accessing 'num_inserted_rows' in Spark SQL (DBR 15.4 LTS)

Hello Databricks Community,I've encountered an issue while trying to capture the number of rows inserted after executing an SQL insert statement in Databricks (DBR 15.4 LTS). My code is attempting to access the number of inserted rows as follows: row...

Buranapat_4-1727751428815.png Buranapat_3-1727750986067.png
  • 3045 Views
  • 4 replies
  • 4 kudos
Latest Reply
GeorgeP1
Databricks Partner
  • 4 kudos

Hi,we are experiencing the same issue. We also turned on liquid clustering on table and we had additional checks on the inserted data information, which was really helpful.@GavinReeves3 did you manage to solve the issue?@MuthuLakshmi any idea? Thank ...

  • 4 kudos
3 More Replies
zg
by New Contributor III
  • 2527 Views
  • 4 replies
  • 3 kudos

Resolved! Unable to Create Alert Using API

Hi All, I'm trying to create an alert using the Databricks REST API, but I keep encountering the following error:Error creating alert: 400 {"message": "Alert name cannot be empty or whitespace"}:{"alert": {"seconds_to_retrigger": 0,"display_name": "A...

  • 2527 Views
  • 4 replies
  • 3 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 3 kudos

Hi @zg ,You are sending the payload related to the new endpoint (/api/2.0/sql/alerts) to the old endpoint (/api/2.0/preview/sql/alerts).That are the docs of the old endpoint:https://docs.databricks.com/api/workspace/alertslegacy/createAs you can see ...

  • 3 kudos
3 More Replies
Labels