cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ajgold
by New Contributor II
  • 1537 Views
  • 6 replies
  • 2 kudos

DLT Expectations Alert for Warning

I want to receive an alert via email or Slack when the @Dlt.expect declaration fails the validation check in my DLT pipeline. I only see the option to add an email alert for @Dlt.expect_or_fail failures, but not for warnings.

  • 1537 Views
  • 6 replies
  • 2 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 2 kudos

Hey @ajgold I don't think DLT has this feature yet. You may raise a feature request for Databricks to add it in its future releases over here - https://databricks.aha.io/Cheers!

  • 2 kudos
5 More Replies
ande
by New Contributor
  • 2552 Views
  • 2 replies
  • 0 kudos

IP address for accessing external SFTP server

I am trying to pull in data to my Databricks workspace via an external SFTP server. I am using Azure for my compute. To access the SFTP server they need to whitelist my IP address. My IP address in Azure Databricks seems to be constantly changing fro...

  • 2552 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Azure Databricks, like many cloud services, does not provide static IP addresses for outbound connections. This is because the compute resources are dynamically allocated and can change over time. One potential workaround could be to use a Virtual N...

  • 0 kudos
1 More Replies
fjrodriguez
by New Contributor III
  • 641 Views
  • 2 replies
  • 0 kudos

Job Preview in ADF

I do have one Spark Job that is triggered via ADF as a usual "Python" activity. Now wanted to move to Job which is under Preview. Normally under linked service level i do have spark config and environment that is needed for the execution of this scri...

  • 641 Views
  • 2 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @fjrodriguez my understanding is You've already created a cluster for Your job. If that's the case, You can put that spark configuration and env variables directly in the cluster Your job is using. If for some reason thats not possible, then You c...

  • 0 kudos
1 More Replies
jdlogos
by New Contributor III
  • 4847 Views
  • 5 replies
  • 2 kudos

apply_changes_from_snapshot with expectations

Hi,Question: Are expectations supposed to function in conjunction with create_streaming_table() and apply_changes_from_snapshot?Our team is investigating Delta Live Tables and we have a working prototype using Autoloader to ingest some files from a m...

  • 4847 Views
  • 5 replies
  • 2 kudos
Latest Reply
jbrmn
New Contributor II
  • 2 kudos

Also facing the same issue - did you find a solution?Thinking I will have to apply expectations at the next stage of the pipeline until this is worked out

  • 2 kudos
4 More Replies
dnz
by New Contributor
  • 1340 Views
  • 1 replies
  • 0 kudos

Performance Issue with OPTIMIZE Command for Historical Data Migration Using Liquid Clustering

Hello Databricks Community,I’m experiencing performance issues with the OPTIMIZE command when migrating historical data into a table with liquid clustering. Specifically, I am processing one year’s worth of data at a time. For example:The OPTIMIZE co...

  • 1340 Views
  • 1 replies
  • 0 kudos
Latest Reply
HimanshuSingh
New Contributor II
  • 0 kudos

Did you got any solution? If Yes please post it.

  • 0 kudos
yuinagam
by New Contributor II
  • 852 Views
  • 2 replies
  • 0 kudos

how can I verify that the result of a dlt will have enough rows before updating the table?

I have a dlt/lakeflow pipeline that creates a table, and I need to make sure that it will only update the resulting materialized view if it will have more than one million records.I've found this, but it seems to only work if I have already updated t...

  • 852 Views
  • 2 replies
  • 0 kudos
Latest Reply
yuinagam
New Contributor II
  • 0 kudos

Thank you for the quick reply.Is there a common/recommended/possible way to work around this limitation? I don't mind not using the expectation api if it doesn't support logic that's based on aggregations.

  • 0 kudos
1 More Replies
shan-databricks
by Databricks Partner
  • 3646 Views
  • 9 replies
  • 4 kudos

Resolved! Databricks Autoloader BadRecords path Issue

I have one file that has 100 rows and in which two rows are bad data and the remaining 98 rows is good data, but when I use the bad records' path, it completely moves the file to the bad records' path, which has good data as well, and it should move ...

  • 3646 Views
  • 9 replies
  • 4 kudos
Latest Reply
ShaileshBobay
Databricks Employee
  • 4 kudos

Why Entire Files Go to badRecordsPath When you enable badRecordsPath in Autoloader or in Spark’s file readers (with formats like CSV/JSON), here’s what happens: Spark expects each data file to be internally well-formed with respect to the declared s...

  • 4 kudos
8 More Replies
yit
by Databricks Partner
  • 5265 Views
  • 8 replies
  • 4 kudos

Resolved! Schema evolution for JSON files with AutoLoader

 I am using Auto Loader to ingest JSON files into a managed table. Auto Loader saves only the first-level fields as new columns, while nested structs are stored as values within those columns.My goal is to support schema evolution when loading new fi...

  • 5265 Views
  • 8 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 4 kudos

@yit awesome. Glad that you got this solved. I look forward to the next problem .All the best,BS

  • 4 kudos
7 More Replies
ZD
by New Contributor III
  • 2404 Views
  • 5 replies
  • 0 kudos

How to replace ${param} by :param

Hello,We previously used ${param} in our SQL queries:SELECT * FROM json.`${source_path}/file.json`However, this syntax is now deprecated. The recommended approach is to use :param instead.But when I attempt to replace ${param} with :param, I encounte...

param.PNG
  • 2404 Views
  • 5 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @ZD Please try this syntax in Your notebook for SQL:%sqldeclare _my_path = 'some_path';select _my_path;  

  • 0 kudos
4 More Replies
Johannes_E
by New Contributor III
  • 1366 Views
  • 2 replies
  • 1 kudos

Resolved! Job cluster has no permission to create folder in Unity Catalog Volume

Hello everybody,I want to run a job that collects some csv files from a SFTP server and saves them on my Unity Catalog Volume. While my personal cluster defined like the following has access to create folders on the volume my job cluster doesn't.Defi...

Johannes_E_0-1752829739526.png Johannes_E_1-1752829991980.png
  • 1366 Views
  • 2 replies
  • 1 kudos
Latest Reply
Johannes_E
New Contributor III
  • 1 kudos

Thank you, that helped although I had to use "SINGLE_USER" instead of "DATA_SECURITY_MODE_DEDICATED". According to the docs (https://docs.databricks.com/api/workspace/clusters/create) "SINGLE_USER" is an alias for "DATA_SECURITY_MODE_DEDICATED".

  • 1 kudos
1 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 1892 Views
  • 4 replies
  • 2 kudos

Resolved! Not able to read data from volume and data is in JSON format

Not able to read data from volume and data is in JSON format data = spark.read.json("/Volumes/mydatabricksaviral/datatesting/datavolume/mytest.json") display(data) ############################################# Py4JJavaError: An error occurred while...

  • 1892 Views
  • 4 replies
  • 2 kudos
Latest Reply
radothede
Valued Contributor II
  • 2 kudos

Hi @Aviral-Bhardwaj ,please double check:- if volume path is correct- if you have READ VOLUME permission on this volume- if your cluster has access to unity catalog- if json file exists

  • 2 kudos
3 More Replies
Pratikmsbsvm
by Contributor
  • 774 Views
  • 1 replies
  • 0 kudos

Resolved! Low Level Design for Moving Data from Databricks A to Databricks B

Hello Techie,May someone please help me with Low level design point what all we should considered while moving data from One Delta lake instance to another delta lake.For example :-Service principle creation.IP Whitelisting.Any gitlab / devops relate...

Pratikmsbsvm_1-1753184957933.png
  • 774 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Pratikmsbsvm Here's a brief low-level design checklist for Delta Lake to Delta Lake data migration:1. Security & Authentication- Create service principals for both environments- Set up Azure Key Vault for credential management- Configure IP white...

  • 0 kudos
root92
by New Contributor
  • 1092 Views
  • 1 replies
  • 0 kudos

finishes execution in 6 seconds but the notebook still shows "waiting"

Issue:Although my SQL query completes execution in approximately 2-3 seconds, the notebook interface continues to show "waiting" for an extended period before displaying results. Only way to see the results of my cell execution is by refreshing the w...

  • 1092 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @root92 This is a known Databricks interface issue, not related to query performance or account type.Most Likely Causes:WebSocket connection timeout between browser and DatabricksBrowser memory issues with long-running notebook sessionsNetwork pro...

  • 0 kudos
noorbasha534
by Valued Contributor II
  • 1080 Views
  • 5 replies
  • 0 kudos

in-home built predictive optimization

Hello allHas anyone attempted to look at the internals of predictive optimization and built an in-home solution mimicking its functionality? I understood that there are no plans from Databricks to roll-out this feature for external tables, and hence,...

Data Engineering
Delta Lake
Liquid clustering
predictive optimization
spark
  • 1080 Views
  • 5 replies
  • 0 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 0 kudos

@LinlinH thanks for the details. Can you please share any Github link where the community work is put so I can verify if any code can be re-used...

  • 0 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 2506 Views
  • 2 replies
  • 0 kudos

Collecting Job Usage Metrics Without Unity Catalog

hi,I would like to request assistance on how to collect usage metrics and job execution data for my Databricks environment. We are currently not using Unity Catalog, but I would still like to monitor and analyze usageCould you please provide guidance...

  • 2506 Views
  • 2 replies
  • 0 kudos
Latest Reply
alsetr
Databricks Partner
  • 0 kudos

Hi @William_Scardua , were you able to collect the job metrics?

  • 0 kudos
1 More Replies
Labels