cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

shan-databricks
by Databricks Partner
  • 3613 Views
  • 9 replies
  • 4 kudos

Resolved! Databricks Autoloader BadRecords path Issue

I have one file that has 100 rows and in which two rows are bad data and the remaining 98 rows is good data, but when I use the bad records' path, it completely moves the file to the bad records' path, which has good data as well, and it should move ...

  • 3613 Views
  • 9 replies
  • 4 kudos
Latest Reply
ShaileshBobay
Databricks Employee
  • 4 kudos

Why Entire Files Go to badRecordsPath When you enable badRecordsPath in Autoloader or in Spark’s file readers (with formats like CSV/JSON), here’s what happens: Spark expects each data file to be internally well-formed with respect to the declared s...

  • 4 kudos
8 More Replies
yit
by Databricks Partner
  • 5241 Views
  • 8 replies
  • 4 kudos

Resolved! Schema evolution for JSON files with AutoLoader

 I am using Auto Loader to ingest JSON files into a managed table. Auto Loader saves only the first-level fields as new columns, while nested structs are stored as values within those columns.My goal is to support schema evolution when loading new fi...

  • 5241 Views
  • 8 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Databricks Partner
  • 4 kudos

@yit awesome. Glad that you got this solved. I look forward to the next problem .All the best,BS

  • 4 kudos
7 More Replies
ZD
by New Contributor III
  • 2388 Views
  • 5 replies
  • 0 kudos

How to replace ${param} by :param

Hello,We previously used ${param} in our SQL queries:SELECT * FROM json.`${source_path}/file.json`However, this syntax is now deprecated. The recommended approach is to use :param instead.But when I attempt to replace ${param} with :param, I encounte...

param.PNG
  • 2388 Views
  • 5 replies
  • 0 kudos
Latest Reply
radothede
Valued Contributor II
  • 0 kudos

Hi @ZD Please try this syntax in Your notebook for SQL:%sqldeclare _my_path = 'some_path';select _my_path;  

  • 0 kudos
4 More Replies
Johannes_E
by New Contributor III
  • 1350 Views
  • 2 replies
  • 1 kudos

Resolved! Job cluster has no permission to create folder in Unity Catalog Volume

Hello everybody,I want to run a job that collects some csv files from a SFTP server and saves them on my Unity Catalog Volume. While my personal cluster defined like the following has access to create folders on the volume my job cluster doesn't.Defi...

Johannes_E_0-1752829739526.png Johannes_E_1-1752829991980.png
  • 1350 Views
  • 2 replies
  • 1 kudos
Latest Reply
Johannes_E
New Contributor III
  • 1 kudos

Thank you, that helped although I had to use "SINGLE_USER" instead of "DATA_SECURITY_MODE_DEDICATED". According to the docs (https://docs.databricks.com/api/workspace/clusters/create) "SINGLE_USER" is an alias for "DATA_SECURITY_MODE_DEDICATED".

  • 1 kudos
1 More Replies
Aviral-Bhardwaj
by Esteemed Contributor III
  • 1878 Views
  • 4 replies
  • 2 kudos

Resolved! Not able to read data from volume and data is in JSON format

Not able to read data from volume and data is in JSON format data = spark.read.json("/Volumes/mydatabricksaviral/datatesting/datavolume/mytest.json") display(data) ############################################# Py4JJavaError: An error occurred while...

  • 1878 Views
  • 4 replies
  • 2 kudos
Latest Reply
radothede
Valued Contributor II
  • 2 kudos

Hi @Aviral-Bhardwaj ,please double check:- if volume path is correct- if you have READ VOLUME permission on this volume- if your cluster has access to unity catalog- if json file exists

  • 2 kudos
3 More Replies
Pratikmsbsvm
by Contributor
  • 770 Views
  • 1 replies
  • 0 kudos

Resolved! Low Level Design for Moving Data from Databricks A to Databricks B

Hello Techie,May someone please help me with Low level design point what all we should considered while moving data from One Delta lake instance to another delta lake.For example :-Service principle creation.IP Whitelisting.Any gitlab / devops relate...

Pratikmsbsvm_1-1753184957933.png
  • 770 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Pratikmsbsvm Here's a brief low-level design checklist for Delta Lake to Delta Lake data migration:1. Security & Authentication- Create service principals for both environments- Set up Azure Key Vault for credential management- Configure IP white...

  • 0 kudos
root92
by New Contributor
  • 1091 Views
  • 1 replies
  • 0 kudos

finishes execution in 6 seconds but the notebook still shows "waiting"

Issue:Although my SQL query completes execution in approximately 2-3 seconds, the notebook interface continues to show "waiting" for an extended period before displaying results. Only way to see the results of my cell execution is by refreshing the w...

  • 1091 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @root92 This is a known Databricks interface issue, not related to query performance or account type.Most Likely Causes:WebSocket connection timeout between browser and DatabricksBrowser memory issues with long-running notebook sessionsNetwork pro...

  • 0 kudos
noorbasha534
by Valued Contributor II
  • 1064 Views
  • 5 replies
  • 0 kudos

in-home built predictive optimization

Hello allHas anyone attempted to look at the internals of predictive optimization and built an in-home solution mimicking its functionality? I understood that there are no plans from Databricks to roll-out this feature for external tables, and hence,...

Data Engineering
Delta Lake
Liquid clustering
predictive optimization
spark
  • 1064 Views
  • 5 replies
  • 0 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 0 kudos

@LinlinH thanks for the details. Can you please share any Github link where the community work is put so I can verify if any code can be re-used...

  • 0 kudos
4 More Replies
William_Scardua
by Valued Contributor
  • 2494 Views
  • 2 replies
  • 0 kudos

Collecting Job Usage Metrics Without Unity Catalog

hi,I would like to request assistance on how to collect usage metrics and job execution data for my Databricks environment. We are currently not using Unity Catalog, but I would still like to monitor and analyze usageCould you please provide guidance...

  • 2494 Views
  • 2 replies
  • 0 kudos
Latest Reply
alsetr
Databricks Partner
  • 0 kudos

Hi @William_Scardua , were you able to collect the job metrics?

  • 0 kudos
1 More Replies
Oliver_Angelil
by Valued Contributor II
  • 3844 Views
  • 2 replies
  • 0 kudos

Append-only table from non-streaming source in Delta Live Tables

I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.The pipeline runs successfully on the first run. However on the seco...

  • 3844 Views
  • 2 replies
  • 0 kudos
Latest Reply
nkarwa
New Contributor II
  • 0 kudos

@Oliver_Angelil - I was wondering if you found a solution? I have a similar use-case. I want to create a archive table using DLT from non-streaming (MV). I would prefer a DLT solution. I was able to get it to work using traditional merge approach (no...

  • 0 kudos
1 More Replies
simonB2025
by New Contributor III
  • 2476 Views
  • 3 replies
  • 1 kudos

Resolved! Deploying Data Assets Bundle with VSCode Add-in

Deploying a bundle containing a Pipeline that references a DLT Notebook.In the YAML I am passing the relative path to the Notebook from the repository root (where the YAML lives).Deploying says 'success' but when validating, the Pipeline cannot find ...

  • 2476 Views
  • 3 replies
  • 1 kudos
Latest Reply
ilir_nuredini
Honored Contributor
  • 1 kudos

Hello @simonB2025 May you share a snippet of your folder/project structure and where the notebook resides, so we can suggest toyou the exact solution. Thank you!Best, Ilir

  • 1 kudos
2 More Replies
Miloud_G
by New Contributor III
  • 2873 Views
  • 1 replies
  • 1 kudos

Resolved! connecting to unity catalog with power BI

i created a group of powerBI users, i grant consumer access permission to this groupe on unity catalog i started a shared compute cluster (Standard_D4ds_v5). when trying to connect from power BI, i will connect to tables only if i grant access to wor...

  • 2873 Views
  • 1 replies
  • 1 kudos
Latest Reply
radothede
Valued Contributor II
  • 1 kudos

Hi @Miloud_G Instead of granting full workspace access, you can grant "Databricks SQL access" entitlement, which provides limited workspace access specifically for BI tools.Go to Workspace Settings → Identity and Access → Groups/Users/SPs Manage → Se...

  • 1 kudos
smoortema
by Contributor
  • 1913 Views
  • 1 replies
  • 1 kudos

Resolved! Complex pipeline with many tasks and dependencies: orchestration in Jobs or in Notebook?

We need to set up a job that consists of several hundred tasks with many dependencies between each other. We are considering two different directions:1. Databricks job with tasks, with dependencies defined as code and deployed with Databricks asset b...

  • 1913 Views
  • 1 replies
  • 1 kudos
Latest Reply
radothede
Valued Contributor II
  • 1 kudos

Hi @smoortema To my best knowlegde:Option 1)You can create jobs that contain up to 1000 tasks, however, it is recommended to split tasks into logical subgroups.Jobs with more than 100 tasks require API 2.2 and above Jobs with a large number of tasks ...

  • 1 kudos
Locomo_Dncr
by New Contributor
  • 1836 Views
  • 4 replies
  • 0 kudos

Time Travel vs Bronze historical archive

HelloI am working on building a pipeline using Medallion architecture. The source tables in this pipeline are overwritten each time the table is updated. In the bronze ingestion layer, I plan to append this new table to the current bronze table, addi...

Data Engineering
Medallion
Processing Time
Storage Costs
Time Travel
  • 1836 Views
  • 4 replies
  • 0 kudos
Latest Reply
MariuszK
Valued Contributor III
  • 0 kudos

Hi @Locomo_Dncr Time travel isn't recomended to store historical data. It's for backup, audit purpose. You can store snapshot data or use SCD2 to keep history."Databricks does not recommend using Delta Lake table history as a long-term backup solutio...

  • 0 kudos
3 More Replies
jar
by Contributor
  • 1755 Views
  • 6 replies
  • 0 kudos

Understanding infra costs of Databricks compute

Hi.I can see in our Azure cost analysis tool that a not insignificant part of our costs come from the managed Databricks RG deployed with the workspace, and that it relates particularly to VMs (so compute, I assume?) and storage, the later which, tho...

  • 1755 Views
  • 6 replies
  • 0 kudos
Latest Reply
jar
Contributor
  • 0 kudos

Thank you all for your replies. The issue is not getting an overview of costs - I already have that from the Cost Management Export function in Azure, and by using the system.billing tables in Databricks. The issue is understanding the relation betwe...

  • 0 kudos
5 More Replies
Labels