cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

naman0012
by New Contributor
  • 754 Views
  • 5 replies
  • 1 kudos

Ingesting data from views

Hi all , I have been looking to create gold tables from views , and also considering to have features of streaming and change data capture .I know in DLT Workflows this is not possible , so I was wondering is there any other way to do the same please...

  • 754 Views
  • 5 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @naman0012, There are a few different approaches depending on where these "views" live and what your exact architecture looks like. Let me walk through the options. CLARIFYING THE SCENARIO The key question is: are these database views on an extern...

  • 1 kudos
4 More Replies
souravroy1990
by New Contributor II
  • 381 Views
  • 3 replies
  • 0 kudos

Tags Field Doesn't propagate in Delta Share

Hi,My current work requires it to add tags in Databricks tables & views. I see that there is a tag column associated to tables and views to which we can set the TAG using SET TAG command.My requirement is that once we are creating delta shares out of...

  • 381 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @souravroy1990, You are correct that Unity Catalog tags (set via ALTER TABLE SET TAGS or ALTER TABLE ALTER COLUMN SET TAGS) are not propagated to recipients through Delta Sharing. Tags are treated as Unity Catalog governance metadata that lives in...

  • 0 kudos
2 More Replies
tdata
by New Contributor
  • 389 Views
  • 2 replies
  • 0 kudos

Databricks write to iceberg managed table with pyiceberg

Hello,Im trying to write to databricks managed icerberg table using pyiceberg inside a spark_python_task (Serverless compute).Im facing an error when writing : Error writing to Iceberg table: When reading information for key '' in bucket '' : AWS Err...

  • 389 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @tdata, The approach you are using (PyIceberg with the Unity Catalog Iceberg REST catalog) does support writes to managed Iceberg tables, so the pattern is correct. The SSL error you are seeing is almost certainly caused by running this from withi...

  • 0 kudos
1 More Replies
ajay_wavicle
by Databricks Partner
  • 319 Views
  • 2 replies
  • 1 kudos

Best place to manage terraform-provider-databricks and databricks cli

I am trying to export and import files using terraform-provider-databricks and databricks cli. I am figuring out on how to manage the files without being running in local. whats the best practice to setup such migration. Can anyone help in establishe...

  • 319 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @ajay_wavicle, There are a couple of well-established patterns for managing Databricks resources with the Terraform provider and CLI without running everything locally. Here is a breakdown of the options and recommended approach. WHERE TO RUN TERR...

  • 1 kudos
1 More Replies
ajay_wavicle
by Databricks Partner
  • 371 Views
  • 3 replies
  • 1 kudos

How to extract read path from notebooks. Especially from the autoloader

I am trying to figure out how to extract source paths from read statement or autoloader paths. I need for knowing my source locations from lot of notebooks. How to extract from databricks. Can databricks sdk do this?

  • 371 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @ajay_wavicle, There are several approaches you can use to extract read/source paths from notebooks, including Auto Loader (cloudFiles) paths. The right choice depends on whether you want runtime lineage data or static code analysis. APPROACH 1: U...

  • 1 kudos
2 More Replies
Raghav1
by New Contributor II
  • 340 Views
  • 2 replies
  • 0 kudos

Updating DLT pipeline's cluster policy using Databricks CLI

**Subject:** Azure Databricks Pipeline Not Applying Cluster Policy and Failing with VM Quota Error**Summary of Issue**I am attempting to create and run a pipeline in Azure Databricks using a custom cluster policy to restrict compute resources due to ...

  • 340 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Raghav1, There are a few things happening here, so let me walk through each one and give you a path forward. UNDERSTANDING THE GATEWAY_DEFINITION ERROR The error "Modifying following parameter gateway_definition in pipeline settings is not allowe...

  • 0 kudos
1 More Replies
mohit7
by New Contributor
  • 573 Views
  • 3 replies
  • 0 kudos

How to disable Delta Lake log compaction (.compacted.json files)

Hi,How can we disable the automatic creation of .compacted.json files in Delta Lake _delta_log folders?Recently, some of our log files less than 30 days old were removed due to this compaction, affecting our ability to time travel.We found spark.data...

  • 573 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @mohit7, There are two separate mechanisms at play here, and it helps to distinguish between them because log compaction itself does not remove your original commit JSON files. WHAT LOG COMPACTION DOES Log compaction creates additional .compacted....

  • 0 kudos
2 More Replies
LSIMS
by Databricks Partner
  • 487 Views
  • 3 replies
  • 1 kudos

Resolved! Databricks Performance Extracting Data from a Wide Table in Oracle

Hi everyone.I have a very wide table (600 columns) on an Oracle database that I need to extract to my data lake.The table has approximately 700K rows.I am currently trying to extract and load this information on my data lake, but I am struggling to m...

  • 487 Views
  • 3 replies
  • 1 kudos
Latest Reply
LSIMS
Databricks Partner
  • 1 kudos

Hi Steve,I really appreciate your amazingly detailed post! These are the contributions that really shine through.I can confirm I ended up in the meantime to resolve the situation and its totally aligned with what you mentioned. I had reduced the numb...

  • 1 kudos
2 More Replies
maddy08
by New Contributor II
  • 469 Views
  • 2 replies
  • 0 kudos

Resolved! File Arrival Trigger - Multiple tables

I have 100+ tables , with CDC I’m getting files on GCS bucket every 15mins or some random time based on source changes.I have enabled file arrival triggers for each tables.is this good approach or consolidating tables in one job to have one trigger? 

  • 469 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

@maddy08 Managing 100+ CDC tables with file arrival triggers is a common architecture decision, and there are tradeoffs either way. Here is a breakdown to help you decide. UNDERSTANDING THE LIMITS The most important thing to know is that without file...

  • 0 kudos
1 More Replies
NW1000
by New Contributor III
  • 605 Views
  • 3 replies
  • 0 kudos

Drop table not working consistently

During development, I drop table table_frq in SQL query mannually. Then run a python notebook using serverless compute, use spark.catalog.tableExists(table_frq) as the condition. Last week, after drop table, spark.catalog.tableExists(table_frq) showe...

  • 605 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

@NW1000 Thanks for the thorough description. I can understand the concern about inconsistent behavior here. The previous reply from nayan_wylde is on the right track -- this is almost certainly a metadata caching issue, not data corruption. Let me gi...

  • 0 kudos
2 More Replies
rahul_goyal
by New Contributor
  • 282 Views
  • 2 replies
  • 0 kudos

Snowflake connection federation databticks

Today i tried to connect snowflake federated connection using databricks. however it seems there is defect with databricks.It is trying to create jdbc urljdbc://dyjcxbz-gfc45507.snowflakecomputing.com:443/however snowflake jdbc url isjdbc:snowflake:/...

  • 282 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

@rahul_goyal This is a common point of confusion when setting up Snowflake federation in Databricks, so let me clarify what is happening. SHORT ANSWER The JDBC URL display showing "jdbc://" instead of "jdbc//" is expected behavior and is not a defect...

  • 0 kudos
1 More Replies
Sunny_singh
by New Contributor
  • 835 Views
  • 3 replies
  • 0 kudos

Best clusters configuration to process 100gb data??

Hi Everyone,I’m new to Data Engineering and often get this interview question:“What’s the best cluster configuration to process 100GB of data in Databricks?”How should we answer this from a databricks perspective in two cases:Complex transformations ...

  • 835 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Sunny_singh, Welcome to Data Engineering! This is a great question and one that comes up frequently. There is no single "correct" answer because real-world sizing depends on data format, cloud provider, data skew, and more, but here is a solid fr...

  • 0 kudos
2 More Replies
MarkV
by New Contributor III
  • 539 Views
  • 2 replies
  • 0 kudos

How do I get at the schedules of scheduled queries?

I have a lot of scheduled queries in a non-uc workspace. They were scheduled a while ago and do not show up as jobs. I'd like to migrate these queries and their schedules to a uc-enabled workspace.I'm not able to export them from the UI. I've tried u...

  • 539 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

@MarkV Appreciate you sharing the details. There are a few approaches depending on whether you want to use the REST API, the Python SDK, SQL, or the CLI. Here is a breakdown. BACKGROUND When you create a schedule for a query in the Databricks SQL edi...

  • 0 kudos
1 More Replies
databricks_use2
by New Contributor II
  • 293 Views
  • 2 replies
  • 0 kudos

Moving data from commercial to gov cloud instance

I have to move data from commercial to gov cloud instance, all the tables are populated from DLT pipeline using autoloader, and using medallion architecture for bronze , silver and gold tables. Checkpoint is managed by pipeline.Silver table are imple...

  • 293 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

@databricks_use2 -- Happy to help with this one. Moving from commercial to GovCloud with a full SDP medallion architecture requires careful planning. Here is a step-by-step approach that covers both the data migration and the pipeline rebuild. THE SH...

  • 0 kudos
1 More Replies
yashojha
by New Contributor III
  • 434 Views
  • 3 replies
  • 0 kudos

Slow writes to managed volume

Hi All, I am using managed volumes as an intermediate storage to write a decrypted file before moving to data lake storage. Strangely the write operation is taking a lot of time (22 mins) to write a small file to volumes and it takes only few seconds...

yashojha_1-1771488017215.png yashojha_3-1771488035100.png
  • 434 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @yashojha, Thanks for the detailed writeup. A 22-minute write for a small file to a managed volume is definitely not expected behavior, especially when it works quickly in your lower environment. Let me walk through what is likely happening and ho...

  • 0 kudos
2 More Replies
Labels