cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

nidhin
by New Contributor II
  • 69 Views
  • 1 replies
  • 0 kudos

SQL Warehouse stuck on "Cluster Start-up Delayed

Hi everyone,I'm running into an issue with my Starter Warehouse on Databricks and would appreciate any help or pointers.Problem: My SQL Warehouse has been stuck in a Starting state with the following warning:Cluster Start-up Delayed. Please wait whil...

  • 69 Views
  • 1 replies
  • 0 kudos
Latest Reply
rdokala
New Contributor III
  • 0 kudos

This typically points to delayed compute provisioning behind the SQL Warehouse, often due to temporary capacity/resource availability or a transient startup issue.A few things I would try:1. Stop and restart the SQL WarehouseIf it has been stuck for ...

  • 0 kudos
A0s01gy
by New Contributor
  • 84 Views
  • 1 replies
  • 0 kudos

From STTM to Databricks Pipelines: Can Metadata Become the Source Code of Data Engineering?

I’ve been exploring a metadata-driven approach to data engineering through a project called Data Engineering Copilot.The idea is to treat Source-to-Target Mapping (STTM) documents as structured metadata rather than static documentation.Instead of man...

  • 84 Views
  • 1 replies
  • 0 kudos
Latest Reply
rdokala
New Contributor III
  • 0 kudos

This is a good discussion topic, but from my experience right now it is both meta data driven and most traditional excel based STMs.A few observations:How most teams manage STTM todayLevel 1 (Most Common)STTM in Excel, Word, or Confluence.Engineers m...

  • 0 kudos
emorgoch
by New Contributor II
  • 41 Views
  • 1 replies
  • 0 kudos

Managing IPYNB cell timestamps in source control

We're in the process of converting over our Databricks notebooks from .py file to .ipynb. We have disabled storing notebook output in source control at the workspace level.However, what we're discovering is that every cell in our notebooks has 3 time...

emorgoch_0-1781635989625.png
  • 41 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @emorgoch, Thanks for raising this. This appears to be a regression rather than expected behaviour. Internally, the issue has been identified around .ipynb handling in Git folders, and the intended fix is to stop serialising these execution timest...

  • 0 kudos
MVMZ
by Visitor
  • 41 Views
  • 1 replies
  • 0 kudos

Table history time travel

I have noticed what seems to be unexpected behavior with the history of Unity Catalog managed tables and would like to understand whether this is expected.As a test, I created a table with two versions:Version 0Version 1 (created approximately 200 ho...

  • 41 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @MVMZ, What you’re seeing is expected for Unity Catalog managed tables. The key detail is that for Unity Catalog managed tables, Databricks blocks time travel queries when the requested version is older than delta.deletedFileRetentionDuration, whi...

  • 0 kudos
shan-databricks
by Databricks Partner
  • 58 Views
  • 2 replies
  • 0 kudos

Issue with MongoDB Void Null Type in Databricks

Facing an issue with the MongoDB Void/Null type in Databricks, which requires explicit casting or conversion to an array or struct of strings. Looking for guidance on how to handle this data type when reading from MongoDB and writing to a Delta table...

  • 58 Views
  • 2 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can handle VOID columns directly if you are on Databricks Runtime 18.2 or later for batch writes.More details hereYou can explicitly cast or replace VOID/NULL columns with appropriate types when reading from MongoDBdf = df.withColumn("files_col",...

  • 0 kudos
1 More Replies
bi_123
by New Contributor III
  • 418 Views
  • 2 replies
  • 1 kudos

Resolved! PII tags in Spark Declarative Pipelines

I need to add PII tags at both the table and column levels for a streaming table created using Spark Declarative Pipelines.I tried applying Unity Catalog tags with the following code inside the SDP Python pipeline:spark.sql(f"""ALTER TABLE {table_nam...

  • 418 Views
  • 2 replies
  • 1 kudos
Latest Reply
Databricks2005
New Contributor III
  • 1 kudos

Hi @amirabedhiafi : Is it not possible to pass the StructField to the schema and then pass it to thedlt.createStreamingTable (name, schema)I tried passing the description of the columns to it and that works. I am wondering , why tags do not work

  • 1 kudos
1 More Replies
afisl
by New Contributor II
  • 18410 Views
  • 9 replies
  • 5 kudos

Resolved! Apply unitycatalog tags programmatically

Hello,I'm interested in the "Tags" feature of columns/schemas/tables of the UnityCatalog (described here: https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/tags)I've been able to play with them by hand and would now lik...

Data Engineering
tags
unitycatalog
  • 18410 Views
  • 9 replies
  • 5 kudos
Latest Reply
Databricks2005
New Contributor III
  • 5 kudos

Hi, Just re-opening this thread here.Is there a way to programatically add tags to our streaming tables with SDP ?dlt.createStreamingTable(table, schema) and the schema is then A Struct field with metadata , comments and Tags?has anyone tried that?

  • 5 kudos
8 More Replies
JTBS
by New Contributor
  • 104 Views
  • 1 replies
  • 1 kudos

Resolved! DatabricksConnect from Python/AKS environment calling Databricks Cluster: Spark Query Call Hangs

I have Python 3.12 Pod in AKS using DatabricksConnect 18.1.1 connecting to Databricks cluster 18.1.All works great and normally I see no issues running series of Spark queries But once a while, even without any load on dedicated cluster we have, quer...

  • 104 Views
  • 1 replies
  • 1 kudos
Latest Reply
balajij8
Contributor III
  • 1 kudos

The execution and result streaming generally happens over the gRPC route. You can force the gRPC route to send periodic frames to keep the connection look active in the AKS network infrastructure side.You can add the following variables into the AKS ...

  • 1 kudos
JTBS
by New Contributor
  • 134 Views
  • 2 replies
  • 2 kudos

Resolved! Databricks Connect - Will I ever have to Stop clean up Spark session when creating new per request

I have API that triggers Spark calculations - with API hosted by Python 3.12 pod in AKS and connects to Databricks cluster using Databricks 18.1.1.Initially I was using getOrCreate call on my API requests and all works.But problem is - as Spark sessi...

  • 134 Views
  • 2 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi there, I   Short answer You should call spark.stop() when you're done with each session. What you're doing now (not calling it) works, but it's not ideal — you're relying on the server-side idle timeout to clean up after you, and in the meantime e...

  • 2 kudos
1 More Replies
IM_01
by Valued Contributor
  • 175 Views
  • 2 replies
  • 0 kudos

Can multiple questions be added to the same sql query in genie space

 Hi, Can we add multiple sample questions to one SQL query  in the sql queries instructions so Genie learns to handle similar variations?

IMG_2758.PNG
  • 175 Views
  • 2 replies
  • 0 kudos
Latest Reply
IM_01
Valued Contributor
  • 0 kudos

Thanks for the response Ashwin, will add more sql queries with different phrasing and would test it

  • 0 kudos
1 More Replies
amirabedhiafi
by Contributor
  • 550 Views
  • 4 replies
  • 5 kudos

Resolved! json file existing in volume but not showing in UI

I have some json files existing in a specific volume when I try to search for them they don't appear but when I query the the volume using python I am able to get them and read their content.Any help ?

  • 550 Views
  • 4 replies
  • 5 kudos
Latest Reply
Vikram10
New Contributor
  • 5 kudos

Hi,The global workspace search won't return results for files stored in Unity Catalog Volumes. Its indexing is focused on workspace assets and catalog-managed objects, rather than the underlying files within a Volume.To locate files in a Volume, navi...

  • 5 kudos
3 More Replies
RGSLCA
by New Contributor II
  • 279 Views
  • 4 replies
  • 0 kudos

Sizing Tables and delt logs/CDF

Hi,I need to compare the sizes of my delta tables , what's the correct approach ?Table size reported by analyze  command ? , but how do I check the delta log size , if I enable CDF .. how do I know the CDF log size(the overhead it adds) ? , kind of l...

  • 279 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vikram10
New Contributor
  • 0 kudos

Hi @RGSLCA DESCRIBE DETAIL is the best starting point if you're comparing Delta table sizes, but it's important to understand what it reports. The sizeInBytes value represents only the latest active snapshot of the table, not the total storage consum...

  • 0 kudos
3 More Replies
CG29
by New Contributor
  • 429 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks unable to list ADLS folder and files

Hi Databricks Community,I am able to list the container from my databricks workspace but unable to list the folder and files further.If I try to access the same files and folder from the Databricks UI, external location path, I am able to see all fil...

  • 429 Views
  • 5 replies
  • 2 kudos
Latest Reply
ashukasma
New Contributor II
  • 2 kudos

Following are may be the Causes1. Different authentication methods- The UI's external location uses Unity Catalog credentials- Your dbutils.fs.ls() command uses the compute's Spark configurations- These may be using different credentials with differe...

  • 2 kudos
4 More Replies
Labels