cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

amirabedhiafi
by Contributor
  • 310 Views
  • 4 replies
  • 5 kudos

Resolved! json file existing in volume but not showing in UI

I have some json files existing in a specific volume when I try to search for them they don't appear but when I query the the volume using python I am able to get them and read their content.Any help ?

  • 310 Views
  • 4 replies
  • 5 kudos
Latest Reply
Vikram10
New Contributor
  • 5 kudos

Hi,The global workspace search won't return results for files stored in Unity Catalog Volumes. Its indexing is focused on workspace assets and catalog-managed objects, rather than the underlying files within a Volume.To locate files in a Volume, navi...

  • 5 kudos
3 More Replies
RGSLCA
by New Contributor II
  • 154 Views
  • 4 replies
  • 0 kudos

Sizing Tables and delt logs/CDF

Hi,I need to compare the sizes of my delta tables , what's the correct approach ?Table size reported by analyze  command ? , but how do I check the delta log size , if I enable CDF .. how do I know the CDF log size(the overhead it adds) ? , kind of l...

  • 154 Views
  • 4 replies
  • 0 kudos
Latest Reply
Vikram10
New Contributor
  • 0 kudos

Hi @RGSLCA DESCRIBE DETAIL is the best starting point if you're comparing Delta table sizes, but it's important to understand what it reports. The sizeInBytes value represents only the latest active snapshot of the table, not the total storage consum...

  • 0 kudos
3 More Replies
IM_01
by Contributor III
  • 97 Views
  • 1 replies
  • 0 kudos

Can multiple questions be added to the same sql query in genie space

 Hi, Can we add multiple sample questions to one SQL query  in the sql queries instructions so Genie learns to handle similar variations?

IMG_2758.PNG
  • 97 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @IM_01, The public guidance points to one natural-language question or title per example SQL query, rather than multiple sample questions attached to a single query. In the Tune Genie Space quality docs, Databricks says that for each example SQL q...

  • 0 kudos
CG29
by New Contributor
  • 188 Views
  • 5 replies
  • 2 kudos

Resolved! Databricks unable to list ADLS folder and files

Hi Databricks Community,I am able to list the container from my databricks workspace but unable to list the folder and files further.If I try to access the same files and folder from the Databricks UI, external location path, I am able to see all fil...

  • 188 Views
  • 5 replies
  • 2 kudos
Latest Reply
ashukasma
New Contributor II
  • 2 kudos

Following are may be the Causes1. Different authentication methods- The UI's external location uses Unity Catalog credentials- Your dbutils.fs.ls() command uses the compute's Spark configurations- These may be using different credentials with differe...

  • 2 kudos
4 More Replies
Sainath368
by Contributor
  • 79 Views
  • 1 replies
  • 1 kudos

Resolved! DESCRIBE HISTORY Performance Issue for Large Scale Tables (22K Tables)

Hi everyone, I’m working with around 22,000 Unity Catalog external Delta tables, and my requirement is to execute DESCRIBE HISTORY table_name LIMIT 1 for each table and append the latest record into a single consolidated table. I’ve already tried mul...

  • 79 Views
  • 1 replies
  • 1 kudos
Latest Reply
ShamenParis
New Contributor II
  • 1 kudos

Hi,The reason your performance degrades so badly (4 mins for 2k tables, but 50 mins for 12k) is because of the Spark Driver. When you run spark.sql("DESCRIBE HISTORY...") inside a ThreadPoolExecutor, every single one of those 22,000 queries has to be...

  • 1 kudos
yanchr
by New Contributor II
  • 366 Views
  • 3 replies
  • 0 kudos

foreachPartition

Is there any difference between pyspark.RDD.foreachPartition vs pyspark.sql.DataFrame.foreachPartition under the hood? The PySpark documentation describes pyspark.sql.DataFrame.foreachPartition as "a shorthand for df.rdd.foreachPartition()"If DataFra...

Data Engineering
rdd
shared
spark
unity_catalog
  • 366 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashukasma
New Contributor II
  • 0 kudos

Although the PySpark documentation states that DataFrame.foreachPartition() is a shorthand for df.rdd. foreachPartition(), there is an important difference when running on Databricks shared clusters (especially with Unity Catalog and Spark Connect).D...

  • 0 kudos
2 More Replies
Jothia
by New Contributor III
  • 794 Views
  • 5 replies
  • 0 kudos

Databricks Access Issue with UC

Hi All ,We are facing issues while reading Storage account where stream data from data verse in Unity catalog through External table but not every time . It was running fine with hiveAn error occurred while calling o393.sql.: org.apache.spark.SparkEx...

  • 794 Views
  • 5 replies
  • 0 kudos
Latest Reply
ashukasma
New Contributor II
  • 0 kudos

This issue appears to be related to Azure Storage access through Unity Catalog rather than the data itself, especially since the same workload was working fine with Hive and the failure is intermittent.A few areas worth checking:1. Storage Credential...

  • 0 kudos
4 More Replies
prasanna_r
by New Contributor
  • 2362 Views
  • 4 replies
  • 0 kudos

Resolved! Download all pages of a multi-page dashboard

Hi,I have created a multi-page dashboard in databricks. I want to download all the pages of the dashboard as a single pdf file. But when i export the dashboard I get it only in .json format. Is there a way to download all the pages as a pdf file?

  • 2362 Views
  • 4 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

Dashboard provides a Download as PDF capability for published dashboards. You can distribute a multi-page dashboard as a PDF with all pages & configure a scheduled email subscription and include all dashboard pages in the generated PDF.You can follow...

  • 0 kudos
3 More Replies
naveen0808
by New Contributor II
  • 145 Views
  • 1 replies
  • 3 kudos

Why We Moved Our Operational Database Into Databricks — And Stopped Managing Two Stacks

Lakebase just went GA. Here's what a production migration actually looks like.For most of the last decade, our data infrastructure lived in two separate worlds.On one side: a transactional database handling operational workloads — the writes, the loo...

Data Engineering
Architecture
Community articles
Database
DIAS2026
lakebase
  • 145 Views
  • 1 replies
  • 3 kudos
Latest Reply
Mailendiran
New Contributor III
  • 3 kudos

Great write up and felt useful. Thanks for sharing the real experience.!

  • 3 kudos
Mailendiran
by New Contributor III
  • 202 Views
  • 2 replies
  • 2 kudos

Resolved! Genie code Customization

Hi,I use Genie code extensively for research , plan and development for building ETL scripts and code migrations.As per my knowledge Databricks manages the backend LLM models for Genie code agent.I wanted to try Genie code with Frontier models for my...

  • 202 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @Mailendiran, From what’s publicly documented, Genie Code already uses frontier models behind the scenes, but it isn’t exposed as a bring-your-own-model or manual model-selection experience. Databricks describes Genie Code as an agentic system tha...

  • 2 kudos
1 More Replies
yit337
by Contributor
  • 130 Views
  • 1 replies
  • 0 kudos

How to change a field when instancing cluster defined as variable?

I define all clusters as variable in separate files, so I can re-use them. Then I am accessing them in jobs as: The issue is that I want to change just the custom_tags in the cluster when instancing it for a job, cause my tags are different for each ...

yit337_0-1781015887733.png yit337_1-1781015901247.png
  • 130 Views
  • 1 replies
  • 0 kudos
Latest Reply
ShamenParis
New Contributor II
  • 0 kudos

Yes, you can achieve this seamlessly, but not by overriding the custom_tags inside the cluster variable. Instead, you define your specific tags at the Job level, and Databricks automatically merges them with your cluster variable's tags.Because compl...

  • 0 kudos
bi_123
by New Contributor III
  • 216 Views
  • 3 replies
  • 2 kudos

Best practice to log Autoloader UNKNOWN_FIELD_EXCEPTION

Hi, When schema evolution is detected, Auto Loader throws an UNKNOWN_FIELD_EXCEPTION, and the error message includes schema information along with other related details. However, when I log the full message, it is too long and contains information th...

  • 216 Views
  • 3 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @bi_123, I would avoid parsing the full rendered UNKNOWN_FIELD_EXCEPTION message. Databricks explicitly notes in the error-handling documentation that the rendered and parameterised messages are not stable across releases, so any logic that depend...

  • 2 kudos
2 More Replies
sd1700092
by New Contributor
  • 308 Views
  • 1 replies
  • 0 kudos

ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS silently does not update column stats on DBR 15

Hi Databricks Support,We need help confirming whether this is a known DBR 15.4 LTS bug or an unsupported/configuration-specific behavior.SummaryOn a Databricks Runtime 15.4.40 Photon job cluster, `ANALYZE TABLE <catalog>.<schema>.<table> COMPUTE STAT...

  • 308 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @sd1700092, From what I can verify, this looks more like a DBR 15.4 job-cluster issue than expected behaviour. The public ANALYZE TABLE documentation is clear that ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS applies to both Databricks Run...

  • 0 kudos
damodhargandha
by New Contributor
  • 124 Views
  • 1 replies
  • 0 kudos

Can we do a shallow clone on top of a shallow clone!

Case 1 Can we do a shallow clone on top of a shallow clone? If I do so, What would be the result Case 2  when ever the table over writes with new data how does it work.Case 3  when a the table is dropped and loaded then how does this scenario effect ...

  • 124 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @damodhargandha, A good way to think about a shallow clone is that it copies the table’s metadata but still points to the source table's data files rather than copying them. Databricks explains that behaviour in the Clone a table " section of the ...

  • 0 kudos
Vladif1
by New Contributor II
  • 11693 Views
  • 9 replies
  • 1 kudos

Error when reading delta lake files with Auto Loader

Hi,When reading Delta Lake file (created by Auto Loader) with this code: df = (    spark.readStream    .format('cloudFiles')    .option("cloudFiles.format", "delta")    .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint")    .load(bronz...

  • 11693 Views
  • 9 replies
  • 1 kudos
Latest Reply
jimmylink
New Contributor
  • 1 kudos

I've been having similar issues with reading Delta Lake files and I think the solution lies in adjusting the format option. When working with Delta tables, it's essential to use the correct format to avoid compatibility issues. This reminds me of the...

  • 1 kudos
8 More Replies
Labels