cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

naveen0808
by New Contributor II
  • 47 Views
  • 1 replies
  • 1 kudos

Why We Moved Our Operational Database Into Databricks — And Stopped Managing Two Stacks

Lakebase just went GA. Here's what a production migration actually looks like.For most of the last decade, our data infrastructure lived in two separate worlds.On one side: a transactional database handling operational workloads — the writes, the loo...

  • 47 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mailendiran
New Contributor III
  • 1 kudos

Great write up and felt useful. Thanks for sharing the real experience.!

  • 1 kudos
yanchr
by New Contributor II
  • 21 Views
  • 0 replies
  • 0 kudos

foreachPartition

Is there any difference between pyspark.RDD.foreachPartition vs pyspark.sql.DataFrame.foreachPartition under the hood? The PySpark documentation describes pyspark.sql.DataFrame.foreachPartition as "a shorthand for df.rdd.foreachPartition()"If DataFra...

Data Engineering
rdd
shared
spark
unity_catalog
  • 21 Views
  • 0 replies
  • 0 kudos
prasanna_r
by New Contributor
  • 2234 Views
  • 3 replies
  • 0 kudos

Resolved! Download all pages of a multi-page dashboard

Hi,I have created a multi-page dashboard in databricks. I want to download all the pages of the dashboard as a single pdf file. But when i export the dashboard I get it only in .json format. Is there a way to download all the pages as a pdf file?

  • 2234 Views
  • 3 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

Dashboard provides a Download as PDF capability for published dashboards. You can distribute a multi-page dashboard as a PDF with all pages & configure a scheduled email subscription and include all dashboard pages in the generated PDF.You can follow...

  • 0 kudos
2 More Replies
Mailendiran
by New Contributor III
  • 102 Views
  • 2 replies
  • 2 kudos

Resolved! Genie code Customization

Hi,I use Genie code extensively for research , plan and development for building ETL scripts and code migrations.As per my knowledge Databricks manages the backend LLM models for Genie code agent.I wanted to try Genie code with Frontier models for my...

  • 102 Views
  • 2 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @Mailendiran, From what’s publicly documented, Genie Code already uses frontier models behind the scenes, but it isn’t exposed as a bring-your-own-model or manual model-selection experience. Databricks describes Genie Code as an agentic system tha...

  • 2 kudos
1 More Replies
yit337
by Contributor
  • 93 Views
  • 1 replies
  • 0 kudos

How to change a field when instancing cluster defined as variable?

I define all clusters as variable in separate files, so I can re-use them. Then I am accessing them in jobs as: The issue is that I want to change just the custom_tags in the cluster when instancing it for a job, cause my tags are different for each ...

yit337_0-1781015887733.png yit337_1-1781015901247.png
  • 93 Views
  • 1 replies
  • 0 kudos
Latest Reply
ShamenParis
New Contributor II
  • 0 kudos

Yes, you can achieve this seamlessly, but not by overriding the custom_tags inside the cluster variable. Instead, you define your specific tags at the Job level, and Databricks automatically merges them with your cluster variable's tags.Because compl...

  • 0 kudos
bi_123
by New Contributor III
  • 152 Views
  • 3 replies
  • 1 kudos

Best practice to log Autoloader UNKNOWN_FIELD_EXCEPTION

Hi, When schema evolution is detected, Auto Loader throws an UNKNOWN_FIELD_EXCEPTION, and the error message includes schema information along with other related details. However, when I log the full message, it is too long and contains information th...

  • 152 Views
  • 3 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @bi_123, I would avoid parsing the full rendered UNKNOWN_FIELD_EXCEPTION message. Databricks explicitly notes in the error-handling documentation that the rendered and parameterised messages are not stable across releases, so any logic that depend...

  • 1 kudos
2 More Replies
sd1700092
by New Contributor
  • 100 Views
  • 1 replies
  • 0 kudos

ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS silently does not update column stats on DBR 15

Hi Databricks Support,We need help confirming whether this is a known DBR 15.4 LTS bug or an unsupported/configuration-specific behavior.SummaryOn a Databricks Runtime 15.4.40 Photon job cluster, `ANALYZE TABLE <catalog>.<schema>.<table> COMPUTE STAT...

  • 100 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @sd1700092, From what I can verify, this looks more like a DBR 15.4 job-cluster issue than expected behaviour. The public ANALYZE TABLE documentation is clear that ANALYZE TABLE ... COMPUTE STATISTICS FOR ALL COLUMNS applies to both Databricks Run...

  • 0 kudos
Jothia
by New Contributor III
  • 673 Views
  • 3 replies
  • 0 kudos

Databricks Access Issue with UC

Hi All ,We are facing issues while reading Storage account where stream data from data verse in Unity catalog through External table but not every time . It was running fine with hiveAn error occurred while calling o393.sql.: org.apache.spark.SparkEx...

  • 673 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Jothia, Apologies for the very delayed response here. Appreciate this was raised in November 2025. I wanted to close the loop in case you were still expecting an answer. From what you described, this does not look like a straightforward Unity Cat...

  • 0 kudos
2 More Replies
Rohansingh01
by Databricks Partner
  • 175 Views
  • 1 replies
  • 5 kudos

My experience replacing a Postgres → Kafka → DMS → S3 pipeline with Lakeflow Connect

Sharing my hands-on experience with Lakeflow Connect for anyone evaluating it for database ingestion. I recently moved data from PostgreSQL on AWS RDS into Databricks, and it replaced a painful legacy pipeline. Keeping this simple and practical.What ...

  • 175 Views
  • 1 replies
  • 5 kudos
Latest Reply
rdokala
New Contributor III
  • 5 kudos

Great article!

  • 5 kudos
damodhargandha
by New Contributor
  • 100 Views
  • 1 replies
  • 0 kudos

Can we do a shallow clone on top of a shallow clone!

Case 1 Can we do a shallow clone on top of a shallow clone? If I do so, What would be the result Case 2  when ever the table over writes with new data how does it work.Case 3  when a the table is dropped and loaded then how does this scenario effect ...

  • 100 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @damodhargandha, A good way to think about a shallow clone is that it copies the table’s metadata but still points to the source table's data files rather than copying them. Databricks explains that behaviour in the Clone a table " section of the ...

  • 0 kudos
Vladif1
by New Contributor II
  • 11645 Views
  • 9 replies
  • 1 kudos

Error when reading delta lake files with Auto Loader

Hi,When reading Delta Lake file (created by Auto Loader) with this code: df = (    spark.readStream    .format('cloudFiles')    .option("cloudFiles.format", "delta")    .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint")    .load(bronz...

  • 11645 Views
  • 9 replies
  • 1 kudos
Latest Reply
jimmylink
New Contributor
  • 1 kudos

I've been having similar issues with reading Delta Lake files and I think the solution lies in adjusting the format option. When working with Delta tables, it's essential to use the correct format to avoid compatibility issues. This reminds me of the...

  • 1 kudos
8 More Replies
Albertino
by New Contributor
  • 124 Views
  • 1 replies
  • 0 kudos

databricks-connect library for python and pandas 3

Hello,databricks-connect is pinning pandas during the installation. Since we're moving towards pandas 3 can you please add the support for the newest version as well?

Data Engineering
databricks-connect
Pandas
python
  • 124 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor II
  • 0 kudos

Hi @Albertino , how are you doing today?Thanks for calling this out. as per my understanding, at the moment, the pandas pin is there intentionally. Databricks Connect release notes say supported pandas versions are currently limited to 1.0.5<=pandas<...

  • 0 kudos
anmolhhns
by New Contributor III
  • 120 Views
  • 1 replies
  • 1 kudos

Databricks apps

I have multiple Databricks Apps running, but their usage is not fixed or predictable. Some apps are used only occasionally, while others may remain idle for long periods.Since Databricks Apps need to stay up and continue consuming resources even when...

  • 120 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @anmolhhns, I couldn't find any public documentation showing that Databricks Apps supports automatic idle shutdown or usage-based scale-to-zero for the app runtime itself. The current documented lifecycle is that an app can be Running, Stopped, De...

  • 1 kudos
naveen0808
by New Contributor II
  • 161 Views
  • 1 replies
  • 0 kudos

From RAG Demo to Production on Databricks: 7 Things Teams Should Validate First

From RAG Demo to Production on Databricks: 7 Things Teams Should Validate FirstBy Naveen AyallaMany teams can build a RAG demo quickly.Upload documents, create embeddings, connect a model, ask a question, and show an answer.But production is differen...

naveen0808_0-1780880239856.png
  • 161 Views
  • 1 replies
  • 0 kudos
Latest Reply
naveen0808
New Contributor II
  • 0 kudos

Thanks for reading. I’m especially interested in hearing from people who have worked on real RAG or GenAI workflows.Which one has been the biggest challenge for your team?1. Choosing the right source data2. Access control and governance3. Improving r...

  • 0 kudos
Jotaefe1991
by New Contributor
  • 199 Views
  • 3 replies
  • 0 kudos

[Lakeflow Spark Declarative Pipelines] - Compatibility Mode not working

I’m working with an SDP pipeline that creates a streaming table using the dlt.create_streaming_table decorator. My goal is to expose this table through an external location so that a client can read it from Snowflake.I attempted to configure this dir...

  • 199 Views
  • 3 replies
  • 0 kudos
Latest Reply
ShamenParis
New Contributor II
  • 0 kudos

Hi @Jotaefe1991 ,The overlap error you are hitting is a Unity Catalog storage collision, not a DLT limitation.Here is exactly what is happening and how to fix it:The path you provided for "delta.universalFormat.compatibility.location" (abfss://.../br...

  • 0 kudos
2 More Replies
Labels