cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Rohansingh01
by Databricks Partner
  • 57 Views
  • 1 replies
  • 3 kudos

My experience replacing a Postgres → Kafka → DMS → S3 pipeline with Lakeflow Connect

Sharing my hands-on experience with Lakeflow Connect for anyone evaluating it for database ingestion. I recently moved data from PostgreSQL on AWS RDS into Databricks, and it replaced a painful legacy pipeline. Keeping this simple and practical.What ...

  • 57 Views
  • 1 replies
  • 3 kudos
Latest Reply
rdokala
New Contributor III
  • 3 kudos

Great article!

  • 3 kudos
damodhargandha
by Visitor
  • 38 Views
  • 1 replies
  • 0 kudos

Can we do a shallow clone on top of a shallow clone!

Case 1 Can we do a shallow clone on top of a shallow clone? If I do so, What would be the result Case 2  when ever the table over writes with new data how does it work.Case 3  when a the table is dropped and loaded then how does this scenario effect ...

  • 38 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @damodhargandha, A good way to think about a shallow clone is that it copies the table’s metadata but still points to the source table's data files rather than copying them. Databricks explains that behaviour in the Clone a table " section of the ...

  • 0 kudos
Vladif1
by New Contributor II
  • 11580 Views
  • 9 replies
  • 1 kudos

Error when reading delta lake files with Auto Loader

Hi,When reading Delta Lake file (created by Auto Loader) with this code: df = (    spark.readStream    .format('cloudFiles')    .option("cloudFiles.format", "delta")    .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint")    .load(bronz...

  • 11580 Views
  • 9 replies
  • 1 kudos
Latest Reply
jimmylink
Visitor
  • 1 kudos

I've been having similar issues with reading Delta Lake files and I think the solution lies in adjusting the format option. When working with Delta tables, it's essential to use the correct format to avoid compatibility issues. This reminds me of the...

  • 1 kudos
8 More Replies
bi_123
by New Contributor III
  • 39 Views
  • 1 replies
  • 0 kudos

Best practice to log Autoloader UNKNOWN_FIELD_EXCEPTION

Hi, When schema evolution is detected, Auto Loader throws an UNKNOWN_FIELD_EXCEPTION, and the error message includes schema information along with other related details. However, when I log the full message, it is too long and contains information th...

  • 39 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @bi_123, I would avoid parsing the full rendered UNKNOWN_FIELD_EXCEPTION message. Databricks explicitly notes in the error-handling documentation that the rendered and parameterised messages are not stable across releases, so any logic that depend...

  • 0 kudos
Albertino
by Visitor
  • 62 Views
  • 1 replies
  • 0 kudos

databricks-connect library for python and pandas 3

Hello,databricks-connect is pinning pandas during the installation. Since we're moving towards pandas 3 can you please add the support for the newest version as well?

Data Engineering
databricks-connect
Pandas
python
  • 62 Views
  • 1 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor II
  • 0 kudos

Hi @Albertino , how are you doing today?Thanks for calling this out. as per my understanding, at the moment, the pandas pin is there intentionally. Databricks Connect release notes say supported pandas versions are currently limited to 1.0.5<=pandas<...

  • 0 kudos
anmolhhns
by New Contributor III
  • 48 Views
  • 1 replies
  • 1 kudos

Databricks apps

I have multiple Databricks Apps running, but their usage is not fixed or predictable. Some apps are used only occasionally, while others may remain idle for long periods.Since Databricks Apps need to stay up and continue consuming resources even when...

  • 48 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @anmolhhns, I couldn't find any public documentation showing that Databricks Apps supports automatic idle shutdown or usage-based scale-to-zero for the app runtime itself. The current documented lifecycle is that an app can be Running, Stopped, De...

  • 1 kudos
naveen0808
by New Contributor
  • 100 Views
  • 1 replies
  • 0 kudos

From RAG Demo to Production on Databricks: 7 Things Teams Should Validate First

From RAG Demo to Production on Databricks: 7 Things Teams Should Validate FirstBy Naveen AyallaMany teams can build a RAG demo quickly.Upload documents, create embeddings, connect a model, ask a question, and show an answer.But production is differen...

naveen0808_0-1780880239856.png
  • 100 Views
  • 1 replies
  • 0 kudos
Latest Reply
naveen0808
New Contributor
  • 0 kudos

Thanks for reading. I’m especially interested in hearing from people who have worked on real RAG or GenAI workflows.Which one has been the biggest challenge for your team?1. Choosing the right source data2. Access control and governance3. Improving r...

  • 0 kudos
Jotaefe1991
by New Contributor
  • 150 Views
  • 3 replies
  • 0 kudos

[Lakeflow Spark Declarative Pipelines] - Compatibility Mode not working

I’m working with an SDP pipeline that creates a streaming table using the dlt.create_streaming_table decorator. My goal is to expose this table through an external location so that a client can read it from Snowflake.I attempted to configure this dir...

  • 150 Views
  • 3 replies
  • 0 kudos
Latest Reply
ShamenParis
New Contributor II
  • 0 kudos

Hi @Jotaefe1991 ,The overlap error you are hitting is a Unity Catalog storage collision, not a DLT limitation.Here is exactly what is happening and how to fix it:The path you provided for "delta.universalFormat.compatibility.location" (abfss://.../br...

  • 0 kudos
2 More Replies
amirabedhiafi
by Contributor
  • 124 Views
  • 3 replies
  • 3 kudos

Resolved! json file existing in volume but not showing in UI

I have some json files existing in a specific volume when I try to search for them they don't appear but when I query the the volume using python I am able to get them and read their content.Any help ?

  • 124 Views
  • 3 replies
  • 3 kudos
Latest Reply
ShamenParis
New Contributor II
  • 3 kudos

Hi @amirabedhiafi ,Catalog Explorer search won't return these files. This is likely because raw files in Volumes can change rapidly and aren't tracked in the system tables in the same way structured data is.Instead, I would suggest using a Genie Spac...

  • 3 kudos
2 More Replies
jfrohnhaus
by New Contributor
  • 197 Views
  • 2 replies
  • 2 kudos

Resolved! Recurring Historical Data Modeling Patterns

After reviewing a surprising number of Databricks discussions around SCD2, CDC, historical reporting and temporal joins, I noticed that most historical data modeling challenges seem to fall into a small set of recurring patterns:Historical BackfillLa...

  • 197 Views
  • 2 replies
  • 2 kudos
Latest Reply
amirabedhiafi
Contributor
  • 2 kudos

Hello!I would add a few more historical modeling patterns that often appear separately, even though they overlap with SCD2, CDC, or temporal joins.One important case is bi-temporal modeling, where you need to separate business effective time from sys...

  • 2 kudos
1 More Replies
Danny_Lee
by Databricks Partner
  • 3906 Views
  • 1 replies
  • 2 kudos

re: Welcoming Bladebridge to Databricks!

Hi @Sujitha and Databricks team,Congrats on the acquisition of Bladebridge.  We used this tool a couple years back to migrate an important ETL process from Informatica.  I'm glad to see its part of the Data Intelligence Platform and have already take...

  • 3906 Views
  • 1 replies
  • 2 kudos
Latest Reply
amirabedhiafi
Contributor
  • 2 kudos

Hi,Thank you for sharing this feedback, and great to hear that you have already used BladeBridge successfully in a previous Informatica migration.I agree that a dedicated BladeBridge forum or community section would be useful, especially now that mor...

  • 2 kudos
nito
by New Contributor II
  • 652 Views
  • 1 replies
  • 1 kudos

New remote (dbfs) caching python library

I had some problems getting much speedup at all from spark or DB disk cache, which I think is essential when developing PySpark code iteratively in notebooks. So I developed a handy caching-library for this which has recently been open sourced, see h...

  • 652 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
Contributor
  • 1 kudos

Thanks for sharing this. It looks useful, especially for iterative notebook development where the expensive part is not just reading source files but recomputing a complex intermediate DataFrame after several joins or transformations.I can see the va...

  • 1 kudos
aharisaibabu
by New Contributor
  • 241 Views
  • 4 replies
  • 1 kudos

Ingest data from snowflake to databricks

Hi Team,I have some confusion regarding the best approach for ingesting data from Snowflake into Databricks using custom SQL queries.While evaluating the available options, I found multiple approaches:Snowflake Spark ConnectorJDBCQuery FederationLake...

  • 241 Views
  • 4 replies
  • 1 kudos
Latest Reply
souryabarnwal
Databricks Partner
  • 1 kudos

Thanks for raising this question. I recently evaluated similar options for Snowflake-to-Databricks ingestion and would like to share my perspective.From my understanding, the choice depends on whether your primary focus is performance, ease of manage...

  • 1 kudos
3 More Replies
Gaganmjain_012
by New Contributor
  • 643 Views
  • 2 replies
  • 0 kudos

AI/BI Genie

I was working with genie and started using Research agent, and now I want to make the genie as a sharable Infrastructure as Code where I can manage all the changes through GitHub and so does anyone have any suggestions how to do this in a best optimi...

  • 643 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @Gaganmjain_012, I know this is a late reply, but if you are looking for the cleanest way to make a Genie Space shareable and manageable as infrastructure-as-code, the best pattern today is to keep the Genie Space configuration in GitHub and deplo...

  • 0 kudos
1 More Replies
cdn_yyz_yul
by Contributor II
  • 259 Views
  • 5 replies
  • 1 kudos

Declarative pipeline full table refresh generates empty MV.

Hi everyone,- the situation:I have a Declarative pipeline. The pipeline consists several .py files.file1.py: creates a Materialized  View: description.file2.py: create 2nd Materialized View by reading a table "transactions" and reading the MV "descri...

  • 259 Views
  • 5 replies
  • 1 kudos
Latest Reply
mazeem-arbisoft
New Contributor II
  • 1 kudos

@cdn_yyz_yul I think that reading part is where your problem lies. When reading from same pipeline produced datasets, you shouldn't use 3 level name, instead follow DLT's way.Older pipeline versions, where "target" field was used for target schema de...

  • 1 kudos
4 More Replies
Labels