cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sdurai
by Visitor
  • 10 Views
  • 0 replies
  • 0 kudos

Databricks to Salesforce Core (Not cloud)

Hi,Is there any native connector available to connect salesforce core (not cloud) in Databricks? If no native connector, what are all recommended approaches to connect to Salesforce coreThanks,Subashini

  • 10 Views
  • 0 replies
  • 0 kudos
mits1
by New Contributor III
  • 60 Views
  • 5 replies
  • 0 kudos

Autoloader inserts null rows in delta table while reading json file

Hi,I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).Code :spark.readStream...

  • 60 Views
  • 5 replies
  • 0 kudos
Latest Reply
mits1
New Contributor III
  • 0 kudos

Hi @lingareddy_Alva ,Thank you for your response.Just to inform you that1. I am using Databrick's free edition to execute code using Serverless which doesnt allow me to get the partition numbers.2. I intentionaly did not want to use/specify schema to...

  • 0 kudos
4 More Replies
BF7
by Contributor
  • 1601 Views
  • 3 replies
  • 2 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?  How does that affect the behavior of cloud...

  • 1601 Views
  • 3 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema:When cloudFiles.inferColumnTypes is enabled, Auto Loader attempts to identify the appropriate data types for columns instead of defaulting everything to strings, which i...

  • 2 kudos
2 More Replies
beaglerot
by Databricks Partner
  • 120 Views
  • 4 replies
  • 5 kudos

Python Data Source API — worth using?

Hi all,I’ve been looking into the Python Data Source API and wanted to get some feedback from others who may be experimenting with it.One of the more common challenges I run into is working with applications that expose APIs but don’t have out-of-the...

  • 120 Views
  • 4 replies
  • 5 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 5 kudos

Adding on to @edonaire, which are accurate. @beaglerot , your contacts project is the right use case for the pattern you have. Small data, infrequent changes, direct read into bronze. That works. The real question you're asking is what happens when t...

  • 5 kudos
3 More Replies
Manjusha
by New Contributor II
  • 82 Views
  • 3 replies
  • 1 kudos

Running python functions (written using polars) on databricks

Hi,We are planning to re-write our application ( which was originally running in R) in python. We chose to use Polars as they seems to be faster than pandas. We have functions written in R which we are planning to convert to Python.However in one of ...

  • 82 Views
  • 3 replies
  • 1 kudos
Latest Reply
Manjusha
New Contributor II
  • 1 kudos

Thank you @Louis_Frolio and @pradeep_singh for the detailed explanation. I will discuss your inputs with the team and get back in case we have more question..

  • 1 kudos
2 More Replies
maikel
by Contributor II
  • 68 Views
  • 2 replies
  • 2 kudos

Running Spark Tests

Hello Community!writing to you with the question about what are the best way to run spark unit tests in databricks. Currently we have a set of notebooks which are responsible for doing the operations on the data (joins, merging etc.).Of course to do ...

  • 68 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Great suggestions  @lingareddy_Alva  regarding Databricks Connect v2! @maikel , A few things to layer on top of that. First, the fact that you already have your functions in a separate directory outside of notebooks is exactly the right foundation. T...

  • 2 kudos
1 More Replies
Malthe
by Valued Contributor II
  • 206 Views
  • 1 replies
  • 0 kudos

Observable API and Delta Table merge

Using the Observable API on the source dataframe to a Delta Table merge seems to hang indefinitely.Steps to reproduce:Create one or more pyspark.sql.Observation objects.Use DataFrame.observe on the merge source.Run merge.Accessing Observation.get blo...

  • 206 Views
  • 1 replies
  • 0 kudos
Latest Reply
AnthonyAnand
Databricks Partner
  • 0 kudos

Hi @Malthe,   You have hit a very specific, known behavioral gap in how Apache Spark and Delta Lake interact. To answer your question directly: Yes, the Observable API is effectively incompatible with Delta Table merges when used directly. Why It ...

  • 0 kudos
stemill
by New Contributor II
  • 412 Views
  • 6 replies
  • 0 kudos

update on iceberg table creating duplicate records

We are using databricks to connect to a glue catalog which contains iceberg tables. We are using DBR 17.2 and adding the jars org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.10.0org.apache.iceberg:iceberg-aws-bundle:1.10.0the spark config is then...

  • 412 Views
  • 6 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi  @stemill , The way of connecting to Iceberg tables managed by Glue catalog that you described is not officially supported. Because spark_catalog is not a generic catalog slot – it’s a special, tightly‑wired session catalog with a lot of assumptio...

  • 0 kudos
5 More Replies
Ashwin_DSA
by Databricks Employee
  • 102 Views
  • 1 replies
  • 2 kudos

Is Address Line 4 the place where data goes to die?

I’ve spent the last few years jumping between insurance, healthcare, and retail, and I’ve come to a very painful conclusion that we should never have let humans type their own addresses into a text box.  For a pet project, I’m currently looking at a ...

  • 102 Views
  • 1 replies
  • 2 kudos
Latest Reply
pradeep_singh
Contributor
  • 2 kudos

I have never worked on this problem but based on previous posts from other community user i have come to know that fuzzy logic can help finding records that are most likely to be same or similar . Here are some links where this has been discussed i g...

  • 2 kudos
kevinzhang29
by New Contributor II
  • 70 Views
  • 1 replies
  • 1 kudos

Issue with create_auto_cdc_flow Not Updating Business Columns for DELETE Events

We 're currently working with Databricks AUTO CDC in a data pipeline and have encountered an issue with create_auto_cdc_flow (AUTO CDC) when using SCD Type 2. We are using the following configuration: stored_as_scd_type = 2apply_as_deletes = expr("op...

  • 70 Views
  • 1 replies
  • 1 kudos
Latest Reply
pradeep_singh
Contributor
  • 1 kudos

Operation type DELETE means the record is supposed to disappear. If you were using SCD Type 1, the record would be removed from the silver table. When using SCD Type 2, AUTO CDC only updates the lifecycle metadata columns to make the record inactive;...

  • 1 kudos
GarciaJorge
by New Contributor
  • 149 Views
  • 3 replies
  • 4 kudos

Resolved! DLT with CDC and schema changes in streaming pipelines

Hi everyone,I’m dealing with a scenario combining Delta Live Tables, CDC ingestion, and streaming pipelines, and I’ve hit a challenge that I haven’t seen clearly addressed in the docs.Some Context:Source is an upstream system emitting CDC events (ins...

  • 149 Views
  • 3 replies
  • 4 kudos
Latest Reply
edonaire
New Contributor
  • 4 kudos

In practice, the impact of adding a normalization layer is usually small compared to the gains in stability and control.At scale, the key is how you implement that layer. If it is designed to operate incrementally and aligned with your partitioning s...

  • 4 kudos
2 More Replies
alexu4798644233
by New Contributor III
  • 2480 Views
  • 2 replies
  • 0 kudos

ETL or Transformations Testing Framework for Databricks

Hi! I'm looking for any ETL or Transformations Testing Framework for Databricks -need to support automation of the following steps:1) create/store test datasets (mock inputs and a golden copy of the output),2) run ETL (notebook) being tested3) compar...

  • 2480 Views
  • 2 replies
  • 0 kudos
Latest Reply
rameshcsert
New Contributor
  • 0 kudos

Hi Rjdudley, tuff for me to understand the readme file and execute the framework. can you post video of how to install and use for any custom data source with customization test cases

  • 0 kudos
1 More Replies
Labels