cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

BF7
by Contributor
  • 1912 Views
  • 8 replies
  • 3 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?  How does that affect the behavior of cloud...

  • 1912 Views
  • 8 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

@mits1 , if you are happy with the answer please click on "Accept as Solution." It will give confidence to others.  Cheers, Lou.  

  • 3 kudos
7 More Replies
Eliza_geo
by New Contributor II
  • 209 Views
  • 1 replies
  • 1 kudos

Databricks SQL Alerts and Jobs Integration: Legacy vs V2

Hi,I’m facing a challenge integrating Databricks SQL alerts with a Databricks Job.After reviewing the documentation, I arrived at the following understanding and would appreciate confirmation from the community or Databricks team:Legacy SQL alerts ca...

  • 209 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @Eliza_geo ,Given your Unity Catalog + serverless + DABS environment, here's what makes sense right now:For new pipelines: Stick to legacy SQL alerts invoked via a SQL task in Jobs. Avoid the SQL Alert task (Beta) in production workflows — Beta fe...

  • 1 kudos
abhishek13
by New Contributor II
  • 171 Views
  • 2 replies
  • 1 kudos

Connection reset error from Databricks notebook but works via curl (GCP)

Hi everyone,I’m facing a connectivity issue in my Databricks workspace on GCP and would appreciate any guidance. ProblemWhen I run commands from a Databricks notebook, I see intermittent errors like:Connection reset Retrying request to https://us-eas...

  • 171 Views
  • 2 replies
  • 1 kudos
Latest Reply
abhishek13
New Contributor II
  • 1 kudos

can someone help on this

  • 1 kudos
1 More Replies
Manjusha
by New Contributor II
  • 358 Views
  • 4 replies
  • 2 kudos

Running python functions (written using polars) on databricks

Hi,We are planning to re-write our application ( which was originally running in R) in python. We chose to use Polars as they seems to be faster than pandas. We have functions written in R which we are planning to convert to Python.However in one of ...

  • 358 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@Manjusha , if you are happy with my response please click on "Accept as Solution." It will help others trust the guidance.

  • 2 kudos
3 More Replies
sdurai
by New Contributor II
  • 522 Views
  • 4 replies
  • 4 kudos

Resolved! Databricks to Salesforce Core (Not cloud)

Hi,Is there any native connector available to connect salesforce core (not cloud) in Databricks? If no native connector, what are all recommended approaches to connect to Salesforce coreThanks,Subashini

  • 522 Views
  • 4 replies
  • 4 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 4 kudos

Hi @sdurai, Yes. Databricks has a native Salesforce connector for core Salesforce (Sales Cloud / Service Cloud / Platform objects) via Lakeflow Connect - Salesforce ingestion connector. It lets you create fully managed, incremental pipelines from Sal...

  • 4 kudos
3 More Replies
P10d
by New Contributor
  • 255 Views
  • 1 replies
  • 0 kudos

Connect Databrick's cluster with Artifactory

Hello,I'm trying to connect databricks with an own JFrog Artifactory. The objective is to download both PIP/JAR dependencies from it instead of connecting to maven-central/PyPi. Im struggling with JAR's. My aproximation to solve the problem is:1. Cre...

  • 255 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I haven't got the ability to test this myself but based on some internal research, I think the following is true: Hi, The most likely issue is your truststore configuration. Setting spark.driver.extraJavaOptions -Djavax.net.ssl.trustStore=<custom...

  • 0 kudos
bi123
by New Contributor
  • 229 Views
  • 3 replies
  • 0 kudos

How to import python modules in a notebook?

I have a job with a notebook task that utilizes python modules in another folder than the notebook itself. When I try to import the module in the notebook, it raises module not found error. I solved the problem using sys.pathBut I am curious if there...

image.png
  • 229 Views
  • 3 replies
  • 0 kudos
Latest Reply
shazi
New Contributor III
  • 0 kudos

While this is a classic way to solve this, it can sometimes be "brittle" if your folder structure changes or if you share the notebook with others who have different file paths. In modern notebook environments.

  • 0 kudos
2 More Replies
ittzzmalind
by New Contributor II
  • 251 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Workspace - Unknow IP access

Azure monitor log showing unknow ip authentication requests to Databricks workspace . -- When searched ip below url, result showing its from AZURE CLOUD : <Region> (the region is same as workspace)https://azureipranges.azurewebsites.net/SearchFor -- ...

  • 251 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ittzzmalind, Because the IP is in the same Azure region but not listed in the Azure Databricks control plane ranges, it’s very likely not a Databricks owned control plane IP. It’s typically either a user or service coming from another Azure resou...

  • 1 kudos
murtadha_s
by Databricks Partner
  • 145 Views
  • 1 replies
  • 0 kudos

Default ACL for Jobs and Clusters

Hi,I want to set default ACL that applies to all created jobs and clusters, according to a cluster policy for example, but currently I need to apply my ACL at every created job/cluster separately.is there a way to do that?BR,

  • 145 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @murtadha_s Can you please clarify what you are after? The second part of your question sounded more like a statement: "but currently I need to apply my ACL at every created job/cluster separately," and that confused me a bit.  To make sure we poi...

  • 0 kudos
sai_sakhamuri
by Databricks Partner
  • 626 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks optimization for query perfomance and pipeline run

I am currently working on optimizing several Spark pipelines and wanted to gather community insights on advanced performance tuning. Typically, my workflow for traditional SQL optimization involves a deep dive into the execution plan to identify bott...

  • 626 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @sai_sakhamuri You're clearly past the basics. Let me give you a practitioner-level breakdown of each layer you mentioned, plus a few things that often get overlooked.Spark Catalyst Optimizer — Working With the Rules EngineCatalyst operates in fou...

  • 1 kudos
databrciks
by New Contributor III
  • 467 Views
  • 3 replies
  • 1 kudos

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

I need to load many tables into Bronze layer connecting to sql server DB. How can i pass the tables names dynamically in DLT. Means one code pass many tables and load into bronze layer

  • 467 Views
  • 3 replies
  • 1 kudos
Latest Reply
databrciks
New Contributor III
  • 1 kudos

Hi Ashwin Thanks for the quick response. Yes I want to pass all the tables through config parameter/param file and load that into bronze layerI will try this approach. Thanks 

  • 1 kudos
2 More Replies
ittzzmalind
by New Contributor II
  • 191 Views
  • 2 replies
  • 0 kudos

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

Databricks ETL pipeline, specifically an error with the @DP.expectorfail decorator causing the pipeline update to fail. The error message indicated a 'key not found: all_info_dlt_cx_utils_cod ' resulting in a NoSuchElementException.Note: if we commen...

  • 191 Views
  • 2 replies
  • 0 kudos
Latest Reply
ittzzmalind
New Contributor II
  • 0 kudos

@MoJaMa Thanks for the reply, The issue was in the code, corrected code worked

  • 0 kudos
1 More Replies
IM_01
by Contributor III
  • 1260 Views
  • 19 replies
  • 3 kudos

Resolved! Lakeflow SDP failed with DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG

Hi,A column was deleted on the source table, when I ran LSDP it failed with error DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG : Streaming read is not supported on tables with read-incompatible schema changes( e.g: rename or drop or datatype ch...

  • 1260 Views
  • 19 replies
  • 3 kudos
Latest Reply
gullsher98743
New Contributor II
  • 3 kudos

This looks like a very practical template, especially for teams trying to structure their Data & AI strategy without overcomplicating things. The step-by-step format and examples should be really helpful for workshops and collaborative sessions. Curi...

  • 3 kudos
18 More Replies
stemill
by New Contributor II
  • 758 Views
  • 7 replies
  • 0 kudos

update on iceberg table creating duplicate records

We are using databricks to connect to a glue catalog which contains iceberg tables. We are using DBR 17.2 and adding the jars org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.10.0org.apache.iceberg:iceberg-aws-bundle:1.10.0the spark config is then...

  • 758 Views
  • 7 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi  @stemill , The way of connecting to Iceberg tables managed by Glue catalog that you described is not officially supported. Because spark_catalog is not a generic catalog slot – it’s a special, tightly‑wired session catalog with a lot of assumptio...

  • 0 kudos
6 More Replies
beaglerot
by Databricks Partner
  • 529 Views
  • 4 replies
  • 6 kudos

Resolved! Python Data Source API — worth using?

Hi all,I’ve been looking into the Python Data Source API and wanted to get some feedback from others who may be experimenting with it.One of the more common challenges I run into is working with applications that expose APIs but don’t have out-of-the...

  • 529 Views
  • 4 replies
  • 6 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 6 kudos

Adding on to @edonaire, which are accurate. @beaglerot , your contacts project is the right use case for the pattern you have. Small data, infrequent changes, direct read into bronze. That works. The real question you're asking is what happens when t...

  • 6 kudos
3 More Replies
Labels