cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Databricks MVP
  • 73 Views
  • 1 replies
  • 0 kudos

Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

 Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...

  • 73 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Honored Contributor
  • 0 kudos

@Phani1  Check my POV:- Follow the Delta purge lifecycle: DELETE → REORG TABLE APPLY (PURGE) → VACUUM- Metadata + Automation: use a control table + Databricks Workflows for scalable, policy-based execution.- Retention by layer + audit centrally: Bron...

  • 0 kudos
mits1
by New Contributor III
  • 326 Views
  • 11 replies
  • 3 kudos

Resolved! Autoloader inserts null rows in delta table while reading json file

Hi,I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).Code :spark.readStream...

  • 326 Views
  • 11 replies
  • 3 kudos
Latest Reply
karthickrs
New Contributor III
  • 3 kudos

Hi ,The extra rows could have been caused by various reasons:Extra files in the directoryEmpty or corrupt recordsNon-JSON content being picked up on the first runYou could make sure that your input path contains only valid JSON files or you could mod...

  • 3 kudos
10 More Replies
BF7
by Contributor
  • 1717 Views
  • 8 replies
  • 3 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?  How does that affect the behavior of cloud...

  • 1717 Views
  • 8 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

@mits1 , if you are happy with the answer please click on "Accept as Solution." It will give confidence to others.  Cheers, Lou.  

  • 3 kudos
7 More Replies
Eliza_geo
by New Contributor
  • 81 Views
  • 1 replies
  • 1 kudos

Databricks SQL Alerts and Jobs Integration: Legacy vs V2

Hi,I’m facing a challenge integrating Databricks SQL alerts with a Databricks Job.After reviewing the documentation, I arrived at the following understanding and would appreciate confirmation from the community or Databricks team:Legacy SQL alerts ca...

  • 81 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @Eliza_geo ,Given your Unity Catalog + serverless + DABS environment, here's what makes sense right now:For new pipelines: Stick to legacy SQL alerts invoked via a SQL task in Jobs. Avoid the SQL Alert task (Beta) in production workflows — Beta fe...

  • 1 kudos
abhishek13
by New Contributor
  • 78 Views
  • 2 replies
  • 1 kudos

Connection reset error from Databricks notebook but works via curl (GCP)

Hi everyone,I’m facing a connectivity issue in my Databricks workspace on GCP and would appreciate any guidance. ProblemWhen I run commands from a Databricks notebook, I see intermittent errors like:Connection reset Retrying request to https://us-eas...

  • 78 Views
  • 2 replies
  • 1 kudos
Latest Reply
abhishek13
New Contributor
  • 1 kudos

can someone help on this

  • 1 kudos
1 More Replies
MaartenH
by New Contributor III
  • 3798 Views
  • 12 replies
  • 4 kudos

Lakehouse federation for SQL server: database name with spaces

We're currently using lakehouse federation for various sources (Snowflake, SQL Server); usually succesful. However we've encountered a case where one of the databases on the SQL Server has spaces in its name, e.g. 'My Database Name'. We've tried vari...

  • 3798 Views
  • 12 replies
  • 4 kudos
Latest Reply
QueryingQuail
New Contributor III
  • 4 kudos

Hello all,We have a good amount of tables from an external ERP system that are being replicated to an existing dwh in an Azure SQL Server database.We have set up a foreign connection for this database and we can connect to the server and database. Sa...

  • 4 kudos
11 More Replies
Manjusha
by New Contributor II
  • 186 Views
  • 4 replies
  • 2 kudos

Running python functions (written using polars) on databricks

Hi,We are planning to re-write our application ( which was originally running in R) in python. We chose to use Polars as they seems to be faster than pandas. We have functions written in R which we are planning to convert to Python.However in one of ...

  • 186 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@Manjusha , if you are happy with my response please click on "Accept as Solution." It will help others trust the guidance.

  • 2 kudos
3 More Replies
sdurai
by New Contributor
  • 223 Views
  • 4 replies
  • 4 kudos

Resolved! Databricks to Salesforce Core (Not cloud)

Hi,Is there any native connector available to connect salesforce core (not cloud) in Databricks? If no native connector, what are all recommended approaches to connect to Salesforce coreThanks,Subashini

  • 223 Views
  • 4 replies
  • 4 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 4 kudos

Hi @sdurai, Yes. Databricks has a native Salesforce connector for core Salesforce (Sales Cloud / Service Cloud / Platform objects) via Lakeflow Connect - Salesforce ingestion connector. It lets you create fully managed, incremental pipelines from Sal...

  • 4 kudos
3 More Replies
P10d
by New Contributor
  • 158 Views
  • 1 replies
  • 0 kudos

Connect Databrick's cluster with Artifactory

Hello,I'm trying to connect databricks with an own JFrog Artifactory. The objective is to download both PIP/JAR dependencies from it instead of connecting to maven-central/PyPi. Im struggling with JAR's. My aproximation to solve the problem is:1. Cre...

  • 158 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I haven't got the ability to test this myself but based on some internal research, I think the following is true: Hi, The most likely issue is your truststore configuration. Setting spark.driver.extraJavaOptions -Djavax.net.ssl.trustStore=<custom...

  • 0 kudos
mordex
by New Contributor III
  • 77 Views
  • 1 replies
  • 0 kudos

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

  Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)Hey everyone,I’m stuck with a Databricks workflow design and could use some advice.Currently, we are calling 70+ APIs Right now the workflow looks something l...

  • 77 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @mordex This is a classic high-frequency orchestration problem on Databricks. The core issue is that Databricks job clusters are designed for batch workloads, not sub-5-minute polling loops. Job clusters have a ~3–5 min cold start. For a 1-min fre...

  • 0 kudos
bi123
by New Contributor
  • 119 Views
  • 3 replies
  • 0 kudos

How to import python modules in a notebook?

I have a job with a notebook task that utilizes python modules in another folder than the notebook itself. When I try to import the module in the notebook, it raises module not found error. I solved the problem using sys.pathBut I am curious if there...

image.png
  • 119 Views
  • 3 replies
  • 0 kudos
Latest Reply
shazi
New Contributor III
  • 0 kudos

While this is a classic way to solve this, it can sometimes be "brittle" if your folder structure changes or if you share the notebook with others who have different file paths. In modern notebook environments.

  • 0 kudos
2 More Replies
ittzzmalind
by New Contributor II
  • 60 Views
  • 1 replies
  • 0 kudos

Databricks Workspace - Unknow IP access

Azure monitor log showing unknow ip authentication requests to Databricks workspace . -- When searched ip below url, result showing its from AZURE CLOUD : <Region> (the region is same as workspace)https://azureipranges.azurewebsites.net/SearchFor -- ...

  • 60 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @ittzzmalind, Because the IP is in the same Azure region but not listed in the Azure Databricks control plane ranges, it’s very likely not a Databricks owned control plane IP. It’s typically either a user or service coming from another Azure resou...

  • 0 kudos
murtadha_s
by Databricks Partner
  • 74 Views
  • 1 replies
  • 0 kudos

Default ACL for Jobs and Clusters

Hi,I want to set default ACL that applies to all created jobs and clusters, according to a cluster policy for example, but currently I need to apply my ACL at every created job/cluster separately.is there a way to do that?BR,

  • 74 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @murtadha_s Can you please clarify what you are after? The second part of your question sounded more like a statement: "but currently I need to apply my ACL at every created job/cluster separately," and that confused me a bit.  To make sure we poi...

  • 0 kudos
sai_sakhamuri
by Databricks Partner
  • 218 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks optimization for query perfomance and pipeline run

I am currently working on optimizing several Spark pipelines and wanted to gather community insights on advanced performance tuning. Typically, my workflow for traditional SQL optimization involves a deep dive into the execution plan to identify bott...

  • 218 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @sai_sakhamuri You're clearly past the basics. Let me give you a practitioner-level breakdown of each layer you mentioned, plus a few things that often get overlooked.Spark Catalyst Optimizer — Working With the Rules EngineCatalyst operates in fou...

  • 1 kudos
Labels