cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HamzaJosh
by New Contributor II
  • 17952 Views
  • 7 replies
  • 3 kudos

I want to use databricks workers to run a function in parallel on the worker nodes

I have a function making api calls. I want to run this function in parallel so I can use the workers in databricks clusters to run it in parallel. I have tried with ThreadPoolExecutor() as executor: results = executor.map(getspeeddata, alist)to run m...

  • 17952 Views
  • 7 replies
  • 3 kudos
Latest Reply
HamzaJosh
New Contributor II
  • 3 kudos

You guys are not getting the point, I am making API calls in a function and want to store the results in a dataframe. I want multiple processes to run this task in parallel. How do I create a UDF and use it in a dataframe when the task is calling an ...

  • 3 kudos
6 More Replies
ittzzmalind
by New Contributor II
  • 40 Views
  • 1 replies
  • 0 kudos

Delta Sharing with Materialized View - recepient data not refreshing when using Open Protocol

Scenario: Delta Sharing with Materialized ViewProvider Side Setup :->A Delta Share was created.->A materialized view was added to the share.->Recipients Created-> 1). Open Delta Sharing recipient       Accessed using Python (import delta_sharing)->2)...

  • 40 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @ittzzmalind, This is expected behaviour and is mainly due to how Delta Sharing handles materialized views for open (non-Databricks) recipients versus Databricks-to-Databricks recipients. For Databricks-to-Databricks recipients, the shared materia...

  • 0 kudos
David_Dabbs
by Databricks Partner
  • 142 Views
  • 3 replies
  • 1 kudos

Resolved! Inquiring whether table triggers are the recommended tool for the job

Seeking the DBRX-appropriate patterns for our application. We have a number of workspaces governed by the same Unity Catalog. One workspace we'll call the 'producer'. It manages data via external custom API interfaces.There are a number of internal c...

  • 142 Views
  • 3 replies
  • 1 kudos
Latest Reply
David_Dabbs
Databricks Partner
  • 1 kudos

Thank you @lingareddy_Alva. Your considered response lives up to your forum title: Esteemed Contributor. 1. Notification. Appreciate the confirmation that the determinism and control is worth the small bit of explicit configuration given the limited ...

  • 1 kudos
2 More Replies
Phani1
by Databricks MVP
  • 119 Views
  • 2 replies
  • 0 kudos

Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

 Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...

  • 119 Views
  • 2 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @Phani1 This is a meaty topic — let me give you a structured breakdown of the full purge/retention framework.Core framework: layer-by-layer policiesBronze — raw ingestion layer:The goal here is preserving source fidelity while enforcing legal/regu...

  • 0 kudos
1 More Replies
mits1
by New Contributor III
  • 449 Views
  • 11 replies
  • 3 kudos

Resolved! Autoloader inserts null rows in delta table while reading json file

Hi,I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).Code :spark.readStream...

  • 449 Views
  • 11 replies
  • 3 kudos
Latest Reply
karthickrs
New Contributor III
  • 3 kudos

Hi ,The extra rows could have been caused by various reasons:Extra files in the directoryEmpty or corrupt recordsNon-JSON content being picked up on the first runYou could make sure that your input path contains only valid JSON files or you could mod...

  • 3 kudos
10 More Replies
BF7
by Contributor
  • 1755 Views
  • 8 replies
  • 3 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?  How does that affect the behavior of cloud...

  • 1755 Views
  • 8 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

@mits1 , if you are happy with the answer please click on "Accept as Solution." It will give confidence to others.  Cheers, Lou.  

  • 3 kudos
7 More Replies
Eliza_geo
by New Contributor
  • 98 Views
  • 1 replies
  • 1 kudos

Databricks SQL Alerts and Jobs Integration: Legacy vs V2

Hi,I’m facing a challenge integrating Databricks SQL alerts with a Databricks Job.After reviewing the documentation, I arrived at the following understanding and would appreciate confirmation from the community or Databricks team:Legacy SQL alerts ca...

  • 98 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @Eliza_geo ,Given your Unity Catalog + serverless + DABS environment, here's what makes sense right now:For new pipelines: Stick to legacy SQL alerts invoked via a SQL task in Jobs. Avoid the SQL Alert task (Beta) in production workflows — Beta fe...

  • 1 kudos
abhishek13
by New Contributor
  • 88 Views
  • 2 replies
  • 1 kudos

Connection reset error from Databricks notebook but works via curl (GCP)

Hi everyone,I’m facing a connectivity issue in my Databricks workspace on GCP and would appreciate any guidance. ProblemWhen I run commands from a Databricks notebook, I see intermittent errors like:Connection reset Retrying request to https://us-eas...

  • 88 Views
  • 2 replies
  • 1 kudos
Latest Reply
abhishek13
New Contributor
  • 1 kudos

can someone help on this

  • 1 kudos
1 More Replies
MaartenH
by New Contributor III
  • 3830 Views
  • 12 replies
  • 4 kudos

Lakehouse federation for SQL server: database name with spaces

We're currently using lakehouse federation for various sources (Snowflake, SQL Server); usually succesful. However we've encountered a case where one of the databases on the SQL Server has spaces in its name, e.g. 'My Database Name'. We've tried vari...

  • 3830 Views
  • 12 replies
  • 4 kudos
Latest Reply
QueryingQuail
New Contributor III
  • 4 kudos

Hello all,We have a good amount of tables from an external ERP system that are being replicated to an existing dwh in an Azure SQL Server database.We have set up a foreign connection for this database and we can connect to the server and database. Sa...

  • 4 kudos
11 More Replies
Manjusha
by New Contributor II
  • 215 Views
  • 4 replies
  • 2 kudos

Running python functions (written using polars) on databricks

Hi,We are planning to re-write our application ( which was originally running in R) in python. We chose to use Polars as they seems to be faster than pandas. We have functions written in R which we are planning to convert to Python.However in one of ...

  • 215 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@Manjusha , if you are happy with my response please click on "Accept as Solution." It will help others trust the guidance.

  • 2 kudos
3 More Replies
sdurai
by New Contributor
  • 312 Views
  • 4 replies
  • 4 kudos

Resolved! Databricks to Salesforce Core (Not cloud)

Hi,Is there any native connector available to connect salesforce core (not cloud) in Databricks? If no native connector, what are all recommended approaches to connect to Salesforce coreThanks,Subashini

  • 312 Views
  • 4 replies
  • 4 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 4 kudos

Hi @sdurai, Yes. Databricks has a native Salesforce connector for core Salesforce (Sales Cloud / Service Cloud / Platform objects) via Lakeflow Connect - Salesforce ingestion connector. It lets you create fully managed, incremental pipelines from Sal...

  • 4 kudos
3 More Replies
P10d
by New Contributor
  • 172 Views
  • 1 replies
  • 0 kudos

Connect Databrick's cluster with Artifactory

Hello,I'm trying to connect databricks with an own JFrog Artifactory. The objective is to download both PIP/JAR dependencies from it instead of connecting to maven-central/PyPi. Im struggling with JAR's. My aproximation to solve the problem is:1. Cre...

  • 172 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, I haven't got the ability to test this myself but based on some internal research, I think the following is true: Hi, The most likely issue is your truststore configuration. Setting spark.driver.extraJavaOptions -Djavax.net.ssl.trustStore=<custom...

  • 0 kudos
mordex
by New Contributor III
  • 94 Views
  • 1 replies
  • 0 kudos

Databricks workflows for APIs with different frequencies (cluster keeps restarting)

  Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)Hey everyone,I’m stuck with a Databricks workflow design and could use some advice.Currently, we are calling 70+ APIs Right now the workflow looks something l...

  • 94 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 0 kudos

Hi @mordex This is a classic high-frequency orchestration problem on Databricks. The core issue is that Databricks job clusters are designed for batch workloads, not sub-5-minute polling loops. Job clusters have a ~3–5 min cold start. For a 1-min fre...

  • 0 kudos
bi123
by New Contributor
  • 143 Views
  • 3 replies
  • 0 kudos

How to import python modules in a notebook?

I have a job with a notebook task that utilizes python modules in another folder than the notebook itself. When I try to import the module in the notebook, it raises module not found error. I solved the problem using sys.pathBut I am curious if there...

image.png
  • 143 Views
  • 3 replies
  • 0 kudos
Latest Reply
shazi
New Contributor III
  • 0 kudos

While this is a classic way to solve this, it can sometimes be "brittle" if your folder structure changes or if you share the notebook with others who have different file paths. In modern notebook environments.

  • 0 kudos
2 More Replies
Labels