cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NW1000
by New Contributor III
  • 996 Views
  • 6 replies
  • 0 kudos

Shorten Classic Cluster start up time

We use R notebooks to generate workflow. Thus we have to use classic clusters. And we need roughly 10 additional R packages in addition to 2 pyPI packages. It takes at least 10-20 min to start the cluster. We found the most time taken were the packag...

  • 996 Views
  • 6 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hi @NW1000 , Glad you tried my suggestion, and thanks for sharing the details. 1. Why the init script failed This message: Init script failure: Cluster scoped init script ... failed: Script exit status is non-zero really just means that something ins...

  • 0 kudos
5 More Replies
ChristianRRL
by Honored Contributor
  • 1187 Views
  • 4 replies
  • 3 kudos

Resolved! Passing Parameters *between* Workflow run_job steps

Hi there, I'm trying to reference a task value - let's call it `output_path` (not known until programmatically generated by the code) - that is created in a nested task (Child 1) within a run_job (Parent 1) as an input parameter - let's call it `inpu...

  • 1187 Views
  • 4 replies
  • 3 kudos
Latest Reply
ChristianRRL
Honored Contributor
  • 3 kudos

Quick update, my question effectively boils down to:Do databricks workflows have "global" variables that can be set programmatically from anywhere in the workflow (e.g. nested notebook task inside a parent run_job task) during runtime and be referenc...

  • 3 kudos
3 More Replies
AanchalSoni
by Databricks Partner
  • 1152 Views
  • 6 replies
  • 1 kudos

Resolved! Unable to read files using Auto Loader

Hi!I'm trying to create an ETL pipeline. It reads data from a UC volume, however, Databricks is not allowing me to do so. The following error is generated:AnalysisException: [RequestId=a11e017b-61db-4c30-a03a-d7cce55e5aea ErrorClass=INVALID_PARAMETER...

  • 1152 Views
  • 6 replies
  • 1 kudos
Latest Reply
balajij8
Contributor III
  • 1 kudos

@AanchalSoni You can update code to use the volumes already available or create the volumes (volume1 & volume2) and use below for auto loader on json files df = (spark.readStream    .format("cloudFiles")    .option("cloudFiles.format", "json")    .op...

  • 1 kudos
5 More Replies
MaartenH
by New Contributor III
  • 5494 Views
  • 13 replies
  • 5 kudos

Lakehouse federation for SQL server: database name with spaces

We're currently using lakehouse federation for various sources (Snowflake, SQL Server); usually succesful. However we've encountered a case where one of the databases on the SQL Server has spaces in its name, e.g. 'My Database Name'. We've tried vari...

  • 5494 Views
  • 13 replies
  • 5 kudos
Latest Reply
QueryingQuail
New Contributor III
  • 5 kudos

Hello all,We have a good amount of tables from an external ERP system that are being replicated to an existing dwh in an Azure SQL Server database.We have set up a foreign connection for this database and we can connect to the server and database. Sa...

  • 5 kudos
12 More Replies
helius_205
by New Contributor II
  • 444 Views
  • 1 replies
  • 0 kudos

Resolved! Does a delta live table automatically perform increments without needing timestamp columns?

The code : import dltfrom pyspark.sql.functions import col@dlt.table(    name="silver_customers",    comment="Cleaned customers data from bronze")@dlt.expect("valid_email", "email IS NOT NULL")@dlt.expect("valid_customer_id", "customer_id IS NOT NULL...

  • 444 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Esteemed Contributor
  • 0 kudos

@helius_205 I doubt, do check the execution mode ~ should be triggered. Also it's a normal read instead of readStream. Read Docs for better understanding.

  • 0 kudos
AngelShrestha
by Databricks Partner
  • 711 Views
  • 5 replies
  • 2 kudos

Error updating schema: SCHEMA_FOREIGN_SQLSERVER update_mask requirement.

What I tried:Updating the description via UI (AI Suggested Description / manual editI’m running into an issue while trying to update the description for the schema.Context:Type: SCHEMA_FOREIGN_SQLSERVERError message:Failed to save description. Please...

  • 711 Views
  • 5 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, Yes 100%, if you use Lakeflow connect, it will ingest the data and they will become managed tables. Which will support the descriptions and comments. You should also get some query improvement as you're actually moving the data rather than queryi...

  • 2 kudos
4 More Replies
HamzaJosh
by New Contributor II
  • 18738 Views
  • 7 replies
  • 3 kudos

I want to use databricks workers to run a function in parallel on the worker nodes

I have a function making api calls. I want to run this function in parallel so I can use the workers in databricks clusters to run it in parallel. I have tried with ThreadPoolExecutor() as executor: results = executor.map(getspeeddata, alist)to run m...

  • 18738 Views
  • 7 replies
  • 3 kudos
Latest Reply
HamzaJosh
New Contributor II
  • 3 kudos

You guys are not getting the point, I am making API calls in a function and want to store the results in a dataframe. I want multiple processes to run this task in parallel. How do I create a UDF and use it in a dataframe when the task is calling an ...

  • 3 kudos
6 More Replies
ittzzmalind
by New Contributor III
  • 752 Views
  • 1 replies
  • 1 kudos

Resolved! Delta Sharing with Materialized View - recepient data not refreshing when using Open Protocol

Scenario: Delta Sharing with Materialized ViewProvider Side Setup :->A Delta Share was created.->A materialized view was added to the share.->Recipients Created-> 1). Open Delta Sharing recipient       Accessed using Python (import delta_sharing)->2)...

  • 752 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ittzzmalind, This is expected behaviour and is mainly due to how Delta Sharing handles materialized views for open (non-Databricks) recipients versus Databricks-to-Databricks recipients. For Databricks-to-Databricks recipients, the shared materia...

  • 1 kudos
David_Dabbs
by Databricks Partner
  • 962 Views
  • 3 replies
  • 1 kudos

Resolved! Inquiring whether table triggers are the recommended tool for the job

Seeking the DBRX-appropriate patterns for our application. We have a number of workspaces governed by the same Unity Catalog. One workspace we'll call the 'producer'. It manages data via external custom API interfaces.There are a number of internal c...

  • 962 Views
  • 3 replies
  • 1 kudos
Latest Reply
David_Dabbs
Databricks Partner
  • 1 kudos

Thank you @lingareddy_Alva. Your considered response lives up to your forum title: Esteemed Contributor. 1. Notification. Appreciate the confirmation that the determinism and control is worth the small bit of explicit configuration given the limited ...

  • 1 kudos
2 More Replies
R_Chaitanya
by New Contributor
  • 596 Views
  • 1 replies
  • 1 kudos
  • 596 Views
  • 1 replies
  • 1 kudos
Latest Reply
Sumit_7
Esteemed Contributor
  • 1 kudos

@R_Chaitanya Check the course status first, maybe a lesson is in progress.Follow: User Menu >> My Activities >> CoursesThen check if all are completed or not.

  • 1 kudos
mits1
by New Contributor III
  • 4163 Views
  • 11 replies
  • 3 kudos

Resolved! Autoloader inserts null rows in delta table while reading json file

Hi,I am exploring Schema inference and Schema evolution using Autoloader.I am reading a single line json file and writing in a delta table which does not exist already (creating it on the fly), using pyspark (below is the code).Code :spark.readStream...

  • 4163 Views
  • 11 replies
  • 3 kudos
Latest Reply
karthickrs
New Contributor III
  • 3 kudos

Hi ,The extra rows could have been caused by various reasons:Extra files in the directoryEmpty or corrupt recordsNon-JSON content being picked up on the first runYou could make sure that your input path contains only valid JSON files or you could mod...

  • 3 kudos
10 More Replies
BF7
by Contributor
  • 3074 Views
  • 8 replies
  • 3 kudos

Resolved! Using cloudFiles.inferColumnTypes with inferSchema and without defining schema checkpoint

Two Issues:1. What is the behavior of cloudFiles.inferColumnTypes with and without cloudFiles.inferSchema? Why would you use both?2. When can cloudFiles.inferColumnTypes be used without a schema checkpoint?  How does that affect the behavior of cloud...

  • 3074 Views
  • 8 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

@mits1 , if you are happy with the answer please click on "Accept as Solution." It will give confidence to others.  Cheers, Lou.  

  • 3 kudos
7 More Replies
Eliza_geo
by New Contributor II
  • 775 Views
  • 1 replies
  • 1 kudos

Databricks SQL Alerts and Jobs Integration: Legacy vs V2

Hi,I’m facing a challenge integrating Databricks SQL alerts with a Databricks Job.After reviewing the documentation, I arrived at the following understanding and would appreciate confirmation from the community or Databricks team:Legacy SQL alerts ca...

  • 775 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @Eliza_geo ,Given your Unity Catalog + serverless + DABS environment, here's what makes sense right now:For new pipelines: Stick to legacy SQL alerts invoked via a SQL task in Jobs. Avoid the SQL Alert task (Beta) in production workflows — Beta fe...

  • 1 kudos
abhishek13
by New Contributor II
  • 508 Views
  • 2 replies
  • 1 kudos

Connection reset error from Databricks notebook but works via curl (GCP)

Hi everyone,I’m facing a connectivity issue in my Databricks workspace on GCP and would appreciate any guidance. ProblemWhen I run commands from a Databricks notebook, I see intermittent errors like:Connection reset Retrying request to https://us-eas...

  • 508 Views
  • 2 replies
  • 1 kudos
Latest Reply
abhishek13
New Contributor II
  • 1 kudos

can someone help on this

  • 1 kudos
1 More Replies
Manjusha
by New Contributor II
  • 1120 Views
  • 4 replies
  • 2 kudos

Running python functions (written using polars) on databricks

Hi,We are planning to re-write our application ( which was originally running in R) in python. We chose to use Polars as they seems to be faster than pandas. We have functions written in R which we are planning to convert to Python.However in one of ...

  • 1120 Views
  • 4 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

@Manjusha , if you are happy with my response please click on "Accept as Solution." It will help others trust the guidance.

  • 2 kudos
3 More Replies
Labels