Data Engineering

Forum Posts

Sorted by:

by LeoRickli • New Contributor III

09-18-2024 9:31:08 AM

857 Views
1 replies
0 kudos

Different GCP Service Account for cluster (compute) creation?

I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for isolation purpose...

Data Engineering

857 Views
1 replies
0 kudos

09-18-2024 9:31:08 AM

View Replies

Latest Reply

jennie258fitz
New Contributor III

09-18-2024 9:43:24 PM

0 kudos

@LeoRickli wrote:I have a Databricks workspace that is attached to a GCP Service Account from a project named "random-production-data". I want to create a cluster (compute) on Databricks that uses a different Service Account from another project for ...

0 kudos

09-18-2024 9:43:24 PM

by monojmckvie • New Contributor II

09-18-2024 11:57:53 AM

876 Views
1 replies
0 kudos

Databricks Workflow File Based Trigger

Hi All,Is there any way to define multiple paths in file arrival trigger setting for Databricks job?For a single path it's working fine.

Data Engineering

876 Views
1 replies
0 kudos

09-18-2024 11:57:53 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-18-2024 12:50:39 PM

0 kudos

Hi @monojmckvie ,You can specify only 1 path as per documentation:https://docs.databricks.com/en/jobs/file-arrival-triggers.html

0 kudos

09-18-2024 12:50:39 PM

by hyedesign • New Contributor II

04-01-2024 10:58:31 AM

5488 Views
6 replies
0 kudos

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

Hello, I am trying to write a simple upsert statement following the steps in the tutorials. here is what my code looks like:from pyspark.sql import functions as Fdef upsert_source_one(self df_source = spark.readStream.format("delta").table(self.so...

Data Engineering

pyspark

5488 Views
6 replies
0 kudos

04-01-2024 10:58:31 AM

View Replies

Latest Reply

seans
New Contributor III

09-18-2024 9:07:50 AM

0 kudos

Here is the full message Exception has occurred: SparkConnectGrpcException (java.io.IOException) Connection reset by peer grpc._channel._MultiThreadedRendezvous: _MultiThreadedRendezvous of RPC that terminated with: status = StatusCode.INTERNAL deta...

0 kudos

09-18-2024 9:07:50 AM

5 More Replies

by brianbraunstein • New Contributor II

05-02-2024 7:51:39 AM

2472 Views
2 replies
0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

Data Engineering

2472 Views
2 replies
0 kudos

05-02-2024 7:51:39 AM

View Replies

Latest Reply

adriennn
Valued Contributor

09-18-2024 8:53:50 AM

0 kudos

It's working again in 15.4 LTS

0 kudos

09-18-2024 8:53:50 AM

1 More Replies

by Hubert-Dudek • Databricks MVP

10-11-2023 1:26:31 PM

2993 Views
2 replies
2 kudos

foreachBatch

With parameterized SQL queries in Structured Streaming's foreachBatch, there's no longer a need to create temp views for the MERGE command.

Data Engineering

2993 Views
2 replies
2 kudos

10-11-2023 1:26:31 PM

View Replies

Latest Reply

adriennn
Valued Contributor

09-18-2024 8:52:22 AM

2 kudos

Note that this functionality broke somewhere between DBR 13.3 and 15, so best bet is 15.4 LTSSolved: Parameterized spark.sql() not working - Databricks Community - 56510

2 kudos

09-18-2024 8:52:22 AM

1 More Replies

by ekdz__ • Databricks Partner

06-28-2022 3:06:53 AM

8235 Views
5 replies
10 kudos

Is there any way to save the notebook in the "Results Only" view?

Hi! I'm looking for a solution to save a notebook in HTML format that has the "Results Only" view (without the executed code). Is there any possibility to do that?Thank you

Data Engineering

8235 Views
5 replies
10 kudos

06-28-2022 3:06:53 AM

View Replies

Latest Reply

Hubert-Dudek
Databricks MVP

06-28-2022 3:57:19 AM

10 kudos

Use option "+New dashboard" in the top menu (picture icon). Add results there (use display() in code to show data), and then you can export the dashboard to HTML.

10 kudos

06-28-2022 3:57:19 AM

4 More Replies

by jonyvp • Databricks Partner

09-18-2024 3:16:36 AM

5804 Views
7 replies
4 kudos

Resolved! Databricks Asset Bundles complex variable for cluster configuration substitute

Using this page of the DAB docs, I tried to substitute cluster configuration by a variable. That way, I want to predefine different job cluster configurations. Doing exactly what is used in the docs yields this error:Error: failed to load [...]/speci...

Data Engineering

5804 Views
7 replies
4 kudos

09-18-2024 3:16:36 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-18-2024 5:42:14 AM

4 kudos

Amazing! Great!

4 kudos

09-18-2024 5:42:14 AM

6 More Replies

by NC • New Contributor III

09-18-2024 2:28:13 AM

1675 Views
1 replies
1 kudos

Logging in Databricks

Hi All,I am trying to create a log using python logging package.Is this allowed in databricks and any sample working code that you can share?Thank you for your guidance.Best regards,NC

Data Engineering

1675 Views
1 replies
1 kudos

09-18-2024 2:28:13 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-18-2024 4:18:17 AM

1 kudos

Hi @NC ,I recommend below video that guides you how to use logging in databricks:Introduction to Logging and Quality Control in Databricks (youtube.com)

1 kudos

09-18-2024 4:18:17 AM

by joshbuttler • New Contributor II

09-18-2024 3:15:35 AM

1227 Views
1 replies
1 kudos

Seeking Advice on Data Lakehouse Architecture with Databricks

I'm currently designing a data lakehouse architecture using Databricks and have a few questions. What are the best practices for efficiently ingesting both batch and streaming data into Delta Lake? Any recommended tools or approaches?

Data Engineering

1227 Views
1 replies
1 kudos

09-18-2024 3:15:35 AM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

09-18-2024 4:01:28 AM

1 kudos

Hi @joshbuttler,I think the best way is to use auto loader, which provides a highly efficient way to incrementally process new data, while also guaranteeing each file is processed exactly once.It supports ingestion in a batch mode (Trigger.Available...

1 kudos

09-18-2024 4:01:28 AM

by Faiyaz17 • New Contributor

09-18-2024 1:00:56 AM

2221 Views
1 replies
1 kudos

Best Practices as a Beginner

Hello everyone,I am working on a project where I need to conduct some analysis on a dataset with 1 billion rows. I extracted the Parquet file from Azure and saved it onto the DBFS. Every time I want to run SQL queries and do preprocessing/analysis, I...

Data Engineering

2221 Views
1 replies
1 kudos

09-18-2024 1:00:56 AM

View Replies

Latest Reply

jennie258fitz
New Contributor III

09-18-2024 1:53:32 AM

1 kudos

@Faiyaz17 wrote:Hello everyone,I am working on a project where I need to conduct some analysis on a dataset with 1 billion rows. I extracted the Parquet file from Azure and saved it onto the DBFS. Every time I want to run SQL queries and do preproces...

1 kudos

09-18-2024 1:53:32 AM

by jpwp • New Contributor III

01-09-2022 4:12:53 PM

30506 Views
9 replies
9 kudos

Resolved! How to specify entry_point for python_wheel_task?

Can someone provide me an example for a python_wheel_task and what the entry_point field should be?The jobs UI help popup says this about "entry_point":"Function to call when starting the wheel, for example: main. If the entry point does not exist in...

Data Engineering

30506 Views
9 replies
9 kudos

01-09-2022 4:12:53 PM

View Replies

Latest Reply

MRMintechGlobal
New Contributor II

09-18-2024 1:30:14 AM

9 kudos

Just want to confirm - my project uses PDM not poetryand as such uses[project.entry-points.packages]Rather than[tool.poetry.scripts]and the bundle is failing to run on the cluster - as it can't find the entry point - is this expected behavior?

9 kudos

09-18-2024 1:30:14 AM

8 More Replies

by ggsmith • Contributor

09-17-2024 10:34:38 AM

2817 Views
1 replies
2 kudos

Resolved! DLT create_table vs create_streaming_table

What is the difference between the create_table and create_streaming_table functions in dlt?For example, this is how I have created a table that streams data from kafka written as json files to a volume. @Dlt.table( name="raw_orders", table_...

Data Engineering

dlt

2817 Views
1 replies
2 kudos

09-17-2024 10:34:38 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-17-2024 11:37:10 AM

2 kudos

Hi @ggsmith ,If you check the examples, you will notice that dlt.create_streaming_table is more specialized and you may consider it to be your target.As per documentation:Check this example:https://www.reddit.com/r/databricks/comments/1b9jg3t/dedupin...

2 kudos

09-17-2024 11:37:10 AM

by filipniziol • Esteemed Contributor

09-17-2024 11:36:50 AM

528 Views
0 replies
0 kudos

[Filter: SPAM] filipniziol's post body matched "sugar", board "data-engineering".

[Filter: SPAM] filipniziol's post body matched "sugar", board "data-engineering". Post Subject: Re: DLT create_table vs create_streaming_table Post Body: Hi @ggsmith , If you check the examples, you will notice that dlt.create_streaming_table is ...

Data Engineering

528 Views
0 replies
0 kudos

09-17-2024 11:36:50 AM

by Vishalakshi • New Contributor II

09-08-2024 7:53:19 AM

17131 Views
5 replies
0 kudos

Need to automatically rerun the failed jobs in databricks

Hi all, I need to retrigger the failed jobs automatically in data bricks, can you please help me with all the possible ways to make it possible

Data Engineering

17131 Views
5 replies
0 kudos

09-08-2024 7:53:19 AM

View Replies

Latest Reply

filipniziol
Esteemed Contributor

09-17-2024 9:19:03 AM

0 kudos

Hi @Vishalakshi ,I have responded during the weekend, but it seems the responses were lost.You have here the run object. For example the current criteria is to return only runs where run[state][result_state] == "FAILED" so basically all failed jobs.W...

0 kudos

09-17-2024 9:19:03 AM

4 More Replies

by cadull • New Contributor II

09-10-2024 8:12:10 AM

1957 Views
2 replies
1 kudos

Permission Issue with IDENTIFIER clause

Hi all,we are parameterizing environment specific catalog names (like `mycatalog_dev` vs. `mycatalog_prd`) in Lakeview dashboard queries like this:SELECT * FROM IDENTIFIER(:catalog_name || '.myschema.mytable')Which works fine in most cases. We have o...

Data Engineering

1957 Views
2 replies
1 kudos

09-10-2024 8:12:10 AM

View Replies

Latest Reply

madams
Contributor III

09-13-2024 10:08:04 AM

1 kudos

I've had quite a bit of fun with UC and view permissions. I don't think this is specific to using the IDENTIFIER() function, but I suspect it's related to UC permissions. What you'll need to ensure:The user or group who owns the view on catalog_b h...

1 kudos

09-13-2024 10:08:04 AM

1 More Replies

Databricks Community

Forum Posts

Different GCP Service Account for cluster (compute) creation?

Databricks Workflow File Based Trigger

Getting SparkConnectGrpcException: (java.io.EOFException) error when using foreachBatch

spark.sql not supporting kwargs as documented

foreachBatch

Is there any way to save the notebook in the "Results Only" view?

Resolved! Databricks Asset Bundles complex variable for cluster configuration substitute

Logging in Databricks

Seeking Advice on Data Lakehouse Architecture with Databricks

Best Practices as a Beginner

Resolved! How to specify entry_point for python_wheel_task?

Resolved! DLT create_table vs create_streaming_table

[Filter: SPAM] filipniziol's post body matched "sugar", board "data-engineering".

Need to automatically rerun the failed jobs in databricks

Permission Issue with IDENTIFIER clause

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template