Topics with Label: Spark

Forum Posts

Sorted by:

by niruban • New Contributor II

3 weeks ago

333 Views
2 replies
0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering

CICD

DAB

Databricks Asset Bundle

DevOps

333 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

niruban
New Contributor II

3 weeks ago

0 kudos

@Rajani : This is what I am doing. I am having git actions to kick off which will run - name: bundle-deployrun: | cd ${{ vars.HOME }}/dev-ops/databricks_cicd_deployment databricks bundle deploy --debug Before running this step, I am creatin...

0 kudos

3 weeks ago

1 More Replies

by amde99 • New Contributor

a month ago

327 Views
2 replies
0 kudos

How can I throw an exception when a .json.gz file has multiple roots?

I have a situation where source files in .json.gz sometimes arrive with invalid syntax containing multiple roots separated by empty braces []. How can I detect this and thrown an exception? Currently the code runs and picks up only record set 1, and ...

Data Engineering

json

327 Views
2 replies
0 kudos

a month ago

View Replies

Latest Reply

Lakshay
Esteemed Contributor

3 weeks ago

0 kudos

Schema validation should help here.

0 kudos

3 weeks ago

1 More Replies

by Karlo_Kotarac • New Contributor II

3 weeks ago

307 Views
3 replies
0 kudos

Run failed with error message ContextNotFound

Hi all!Recently we've been getting lots of these errors when running Databricks notebooks:At that time we observed DRIVER_NOT_RESPONDING (Driver is up but is not responsive, likely due to GC.) log on the single-user cluster we use.Previously when thi...

Data Engineering

307 Views
3 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Lakshay
Esteemed Contributor

3 weeks ago

0 kudos

You may also try to run the failing notebook on the job cluster

0 kudos

3 weeks ago

2 More Replies

by Phani1 • Valued Contributor

3 weeks ago

189 Views
1 replies
0 kudos

Code Review tools

Could you kindly recommend any Code Review tools that would be suitable for our Databricks tech stack?

Data Engineering

code review

189 Views
1 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @Phani1, When it comes to code review tools for your Databricks tech stack, here are some options you might find useful: Built-in Interactive Debugger in Databricks Notebook: The interactive debugger is available exclusively for Python code withi...

0 kudos

3 weeks ago

by dilkushpatel • New Contributor II

4 weeks ago

339 Views
4 replies
0 kudos

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

I see two articles on databricks documentationshttps://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-pythonhttps://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal Polybase one is legacy o...

Data Engineering

azure

Copy

help

Polybase

Synapse

339 Views
4 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

3 weeks ago

0 kudos

Hi @dilkushpatel, Thank you for sharing your confusion regarding PolyBase and the COPY INTO command in Databricks when working with Azure Synapse. PolyBase (Legacy): PolyBase was previously used for data loading and unloading operations in Azure...

0 kudos

3 weeks ago

3 More Replies

by Abhi0607 • New Contributor II

3 weeks ago

271 Views
2 replies
0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

Data Engineering

271 Views
2 replies
0 kudos

3 weeks ago

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

3 weeks ago

0 kudos

Hi @Abhi0607 Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

0 kudos

3 weeks ago

1 More Replies

by PrebenOlsen • New Contributor III

4 weeks ago

253 Views
2 replies
0 kudos

Job stuck while utilizing all workers

Hi!Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.I can find one Failed Stage that reads ...

Data Engineering

job failed

Job froze

need help

253 Views
2 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

4 weeks ago

0 kudos

As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.The data is read when you apply an action (write f.e.).That being said: I have no knowledge of a bug in Databricks on clusters...

0 kudos

4 weeks ago

1 More Replies

by laurenskuiper97 • New Contributor

4 weeks ago

248 Views
1 replies
0 kudos

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Hi everybody,I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this co...

Data Engineering

clusters

JDBC

spark

SSH

248 Views
1 replies
0 kudos

4 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

4 weeks ago

0 kudos

I doubt it is possible.The driver runs the program, and sends tasks to the executors. But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

0 kudos

4 weeks ago

by surband • New Contributor III

04-10-2024 11:57:02 AM

823 Views
7 replies
1 kudos

Resolved! Failures Streaming data to Pulsar

I am encountering the following exception when attempting to stream data to a pulsar topic. This is a first time implementation - any ideas to resolve this is greatly appreciated.DBR: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)1 Driver64 GB...

Data Engineering

823 Views
7 replies
1 kudos

04-10-2024 11:57:02 AM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

04-10-2024 12:12:00 PM

1 kudos

Hi @surband - can you please share the full error stack trace. Also, please use the compatible DBR(Spark) version instead of ML runtime. Please refer to the below document and validate if you have the necessary connector libraries added to the clust...

1 kudos

04-10-2024 12:12:00 PM

6 More Replies

by mh_db • New Contributor II

04-10-2024 10:46:45 AM

480 Views
1 replies
0 kudos

Write to csv file in S3 bucket

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save itimport boto3import s3fsdf_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)but I keep getting thi...

Data Engineering

480 Views
1 replies
0 kudos

04-10-2024 10:46:45 AM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

04-10-2024 11:56:58 AM

0 kudos

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you ...

0 kudos

04-10-2024 11:56:58 AM

by brian_zavareh • New Contributor III

04-02-2024 2:48:58 PM

1720 Views
5 replies
4 kudos

Resolved! Optimizing Delta Live Table Ingestion Performance for Large JSON Datasets

I'm currently facing challenges with optimizing the performance of a Delta Live Table pipeline in Azure Databricks. The task involves ingesting over 10 TB of raw JSON log files from an Azure Data Lake Storage account into a bronze Delta Live Table la...

Data Engineering

autoloader

bigdata

delta-live-tables

json

1720 Views
5 replies
4 kudos

04-02-2024 2:48:58 PM

View Replies

Latest Reply

standup1
New Contributor III

04-09-2024 9:14:42 AM

4 kudos

Hey @brian_zavareh , see this document. I hope this can help.https://learn.microsoft.com/en-us/azure/databricks/compute/cluster-config-best-practicesJust keep in mind that there's some extra cost from Azure VM side, check your Azure Cost Analysis for...

4 kudos

04-09-2024 9:14:42 AM

4 More Replies

by Kibour • Contributor

03-13-2024 6:53:43 AM

998 Views
3 replies
1 kudos

Resolved! date_format 'LLLL' returns '1'

Hi all,In my notebook, when I run my cell with following code%sqlselect date_format(date '1970-01-01', "LLL");I get '1', while I expect 'Jan' according to the dochttps://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html I would also expect t...

Data Engineering

998 Views
3 replies
1 kudos

03-13-2024 6:53:43 AM

View Replies

Latest Reply

Kibour
Contributor

04-09-2024 2:36:10 AM

1 kudos

Hi @Kaniz ,Turns out it was actually a Java 8 bug:IllegalArgumentException: Java 8 has a bug to support stand-alone form (3 or more 'L' or 'q' in the pattern string). Please use 'M' or 'Q' instead, or upgrade your Java version. For more details, plea...

1 kudos

04-09-2024 2:36:10 AM

2 More Replies

by Erik_L • Contributor II

04-05-2024 10:29:47 AM

298 Views
2 replies
1 kudos

Visualizations failing to show

I have a SQL query that generates a table. I created a visualization from that table with the UI. I then have a widget that updates a value used in the query and re-runs the SQL, but then the visualization shows nothing, that there is "1 row," but if...

Data Engineering

298 Views
2 replies
1 kudos

04-05-2024 10:29:47 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-05-2024 1:13:35 PM

1 kudos

Hi @Erik_L , It seems like you’re encountering an issue with your visualization in Databricks. Let’s troubleshoot this! Here are a few common reasons why visualizations might not display as expected: Data Issues: Ensure that your SQL query is cor...

1 kudos

04-05-2024 1:13:35 PM

1 More Replies

by Henrique_Lino • New Contributor II

04-05-2024 7:53:04 AM

741 Views
6 replies
0 kudos

value is null after loading a saved df when using specific type in schema

I am facing an issue when using databricks, when I set a specific type in my schema and read a json, its values are fine, but after saving my df and loading again, the value is gone.I have this sample code that shows this issue: from pyspark.sql.typ...

Data Engineering

741 Views
6 replies
0 kudos

04-05-2024 7:53:04 AM

View Replies

Latest Reply

Lakshay
Esteemed Contributor

04-05-2024 9:51:32 AM

0 kudos

@Henrique_Lino , Where are you saving your df?

0 kudos

04-05-2024 9:51:32 AM

5 More Replies

by Kavi_007 • New Contributor III

04-01-2024 12:44:21 PM

1236 Views
7 replies
1 kudos

Resolved! Seeing history even after vacuuming the Delta table

Hi,I'm trying to do the vacuum on a Delta table within a unity catalog. The default retention is 7 days. Though I vacuum the table, I'm able to see the history beyond 7 days. Tried restarting the cluster but still not working. What would be the fix ?...

Data Engineering

1236 Views
7 replies
1 kudos

04-01-2024 12:44:21 PM

View Replies

Latest Reply

Kavi_007
New Contributor III

04-04-2024 12:06:25 PM

1 kudos

No, that's wrong. VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.VACUUM - Azu...

1 kudos

04-04-2024 12:06:25 PM

6 More Replies