cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

niruban
by New Contributor II
  • 333 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering
CICD
DAB
Databricks Asset Bundle
DevOps
  • 333 Views
  • 2 replies
  • 0 kudos
Latest Reply
niruban
New Contributor II
  • 0 kudos

@Rajani : This is what I am doing. I am having git actions to kick off which will run - name: bundle-deployrun: |      cd ${{ vars.HOME }}/dev-ops/databricks_cicd_deployment      databricks bundle deploy --debug Before running this step, I am creatin...

  • 0 kudos
1 More Replies
amde99
by New Contributor
  • 327 Views
  • 2 replies
  • 0 kudos

How can I throw an exception when a .json.gz file has multiple roots?

I have a situation where source files in .json.gz sometimes arrive with invalid syntax containing multiple roots separated by empty braces []. How can I detect this and thrown an exception? Currently the code runs and picks up only record set 1, and ...

  • 327 Views
  • 2 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

Schema validation should help here.

  • 0 kudos
1 More Replies
Karlo_Kotarac
by New Contributor II
  • 307 Views
  • 3 replies
  • 0 kudos

Run failed with error message ContextNotFound

Hi all!Recently we've been getting lots of these errors when running Databricks notebooks:At that time we observed DRIVER_NOT_RESPONDING (Driver is up but is not responsive, likely due to GC.) log on the single-user cluster we use.Previously when thi...

Karlo_Kotarac_0-1713422302017.png
  • 307 Views
  • 3 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

You may also try to run the failing notebook on the job cluster

  • 0 kudos
2 More Replies
Phani1
by Valued Contributor
  • 189 Views
  • 1 replies
  • 0 kudos

Code Review tools

Could you kindly recommend any Code Review tools that would be suitable for our Databricks tech stack?

Data Engineering
code review
  • 189 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, When it comes to code review tools for your Databricks tech stack, here are some options you might find useful: Built-in Interactive Debugger in Databricks Notebook: The interactive debugger is available exclusively for Python code withi...

  • 0 kudos
dilkushpatel
by New Contributor II
  • 339 Views
  • 4 replies
  • 0 kudos

Databricks connecting SQL Azure DW - Confused between Polybase and Copy Into

I see two articles on databricks documentationshttps://docs.databricks.com/en/archive/azure/synapse-polybase.html#language-pythonhttps://docs.databricks.com/en/connect/external-systems/synapse-analytics.html#service-principal Polybase one is legacy o...

Data Engineering
azure
Copy
help
Polybase
Synapse
  • 339 Views
  • 4 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @dilkushpatel, Thank you for sharing your confusion regarding PolyBase and the COPY INTO command in Databricks when working with Azure Synapse.  PolyBase (Legacy): PolyBase was previously used for data loading and unloading operations in Azure...

  • 0 kudos
3 More Replies
Abhi0607
by New Contributor II
  • 271 Views
  • 2 replies
  • 0 kudos

Variables passed from ADF to Databricks Notebook Try-Catch are not accessible

Dear Members,I need your help in below scenario.I am passing few parameters from ADF pipeline to Databricks notebook.If I execute ADF pipeline to run my databricks notebook and use these variables as is in my code (python) then it works fine.But as s...

  • 271 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Abhi0607  Can you please help me to find if you are taking or defining these parameter value outside try catch or inside it ?

  • 0 kudos
1 More Replies
PrebenOlsen
by New Contributor III
  • 253 Views
  • 2 replies
  • 0 kudos

Job stuck while utilizing all workers

Hi!Started a job yesterday. It was iterating over data, 2-months at a time, and writing to a table. It was successfully doing this for 4 out of 6 time periods. The 5th time period however, got stuck, 5 hours in.I can find one Failed Stage that reads ...

Data Engineering
job failed
Job froze
need help
  • 253 Views
  • 2 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

As Spark is lazy evaluated, using only small clusters for read and large ones for writes is not something that will happen.The data is read when you apply an action (write f.e.).That being said:  I have no knowledge of a bug in Databricks on clusters...

  • 0 kudos
1 More Replies
laurenskuiper97
by New Contributor
  • 248 Views
  • 1 replies
  • 0 kudos

JDBC / SSH-tunnel to connect to PostgreSQL not working on multi-node clusters

Hi everybody,I'm trying to setup a connection between Databricks' Notebooks and an external PostgreSQL database through a SSH-tunnel. On a single-node cluster, this is working perfectly fine. However, when this is ran on a multi-node cluster, this co...

Data Engineering
clusters
JDBC
spark
SSH
  • 248 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

I doubt it is possible.The driver runs the program, and sends tasks to the executors.  But since creating the ssh tunnel is no spark task, I don't think it will be established on any executor.

  • 0 kudos
surband
by New Contributor III
  • 823 Views
  • 7 replies
  • 1 kudos

Resolved! Failures Streaming data to Pulsar

I am encountering the following exception when attempting to stream data to a pulsar topic. This is a first time implementation - any ideas to resolve this is greatly appreciated.DBR: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)1 Driver64 GB...

  • 823 Views
  • 7 replies
  • 1 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 1 kudos

Hi @surband  - can you please share the full error stack trace. Also, please use the compatible DBR(Spark) version instead of ML runtime. Please refer to the below document and validate if you have the necessary connector libraries added to the clust...

  • 1 kudos
6 More Replies
mh_db
by New Contributor II
  • 480 Views
  • 1 replies
  • 0 kudos

Write to csv file in S3 bucket

I have a pandas dataframe in my Pyspark notebook. I want to save this dataframe to my S3 bucket. I'm using the following command to save itimport boto3import s3fsdf_summary.to_csv(f"s3://dataconversion/data/exclude",index=False)but I keep getting thi...

  • 480 Views
  • 1 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

Hi @mh_db - you can import botocore library (or) if it is not found can do a pip install botocore to resolve this. Alternatively, you can maintain the data in a spark dataframe without converting to a pandas dataframe and while writing to a csv. you ...

  • 0 kudos
brian_zavareh
by New Contributor III
  • 1720 Views
  • 5 replies
  • 4 kudos

Resolved! Optimizing Delta Live Table Ingestion Performance for Large JSON Datasets

I'm currently facing challenges with optimizing the performance of a Delta Live Table pipeline in Azure Databricks. The task involves ingesting over 10 TB of raw JSON log files from an Azure Data Lake Storage account into a bronze Delta Live Table la...

Data Engineering
autoloader
bigdata
delta-live-tables
json
  • 1720 Views
  • 5 replies
  • 4 kudos
Latest Reply
standup1
New Contributor III
  • 4 kudos

Hey @brian_zavareh , see this document. I hope this can help.https://learn.microsoft.com/en-us/azure/databricks/compute/cluster-config-best-practicesJust keep in mind that there's some extra cost from Azure VM side, check your Azure Cost Analysis for...

  • 4 kudos
4 More Replies
Kibour
by Contributor
  • 998 Views
  • 3 replies
  • 1 kudos

Resolved! date_format 'LLLL' returns '1'

Hi all,In my notebook, when I run my cell with following code%sqlselect date_format(date '1970-01-01', "LLL");I get '1', while I expect 'Jan' according to the dochttps://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html I would also expect t...

  • 998 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kibour
Contributor
  • 1 kudos

Hi @Kaniz ,Turns out it was actually a Java 8 bug:IllegalArgumentException: Java 8 has a bug to support stand-alone form (3 or more 'L' or 'q' in the pattern string). Please use 'M' or 'Q' instead, or upgrade your Java version. For more details, plea...

  • 1 kudos
2 More Replies
Erik_L
by Contributor II
  • 298 Views
  • 2 replies
  • 1 kudos

Visualizations failing to show

I have a SQL query that generates a table. I created a visualization from that table with the UI. I then have a widget that updates a value used in the query and re-runs the SQL, but then the visualization shows nothing, that there is "1 row," but if...

Screenshot from 2024-04-05 10-23-03.png
  • 298 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Erik_L , It seems like you’re encountering an issue with your visualization in Databricks. Let’s troubleshoot this! Here are a few common reasons why visualizations might not display as expected: Data Issues: Ensure that your SQL query is cor...

  • 1 kudos
1 More Replies
Henrique_Lino
by New Contributor II
  • 741 Views
  • 6 replies
  • 0 kudos

value is null after loading a saved df when using specific type in schema

 I am facing an issue when using databricks, when I set a specific type in my schema and read a json, its values are fine, but after saving my df and loading again, the value is gone.I have this sample code that shows this issue: from pyspark.sql.typ...

  • 741 Views
  • 6 replies
  • 0 kudos
Latest Reply
Lakshay
Esteemed Contributor
  • 0 kudos

@Henrique_Lino , Where are you saving your df?

  • 0 kudos
5 More Replies
Kavi_007
by New Contributor III
  • 1236 Views
  • 7 replies
  • 1 kudos

Resolved! Seeing history even after vacuuming the Delta table

Hi,I'm trying to do the vacuum on a Delta table within a unity catalog. The default retention is 7 days. Though I vacuum the table, I'm able to see the history beyond 7 days. Tried restarting the cluster but still not working. What would be the fix ?...

  • 1236 Views
  • 7 replies
  • 1 kudos
Latest Reply
Kavi_007
New Contributor III
  • 1 kudos

No, that's wrong. VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.VACUUM - Azu...

  • 1 kudos
6 More Replies
Labels