cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Mohit_m
by Valued Contributor II
  • 4010 Views
  • 2 replies
  • 3 kudos

Resolved! Could not initialize class error

User is running a job triggered from ADF in Databricks. In this job they need to use custom libraries that are in jars. Most of the times jobs are running fine, however sometimes it fails with:java.lang.NoClassDefFoundError: Could not initializeAny s...

  • 4010 Views
  • 2 replies
  • 3 kudos
Latest Reply
Mohit_m
Valued Contributor II
  • 3 kudos

Can you please check if there are more than one jar containing this class . If multiple jars of the same type are available on the cluster, then there is no guarantee of JVM picking the proper classes for processing, which results in the intermittent...

  • 3 kudos
1 More Replies
Jorge3
by New Contributor III
  • 2158 Views
  • 3 replies
  • 2 kudos

Resolved! [Databricks Assets Bundles] Workflow trigger on file arrival

Hi everyone!I'm setting up a workflow using Databricks Assets Bundles (DABs). And I want to configure my workflow to be trigger on file arrival. However all the examples I've found in the documentation use schedule triggers. Does anyone know if it is...

  • 2158 Views
  • 3 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi @Jorge3 Yes, you can use continues mode also.Please find syntax below - resources: jobs: dbx_job: name: continuous_job_name continuous: pause_status: UNPAUSED queue: enabled: true

  • 2 kudos
2 More Replies
ismaelhenzel
by Contributor
  • 2145 Views
  • 1 replies
  • 1 kudos

Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken p...

Data Engineering
bunlde
CICD
Databricks
  • 2145 Views
  • 1 replies
  • 1 kudos
Latest Reply
ismaelhenzel
Contributor
  • 1 kudos

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to imp...

  • 1 kudos
smedegaard
by New Contributor III
  • 1397 Views
  • 1 replies
  • 1 kudos

[delta live tabel] exception: getPrimaryKeys not implemented for debezium

I've defined a streaming deltlive table in a notebook using python.running on "preview" channeldelta cache accelerated (Standard_D4ads_v5) computeIt fails withorg.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, ru...

  • 1397 Views
  • 1 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
I've defined a streaming deltlive table in a notebook using python.running on "preview" channeldelta cache accelerated (Standard_D4ads_v5) computeIt fails withorg.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, ru...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.
ETLdeveloper
by New Contributor II
  • 3051 Views
  • 1 replies
  • 0 kudos

Resolved! I have to run the notebook in concurrently using process pool executor in python

Hello All,My scenario required me to create a code that reads tables from the source catalog and writes them to the destination catalog using Spark. Doing one by one is not a good option when there are 300 tables in the catalog. So I am trying the pr...

  • 3051 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @ETLdeveloper You can use the multithreading that help you to run notebook in parallel.Attaching code for your reference - from concurrent.futures import ThreadPoolExecutor class NotebookData: def __init__(self, path, timeout, parameters = Non...

  • 0 kudos
Anske
by New Contributor III
  • 1253 Views
  • 4 replies
  • 0 kudos

how to stop dataframe with federated table source to be reevaluated when referenced (cache?)

Hi,Would anyone happen to know whether it's possible to cache a dataframe in memory that the result of a query on a federated table?I have a notebook that queries a federated table, does some transformations on the dataframe and then writes this data...

  • 1253 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anske
New Contributor III
  • 0 kudos

@daniel_sahal , this is the code snippet:lsn_incr_batch = spark.sql(f"""select start_lsn,tran_begin_time,tran_end_time,tran_id,tran_begin_lsn,cast('{current_run_ts}' as timestamp) as appendedfrom externaldb.cdc.lsn_time_mappingwhere tran_end_time > '...

  • 0 kudos
3 More Replies
amar1995
by New Contributor II
  • 2482 Views
  • 4 replies
  • 0 kudos

Performance Issue with XML Processing in Spark Databricks

I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...

  • 2482 Views
  • 4 replies
  • 0 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 0 kudos

@amar1995 - Can you try this streaming approach and see if it works for your use case (using autoloader) - https://kb.databricks.com/streaming/stream-xml-auto-loader

  • 0 kudos
3 More Replies
johnp
by New Contributor III
  • 1561 Views
  • 1 replies
  • 0 kudos

Call databricks notebook from azure flask app

I have an Azure web app running flask web server.  From flask server, I want to run some queries on the data  stored in ADLS Gen2 storage.   I already created Databricks notebooks running these queries.  The flask server will pass some parameters in ...

  • 1561 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Honored Contributor
  • 0 kudos

you can use databricks SDKhttps://docs.databricks.com/en/dev-tools/sdk-python.html#create-a-job 

  • 0 kudos
Kanti1989
by New Contributor II
  • 1482 Views
  • 4 replies
  • 0 kudos

Pyspark execution error

I am getting a error message when executing a simple pyspark code. Can anyone help me with this.  

Kanti1989_0-1713522601530.png
  • 1482 Views
  • 4 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Could you please share the entire error message?Are you running the code locally or on databricks?

  • 0 kudos
3 More Replies
data-grassroots
by New Contributor III
  • 3440 Views
  • 6 replies
  • 1 kudos

Resolved! Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

  • 3440 Views
  • 6 replies
  • 1 kudos
Latest Reply
data-grassroots
New Contributor III
  • 1 kudos

Thanks for the validation, Werners! That's the path we've been heading down (copy + merge). I still have some DLT experiments planned but - at least for this situation - copy + merge works just fine.

  • 1 kudos
5 More Replies
miaomia123
by New Contributor
  • 752 Views
  • 1 replies
  • 0 kudos

LLM using DataBrick

Is there any coding example for how to use LLM?

  • 752 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

I would like to share the following links https://www.databricks.com/product/machine-learning/large-language-models https://docs.databricks.com/en/large-language-models/index.html

  • 0 kudos
BrianJ
by New Contributor II
  • 2476 Views
  • 5 replies
  • 4 kudos

{{job.trigger.type}} not working and throws error on Edit Parameter from Job page

Following the instruction on the Job Parameter Dynamic values, I am able to use {{job.id}}{{job.name}}{{job.run_id}}{{job.repair_count}}{{job.start_time.[argument]}}However, when I set trigger_type as trigger_type: {{job.trigger.type}} and hit SAVE, ...

BrianJ_1-1713544000542.png BrianJ_0-1713544144110.png
  • 2476 Views
  • 5 replies
  • 4 kudos
Latest Reply
BrianJ
New Contributor II
  • 4 kudos

Thanks everyone, I decided to use the Sparkcontext instead. dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

  • 4 kudos
4 More Replies
Phani1
by Valued Contributor II
  • 1593 Views
  • 0 replies
  • 0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

  • 1593 Views
  • 0 replies
  • 0 kudos
niruban
by New Contributor II
  • 1635 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering
CICD
DAB
Databricks Asset Bundle
DevOps
  • 1635 Views
  • 2 replies
  • 0 kudos
Latest Reply
niruban
New Contributor II
  • 0 kudos

@Rajani : This is what I am doing. I am having git actions to kick off which will run - name: bundle-deployrun: |      cd ${{ vars.HOME }}/dev-ops/databricks_cicd_deployment      databricks bundle deploy --debug Before running this step, I am creatin...

  • 0 kudos
1 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels