cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ismaelhenzel
by New Contributor II
  • 509 Views
  • 2 replies
  • 1 kudos

Resolved! Addressing Pipeline Error Handling in Databricks bundle run with CI/CD when SUCCESS WITH FAILURES

I'm using Databricks asset bundles and I have pipelines that contain "if all done rules". When running on CI/CD, if a task fails, the pipeline returns a message like "the job xxxx SUCCESS_WITH_FAILURES" and it passes, potentially deploying a broken p...

Data Engineering
bunlde
CICD
Databricks
  • 509 Views
  • 2 replies
  • 1 kudos
Latest Reply
ismaelhenzel
New Contributor II
  • 1 kudos

Awesome answer, I will try the first approach. I think it is a less intrusive solution than changing the rules of my pipeline in development scenarios. This way, I can maintain a general pipeline for deployment across all environments. We plan to imp...

  • 1 kudos
1 More Replies
Jorge3
by New Contributor II
  • 46 Views
  • 2 replies
  • 0 kudos

[Databricks Assets Bundles] Workflow trigger on file arrival

Hi everyone!I'm setting up a workflow using Databricks Assets Bundles (DABs). And I want to configure my workflow to be trigger on file arrival. However all the examples I've found in the documentation use schedule triggers. Does anyone know if it is...

  • 46 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Jorge3 Yes you can use file_arrival triger with DAB.Find below tag for your reference - resources: jobs: FileBasedJob: name: FileBasedJob trigger: pause_status: UNPAUSED file_arrival: url: abfss://test@...

  • 0 kudos
1 More Replies
smedegaard
by New Contributor III
  • 53 Views
  • 2 replies
  • 1 kudos

[delta live tabel] exception: getPrimaryKeys not implemented for debezium

I've defined a streaming deltlive table in a notebook using python.running on "preview" channeldelta cache accelerated (Standard_D4ads_v5) computeIt fails withorg.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, ru...

  • 53 Views
  • 2 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @smedegaard,  You’re encountering a StreamingQueryException with the message: “getPrimaryKeys not implemented for debezium SQLSTATE: XXKST.” This error suggests that the getPrimaryKeys operation is not supported for the Debezium connector in your ...

  • 1 kudos
1 More Replies
Phani1
by Valued Contributor
  • 47 Views
  • 1 replies
  • 0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

  • 47 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, Let’s explore the integration of Databricks with Boomi and compare it to Azure Event Hub. Databricks Integration with Boomi: Databricks is a powerful data analytics platform that allows you to process large-scale data and build machin...

  • 0 kudos
ETLdeveloper
by New Contributor II
  • 53 Views
  • 1 replies
  • 0 kudos

Resolved! I have to run the notebook in concurrently using process pool executor in python

Hello All,My scenario required me to create a code that reads tables from the source catalog and writes them to the destination catalog using Spark. Doing one by one is not a good option when there are 300 tables in the catalog. So I am trying the pr...

  • 53 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @ETLdeveloper You can use the multithreading that help you to run notebook in parallel.Attaching code for your reference - from concurrent.futures import ThreadPoolExecutor class NotebookData: def __init__(self, path, timeout, parameters = Non...

  • 0 kudos
TitaMn
by Visitor
  • 53 Views
  • 1 replies
  • 0 kudos

AzureDevOps and Databricks Connection using managed identity or service principal

Hi All!  Im in a project where i need to connect azure devops and databricks using managed identity to avoid the using of service account, PAT, etc.The thing is i cant move forward with the connection since i cannot take the ownership of the files wh...

  • 53 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @TitaMn, Connecting Azure DevOps and Azure Databricks using managed identity is a great approach to enhance security and avoid using service accounts or personal access tokens (PATs). Let’s explore some options: Azure Managed Identity for Dat...

  • 0 kudos
Anske
by New Contributor II
  • 66 Views
  • 4 replies
  • 0 kudos

how to stop dataframe with federated table source to be reevaluated when referenced (cache?)

Hi,Would anyone happen to know whether it's possible to cache a dataframe in memory that the result of a query on a federated table?I have a notebook that queries a federated table, does some transformations on the dataframe and then writes this data...

  • 66 Views
  • 4 replies
  • 0 kudos
Latest Reply
Anske
New Contributor II
  • 0 kudos

@daniel_sahal , this is the code snippet:lsn_incr_batch = spark.sql(f"""select start_lsn,tran_begin_time,tran_end_time,tran_id,tran_begin_lsn,cast('{current_run_ts}' as timestamp) as appendedfrom externaldb.cdc.lsn_time_mappingwhere tran_end_time > '...

  • 0 kudos
3 More Replies
CarstenWeber
by New Contributor
  • 77 Views
  • 4 replies
  • 1 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config:  spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

  • 77 Views
  • 4 replies
  • 1 kudos
Latest Reply
CarstenWeber
New Contributor
  • 1 kudos

@daniel_sahal using the settings above did indeed work. 

  • 1 kudos
3 More Replies
amar1995
by New Contributor
  • 302 Views
  • 4 replies
  • 0 kudos

Performance Issue with XML Processing in Spark Databricks

I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...

  • 302 Views
  • 4 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

@amar1995 - Can you try this streaming approach and see if it works for your use case (using autoloader) - https://kb.databricks.com/streaming/stream-xml-auto-loader

  • 0 kudos
3 More Replies
AnaMocanu
by Visitor
  • 88 Views
  • 1 replies
  • 0 kudos

Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

  • 88 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

  • 0 kudos
johnp
by New Contributor II
  • 64 Views
  • 1 replies
  • 0 kudos

Call databricks notebook from azure flask app

I have an Azure web app running flask web server.  From flask server, I want to run some queries on the data  stored in ADLS Gen2 storage.   I already created Databricks notebooks running these queries.  The flask server will pass some parameters in ...

  • 64 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Contributor
  • 0 kudos

you can use databricks SDKhttps://docs.databricks.com/en/dev-tools/sdk-python.html#create-a-job 

  • 0 kudos
data-grassroots
by New Contributor II
  • 353 Views
  • 6 replies
  • 1 kudos

Resolved! Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

  • 353 Views
  • 6 replies
  • 1 kudos
Latest Reply
data-grassroots
New Contributor II
  • 1 kudos

Thanks for the validation, Werners! That's the path we've been heading down (copy + merge). I still have some DLT experiments planned but - at least for this situation - copy + merge works just fine.

  • 1 kudos
5 More Replies
miaomia123
by New Contributor
  • 268 Views
  • 1 replies
  • 0 kudos

LLM using DataBrick

Is there any coding example for how to use LLM?

  • 268 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

I would like to share the following links https://www.databricks.com/product/machine-learning/large-language-models https://docs.databricks.com/en/large-language-models/index.html

  • 0 kudos
Labels
Top Kudoed Authors