Data Engineering

Forum Posts

Sorted by:

by smedegaard • New Contributor III

yesterday

46 Views
2 replies
1 kudos

[delta live tabel] exception: getPrimaryKeys not implemented for debezium

I've defined a streaming deltlive table in a notebook using python.running on "preview" channeldelta cache accelerated (Standard_D4ads_v5) computeIt fails withorg.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, ru...

Data Engineering

46 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

24m ago

1 kudos

Hi @smedegaard, You’re encountering a StreamingQueryException with the message: “getPrimaryKeys not implemented for debezium SQLSTATE: XXKST.” This error suggests that the getPrimaryKeys operation is not supported for the Debezium connector in your ...

1 kudos

24m ago

1 More Replies

by Phani1 • Valued Contributor

yesterday

39 Views
1 replies
0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

Data Engineering

delta

39 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

20m ago

0 kudos

Hi @Phani1, Let’s explore the integration of Databricks with Boomi and compare it to Azure Event Hub. Databricks Integration with Boomi: Databricks is a powerful data analytics platform that allows you to process large-scale data and build machin...

0 kudos

20m ago

by ETLdeveloper • New Contributor

yesterday

44 Views
1 replies
0 kudos

I have to run the notebook in concurrently using process pool executor in python

Hello All,My scenario required me to create a code that reads tables from the source catalog and writes them to the destination catalog using Spark. Doing one by one is not a good option when there are 300 tables in the catalog. So I am trying the pr...

Data Engineering

44 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

24m ago

0 kudos

Hi @ETLdeveloper You can use the multithreading that help you to run notebook in parallel.Attaching code for your reference - from concurrent.futures import ThreadPoolExecutor class NotebookData: def __init__(self, path, timeout, parameters = Non...

0 kudos

24m ago

by TitaMn • Visitor

yesterday

49 Views
1 replies
0 kudos

AzureDevOps and Databricks Connection using managed identity or service principal

Hi All! Im in a project where i need to connect azure devops and databricks using managed identity to avoid the using of service account, PAT, etc.The thing is i cant move forward with the connection since i cannot take the ownership of the files wh...

Data Engineering

49 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

30m ago

0 kudos

Hi @TitaMn, Connecting Azure DevOps and Azure Databricks using managed identity is a great approach to enhance security and avoid using service accounts or personal access tokens (PATs). Let’s explore some options: Azure Managed Identity for Dat...

0 kudos

30m ago

by Anske • New Contributor II

yesterday

61 Views
4 replies
0 kudos

how to stop dataframe with federated table source to be reevaluated when referenced (cache?)

Hi,Would anyone happen to know whether it's possible to cache a dataframe in memory that the result of a query on a federated table?I have a notebook that queries a federated table, does some transformations on the dataframe and then writes this data...

Data Engineering

61 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

Anske
New Contributor II

48m ago

0 kudos

@daniel_sahal , this is the code snippet:lsn_incr_batch = spark.sql(f"""select start_lsn,tran_begin_time,tran_end_time,tran_id,tran_begin_lsn,cast('{current_run_ts}' as timestamp) as appendedfrom externaldb.cdc.lsn_time_mappingwhere tran_end_time > '...

0 kudos

48m ago

3 More Replies

by CarstenWeber • New Contributor

yesterday

73 Views
4 replies
1 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config: spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

Data Engineering

73 Views
4 replies
1 kudos

yesterday

View Replies

Latest Reply

CarstenWeber
New Contributor

an hour ago

1 kudos

@daniel_sahal using the settings above did indeed work.

1 kudos

an hour ago

3 More Replies

by amar1995 • New Contributor

Friday

299 Views
4 replies
0 kudos

Performance Issue with XML Processing in Spark Databricks

I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...

Data Engineering

299 Views
4 replies
0 kudos

Friday

View Replies

Latest Reply

shan_chandra
Honored Contributor III

Friday

0 kudos

@amar1995 - Can you try this streaming approach and see if it works for your use case (using autoloader) - https://kb.databricks.com/streaming/stream-xml-auto-loader

0 kudos

Friday

3 More Replies

by AnaMocanu • Visitor

yesterday

67 Views
1 replies
0 kudos

Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

Data Engineering

67 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

yesterday

0 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

0 kudos

yesterday

by johnp • New Contributor II

yesterday

62 Views
1 replies
0 kudos

Call databricks notebook from azure flask app

I have an Azure web app running flask web server. From flask server, I want to run some queries on the data stored in ADLS Gen2 storage. I already created Databricks notebooks running these queries. The flask server will pass some parameters in ...

Data Engineering

62 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

feiyun0112
Contributor

yesterday

0 kudos

you can use databricks SDKhttps://docs.databricks.com/en/dev-tools/sdk-python.html#create-a-job

0 kudos

yesterday

by Kanti1989 • New Contributor

Friday

324 Views
4 replies
0 kudos

Pyspark execution error

I am getting a error message when executing a simple pyspark code. Can anyone help me with this.

Data Engineering

324 Views
4 replies
0 kudos

Friday

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

yesterday

0 kudos

Could you please share the entire error message?Are you running the code locally or on databricks?

0 kudos

yesterday

3 More Replies

by data-grassroots • New Contributor II

Tuesday

353 Views
6 replies
1 kudos

Resolved! Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

Data Engineering

353 Views
6 replies
1 kudos

Tuesday

View Replies

Latest Reply

data-grassroots
New Contributor II

yesterday

1 kudos

Thanks for the validation, Werners! That's the path we've been heading down (copy + merge). I still have some DLT experiments planned but - at least for this situation - copy + merge works just fine.

1 kudos

yesterday

5 More Replies

by miaomia123 • New Contributor

06-29-2023 11:11:22 AM

268 Views
1 replies
0 kudos

LLM using DataBrick

Is there any coding example for how to use LLM?

Data Engineering

268 Views
1 replies
0 kudos

06-29-2023 11:11:22 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

I would like to share the following links https://www.databricks.com/product/machine-learning/large-language-models https://docs.databricks.com/en/large-language-models/index.html

0 kudos

yesterday

by BrianJ • New Contributor

Friday

849 Views
5 replies
4 kudos

{{job.trigger.type}} not working and throws error on Edit Parameter from Job page

Following the instruction on the Job Parameter Dynamic values, I am able to use {{job.id}}{{job.name}}{{job.run_id}}{{job.repair_count}}{{job.start_time.[argument]}}However, when I set trigger_type as trigger_type: {{job.trigger.type}} and hit SAVE, ...

Data Engineering

849 Views
5 replies
4 kudos

Friday

View Replies

Latest Reply

BrianJ
New Contributor

yesterday

4 kudos

Thanks everyone, I decided to use the Sparkcontext instead. dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

4 kudos

yesterday

4 More Replies

by PrebenOlsen • New Contributor III

yesterday

55 Views
1 replies
0 kudos

How to migrate Git repos with DLT configurations

Hi!I want to migrate all my databricks related code from one github repo to another. I knew this wouldn't be straight forward. When I copy my code for one DLT, I get the errororg.apache.spark.sql.catalyst.ExtendedAnalysisException: Table 'vessel_batt...

Data Engineering

55 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

yesterday

0 kudos

Would cloning the tables help your cause? You could probably try shallow or deep cloning as per your requirement.

0 kudos

yesterday

by niruban • New Contributor

Saturday

184 Views
2 replies
0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering

CICD

DAB

Databricks Asset Bundle

DevOps

184 Views
2 replies
0 kudos

Saturday

View Replies

Latest Reply

niruban
New Contributor

yesterday

0 kudos

@Rajani : This is what I am doing. I am having git actions to kick off which will run - name: bundle-deployrun: | cd ${{ vars.HOME }}/dev-ops/databricks_cicd_deployment databricks bundle deploy --debug Before running this step, I am creatin...

0 kudos

yesterday

1 More Replies

User

Count

1599

735

343

284

246

Databricks

Forum Posts

[delta live tabel] exception: getPrimaryKeys not implemented for debezium

Boomi integrating with Databricks

I have to run the notebook in concurrently using process pool executor in python

AzureDevOps and Databricks Connection using managed identity or service principal

how to stop dataframe with federated table source to be reevaluated when referenced (cache?)

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Performance Issue with XML Processing in Spark Databricks

Best way to parse Google Analytics data in Databricks notebook

Call databricks notebook from azure flask app

Pyspark execution error

Resolved! Ingesting Files - Same file name, modified content

LLM using DataBrick

{{job.trigger.type}} not working and throws error on Edit Parameter from Job page

How to migrate Git repos with DLT configurations

Databricks Asset Bundle to deploy only one workflow

Invalid configuration fs.azure.account.key trying ...

Ingesting Files - Same file name, modified content

Access Azure App service failed with 403 response

Using managed identities to access SQL server - ho...

Unity catalog issues