Data Engineering

Forum Posts

Sorted by:

by Anske • New Contributor II

yesterday

54 Views
3 replies
0 kudos

how to stop dataframe with federated table source to be reevaluated when referenced (cache?)

Hi,Would anyone happen to know whether it's possible to cache a dataframe in memory that the result of a query on a federated table?I have a notebook that queries a federated table, does some transformations on the dataframe and then writes this data...

Data Engineering

54 Views
3 replies
0 kudos

yesterday

View Replies

Latest Reply

Anske
New Contributor II

48m ago

0 kudos

Thanks for your answer Lakshay. I have tried caching the df by using the cache() function, but it does not seem to do anything (the dataset in this case is tiny, so I'm pretty sure it would fit into memory). So I'm indeed back to writing to file firs...

0 kudos

48m ago

2 More Replies

by amar1995 • New Contributor

Friday

298 Views
4 replies
0 kudos

Performance Issue with XML Processing in Spark Databricks

I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...

Data Engineering

298 Views
4 replies
0 kudos

Friday

View Replies

Latest Reply

shan_chandra
Honored Contributor III

Friday

0 kudos

@amar1995 - Can you try this streaming approach and see if it works for your use case (using autoloader) - https://kb.databricks.com/streaming/stream-xml-auto-loader

0 kudos

Friday

3 More Replies

by AnaMocanu • Visitor

yesterday

57 Views
1 replies
0 kudos

Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

Data Engineering

57 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

yesterday

0 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

0 kudos

yesterday

by CarstenWeber • Visitor

yesterday

65 Views
3 replies
0 kudos

Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config: spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

Data Engineering

65 Views
3 replies
0 kudos

yesterday

View Replies

Latest Reply

CarstenWeber
Visitor

yesterday

0 kudos

@daniel_sahal i already tried it out with the "longer" version for spark configs as mentioned in the article.Tbh. for regular spark.read.load(path) commands both versions work just fine. I guess the one i used is a general conf. and the one in the ar...

0 kudos

yesterday

2 More Replies

by johnp • New Contributor II

yesterday

60 Views
1 replies
0 kudos

Call databricks notebook from azure flask app

I have an Azure web app running flask web server. From flask server, I want to run some queries on the data stored in ADLS Gen2 storage. I already created Databricks notebooks running these queries. The flask server will pass some parameters in ...

Data Engineering

60 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

feiyun0112
Contributor

yesterday

0 kudos

you can use databricks SDKhttps://docs.databricks.com/en/dev-tools/sdk-python.html#create-a-job

0 kudos

yesterday

by Kanti1989 • New Contributor

Friday

323 Views
4 replies
0 kudos

Pyspark execution error

I am getting a error message when executing a simple pyspark code. Can anyone help me with this.

Data Engineering

323 Views
4 replies
0 kudos

Friday

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

yesterday

0 kudos

Could you please share the entire error message?Are you running the code locally or on databricks?

0 kudos

yesterday

3 More Replies

by data-grassroots • New Contributor II

Tuesday

351 Views
6 replies
1 kudos

Resolved! Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

Data Engineering

351 Views
6 replies
1 kudos

Tuesday

View Replies

Latest Reply

data-grassroots
New Contributor II

yesterday

1 kudos

Thanks for the validation, Werners! That's the path we've been heading down (copy + merge). I still have some DLT experiments planned but - at least for this situation - copy + merge works just fine.

1 kudos

yesterday

5 More Replies

by miaomia123 • New Contributor

06-29-2023 11:11:22 AM

268 Views
1 replies
0 kudos

LLM using DataBrick

Is there any coding example for how to use LLM?

Data Engineering

268 Views
1 replies
0 kudos

06-29-2023 11:11:22 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

yesterday

0 kudos

I would like to share the following links https://www.databricks.com/product/machine-learning/large-language-models https://docs.databricks.com/en/large-language-models/index.html

0 kudos

yesterday

by BrianJ • New Contributor

Friday

846 Views
5 replies
4 kudos

{{job.trigger.type}} not working and throws error on Edit Parameter from Job page

Following the instruction on the Job Parameter Dynamic values, I am able to use {{job.id}}{{job.name}}{{job.run_id}}{{job.repair_count}}{{job.start_time.[argument]}}However, when I set trigger_type as trigger_type: {{job.trigger.type}} and hit SAVE, ...

Data Engineering

846 Views
5 replies
4 kudos

Friday

View Replies

Latest Reply

BrianJ
New Contributor

yesterday

4 kudos

Thanks everyone, I decided to use the Sparkcontext instead. dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

4 kudos

yesterday

4 More Replies

by TitaMn • Visitor

yesterday

46 Views
0 replies
0 kudos

AzureDevOps and Databricks Connection using managed identity or service principal

Hi All! Im in a project where i need to connect azure devops and databricks using managed identity to avoid the using of service account, PAT, etc.The thing is i cant move forward with the connection since i cannot take the ownership of the files wh...

Data Engineering

46 Views
0 replies
0 kudos

yesterday

by smedegaard • New Contributor II

yesterday

40 Views
0 replies
0 kudos

[delta live tabel] exception: getPrimaryKeys not implemented for debezium

I've defined a streaming deltlive table in a notebook using python.running on "preview" channeldelta cache accelerated (Standard_D4ads_v5) computeIt fails withorg.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = xxx, ru...

Data Engineering

40 Views
0 replies
0 kudos

yesterday

by Phani1 • Valued Contributor

yesterday

37 Views
0 replies
0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

Data Engineering

delta

37 Views
0 replies
0 kudos

yesterday

by PrebenOlsen • New Contributor III

yesterday

54 Views
1 replies
0 kudos

How to migrate Git repos with DLT configurations

Hi!I want to migrate all my databricks related code from one github repo to another. I knew this wouldn't be straight forward. When I copy my code for one DLT, I get the errororg.apache.spark.sql.catalyst.ExtendedAnalysisException: Table 'vessel_batt...

Data Engineering

54 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

yesterday

0 kudos

Would cloning the tables help your cause? You could probably try shallow or deep cloning as per your requirement.

0 kudos

yesterday

by niruban • New Contributor

Saturday

183 Views
2 replies
0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering

CICD

DAB

Databricks Asset Bundle

DevOps

183 Views
2 replies
0 kudos

Saturday

View Replies

Latest Reply

niruban
New Contributor

yesterday

0 kudos

@Rajani : This is what I am doing. I am having git actions to kick off which will run - name: bundle-deployrun: | cd ${{ vars.HOME }}/dev-ops/databricks_cicd_deployment databricks bundle deploy --debug Before running this step, I am creatin...

0 kudos

yesterday

1 More Replies

by Espenol1 • Visitor

yesterday

154 Views
4 replies
2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

Data Engineering

154 Views
4 replies
2 kudos

yesterday

View Replies

Latest Reply

Espenol1
Visitor

yesterday

2 kudos

Thanks a lot. Then I guess we will try to use dbmanagedidentity for most of our needs, and create service principals +secret scopes when there are more specific needs, such as for limiting access to sensitive data. A bit of a hassle to scale, probabl...

2 kudos

yesterday

3 More Replies

User

Count

1599

735

343

284

246

Databricks

Forum Posts

how to stop dataframe with federated table source to be reevaluated when referenced (cache?)

Performance Issue with XML Processing in Spark Databricks

Best way to parse Google Analytics data in Databricks notebook

Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Call databricks notebook from azure flask app

Pyspark execution error

Resolved! Ingesting Files - Same file name, modified content

LLM using DataBrick

{{job.trigger.type}} not working and throws error on Edit Parameter from Job page

AzureDevOps and Databricks Connection using managed identity or service principal

[delta live tabel] exception: getPrimaryKeys not implemented for debezium

Boomi integrating with Databricks

How to migrate Git repos with DLT configurations

Databricks Asset Bundle to deploy only one workflow

Resolved! Using managed identities to access SQL server - how?

Ingesting Files - Same file name, modified content

Access Azure App service failed with 403 response

Using managed identities to access SQL server - ho...

Unity catalog issues

Unit Testing with the new Databricks Connect in Py...