cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anske
by New Contributor II
  • 54 Views
  • 3 replies
  • 0 kudos

how to stop dataframe with federated table source to be reevaluated when referenced (cache?)

Hi,Would anyone happen to know whether it's possible to cache a dataframe in memory that the result of a query on a federated table?I have a notebook that queries a federated table, does some transformations on the dataframe and then writes this data...

  • 54 Views
  • 3 replies
  • 0 kudos
Latest Reply
Anske
New Contributor II
  • 0 kudos

Thanks for your answer Lakshay. I have tried caching the df by using the cache() function, but it does not seem to do anything (the dataset in this case is tiny, so I'm pretty sure it would fit into memory). So I'm indeed back to writing to file firs...

  • 0 kudos
2 More Replies
amar1995
by New Contributor
  • 298 Views
  • 4 replies
  • 0 kudos

Performance Issue with XML Processing in Spark Databricks

I am reaching out to bring attention to a performance issue we are encountering while processing XML files using Spark-XML, particularly with the configuration spark.read().format("com.databricks.spark.xml").Currently, we are experiencing significant...

  • 298 Views
  • 4 replies
  • 0 kudos
Latest Reply
shan_chandra
Honored Contributor III
  • 0 kudos

@amar1995 - Can you try this streaming approach and see if it works for your use case (using autoloader) - https://kb.databricks.com/streaming/stream-xml-auto-loader

  • 0 kudos
3 More Replies
AnaMocanu
by Visitor
  • 57 Views
  • 1 replies
  • 0 kudos

Best way to parse Google Analytics data in Databricks notebook

I managed to extract the Google Analytics data via lakehouse federation and the Big Query connection but the events table values are in a weird JSON format{"v":[{"v":{"f":[{"v":"ga_session_number"},{"v":{"f":[{"v":null},{"v":"2"},{"v":null},{"v":null...

  • 57 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

@AnaMocanu I was using this function, with a little modifications on my end:https://gist.github.com/shreyasms17/96f74e45d862f8f1dce0532442cc95b2Maybe this will be helpful for you

  • 0 kudos
CarstenWeber
by Visitor
  • 65 Views
  • 3 replies
  • 0 kudos

Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config:  spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

  • 65 Views
  • 3 replies
  • 0 kudos
Latest Reply
CarstenWeber
  • 0 kudos

@daniel_sahal i already tried it out with the "longer" version for spark configs as mentioned in the article.Tbh. for regular spark.read.load(path) commands both versions work just fine. I guess the one i used is a general conf. and the one in the ar...

  • 0 kudos
2 More Replies
johnp
by New Contributor II
  • 60 Views
  • 1 replies
  • 0 kudos

Call databricks notebook from azure flask app

I have an Azure web app running flask web server.  From flask server, I want to run some queries on the data  stored in ADLS Gen2 storage.   I already created Databricks notebooks running these queries.  The flask server will pass some parameters in ...

  • 60 Views
  • 1 replies
  • 0 kudos
Latest Reply
feiyun0112
Contributor
  • 0 kudos

you can use databricks SDKhttps://docs.databricks.com/en/dev-tools/sdk-python.html#create-a-job 

  • 0 kudos
data-grassroots
by New Contributor II
  • 351 Views
  • 6 replies
  • 1 kudos

Resolved! Ingesting Files - Same file name, modified content

We have a data feed with files whose filenames stays the same but the contents change over time (brand_a.csv, brand_b.csv, brand_c.csv ....).Copy Into seems to ignore the files when they change.If we set the Force flag to true and run it, we end up w...

  • 351 Views
  • 6 replies
  • 1 kudos
Latest Reply
data-grassroots
New Contributor II
  • 1 kudos

Thanks for the validation, Werners! That's the path we've been heading down (copy + merge). I still have some DLT experiments planned but - at least for this situation - copy + merge works just fine.

  • 1 kudos
5 More Replies
miaomia123
by New Contributor
  • 268 Views
  • 1 replies
  • 0 kudos

LLM using DataBrick

Is there any coding example for how to use LLM?

  • 268 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

I would like to share the following links https://www.databricks.com/product/machine-learning/large-language-models https://docs.databricks.com/en/large-language-models/index.html

  • 0 kudos
BrianJ
by New Contributor
  • 846 Views
  • 5 replies
  • 4 kudos

{{job.trigger.type}} not working and throws error on Edit Parameter from Job page

Following the instruction on the Job Parameter Dynamic values, I am able to use {{job.id}}{{job.name}}{{job.run_id}}{{job.repair_count}}{{job.start_time.[argument]}}However, when I set trigger_type as trigger_type: {{job.trigger.type}} and hit SAVE, ...

BrianJ_1-1713544000542.png BrianJ_0-1713544144110.png
  • 846 Views
  • 5 replies
  • 4 kudos
Latest Reply
BrianJ
New Contributor
  • 4 kudos

Thanks everyone, I decided to use the Sparkcontext instead. dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson()

  • 4 kudos
4 More Replies
Phani1
by Valued Contributor
  • 37 Views
  • 0 replies
  • 0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

  • 37 Views
  • 0 replies
  • 0 kudos
PrebenOlsen
by New Contributor III
  • 54 Views
  • 1 replies
  • 0 kudos

How to migrate Git repos with DLT configurations

Hi!I want to migrate all my databricks related code from one github repo to another. I knew this wouldn't be straight forward. When I copy my code for one DLT, I get the errororg.apache.spark.sql.catalyst.ExtendedAnalysisException: Table 'vessel_batt...

  • 54 Views
  • 1 replies
  • 0 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 0 kudos

Would cloning the tables help your cause? You could probably try shallow or deep cloning as per your requirement.

  • 0 kudos
niruban
by New Contributor
  • 183 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering
CICD
DAB
Databricks Asset Bundle
DevOps
  • 183 Views
  • 2 replies
  • 0 kudos
Latest Reply
niruban
New Contributor
  • 0 kudos

@Rajani : This is what I am doing. I am having git actions to kick off which will run - name: bundle-deployrun: |      cd ${{ vars.HOME }}/dev-ops/databricks_cicd_deployment      databricks bundle deploy --debug Before running this step, I am creatin...

  • 0 kudos
1 More Replies
Espenol1
by Visitor
  • 154 Views
  • 4 replies
  • 2 kudos

Resolved! Using managed identities to access SQL server - how?

Hello! My company wants us to only use managed identities for authentication. We have set up Databricks using Terraform, got Unity Catalog and everything, but we're a very small team and I'm struggling to control permissions outside of Unity Catalog....

  • 154 Views
  • 4 replies
  • 2 kudos
Latest Reply
Espenol1
Visitor
  • 2 kudos

Thanks a lot. Then I guess we will try to use dbmanagedidentity for most of our needs, and create service principals +secret scopes when there are more specific needs, such as for limiting access to sensitive data. A bit of a hassle to scale, probabl...

  • 2 kudos
3 More Replies
Labels
Top Kudoed Authors