cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Chengcheng
by New Contributor III
  • 1006 Views
  • 1 replies
  • 4 kudos

Is Feature Store packaged model compatible with Spark UDF?

Hi, I tried to deploy a Feature Store packaged model into Delta Live Table using mlflow.pyfunc.spark_udf in Azure Databricks. This model is built by Databricks autoML with joined Feature Table inside it.And I'm trying to make prediction using the fol...

  • 1006 Views
  • 1 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hi @Chengcheng Guo​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 4 kudos
vanessafvg
by New Contributor III
  • 1010 Views
  • 1 replies
  • 3 kudos

Extracting data from excel in datalake storage using openpyxl

i am trying to extract some data into databricks but tripping all over openpyxl, newish user of databricks..from openpyxl import load_workbookdirectory_id="hidden"scope="hidden"client_id="hidden"service_credential_key="hidden"container_name="hidden"s...

  • 1010 Views
  • 1 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Vanessa Van Gelder​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 3 kudos
guostong
by New Contributor III
  • 2662 Views
  • 1 replies
  • 1 kudos

How to update the items in array of struct column with sql

create table test.json_test_01 ( id int, description string, struct_address STRUCT<street_number: STRING, street_name: STRING, city: STRING, province: STRING>, arrary_phone ARRAY<STRUCT<phone_number: STRING, phone_type: STRING>> );   insert into ...

  • 2662 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Richard Guo​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 1 kudos
timothy_uk
by New Contributor III
  • 562 Views
  • 1 replies
  • 1 kudos

Mysterious simultaneous long-running Databricks Workflows

Hi,This happened across 4x seemingly unrelated workflows at the same time of the day - all 4x workflows eventually completed successfully. It appeared that all workflows sat idling despite triggering via the Jobs API. The two symptoms I have observed...

  • 562 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Timothy Lin​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 1 kudos
pskchai
by New Contributor
  • 841 Views
  • 2 replies
  • 0 kudos

Resolved! Using DLT with a non-streaming large table

We have a source table that receives daily append operations, but the rows created within the last 30 days in this table can be updated or deleted. Thus, the source table is not exactly a streaming source.Our processing workflow involves performing "...

  • 841 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hi @Pongsakorn Chairatanakul​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please...

  • 0 kudos
1 More Replies
Thaw
by New Contributor III
  • 1015 Views
  • 3 replies
  • 4 kudos

Resolved! How to change Instance Family in CloudFormation in a Databricks trial mood?

I implemented Databrick on AWS and the template is used i3.xlarge. Could I use it for down Instance Family for cost optimization? Is i3.xlarge the minimum size to use Databricks in a trial mood? Thanks

  • 1015 Views
  • 3 replies
  • 4 kudos
Latest Reply
Thaw
New Contributor III
  • 4 kudos

Thank you so much for your reply to my question, @Vidula Khanna​ @Kaniz Fatma​ . After I took some study time, I understood the basics, and then I am on the way to Databricks.

  • 4 kudos
2 More Replies
dukebaslangic
by New Contributor II
  • 951 Views
  • 3 replies
  • 3 kudos

Resolved! Databricks performance related documentation/books

Hi,Do you know any good resources about Databricks performance improvements(like improving query performances, monitoring/resolving performance bottlenecks etc)?Thanks

  • 951 Views
  • 3 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Ömer Özsakarya​  We haven't heard from you since the last response from @Lakshay Goel​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

  • 3 kudos
2 More Replies
Ram443
by New Contributor III
  • 14674 Views
  • 9 replies
  • 5 kudos

Resolved! I created a data frame but was not able to see the data

Code to create a data frame:from pyspark.sql import SparkSessionspark=SparkSession.builder.appName("oracle_queries").master("local[4]")\  .config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()from pyspark.sql.functions ...

  • 14674 Views
  • 9 replies
  • 5 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 5 kudos

@ramanjaneyulu kancharla​  can you please select my answer as best answer

  • 5 kudos
8 More Replies
Paul_Seattle
by New Contributor
  • 1157 Views
  • 1 replies
  • 0 kudos

A Quick Question on Running a job from CLI

Could anyone tell me what could be wrong with my command to submit a spark job with params( If I don’t have --spark-submit-params, it’s fine). Please see the attached snapshot.

image
  • 1157 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16539034020
Contributor II
  • 0 kudos

yes, there is no need for spark-submit-params. databricks jobs run-now --job-id ***reference: https://docs.databricks.com/dev-tools/cli/jobs-cli.html

  • 0 kudos
Kotofosonline
by New Contributor III
  • 843 Views
  • 2 replies
  • 2 kudos

Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by cla...

  • 843 Views
  • 2 replies
  • 2 kudos
Latest Reply
User16756723392
New Contributor III
  • 2 kudos

SELECT album.ArtistId ,DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdCan you try this

  • 2 kudos
1 More Replies
RonanStokes_DB
by New Contributor III
  • 1351 Views
  • 1 replies
  • 1 kudos

Can you apply a specific cluster policy when launching a Databricks job via Azure Data Factory

When using Azure Data Factory to coordinate the launch of Databricks jobs - can you specify which cluster policy to apply to the job, either explicitly or implicitly?

  • 1351 Views
  • 1 replies
  • 1 kudos
Latest Reply
mvandeborne
New Contributor II
  • 1 kudos

you could, but not from ADF's UI. You need to edit the json of the linked service, adding a 'policyId' parameter in the 'typeProperties' object, pointing to the cluster policy ID from Databricks (which you could find in Databricks' URL).

  • 1 kudos
pcriado
by New Contributor III
  • 3285 Views
  • 2 replies
  • 1 kudos

Resolved! Requested array size exceeds VM limit when saving to feature table

Hi, I'm trying to process a small dataset (less than 300 Mb) composed by five queries that run with spark. The end result of those queries is parsed using python and merged into a data frame. Then I try to write this to a delta lake table using featu...

  • 3285 Views
  • 2 replies
  • 1 kudos
Latest Reply
pcriado
New Contributor III
  • 1 kudos

Hello, we have recently found that it's my user in particular that casues the memory issue. Two other users in my organization can run the same notebook without problems, but my user consistenly consumes all available ram and crashes the cluster... a...

  • 1 kudos
1 More Replies
gustavomcarmo-h
by New Contributor III
  • 1532 Views
  • 5 replies
  • 2 kudos

Resolved! Is there a way to list the dlt maintenance jobs through the API?

After creating the delta pipeline, I would like to get details from the dlt maintenance job automatically created by Databricks, like the scheduled time when the dlt maintenance tasks will be executed. However, it seems the Job API 2.1 doesn't cover ...

  • 1532 Views
  • 5 replies
  • 2 kudos
Latest Reply
gustavomcarmo-h
New Contributor III
  • 2 kudos

Hi @Debayan Mukherjee​ ,Actually the Databricks Jobs API documentation has not been fixed yet. The parameter `job_type` should be included in the list endpoint request documentation. Please do this in order to avoid unnecessary questions here in the ...

  • 2 kudos
4 More Replies
iptkrisna
by New Contributor III
  • 1525 Views
  • 2 replies
  • 1 kudos

Clear Cache From a Notebook, not from a Cluster

Hi, I'm running all my jobs on one big cluster, I'm just concerned is there a solution on how we could clear cache resulted by a notebook in the end of the job when its done? hence it does not causing any memory problem sometime from one to another, ...

  • 1525 Views
  • 2 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @krisna math​ We haven't heard from you since the last response from @Debayan Mukherjee​ â€‹, and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

  • 1 kudos
1 More Replies
sree1567
by New Contributor II
  • 640 Views
  • 1 replies
  • 1 kudos

Azure-EventHub Schema Registry with Spark-Scala

Hi all,Is there a way to consume the schemas from schema registry defined in Azure EventHub using apache spark and scala.

  • 640 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @sreeranjani thevan​ Great to meet you, and thanks for your question!Let's see if your peers in the community have an answer to your question. Thanks.

  • 1 kudos
Labels
Top Kudoed Authors