Data Engineering

Forum Posts

Sorted by:

by Chengcheng • New Contributor III

06-15-2023 12:21:56 AM

1006 Views
1 replies
4 kudos

Is Feature Store packaged model compatible with Spark UDF?

Hi, I tried to deploy a Feature Store packaged model into Delta Live Table using mlflow.pyfunc.spark_udf in Azure Databricks. This model is built by Databricks autoML with joined Feature Table inside it.And I'm trying to make prediction using the fol...

Data Engineering

1006 Views
1 replies
4 kudos

06-15-2023 12:21:56 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 2:34:31 AM

4 kudos

Hi @Chengcheng Guo Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

4 kudos

06-17-2023 2:34:31 AM

by vanessafvg • New Contributor III

06-15-2023 10:49:09 AM

1010 Views
1 replies
3 kudos

Extracting data from excel in datalake storage using openpyxl

i am trying to extract some data into databricks but tripping all over openpyxl, newish user of databricks..from openpyxl import load_workbookdirectory_id="hidden"scope="hidden"client_id="hidden"service_credential_key="hidden"container_name="hidden"s...

Data Engineering

1010 Views
1 replies
3 kudos

06-15-2023 10:49:09 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 2:30:22 AM

3 kudos

Hi @Vanessa Van Gelder Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

3 kudos

06-17-2023 2:30:22 AM

by guostong • New Contributor III

06-16-2023 2:27:16 PM

2662 Views
1 replies
1 kudos

How to update the items in array of struct column with sql

create table test.json_test_01 ( id int, description string, struct_address STRUCT<street_number: STRING, street_name: STRING, city: STRING, province: STRING>, arrary_phone ARRAY<STRUCT<phone_number: STRING, phone_type: STRING>> ); insert into ...

Data Engineering

2662 Views
1 replies
1 kudos

06-16-2023 2:27:16 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 2:29:43 AM

1 kudos

Hi @Richard Guo Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

1 kudos

06-17-2023 2:29:43 AM

by timothy_uk • New Contributor III

06-14-2023 10:12:37 AM

562 Views
1 replies
1 kudos

Mysterious simultaneous long-running Databricks Workflows

Hi,This happened across 4x seemingly unrelated workflows at the same time of the day - all 4x workflows eventually completed successfully. It appeared that all workflows sat idling despite triggering via the Jobs API. The two symptoms I have observed...

Data Engineering

562 Views
1 replies
1 kudos

06-14-2023 10:12:37 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 2:28:49 AM

1 kudos

Hi @Timothy Lin Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

1 kudos

06-17-2023 2:28:49 AM

by pskchai • New Contributor

06-16-2023 12:42:20 AM

841 Views
2 replies
0 kudos

Resolved! Using DLT with a non-streaming large table

We have a source table that receives daily append operations, but the rows created within the last 30 days in this table can be updated or deleted. Thus, the source table is not exactly a streaming source.Our processing workflow involves performing "...

Data Engineering

841 Views
2 replies
0 kudos

06-16-2023 12:42:20 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 1:48:46 AM

0 kudos

Hi @Pongsakorn Chairatanakul Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please...

0 kudos

06-17-2023 1:48:46 AM

1 More Replies

by Thaw • New Contributor III

06-08-2023 9:44:29 PM

1015 Views
3 replies
4 kudos

Resolved! How to change Instance Family in CloudFormation in a Databricks trial mood?

I implemented Databrick on AWS and the template is used i3.xlarge. Could I use it for down Instance Family for cost optimization? Is i3.xlarge the minimum size to use Databricks in a trial mood? Thanks

Data Engineering

1015 Views
3 replies
4 kudos

06-08-2023 9:44:29 PM

View Replies

Latest Reply

Thaw
New Contributor III

06-17-2023 1:38:17 AM

4 kudos

Thank you so much for your reply to my question, @Vidula Khanna @Kaniz Fatma . After I took some study time, I understood the basics, and then I am on the way to Databricks.

4 kudos

06-17-2023 1:38:17 AM

2 More Replies

by dukebaslangic • New Contributor II

06-15-2023 11:07:24 PM

951 Views
3 replies
3 kudos

Resolved! Databricks performance related documentation/books

Hi,Do you know any good resources about Databricks performance improvements(like improving query performances, monitoring/resolving performance bottlenecks etc)?Thanks

Data Engineering

951 Views
3 replies
3 kudos

06-15-2023 11:07:24 PM

View Replies

Latest Reply

Anonymous
Not applicable

06-17-2023 12:18:26 AM

3 kudos

Hi @Ömer Özsakarya We haven't heard from you since the last response from @Lakshay Goel , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to ...

3 kudos

06-17-2023 12:18:26 AM

2 More Replies

by Ram443 • New Contributor III

12-24-2022 5:02:21 PM

14674 Views
9 replies
5 kudos

Resolved! I created a data frame but was not able to see the data

Code to create a data frame:from pyspark.sql import SparkSessionspark=SparkSession.builder.appName("oracle_queries").master("local[4]")\ .config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()from pyspark.sql.functions ...

Data Engineering

14674 Views
9 replies
5 kudos

12-24-2022 5:02:21 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-26-2022 5:33:15 PM

5 kudos

@ramanjaneyulu kancharla can you please select my answer as best answer

5 kudos

12-26-2022 5:33:15 PM

8 More Replies

by Paul_Seattle • New Contributor

05-11-2023 10:37:33 AM

1157 Views
1 replies
0 kudos

A Quick Question on Running a job from CLI

Could anyone tell me what could be wrong with my command to submit a spark job with params( If I don’t have --spark-submit-params, it’s fine). Please see the attached snapshot.

Data Engineering

1157 Views
1 replies
0 kudos

05-11-2023 10:37:33 AM

View Replies

Latest Reply

User16539034020
Contributor II

06-16-2023 9:47:09 AM

0 kudos

yes, there is no need for spark-submit-params. databricks jobs run-now --job-id ***reference: https://docs.databricks.com/dev-tools/cli/jobs-cli.html

0 kudos

06-16-2023 9:47:09 AM

by Kotofosonline • New Contributor III

09-08-2021 4:41:09 AM

843 Views
2 replies
2 kudos

Query with distinct sort and alias produces error column not found

I’m trying to use sql query on azure-databricks with distinct sort and aliasesSELECT DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdThe problem is that if I add an alias then I can not use not aliased name in the order by cla...

Data Engineering

843 Views
2 replies
2 kudos

09-08-2021 4:41:09 AM

View Replies

Latest Reply

User16756723392
New Contributor III

04-14-2023 12:22:33 AM

2 kudos

SELECT album.ArtistId ,DISTINCT album.ArtistId AS my_alias FROM album ORDER BY album.ArtistIdCan you try this

2 kudos

04-14-2023 12:22:33 AM

1 More Replies

by RonanStokes_DB • New Contributor III

06-08-2021 10:06:15 AM

1351 Views
1 replies
1 kudos

Can you apply a specific cluster policy when launching a Databricks job via Azure Data Factory

When using Azure Data Factory to coordinate the launch of Databricks jobs - can you specify which cluster policy to apply to the job, either explicitly or implicitly?

Data Engineering

1351 Views
1 replies
1 kudos

06-08-2021 10:06:15 AM

View Replies

Latest Reply

mvandeborne
New Contributor II

06-16-2023 5:46:40 AM

1 kudos

you could, but not from ADF's UI. You need to edit the json of the linked service, adding a 'policyId' parameter in the 'typeProperties' object, pointing to the cluster policy ID from Databricks (which you could find in Databricks' URL).

1 kudos

06-16-2023 5:46:40 AM

by pcriado • New Contributor III

06-07-2023 2:09:05 PM

3285 Views
2 replies
1 kudos

Resolved! Requested array size exceeds VM limit when saving to feature table

Hi, I'm trying to process a small dataset (less than 300 Mb) composed by five queries that run with spark. The end result of those queries is parsed using python and merged into a data frame. Then I try to write this to a delta lake table using featu...

Data Engineering

3285 Views
2 replies
1 kudos

06-07-2023 2:09:05 PM

View Replies

Latest Reply

pcriado
New Contributor III

06-16-2023 5:30:58 AM

1 kudos

Hello, we have recently found that it's my user in particular that casues the memory issue. Two other users in my organization can run the same notebook without problems, but my user consistenly consumes all available ram and crashes the cluster... a...

1 kudos

06-16-2023 5:30:58 AM

1 More Replies

by gustavomcarmo-h • New Contributor III

06-14-2023 9:54:04 AM

1532 Views
5 replies
2 kudos

Resolved! Is there a way to list the dlt maintenance jobs through the API?

After creating the delta pipeline, I would like to get details from the dlt maintenance job automatically created by Databricks, like the scheduled time when the dlt maintenance tasks will be executed. However, it seems the Job API 2.1 doesn't cover ...

Data Engineering

1532 Views
5 replies
2 kudos

06-14-2023 9:54:04 AM

View Replies

Latest Reply

gustavomcarmo-h
New Contributor III

06-16-2023 1:38:34 AM

2 kudos

Hi @Debayan Mukherjee ,Actually the Databricks Jobs API documentation has not been fixed yet. The parameter `job_type` should be included in the list endpoint request documentation. Please do this in order to avoid unnecessary questions here in the ...

2 kudos

06-16-2023 1:38:34 AM

4 More Replies

by iptkrisna • New Contributor III

06-14-2023 3:04:40 AM

1525 Views
2 replies
1 kudos

Clear Cache From a Notebook, not from a Cluster

Hi, I'm running all my jobs on one big cluster, I'm just concerned is there a solution on how we could clear cache resulted by a notebook in the end of the job when its done? hence it does not causing any memory problem sometime from one to another, ...

Data Engineering

1525 Views
2 replies
1 kudos

06-14-2023 3:04:40 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-16-2023 12:21:47 AM

1 kudos

Hi @krisna math We haven't heard from you since the last response from @Debayan Mukherjee , and I was checking back to see if her suggestions helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to...

1 kudos

06-16-2023 12:21:47 AM

1 More Replies

by sree1567 • New Contributor II

06-01-2023 6:25:29 AM

640 Views
1 replies
1 kudos

Azure-EventHub Schema Registry with Spark-Scala

Hi all,Is there a way to consume the schemas from schema registry defined in Azure EventHub using apache spark and scala.

Data Engineering

640 Views
1 replies
1 kudos

06-01-2023 6:25:29 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-15-2023 11:53:36 PM

1 kudos

Hi @sreeranjani thevan Great to meet you, and thanks for your question!Let's see if your peers in the community have an answer to your question. Thanks.

1 kudos

06-15-2023 11:53:36 PM

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Is Feature Store packaged model compatible with Spark UDF?

Extracting data from excel in datalake storage using openpyxl

How to update the items in array of struct column with sql

Mysterious simultaneous long-running Databricks Workflows

Resolved! Using DLT with a non-streaming large table

Resolved! How to change Instance Family in CloudFormation in a Databricks trial mood?

Resolved! Databricks performance related documentation/books

Resolved! I created a data frame but was not able to see the data

A Quick Question on Running a job from CLI

Query with distinct sort and alias produces error column not found

Can you apply a specific cluster policy when launching a Databricks job via Azure Data Factory

Resolved! Requested array size exceeds VM limit when saving to feature table

Resolved! Is there a way to list the dlt maintenance jobs through the API?

Clear Cache From a Notebook, not from a Cluster

Azure-EventHub Schema Registry with Spark-Scala

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...