Data Engineering

Forum Posts

Sorted by:

by ItsMe • New Contributor II

10-06-2021 11:58:11 PM

4288 Views
3 replies
7 kudos

Resolved! Run Pyspark job of Python egg package using spark submit on databricks

Error: missing application resourceGetting this error while running job with spark submit. I have given following parameters while creating job:--conf spark.yarn.appMasterEnv.PYSAPRK_PYTHON=databricks/path/python3--py-files dbfs/path/to/.egg job_m...

Data Engineering

4288 Views
3 replies
7 kudos

10-06-2021 11:58:11 PM

View Replies

Latest Reply

User16752246494
Contributor

10-11-2021 11:32:45 AM

7 kudos

Hi,We tried a simulate the question on our end and what we did was packaged a module inside a whl file.Now to access the wheel file we created another python file test_whl_locally.py. Inside test_whl_locally.py to access the content of the wheel file...

7 kudos

10-11-2021 11:32:45 AM

2 More Replies

by BorislavBlagoev • Valued Contributor III

09-14-2021 6:01:21 AM

4401 Views
1 replies
5 kudos

Resolved! Get package from Nexus repo.

I want to receive a package from Nexus repo both in notebook and job. If anyone has experience with this, please answer me here!

Data Engineering

4401 Views
1 replies
5 kudos

09-14-2021 6:01:21 AM

View Replies

Latest Reply

User16855813973
Databricks Employee

10-11-2021 11:18:57 PM

5 kudos

Nexus repo for the notebook you can use Notebook-scoped libraries with %pip with Use %pip install with the --index-url option. Secret management is available. See example.from UI it is not supported Cluster libraries

5 kudos

10-11-2021 11:18:57 PM

by afshinR • New Contributor III

10-11-2021 5:24:46 PM

931 Views
0 replies
1 kudos

Hi, could you please help me with my question? i have not get any answers.

Hi,could you please help me with my question? i have not get any answers.

Data Engineering

931 Views
0 replies
1 kudos

10-11-2021 5:24:46 PM

by WillBlock • Contributor

10-11-2021 3:56:47 PM

4682 Views
1 replies
0 kudos

What is the best way to decode protobuf using pyspark?

I am using spark structured streaming to read a protobuf encoded message from the event hub. We use a lot of Delta tables, but there isn't a simple way to integrate this. We are currently using K-SQL to transform into avro on the fly and then use Dat...

Data Engineering

4682 Views
1 replies
0 kudos

10-11-2021 3:56:47 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-11-2021 4:38:23 PM

0 kudos

hi @Will Block ,I think there is a related question being asked in the past. I think it was this one I found this library, I hope it helps.

0 kudos

10-11-2021 4:38:23 PM

by marchello • New Contributor III

09-04-2021 12:46:37 AM

6898 Views
8 replies
3 kudos

Resolved! error on connecting to Snowflake

Hi team, I'm getting weird error in one of my jobs when connecting to Snowflake. All my other jobs (I've got plenty) work fine. The current one also works fine when I have only one coding step (except installing needed libraries in my very first step...

Data Engineering

6898 Views
8 replies
3 kudos

09-04-2021 12:46:37 AM

View Replies

Latest Reply

Dan_Z
Databricks Employee

10-11-2021 2:18:01 PM

3 kudos

@marchello I suggest you contact Snowflake to move forward on this one.

3 kudos

10-11-2021 2:18:01 PM

7 More Replies

by William_Scardua • Valued Contributor

10-06-2021 6:13:12 PM

4455 Views
4 replies
4 kudos

Resolved! Small/big file problem, how do you fix it ?

How do you work to fixing the small/big file problem ? what you suggest ?

Data Engineering

4455 Views
4 replies
4 kudos

10-06-2021 6:13:12 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-08-2021 12:01:20 AM

4 kudos

What Jose said.If you cannot use delta or do not want to:the use of coalesce and repartition/partitioning is the way to define the file size.There is no one ideal file size. It all depends on the use case, available cluster size, data flow downstrea...

4 kudos

10-08-2021 12:01:20 AM

3 More Replies

by soundari • New Contributor

10-06-2021 2:29:07 AM

2570 Views
1 replies
1 kudos

Resolved! Identify the partitionValues written yesterday from delta

We have a streaming data written into delta. We will not write all the partitions every day. Hence i am thinking of running compact spark job, to run only on partitions that has been modified yesterday. Is it possible to query the partitionsValues wr...

Data Engineering

2570 Views
1 replies
1 kudos

10-06-2021 2:29:07 AM

View Replies

Latest Reply

Deepak_Bhutada
Contributor III

10-11-2021 9:51:12 AM

1 kudos

Hi @Gnanasoundari Soundarajan Based on the details you provided, you are not overwriting all the partitions every day which means you might be using append mode while writing the data on day 1. On day 2, you want to access those partition values and...

1 kudos

10-11-2021 9:51:12 AM

by shan_chandra • Databricks Employee

10-11-2021 9:46:16 AM

8842 Views
1 replies
3 kudos

Resolved! Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes).

I got the below error when running a streaming workload from a source Delta table Caused by: java.lang.RuntimeException: Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes). As a workaround, you can reduce ...

Data Engineering

8842 Views
1 replies
3 kudos

10-11-2021 9:46:16 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

10-11-2021 9:49:57 AM

3 kudos

This is happening because the delta/parquet source has one or more of the following:a huge number of columnshuge strings in one or more columnshuge arrays/map, possibly nested in each otherIn order to mitigate this issue, could you please reduce spar...

3 kudos

10-11-2021 9:49:57 AM

by Sam • New Contributor III

09-13-2021 4:24:14 PM

4083 Views
1 replies
1 kudos

Resolved! Query Pushdown in Snowflake

Hi,I am wondering what documentation exists on Query Pushdown in Snowflake.I noticed that a single function (monitonically_increasing_id()) prevented the entire query being pushed down to Snowflake during an ETL process. Is Pushdown coming from the S...

Data Engineering

4083 Views
1 replies
1 kudos

09-13-2021 4:24:14 PM

View Replies

Latest Reply

siddhathPanchal
Databricks Employee

10-11-2021 9:18:18 AM

1 kudos

Hi Sam,The Spark Connector applies predicate and query pushdown by capturing and analyzing the Spark logical plans for SQL operations. When the data source is Snowflake, the operations are translated into a SQL query and then executed in Snowflake to...

1 kudos

10-11-2021 9:18:18 AM

by MarcoCaviezel • New Contributor III

10-07-2021 2:32:49 AM

6176 Views
5 replies
3 kudos

Resolved! Use Spot Instances with Azure Data Factory Linked Service

In my pipeline I'm using Azure Data Factory to trigger Databricks notebooks as a linked serviceI want to use spot instances for my job clusters Is there a way to achieve this?I didn't find a way to do this in the GUI.Thanks for your help!Marco

Data Engineering

6176 Views
5 replies
3 kudos

10-07-2021 2:32:49 AM

View Replies

Latest Reply

MarcoCaviezel
New Contributor III

10-11-2021 7:19:14 AM

3 kudos

Hi @Werner Stinckens ,Just a quick follow up question.Does it make sense to you that you can select the following options in Azure Data Factory?To my understanding, "cluster version", "Python Version" and the "Worker options" are defined when I crea...

3 kudos

10-11-2021 7:19:14 AM

4 More Replies

by brendan-b • New Contributor II

10-09-2021 5:35:37 PM

11136 Views
2 replies
3 kudos

spark-xml not working with Databricks Connect and Pyspark

Hi all,I currently have a cluster configured in databricks with spark-xml (version com.databricks:spark-xml_2.12:0.13.0) which was installed using Maven. The spark-xml library itself works fine with Pyspark when I am using it in a notebook within th...

Data Engineering

11136 Views
2 replies
3 kudos

10-09-2021 5:35:37 PM

View Replies

Latest Reply

brendan-b
New Contributor II

10-10-2021 2:55:33 PM

3 kudos

@Sean Owen I do not believe I have. Do you have any documentation on how to install spark-xml locally? I have tried the following with no luck. IS this what you are referring to?PYSPARK_HOME/bin/pyspark --packages com.databricks:spark-xml_2.12:0.13....

3 kudos

10-10-2021 2:55:33 PM

1 More Replies

by Maverick1 • Valued Contributor II

09-09-2021 10:53:42 PM

5475 Views
8 replies
14 kudos

Resolved! Real-time model serving and monitoring on Databricks at scale

How to deploy real-time model on databricks at scale? Right now, The model serving is very limited to 20 requests per second. Also, There are no model monitoring framework/graphs like the one's provided with AzureML or Sagemaker frameworks.

Data Engineering

5475 Views
8 replies
14 kudos

09-09-2021 10:53:42 PM

View Replies

Latest Reply

sean_owen
Databricks Employee

10-10-2021 9:29:00 AM

14 kudos

I believe the next update to serving will include 1, not 2 (this is still within a Databricks workspace in a region). I don't think multi-model endpoints are on the roadmap next.How does Airflow integration relate?

14 kudos

10-10-2021 9:29:00 AM

7 More Replies

by Artem_Y • Databricks Employee

10-08-2021 7:55:29 AM

2182 Views
1 replies
4 kudos

Embed Google Slides (PowerPoint) into Databricks Interactive Notebooks Use the following code to embed your slides:slide_id = '1CYEVsDqsdfg343fwg4...

Embed Google Slides (PowerPoint) into Databricks Interactive NotebooksUse the following code to embed your slides:slide_id = '1CYEVsDqsdfg343fwg42MtXqGd68gffP-Y16CR59c' slide_number = 'id.p9' displayHTML(f''' <iframe src="https://docs.google.com/...

8F9761FE-B986-48EB-8461-0AAEA891DEDB_4_5005_c

Data Engineering

2182 Views
1 replies
4 kudos

10-08-2021 7:55:29 AM

View Replies

Latest Reply

Anonymous
Not applicable

10-08-2021 1:19:32 PM

4 kudos

@Artem Yevtushenko - Thank you for sharing this solution.

4 kudos

10-08-2021 1:19:32 PM

by User16783855534 • New Contributor III

06-07-2021 10:47:16 AM

10635 Views
5 replies
5 kudos

Resolved! How can I get the json spec of my Databricks Job?

Data Engineering

10635 Views
5 replies
5 kudos

06-07-2021 10:47:16 AM

View Replies

Latest Reply

Prabakar
Databricks Employee

10-08-2021 1:20:36 AM

5 kudos

This is where you can get the jobs JSON format.

5 kudos

10-08-2021 1:20:36 AM

4 More Replies

by narek_margaryan • New Contributor II

10-06-2021 12:51:06 PM

3031 Views
1 replies
3 kudos

Resolved! Do Spark nodes read data from storage in a sequence?

I'm new to Spark and trying to understand how some of its components work.I understand that once the data is loaded into the memory of separate nodes, they process partitions in parallel, within their own memory (RAM).But I'm wondering whether the in...

Data Engineering

3031 Views
1 replies
3 kudos

10-06-2021 12:51:06 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-08-2021 12:11:36 AM

3 kudos

@Narek Margaryan , Normally the reading is done in parallel because the underlying file system is already distributed (if you use HDFS-based storage or something like, a data lake f.e.).The number of partitions in the file itself also matters.This l...

3 kudos

10-08-2021 12:11:36 AM

User

Count

1611

768

348

286

252

Databricks Community

Forum Posts

Resolved! Run Pyspark job of Python egg package using spark submit on databricks

Resolved! Get package from Nexus repo.

Hi, could you please help me with my question? i have not get any answers.

What is the best way to decode protobuf using pyspark?

Resolved! error on connecting to Snowflake

Resolved! Small/big file problem, how do you fix it ?

Resolved! Identify the partitionValues written yesterday from delta

Resolved! Cannot reserve additional contiguous bytes in the vectorized reader (requested xxxxxxxxx bytes).

Resolved! Query Pushdown in Snowflake

Resolved! Use Spot Instances with Azure Data Factory Linked Service

spark-xml not working with Databricks Connect and Pyspark

Resolved! Real-time model serving and monitoring on Databricks at scale

Embed Google Slides (PowerPoint) into Databricks Interactive Notebooks Use the following code to embed your slides:slide_id = '1CYEVsDqsdfg343fwg4...

Resolved! How can I get the json spec of my Databricks Job?

Resolved! Do Spark nodes read data from storage in a sequence?

Join Us as a Local Community Builder!

global temp view issue

Dlt pipeline showing legacy , even though all thin...

SERVERLESS SQL WAREHOUSE

Unity Catalog Table in Databricks Asset Bundle

Databricks data engineer associate exam