Data Engineering

Forum Posts

Sorted by:

by Anonymous • Not applicable

04-17-2023 8:01:22 PM

1744 Views
2 replies
0 kudos

Tuesday get to know each other series: Hey everyone! I'm curious to know what tech books you've been reading lately. As a community of tech en...

Tuesday get to know each other series:Hey everyone! I'm curious to know what tech books you've been reading lately. As a community of tech enthusiasts, I'm sure we all have some great recommendations to share with each other.To get the ball rolling, ...

Data Engineering

1744 Views
2 replies
0 kudos

04-17-2023 8:01:22 PM

View Replies

Latest Reply

Serlal
New Contributor III

04-18-2023 2:27:24 AM

0 kudos

If you are interested in distributed machine learning I would suggest Scaling Machine Learning with Spark by Adi Pola k . It is just out so it has all the new goodies including the latest in ML Flow. I am half way though it already but it looks like ...

0 kudos

04-18-2023 2:27:24 AM

1 More Replies

by User16618471166 • New Contributor II

04-09-2023 2:34:34 PM

5619 Views
3 replies
1 kudos

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was w...

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was working fine (but still got the same error below). Please advise as this is a critical report where the b...

Data Engineering

5619 Views
3 replies
1 kudos

04-09-2023 2:34:34 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 2:31:38 AM

1 kudos

@Jeff Wu :The error message suggests that there is a syntax error in a SQL statement, specifically near the end of the input. Without the full SQL statement or additional information, it's difficult to pinpoint the exact cause of the error. However,...

1 kudos

04-18-2023 2:31:38 AM

2 More Replies

by andrew0117 • Contributor

04-09-2023 9:09:10 PM

4537 Views
4 replies
2 kudos

Resolved! master notebook cannot find the udf registered in the child notebook

The master notebook is calling a child notebook using dbutils.notebook.run("PathToChildnotebook"). The child notebook defines a user-defined function (UDF) and registers it using spark.udf.register. However, when the child notebook finishes running a...

Data Engineering

4537 Views
4 replies
2 kudos

04-09-2023 9:09:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 2:27:24 AM

2 kudos

@andrew li :The reason why the UDF cannot be found is that when the child notebook finishes running, the Spark context that was used to define and register the UDF is destroyed. Therefore, the UDF is no longer available in the Spark context used by ...

2 kudos

04-18-2023 2:27:24 AM

3 More Replies

by ayesharahmat • New Contributor II

04-16-2023 10:56:59 PM

3995 Views
3 replies
2 kudos

AutoLoader issue - java.lang.AssertionError

The below error I am encountering . I am using microbatch for autoloader. please help to rectify this issuejava.lang.AssertionError: assertion failed: Invalid batch: path#36188,modificationTime#36189,length#36190L,content#36191,PROVIDER#36192,LOCATIO...

Data Engineering

3995 Views
3 replies
2 kudos

04-16-2023 10:56:59 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 2:11:19 AM

2 kudos

@Ayesha Rahmatali :The error message you provided suggests that there is an assertion failure due to invalid batch data in your AutoLoader implementation. The error specifically indicates that the schema of the incoming data is not matching with the...

2 kudos

04-18-2023 2:11:19 AM

2 More Replies

by Data_Engineer3 • Contributor III

04-12-2023 3:33:34 AM

13819 Views
4 replies
5 kudos

How can i use the same spark session from onenotebook to another notebook in databricks

I want to use the same spark session which created in one notebook and need to be used in another notebook in across same environment, Example, if some of the (variable)object got initialized in the first notebook, i need to use the same object in t...

Data Engineering

13819 Views
4 replies
5 kudos

04-12-2023 3:33:34 AM

View Replies

Latest Reply

Manoj12421
Valued Contributor II

04-18-2023 12:39:19 AM

5 kudos

You can use %run and then use the location of the notebook - %run "/folder/notebookname"

5 kudos

04-18-2023 12:39:19 AM

3 More Replies

by yopbibo • Contributor II

04-17-2023 7:01:19 AM

4943 Views
2 replies
0 kudos

pip install in cluster using web UI and extra index

In an init script or a notebook, we can:pip install --index-url=<our private pypi url> --extra-index-url=https://pypi.org/simple <a module>In the cluster web UI (libraries -> install library), we can give only the url of our private repository, but n...

Data Engineering

4943 Views
2 replies
0 kudos

04-17-2023 7:01:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 12:36:49 AM

0 kudos

Hi @Philippe CRAVE Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

0 kudos

04-18-2023 12:36:49 AM

1 More Replies

by Sabytheseeker • New Contributor

04-17-2023 3:40:01 AM

1104 Views
1 replies
0 kudos

I just passed the Lakehouse Fundamentals Accreditation and I haven't received my badge yet and the certification seems to be messed up

Data Engineering

1104 Views
1 replies
0 kudos

04-17-2023 3:40:01 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 12:21:58 AM

0 kudos

Hi @Sabyasachi Samaddar We are going through a contract renewal with our vendor, Accredible. Once our new contract goes through you will get your badge this week.Thank you for understanding.

0 kudos

04-18-2023 12:21:58 AM

by kris08 • New Contributor

04-17-2023 6:52:08 AM

2798 Views
1 replies
0 kudos

Kafka consumer groups in Databricks

I was trying to find information about configuring the consumer groups for kafka stream in databricks. By doing so I want to parallelize the stream and load it into databricks tables. Does the databricks handle this internally? If we can configure th...

Data Engineering

2798 Views
1 replies
0 kudos

04-17-2023 6:52:08 AM

View Replies

Latest Reply

Debayan
Databricks Employee

04-17-2023 11:55:14 PM

0 kudos

Hi, we have a few examples on stream processing using Kafka (https://docs.databricks.com/structured-streaming/kafka.html), there is no straight public document for Kafka consumer group creation. You can refer to https://kafka.apache.org/documentation...

0 kudos

04-17-2023 11:55:14 PM

by Data_Analytics1 • Contributor III

04-13-2023 8:38:25 AM

4565 Views
4 replies
2 kudos

Delta table property is not set.

I have set the delta table property at cluster level.spark.databricks.delta.retentionDurationCheck.enabled falseWhen I create a new table, retentionDurationCheck property is not shown in the table details. But when I set this with ALTER TABLE for a s...

Data Engineering

4565 Views
4 replies
2 kudos

04-13-2023 8:38:25 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 10:20:55 PM

2 kudos

Hi @Mahesh Chahare Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

2 kudos

04-15-2023 10:20:55 PM

3 More Replies

by Sachinbt • New Contributor II

04-16-2023 4:23:53 AM

2403 Views
2 replies
2 kudos

DataBricks Certification Exam Got Suspended. Need help in resolving the issue

Hi Team,My databricks exam got suspened on 16th April today Morning and it is still in the suspended state. I have raised a support request using the below linkhttps://help.databricks.com/s/contact-us?ReqType=training . but I haven’t received the ti...

Data Engineering

2403 Views
2 replies
2 kudos

04-16-2023 4:23:53 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-17-2023 8:36:38 PM

2 kudos

Hi @Sachin Kumara We are going through a contract renewal with our vendor, Accredible. Once our new contract goes through you will get your badge this week. Thank you for understanding!

2 kudos

04-17-2023 8:36:38 PM

1 More Replies

by tytytyc26 • New Contributor II

04-02-2023 8:59:59 PM

3593 Views
3 replies
0 kudos

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Hi everyone,I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element...

Data Engineering

3593 Views
3 replies
0 kudos

04-02-2023 8:59:59 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-17-2023 6:48:04 AM

0 kudos

@Yan Chong Tan :The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attr...

0 kudos

04-17-2023 6:48:04 AM

2 More Replies

by andrew0117 • Contributor

04-16-2023 8:39:11 PM

6882 Views
4 replies
0 kudos

Resolved! partition on a csv file

When I use SQL code like "create table myTable (column1 string, column2 string) using csv options('delimiter' = ',', 'header' = 'true') location 'pathToCsv'" to create a table from a single CSV file stored in a folder within an Azure Data Lake contai...

Data Engineering

6882 Views
4 replies
0 kudos

04-16-2023 8:39:11 PM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-17-2023 4:40:46 AM

0 kudos

Hi @andrew li, When you specify a path with LOCATION keyword, Spark will consider that to be an EXTERNAL table. So when you dropped the table, you underlying data if any will not be cleared. So in you case, as this is an external table, you folder s...

0 kudos

04-17-2023 4:40:46 AM

3 More Replies

by oleole • Contributor

03-19-2023 8:01:15 PM

7714 Views
3 replies
3 kudos

Resolved! How to delay a new job run after job

I have a daily job run that occasionally fails with the error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached. After I get the notification that this job failed on schedule, I manually run ...

Data Engineering

7714 Views
3 replies
3 kudos

03-19-2023 8:01:15 PM

View Replies

Latest Reply

oleole
Contributor

04-17-2023 1:16:39 PM

3 kudos

According to this documentation, you can specify the wait time between the "start" of the first run and the retry start time.

3 kudos

04-17-2023 1:16:39 PM

2 More Replies

by jtorr • New Contributor

04-17-2023 11:16:25 AM

6166 Views
0 replies
0 kudos

What are executeAdhocQuery and executeFastQuery operations in the Azure SQL Logs?

Hi,-Im performing some analysis using the databricks sql logs, and seeing these operation names.-I notice these events dont seem to have a duration nor query text, unlike commandSubmit operations.-Any explanation on what these operations mean exactly...

Data Engineering

6166 Views
0 replies
0 kudos

04-17-2023 11:16:25 AM

by rshark • New Contributor II

03-30-2023 10:25:03 AM

8287 Views
3 replies
0 kudos

Error when calling SparkR from within a Python notebook

I’ve had success with R magic (R cells in a Python notebook) and running an R script from a Python notebook, up to the point of connecting R to a Spark cluster. In either case, I can’t get a `SparkSession` to initialize. 2-cell (Python) notebook exa...

Data Engineering

8287 Views
3 replies
0 kudos

03-30-2023 10:25:03 AM

View Replies

Latest Reply

Dooley
Valued Contributor II

04-17-2023 10:52:48 AM

0 kudos

The answer I can give you to have this work for you is to call the R notebooks from your Python notebook. Just save each dataframe as a delta table to pass between the languages.How to call a notebook from another notebook? here is a link

0 kudos

04-17-2023 10:52:48 AM

2 More Replies

Databricks Community

Forum Posts

Tuesday get to know each other series: Hey everyone! I'm curious to know what tech books you've been reading lately. As a community of tech en...

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was w...

Resolved! master notebook cannot find the udf registered in the child notebook

AutoLoader issue - java.lang.AssertionError

How can i use the same spark session from onenotebook to another notebook in databricks

pip install in cluster using web UI and extra index

I just passed the Lakehouse Fundamentals Accreditation and I haven't received my badge yet and the certification seems to be messed up

Kafka consumer groups in Databricks

Delta table property is not set.

DataBricks Certification Exam Got Suspended. Need help in resolving the issue

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Resolved! partition on a csv file

Resolved! How to delay a new job run after job

What are executeAdhocQuery and executeFastQuery operations in the Azure SQL Logs?

Error when calling SparkR from within a Python notebook

Join Us as a Local Community Builder!

Unexpected Schema ID Folder Creation in Unity Cata...

PipelineSpec object does not seem to show event_lo...

delta live tables

readStream with readChangeFeed option in SQL

Understanding High I/O Wait Despite High CPU Utili...