Data Engineering

Forum Posts

Sorted by:

by User16618471166 • New Contributor II

04-09-2023 2:34:34 PM

3122 Views
3 replies
1 kudos

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was w...

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was working fine (but still got the same error below). Please advise as this is a critical report where the b...

Data Engineering

3122 Views
3 replies
1 kudos

04-09-2023 2:34:34 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 2:31:38 AM

1 kudos

@Jeff Wu :The error message suggests that there is a syntax error in a SQL statement, specifically near the end of the input. Without the full SQL statement or additional information, it's difficult to pinpoint the exact cause of the error. However,...

1 kudos

04-18-2023 2:31:38 AM

2 More Replies

by FG • New Contributor II

04-09-2023 4:13:09 PM

2479 Views
3 replies
1 kudos

Running unit tests from a different notebook (using Python unittest package) doesn't produce output (can't discover the test files)

I have a test file (test_transforms.py) which has a series of tests running using Python's unittest package. I can successfully run the tests inside of the file with expected output. But when I try to run this test file from a different notebook (run...

Data Engineering

2479 Views
3 replies
1 kudos

04-09-2023 4:13:09 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-11-2023 2:33:25 AM

1 kudos

@Fuad Goloba :When running tests on Databricks, you need to ensure that the test file is uploaded to the Databricks workspace and that the correct path is specified when importing the test module in the notebook that is running the tests. Here's an ...

1 kudos

04-11-2023 2:33:25 AM

2 More Replies

by andrew0117 • Contributor

04-09-2023 9:09:10 PM

1585 Views
4 replies
2 kudos

Resolved! master notebook cannot find the udf registered in the child notebook

The master notebook is calling a child notebook using dbutils.notebook.run("PathToChildnotebook"). The child notebook defines a user-defined function (UDF) and registers it using spark.udf.register. However, when the child notebook finishes running a...

Data Engineering

1585 Views
4 replies
2 kudos

04-09-2023 9:09:10 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 2:27:24 AM

2 kudos

@andrew li :The reason why the UDF cannot be found is that when the child notebook finishes running, the Spark context that was used to define and register the UDF is destroyed. Therefore, the UDF is no longer available in the Spark context used by ...

2 kudos

04-18-2023 2:27:24 AM

3 More Replies

by ayesharahmat • New Contributor II

04-16-2023 10:56:59 PM

1351 Views
3 replies
2 kudos

AutoLoader issue - java.lang.AssertionError

The below error I am encountering . I am using microbatch for autoloader. please help to rectify this issuejava.lang.AssertionError: assertion failed: Invalid batch: path#36188,modificationTime#36189,length#36190L,content#36191,PROVIDER#36192,LOCATIO...

Data Engineering

1351 Views
3 replies
2 kudos

04-16-2023 10:56:59 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 2:11:19 AM

2 kudos

@Ayesha Rahmatali :The error message you provided suggests that there is an assertion failure due to invalid batch data in your AutoLoader implementation. The error specifically indicates that the schema of the incoming data is not matching with the...

2 kudos

04-18-2023 2:11:19 AM

2 More Replies

by Data_Engineer3 • Contributor II

04-12-2023 3:33:34 AM

3424 Views
4 replies
5 kudos

How can i use the same spark session from onenotebook to another notebook in databricks

I want to use the same spark session which created in one notebook and need to be used in another notebook in across same environment, Example, if some of the (variable)object got initialized in the first notebook, i need to use the same object in t...

Data Engineering

3424 Views
4 replies
5 kudos

04-12-2023 3:33:34 AM

View Replies

Latest Reply

Manoj12421
Valued Contributor II

04-18-2023 12:39:19 AM

5 kudos

You can use %run and then use the location of the notebook - %run "/folder/notebookname"

5 kudos

04-18-2023 12:39:19 AM

3 More Replies

by yopbibo • Contributor II

04-17-2023 7:01:19 AM

1374 Views
2 replies
0 kudos

pip install in cluster using web UI and extra index

In an init script or a notebook, we can:pip install --index-url=<our private pypi url> --extra-index-url=https://pypi.org/simple <a module>In the cluster web UI (libraries -> install library), we can give only the url of our private repository, but n...

Data Engineering

1374 Views
2 replies
0 kudos

04-17-2023 7:01:19 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 12:36:49 AM

0 kudos

Hi @Philippe CRAVE Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answer...

0 kudos

04-18-2023 12:36:49 AM

1 More Replies

by gfar • New Contributor II

04-12-2023 8:47:16 AM

1832 Views
2 replies
0 kudos

Is it possible to connect QGIS to Databricks using ODBC?

I can connect ArcGIS to Databricks using ODBC, but using the same ODBC DSN for QGIS I get an error - Unable to initialize ODBC connection to DSNHas anyone got this working?

Data Engineering

1832 Views
2 replies
0 kudos

04-12-2023 8:47:16 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 5:57:21 PM

0 kudos

@Grainne Farrant :It is possible to connect QGIS to Databricks using ODBC, but it requires additional configuration. Here are the general steps to follow:Install the ODBC driver for Databricks on your machine where QGIS is installed. You can downloa...

0 kudos

04-15-2023 5:57:21 PM

1 More Replies

by Sabytheseeker • New Contributor

04-17-2023 3:40:01 AM

299 Views
1 replies
0 kudos

I just passed the Lakehouse Fundamentals Accreditation and I haven't received my badge yet and the certification seems to be messed up

Data Engineering

299 Views
1 replies
0 kudos

04-17-2023 3:40:01 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-18-2023 12:21:58 AM

0 kudos

Hi @Sabyasachi Samaddar We are going through a contract renewal with our vendor, Accredible. Once our new contract goes through you will get your badge this week.Thank you for understanding.

0 kudos

04-18-2023 12:21:58 AM

by kris08 • New Contributor

04-17-2023 6:52:08 AM

681 Views
1 replies
0 kudos

Kafka consumer groups in Databricks

I was trying to find information about configuring the consumer groups for kafka stream in databricks. By doing so I want to parallelize the stream and load it into databricks tables. Does the databricks handle this internally? If we can configure th...

Data Engineering

681 Views
1 replies
0 kudos

04-17-2023 6:52:08 AM

View Replies

Latest Reply

Debayan
Esteemed Contributor III

04-17-2023 11:55:14 PM

0 kudos

Hi, we have a few examples on stream processing using Kafka (https://docs.databricks.com/structured-streaming/kafka.html), there is no straight public document for Kafka consumer group creation. You can refer to https://kafka.apache.org/documentation...

0 kudos

04-17-2023 11:55:14 PM

by Data_Analytics1 • Contributor III

04-13-2023 8:38:25 AM

1672 Views
4 replies
2 kudos

Delta table property is not set.

I have set the delta table property at cluster level.spark.databricks.delta.retentionDurationCheck.enabled falseWhen I create a new table, retentionDurationCheck property is not shown in the table details. But when I set this with ALTER TABLE for a s...

Data Engineering

1672 Views
4 replies
2 kudos

04-13-2023 8:38:25 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-15-2023 10:20:55 PM

2 kudos

Hi @Mahesh Chahare Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us s...

2 kudos

04-15-2023 10:20:55 PM

3 More Replies

by Sachinbt • New Contributor II

04-16-2023 4:23:53 AM

871 Views
2 replies
2 kudos

DataBricks Certification Exam Got Suspended. Need help in resolving the issue

Hi Team,My databricks exam got suspened on 16th April today Morning and it is still in the suspended state. I have raised a support request using the below linkhttps://help.databricks.com/s/contact-us?ReqType=training . but I haven’t received the ti...

Data Engineering

871 Views
2 replies
2 kudos

04-16-2023 4:23:53 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-17-2023 8:36:38 PM

2 kudos

Hi @Sachin Kumara We are going through a contract renewal with our vendor, Accredible. Once our new contract goes through you will get your badge this week. Thank you for understanding!

2 kudos

04-17-2023 8:36:38 PM

1 More Replies

by tytytyc26 • New Contributor II

04-02-2023 8:59:59 PM

1166 Views
3 replies
0 kudos

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Hi everyone,I was stuck at this for very long time. Not a very familiar user of using Spark for image processing. I was trying to resize images that are loaded into a Spark DF. However, it keeps throwing error that I am not able to access the element...

Data Engineering

1166 Views
3 replies
0 kudos

04-02-2023 8:59:59 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-17-2023 6:48:04 AM

0 kudos

@Yan Chong Tan :The error you are facing is due to the fact that you are trying to access the attribute "width" of a string object in the resize_image function. Specifically, input_dim is a string object, but you are trying to access its width attr...

0 kudos

04-17-2023 6:48:04 AM

2 More Replies

by andrew0117 • Contributor

04-16-2023 8:39:11 PM

2546 Views
4 replies
0 kudos

Resolved! partition on a csv file

When I use SQL code like "create table myTable (column1 string, column2 string) using csv options('delimiter' = ',', 'header' = 'true') location 'pathToCsv'" to create a table from a single CSV file stored in a folder within an Azure Data Lake contai...

Data Engineering

2546 Views
4 replies
0 kudos

04-16-2023 8:39:11 PM

View Replies

Latest Reply

pvignesh92
Honored Contributor

04-17-2023 4:40:46 AM

0 kudos

Hi @andrew li, When you specify a path with LOCATION keyword, Spark will consider that to be an EXTERNAL table. So when you dropped the table, you underlying data if any will not be cleared. So in you case, as this is an external table, you folder s...

0 kudos

04-17-2023 4:40:46 AM

3 More Replies

by oleole • Contributor

03-19-2023 8:01:15 PM

2023 Views
3 replies
3 kudos

Resolved! How to delay a new job run after job

I have a daily job run that occasionally fails with the error: The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached. After I get the notification that this job failed on schedule, I manually run ...

Data Engineering

2023 Views
3 replies
3 kudos

03-19-2023 8:01:15 PM

View Replies

Latest Reply

oleole
Contributor

04-17-2023 1:16:39 PM

3 kudos

According to this documentation, you can specify the wait time between the "start" of the first run and the retry start time.

3 kudos

04-17-2023 1:16:39 PM

2 More Replies

by rshark • New Contributor II

03-30-2023 10:25:03 AM

1261 Views
3 replies
0 kudos

Error when calling SparkR from within a Python notebook

I’ve had success with R magic (R cells in a Python notebook) and running an R script from a Python notebook, up to the point of connecting R to a Spark cluster. In either case, I can’t get a `SparkSession` to initialize. 2-cell (Python) notebook exa...

Data Engineering

1261 Views
3 replies
0 kudos

03-30-2023 10:25:03 AM

View Replies

Latest Reply

Dooley
Valued Contributor

04-17-2023 10:52:48 AM

0 kudos

The answer I can give you to have this work for you is to call the R notebooks from your Python notebook. Just save each dataframe as a delta table to pass between the languages.How to call a notebook from another notebook? here is a link

0 kudos

04-17-2023 10:52:48 AM

2 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

When I aggregate over more data, I get the below error message. I've tried multiple ways of diagnosis like going back to a version I know it was w...

Running unit tests from a different notebook (using Python unittest package) doesn't produce output (can't discover the test files)

Resolved! master notebook cannot find the udf registered in the child notebook

AutoLoader issue - java.lang.AssertionError

How can i use the same spark session from onenotebook to another notebook in databricks

pip install in cluster using web UI and extra index

Is it possible to connect QGIS to Databricks using ODBC?

I just passed the Lakehouse Fundamentals Accreditation and I haven't received my badge yet and the certification seems to be messed up

Kafka consumer groups in Databricks

Delta table property is not set.

DataBricks Certification Exam Got Suspended. Need help in resolving the issue

Resolved! Problem with accessing element using Pandas UDF in Image Processing

Resolved! partition on a csv file

Resolved! How to delay a new job run after job

Error when calling SparkR from within a Python notebook

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...