Data Engineering

Forum Posts

Sorted by:

by McKayHarris • New Contributor II

01-03-2017 3:42:14 PM

17862 Views
17 replies
3 kudos

ExecutorLostFailure: Remote RPC Client Disassociated

This is an expensive and long-running job that gets about halfway done before failing. The stack trace is included below, but here is the salient part: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage...

Data Engineering

17862 Views
17 replies
3 kudos

01-03-2017 3:42:14 PM

View Replies

Latest Reply

RodrigoDe_Freit
New Contributor II

12-10-2019 11:56:17 AM

3 kudos

According to https://docs.databricks.com/jobs.html#jar-job-tips:"Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed."That was my prob...

3 kudos

12-10-2019 11:56:17 AM

16 More Replies

by dtr • New Contributor

08-04-2020 11:54:17 PM

5342 Views
1 replies
0 kudos

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

I am trying to write a function in Azure databricks. I would like to spark.sql inside the function. But it looks like I cannot use it with worker nodes. def SEL_ID(value, index): # some processing on value here ans = spark.sql("SELECT id FRO...

Data Engineering

5342 Views
1 replies
0 kudos

08-04-2020 11:54:17 PM

View Replies

Latest Reply

MartinhoAzevedo
New Contributor II

02-01-2021 10:56:58 AM

0 kudos

Hi there. i guess im a bit late but do you remember how and if you fixed this issue? im getting the same exact problem. @dtr

0 kudos

02-01-2021 10:56:58 AM

by HarisKhan • New Contributor

04-12-2020 5:32:03 AM

9075 Views
2 replies
0 kudos

Escape Backslash(/) while writing spark dataframe into csv

I am using spark version 2.4.0. I know that Backslash is default escape character in spark but still I am facing below issue. I am reading a csv file into a spark dataframe (using pyspark language) and writing back the dataframe into csv. I have so...

Data Engineering

9075 Views
2 replies
0 kudos

04-12-2020 5:32:03 AM

View Replies

Latest Reply

sean_owen
Honored Contributor II

04-17-2020 2:19:09 PM

0 kudos

I'm confused - you say the escape is backslash, but you show forward slashes in your data. Don't you want the escape to be forward slash?

0 kudos

04-17-2020 2:19:09 PM

1 More Replies

by User16826991422 • Contributor

12-02-2015 10:26:01 AM

12574 Views
12 replies
0 kudos

Resolved! How do I create a single CSV file from multiple partitions in Databricks / Spark?

Using sparkcsv to write data to dbfs, which I plan to move to my laptop via standard s3 copy commands. The default for spark csv is to write output into partitions. I can force it to a single partition, but would really like to know if there is a ge...

Data Engineering

12574 Views
12 replies
0 kudos

12-02-2015 10:26:01 AM

View Replies

Latest Reply

ChristianHomber
New Contributor II

01-21-2020 3:50:40 AM

0 kudos

Without access to bash it would be highly appreciated if an option within databricks (e.g. via dbfsutils) existed.

0 kudos

01-21-2020 3:50:40 AM

11 More Replies

by tunguyen90 • New Contributor

11-08-2019 7:41:42 AM

7479 Views
3 replies
1 kudos

How to change line separator for csv file exported from dataframe in databricks

Hello, Currently, I'm facing problem with line separator inside csv file, which is exported from data frame in Azure Databricks (version Spark 2.4.3) to Azure Blob storage. All those csv files contains LF as line-separator. I need to have CRLF (\r\n...

Data Engineering

7479 Views
3 replies
1 kudos

11-08-2019 7:41:42 AM

View Replies

Latest Reply

Nikhila
New Contributor II

12-24-2020 5:02:26 AM

1 kudos

Hi, Have you got the solution for above problem.Kindly let me know.

1 kudos

12-24-2020 5:02:26 AM

2 More Replies

by tismith1_558848 • New Contributor

05-23-2018 1:27:21 PM

6849 Views
2 replies
0 kudos

Resolved! Change size or aspect ratio of ggplot visualizations

I understand that plots in R notebooks are captured by a png graphics device. Is there a way to set the size or the aspect ratio of the canvas? I understand that I can resize the rendered .png by dragging the handle in the notebook, but that means I...

Data Engineering

6849 Views
2 replies
0 kudos

05-23-2018 1:27:21 PM

View Replies

Latest Reply

kassandra
New Contributor II

12-23-2020 1:49:15 PM

0 kudos

Hi @sdaza , the answer above didn't change the size somehow, or perhaps I was putting it in the wrong place? I entered it in a new cell before the %sql cell with the plot chart.

0 kudos

12-23-2020 1:49:15 PM

1 More Replies

by bhosskie • New Contributor

05-13-2016 1:33:41 PM

9624 Views
9 replies
0 kudos

How to merge two data frames column-wise in Apache Spark

I have the following two data frames which have just one column each and have exact same number of rows. How do I merge them so that I get a new data frame which has the two columns and all rows from both the data frames. For example, df1: +-----+...

Data Engineering

9624 Views
9 replies
0 kudos

05-13-2016 1:33:41 PM

View Replies

Latest Reply

AmolZinjade
New Contributor II

12-16-2020 9:36:04 AM

0 kudos

@bhosskie from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().getOrCreate() sc = spark.sparkContext sqlDF1 = spark.sql("select count(*) as Total FROM user_summary") sqlDF2 = sp...

0 kudos

12-16-2020 9:36:04 AM

8 More Replies

by cfregly • Contributor

02-24-2015 3:51:39 PM

27563 Views
9 replies
0 kudos

Resolved! How do I avoid the "No space left on device" error where my disk is running out of space?

Data Engineering

27563 Views
9 replies
0 kudos

02-24-2015 3:51:39 PM

View Replies

Latest Reply

MichaelHuntsber
New Contributor II

07-17-2019 4:05:51 AM

0 kudos

I have 8 GB of internal memory, but several MB of them are free but I also have an additional memory with an 8 GB memory card. Anyway, there is no enough space and the memory card is completely empty.essay service

0 kudos

07-17-2019 4:05:51 AM

8 More Replies

by nmud19 • New Contributor II

09-08-2016 4:53:14 AM

59325 Views
8 replies
6 kudos

how to delete a folder in databricks mnt?

I have a folder at location dbfs:/mnt/temp I need to delete this folder. I tried using %fs rm mnt/temp & dbutils.fs.rm("mnt/temp") Could you please help me out with what I am doing wrong?

Data Engineering

59325 Views
8 replies
6 kudos

09-08-2016 4:53:14 AM

View Replies

Latest Reply

amitca71
Contributor II

11-30-2020 12:16:10 AM

6 kudos

use this (last raw should not be indented twice...): def delete_mounted_dir(dirname): files=dbutils.fs.ls(dirname) for f in files: if f.isDir(): delete_mounted_dir(f.path) dbutils.fs.rm(f.path, recurse=True)

6 kudos

11-30-2020 12:16:10 AM

7 More Replies

by PraveenKumarB • New Contributor

04-24-2019 7:08:28 AM

6313 Views
5 replies
0 kudos

java.io.IOException: No FileSystem for scheme: null

Getting the error when try to load the uploaded file in py notebook.# File location and type file_location = "//FileStore/tables/data/d1.csv" file_type = "csv" # CSV options infer_schema = "true" first_row_is_header = "false" delimiter = ","# The app...

Data Engineering

6313 Views
5 replies
0 kudos

04-24-2019 7:08:28 AM

View Replies

Latest Reply

DivyanshuBhatia
New Contributor II

11-22-2020 6:29:46 AM

0 kudos

@naughtonelad if your issue is solved,please let me know as I am facing the same problem

0 kudos

11-22-2020 6:29:46 AM

4 More Replies

by nthomas • New Contributor

05-26-2016 11:27:38 AM

5782 Views
5 replies
0 kudos

Tips for properly using large broadcast variables?

I'm using a broadcast variable about 100 MB pickled in size, which I'm approximating with: >>> data = list(range(int(10*1e6))) >>> import cPickle as pickle >>> len(pickle.dumps(data)) 98888896Running on a cluster with 3 c3.2xlarge executors, ...

Data Engineering

5782 Views
5 replies
0 kudos

05-26-2016 11:27:38 AM

View Replies

Latest Reply

dragoncity
New Contributor II

11-06-2020 8:18:49 PM

0 kudos

The Facebook credit can be utilized by the gamers to purchase the pearls. The other route is to finished various sorts of Dragons in the Dragon Book. Dragon City Gems There are various kinds of Dragons, one is amazing, at that point you have the fund...

0 kudos

11-06-2020 8:18:49 PM

4 More Replies

by JulioManuelNava • New Contributor

11-02-2019 12:40:15 AM

5835 Views
2 replies
0 kudos

[pyspark] foreach + print produces no output

The following code produces no output. It seems as if the print(x) is not being executed for each "words" element: words = sc.parallelize ( ["scala", "java", "hadoop", "spark", "akka", "spark vs hadoop", "pyspark", "pysp...

Data Engineering

5835 Views
2 replies
0 kudos

11-02-2019 12:40:15 AM

View Replies

Latest Reply

john_nicholas
New Contributor II

11-03-2020 3:41:49 AM

0 kudos

Epson wf-3640 error code 0x97 is the common printer error code that may occur mostly in all printers but in order to resolve the error code, upon provides the best printer guide to all printer users.

0 kudos

11-03-2020 3:41:49 AM

1 More Replies

by dchokkadi1_5588 • New Contributor II

05-10-2016 3:36:19 PM

13115 Views
8 replies
0 kudos

Resolved! graceful dbutils mount/unmount

Is there a way to indicate to dbutils.fs.mount to not throw an error if the mount is already mounted? And viceversa, for unmount to not throw an error if it is already unmounted? I am trying to run my notebook as a job and it has a init section that...

Data Engineering

13115 Views
8 replies
0 kudos

05-10-2016 3:36:19 PM

View Replies

Latest Reply

Mariano_IrvinLo
New Contributor II

10-31-2020 7:59:00 PM

0 kudos

If you use scala to mount a gen 2 data lake you could try something like this /Gather relevant Keys/ var ServicePrincipalID = "" var ServicePrincipalKey = "" var DirectoryID = "" /Create configurations for our connection/ var configs = Map (...

0 kudos

10-31-2020 7:59:00 PM

7 More Replies

by Barb • New Contributor III

10-07-2019 9:05:30 AM

5032 Views
6 replies
0 kudos

SQL charindex function?

Hi all,I need to use the SQL charindex function, but I'm getting a databricks error that this doesn't exist. That can't be true, right? Thanks for any ideas about how to make this work!Barb

Data Engineering

5032 Views
6 replies
0 kudos

10-07-2019 9:05:30 AM

View Replies

Latest Reply

Traveller
New Contributor II

10-13-2020 10:14:32 PM

0 kudos

The best option I found to replace CHARINDEX was LOCATE, examples from the Spark documentation below > SELECT locate('bar', 'foobarbar', 5); 7 > SELECT POSITION('bar' IN 'foobarbar'); 4

0 kudos

10-13-2020 10:14:32 PM

5 More Replies

by SatheeshSathees • New Contributor

08-19-2020 11:31:33 AM

5909 Views
1 replies
0 kudos

how to dynamically explode array type column in pyspark or scala

HI, i have a parquet file with complex column types with nested structs and arrays. I am using the scrpit from below link to flatten my parquet file. https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schema I am able ...

Data Engineering

5909 Views
1 replies
0 kudos

08-19-2020 11:31:33 AM

View Replies

Latest Reply

shyam_9
Valued Contributor

09-18-2020 12:39:35 PM

0 kudos

Hello, Please check out the below docs and notebook which has similar examples, https://docs.microsoft.com/en-us/azure/synapse-analytics/how-to-analyze-complex-schemahttps://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/transform-comple...

0 kudos

09-18-2020 12:39:35 PM

User

Count

1602

738

348

285

247

Databricks Community

Forum Posts

ExecutorLostFailure: Remote RPC Client Disassociated

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

Escape Backslash(/) while writing spark dataframe into csv

Resolved! How do I create a single CSV file from multiple partitions in Databricks / Spark?

How to change line separator for csv file exported from dataframe in databricks

Resolved! Change size or aspect ratio of ggplot visualizations

How to merge two data frames column-wise in Apache Spark

Resolved! How do I avoid the "No space left on device" error where my disk is running out of space?

how to delete a folder in databricks mnt?

java.io.IOException: No FileSystem for scheme: null

Tips for properly using large broadcast variables?

[pyspark] foreach + print produces no output

Resolved! graceful dbutils mount/unmount

SQL charindex function?

how to dynamically explode array type column in pyspark or scala

Getting com.databricks.client.jdbc.Driver is not f...

Unit Testing DLT Pipelines

Retrieve job-level parameters in spark_python_task...

Cannot pass arrays to spark.sql() using named para...

unity catalog with external table and column maski...