Data Engineering

Forum Posts

Sorted by:

by User16873043212 • Databricks Employee

05-07-2021 3:00:48 AM

1480 Views
0 replies
0 kudos

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks ...

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks pools for driver and workers. It provides a way to support driver vs. worker heterogeneity, and ther...

Data Engineering

1480 Views
0 replies
0 kudos

05-07-2021 3:00:48 AM

by FernandoBenedet • New Contributor

06-09-2020 6:08:04 PM

7809 Views
2 replies
0 kudos

Loop through Dataframe in Python

Hello, Imagine you have a dataframe with cols: A, B, C. I want to add a column D based on some calculations of columns B and C of the previous record of the df. Which is the best way of doing this? I am trying to avoid looping through the df. I am u...

Data Engineering

7809 Views
2 replies
0 kudos

06-09-2020 6:08:04 PM

View Replies

Latest Reply

quincybatten
New Contributor II

05-02-2021 11:25:39 PM

0 kudos

Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a...

0 kudos

05-02-2021 11:25:39 PM

1 More Replies

by winston12 • New Contributor

09-13-2018 9:56:40 AM

18681 Views
5 replies
0 kudos

Connect to Blob storage "no credentials found for them in the configuration"

I'm working with Databricks notebook backed by spark cluster. Having trouble trying to connect to the Azure blob storage. I used this link and tried the section Access Azure Blob Storage Directly - Set up an account access key. I get no errors here:s...

Data Engineering

18681 Views
5 replies
0 kudos

09-13-2018 9:56:40 AM

View Replies

Latest Reply

Feder
New Contributor II

04-27-2021 7:25:36 AM

0 kudos

I have been facing the same problem over and over. Now trying to follow what's written here (https://docs.databricks.com/data/data-sources/azure/azure-storage.html#access-azure-blob-storage-directly), but always getting "shaded.databricks.org.apache...

0 kudos

04-27-2021 7:25:36 AM

4 More Replies

by Jasam • New Contributor

07-19-2016 8:17:07 AM

13999 Views
3 replies
0 kudos

how to infer csv schema default all columns like string using spark- csv?

I am using spark- csv utility, but I need when it infer schema all columns be transform in string columns by default. Thanks in advance.

Data Engineering

13999 Views
3 replies
0 kudos

07-19-2016 8:17:07 AM

View Replies

Latest Reply

jhoop2002
New Contributor II

04-19-2021 2:09:25 PM

0 kudos

@peyman what if I don't want to manually specify the schema? For example, I have a vendor that can't build a valid .csv file. I just need to import it somewhere so I can explore the data and find the errors. Just like the original author's question?...

0 kudos

04-19-2021 2:09:25 PM

2 More Replies

by NEERAJRATHORE19 • New Contributor

07-26-2019 1:07:18 PM

14883 Views
3 replies
1 kudos

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition : Error

I am creating dataframe using SQL in which all the underline tables are actually tempview based on dataframes. I am getting below error everytime. Can anyone help me to uderstand the issue here. Thanks in advance.An error occurred while calling o183....

Data Engineering

14883 Views
3 replies
1 kudos

07-26-2019 1:07:18 PM

View Replies

Latest Reply

htinhk
New Contributor II

04-09-2021 8:02:35 PM

1 kudos

I also encountered the same problem...It's weird that I can do the query but not the count.

1 kudos

04-09-2021 8:02:35 PM

2 More Replies

by XinhHuynh • New Contributor

06-04-2015 2:28:23 PM

11680 Views
3 replies
0 kudos

How do you add user comments to a notebook?

This is shown in a recent blog post (Figure 5): https://databricks.com/blog/2015/06/04/simplify-machine-learning-on-spark-with-databricks.html

Data Engineering

11680 Views
3 replies
0 kudos

06-04-2015 2:28:23 PM

View Replies

Latest Reply

Munna123
New Contributor II

02-14-2019 1:30:57 AM

0 kudos

Using of mouse and touch pad is very annoying that's why Microsoft launch windows shortcut keys. shortcut keys of laptop This windows shortcut keys are used for avoiding the use of mouse and touch pad.

0 kudos

02-14-2019 1:30:57 AM

2 More Replies

by MatthewHo • New Contributor

08-27-2015 12:24:18 PM

10527 Views
4 replies
0 kudos

"Importing" functions from other notebooks

For the sake of organization, I would like to define a few functions in notebook A, and have notebook B have access to those functions in notebook A. Having everything in one notebook makes it look very cluttered. Is this possible?

Data Engineering

10527 Views
4 replies
0 kudos

08-27-2015 12:24:18 PM

View Replies

Latest Reply

simone01
New Contributor II

02-22-2021 4:55:21 AM

0 kudos

<a href="https://managementassignmentshelp.com/risk-management-assignment-help.php ">Risk Management Assignment Help </a> <a href="https://myassignmentmart.com/assignment/material-science-assignment-help.html "> Material Science assignment help </a>...

0 kudos

02-22-2021 4:55:21 AM

3 More Replies

by RaymondXie • New Contributor

01-30-2020 8:14:18 PM

11450 Views
1 replies
0 kudos

How to union multiple dataframe in pyspark within Databricks notebook

I have 4 DFs: Avg_OpenBy_Year, AvgHighBy_Year, AvgLowBy_Year and AvgClose_By_Year, all of them have a common column of 'Year'.I want to join the three together to get a final df like:`Year, Open, High, Low, Close`At the moment I have to use the ugly...

Data Engineering

11450 Views
1 replies
0 kudos

01-30-2020 8:14:18 PM

View Replies

Latest Reply

thiago_matos
New Contributor II

02-04-2021 7:11:38 AM

0 kudos

Import reduce function in this way: from functools import reduce

0 kudos

02-04-2021 7:11:38 AM

by McKayHarris • New Contributor II

01-03-2017 3:42:14 PM

38736 Views
17 replies
3 kudos

ExecutorLostFailure: Remote RPC Client Disassociated

This is an expensive and long-running job that gets about halfway done before failing. The stack trace is included below, but here is the salient part: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage...

Data Engineering

38736 Views
17 replies
3 kudos

01-03-2017 3:42:14 PM

View Replies

Latest Reply

RodrigoDe_Freit
Databricks Partner

12-10-2019 11:56:17 AM

3 kudos

According to https://docs.databricks.com/jobs.html#jar-job-tips:"Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed."That was my prob...

3 kudos

12-10-2019 11:56:17 AM

16 More Replies

by dtr • New Contributor

08-04-2020 11:54:17 PM

8256 Views
1 replies
0 kudos

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

I am trying to write a function in Azure databricks. I would like to spark.sql inside the function. But it looks like I cannot use it with worker nodes. def SEL_ID(value, index): # some processing on value here ans = spark.sql("SELECT id FRO...

Data Engineering

8256 Views
1 replies
0 kudos

08-04-2020 11:54:17 PM

View Replies

Latest Reply

MartinhoAzevedo
New Contributor II

02-01-2021 10:56:58 AM

0 kudos

Hi there. i guess im a bit late but do you remember how and if you fixed this issue? im getting the same exact problem. @dtr

0 kudos

02-01-2021 10:56:58 AM

by HarisKhan • New Contributor

04-12-2020 5:32:03 AM

13960 Views
2 replies
0 kudos

Escape Backslash(/) while writing spark dataframe into csv

I am using spark version 2.4.0. I know that Backslash is default escape character in spark but still I am facing below issue. I am reading a csv file into a spark dataframe (using pyspark language) and writing back the dataframe into csv. I have so...

Data Engineering

13960 Views
2 replies
0 kudos

04-12-2020 5:32:03 AM

View Replies

Latest Reply

sean_owen
Databricks Employee

04-17-2020 2:19:09 PM

0 kudos

I'm confused - you say the escape is backslash, but you show forward slashes in your data. Don't you want the escape to be forward slash?

0 kudos

04-17-2020 2:19:09 PM

1 More Replies

by rlgarris • Databricks Employee

12-02-2015 10:26:01 AM

25286 Views
12 replies
0 kudos

Resolved! How do I create a single CSV file from multiple partitions in Databricks / Spark?

Using sparkcsv to write data to dbfs, which I plan to move to my laptop via standard s3 copy commands. The default for spark csv is to write output into partitions. I can force it to a single partition, but would really like to know if there is a ge...

Data Engineering

25286 Views
12 replies
0 kudos

12-02-2015 10:26:01 AM

View Replies

Latest Reply

ChristianHomber
New Contributor II

01-21-2020 3:50:40 AM

0 kudos

Without access to bash it would be highly appreciated if an option within databricks (e.g. via dbfsutils) existed.

0 kudos

01-21-2020 3:50:40 AM

11 More Replies

by tunguyen90 • New Contributor

11-08-2019 7:41:42 AM

13255 Views
3 replies
1 kudos

How to change line separator for csv file exported from dataframe in databricks

Hello, Currently, I'm facing problem with line separator inside csv file, which is exported from data frame in Azure Databricks (version Spark 2.4.3) to Azure Blob storage. All those csv files contains LF as line-separator. I need to have CRLF (\r\n...

Data Engineering

13255 Views
3 replies
1 kudos

11-08-2019 7:41:42 AM

View Replies

Latest Reply

Nikhila
New Contributor II

12-24-2020 5:02:26 AM

1 kudos

Hi, Have you got the solution for above problem.Kindly let me know.

1 kudos

12-24-2020 5:02:26 AM

2 More Replies

by tismith1_558848 • New Contributor

05-23-2018 1:27:21 PM

13285 Views
2 replies
0 kudos

Resolved! Change size or aspect ratio of ggplot visualizations

I understand that plots in R notebooks are captured by a png graphics device. Is there a way to set the size or the aspect ratio of the canvas? I understand that I can resize the rendered .png by dragging the handle in the notebook, but that means I...

Data Engineering

13285 Views
2 replies
0 kudos

05-23-2018 1:27:21 PM

View Replies

Latest Reply

kassandra
New Contributor II

12-23-2020 1:49:15 PM

0 kudos

Hi @sdaza , the answer above didn't change the size somehow, or perhaps I was putting it in the wrong place? I entered it in a new cell before the %sql cell with the plot chart.

0 kudos

12-23-2020 1:49:15 PM

1 More Replies

by bhosskie • New Contributor

05-13-2016 1:33:41 PM

23147 Views
9 replies
0 kudos

How to merge two data frames column-wise in Apache Spark

I have the following two data frames which have just one column each and have exact same number of rows. How do I merge them so that I get a new data frame which has the two columns and all rows from both the data frames. For example, df1: +-----+...

Data Engineering

23147 Views
9 replies
0 kudos

05-13-2016 1:33:41 PM

View Replies

Latest Reply

AmolZinjade
New Contributor II

12-16-2020 9:36:04 AM

0 kudos

@bhosskie from pyspark.sql import SparkSession spark = SparkSession.builder.appName("Spark SQL basic example").enableHiveSupport().getOrCreate() sc = spark.sparkContext sqlDF1 = spark.sql("select count(*) as Total FROM user_summary") sqlDF2 = sp...

0 kudos

12-16-2020 9:36:04 AM

8 More Replies

Databricks Community

Forum Posts

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks ...

Loop through Dataframe in Python

Connect to Blob storage "no credentials found for them in the configuration"

how to infer csv schema default all columns like string using spark- csv?

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition : Error

How do you add user comments to a notebook?

"Importing" functions from other notebooks

How to union multiple dataframe in pyspark within Databricks notebook

ExecutorLostFailure: Remote RPC Client Disassociated

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

Escape Backslash(/) while writing spark dataframe into csv

Resolved! How do I create a single CSV file from multiple partitions in Databricks / Spark?

How to change line separator for csv file exported from dataframe in databricks

Resolved! Change size or aspect ratio of ggplot visualizations

How to merge two data frames column-wise in Apache Spark

Community Custom Connector - Defining Non-Serverle...

Getting connection reset issue while connecting to...

sdp-meta (dlt-meta) vs lakeflow_framework: when sh...

What is the compute for Lakeflow Connect SharePoin...

Issue: Lakeflow Connect Microsoft Teams Community ...