Data Engineering

Forum Posts

Sorted by:

Start a conversation

by Anonymous • Not applicable

05-12-2021 7:23:54 AM

397 Views
0 replies
0 kudos

I am not able to add picture to my profile

How can i add picture to my profile page. Will not save. #Bug

Data Engineering

397 Views
0 replies
0 kudos

05-12-2021 7:23:54 AM

by User16789791395 • New Contributor

05-11-2021 9:08:17 AM

525 Views
0 replies
0 kudos

Kafka-StructuredStreaming-Elastic Search

Can anyone send me any online materials that describe detail end to end setup of Kafka-StructuredStreaming-Elastic Search data flow?

Data Engineering

525 Views
0 replies
0 kudos

05-11-2021 9:08:17 AM

by Anonymous • Not applicable

05-07-2021 7:27:38 AM

647 Views
2 replies
0 kudos

How to retrieve the SQL queries owned by a user who left the organization in SQL Analytics?

Data Engineering

647 Views
2 replies
0 kudos

05-07-2021 7:27:38 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-07-2021 7:59:10 AM

0 kudos

0 kudos

05-07-2021 7:59:10 AM

1 More Replies

by User15813097110 • New Contributor III

05-07-2021 7:48:09 AM

3367 Views
1 replies
0 kudos

How to push Cluster Logs to Elastic Search?

Data Engineering

3367 Views
1 replies
0 kudos

05-07-2021 7:48:09 AM

View Replies

Latest Reply

User15813097110
New Contributor III

05-07-2021 7:52:09 AM

0 kudos

We can use the below steps to push Cluster Logs to Elastic Search:1. Download the log4j-elasticsearch-java-api repo and build the jar file:git clone https://github.com/Downfy/log4j-elasticsearch-java-api.git cd log4j-elasticsearch-java-api/ mvn clean...

0 kudos

05-07-2021 7:52:09 AM

by User16871418122 • Contributor III

05-07-2021 7:28:10 AM

2066 Views
1 replies
0 kudos

Resolved! How do I download maven libraries with dependencies?

I want to import a maven library with its dependencies. How to do it?

Data Engineering

2066 Views
1 replies
0 kudos

05-07-2021 7:28:10 AM

View Replies

Latest Reply

User16871418122
Contributor III

05-07-2021 7:43:06 AM

0 kudos

I recommend creating a UBER jar or download jars offline use it in clusters when the maven becomes healthy again: 1. Install the MVN CLI tool on your local mac: brew install mvnvm2. Download the Artifact with all dependencies: mvn dependency:get -Dr...

0 kudos

05-07-2021 7:43:06 AM

by User15813097110 • New Contributor III

05-07-2021 7:24:28 AM

1183 Views
1 replies
0 kudos

Can we update the jars on a running interactive cluster? Is there a way we can reload the Jars and make them available for use on the Notebook/Jobs ?

Data Engineering

1183 Views
1 replies
0 kudos

05-07-2021 7:24:28 AM

View Replies

Latest Reply

User15813097110
New Contributor III

05-07-2021 7:31:43 AM

0 kudos

Since the SparkContext is already up and running, it requires a restart. Technically, it might be possible to kill the JVM process and restart it but we do not recommend that approach. In this case, we recommend restarting the cluster so that the Sp...

0 kudos

05-07-2021 7:31:43 AM

by User16873043212 • New Contributor III

05-07-2021 3:00:48 AM

290 Views
0 replies
0 kudos

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks ...

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks pools for driver and workers. It provides a way to support driver vs. worker heterogeneity, and ther...

Data Engineering

290 Views
0 replies
0 kudos

05-07-2021 3:00:48 AM

by FernandoBenedet • New Contributor

06-09-2020 6:08:04 PM

4128 Views
2 replies
0 kudos

Loop through Dataframe in Python

Hello, Imagine you have a dataframe with cols: A, B, C. I want to add a column D based on some calculations of columns B and C of the previous record of the df. Which is the best way of doing this? I am trying to avoid looping through the df. I am u...

Data Engineering

4128 Views
2 replies
0 kudos

06-09-2020 6:08:04 PM

View Replies

Latest Reply

quincybatten
New Contributor II

05-02-2021 11:25:39 PM

0 kudos

Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a...

0 kudos

05-02-2021 11:25:39 PM

1 More Replies

by winston12 • New Contributor

09-13-2018 9:56:40 AM

12006 Views
5 replies
0 kudos

Connect to Blob storage "no credentials found for them in the configuration"

I'm working with Databricks notebook backed by spark cluster. Having trouble trying to connect to the Azure blob storage. I used this link and tried the section Access Azure Blob Storage Directly - Set up an account access key. I get no errors here:s...

Data Engineering

12006 Views
5 replies
0 kudos

09-13-2018 9:56:40 AM

View Replies

Latest Reply

Feder
New Contributor II

04-27-2021 7:25:36 AM

0 kudos

I have been facing the same problem over and over. Now trying to follow what's written here (https://docs.databricks.com/data/data-sources/azure/azure-storage.html#access-azure-blob-storage-directly), but always getting "shaded.databricks.org.apache...

0 kudos

04-27-2021 7:25:36 AM

4 More Replies

by Jasam • New Contributor

07-19-2016 8:17:07 AM

8033 Views
3 replies
0 kudos

how to infer csv schema default all columns like string using spark- csv?

I am using spark- csv utility, but I need when it infer schema all columns be transform in string columns by default. Thanks in advance.

Data Engineering

8033 Views
3 replies
0 kudos

07-19-2016 8:17:07 AM

View Replies

Latest Reply

jhoop2002
New Contributor II

04-19-2021 2:09:25 PM

0 kudos

@peyman what if I don't want to manually specify the schema? For example, I have a vendor that can't build a valid .csv file. I just need to import it somewhere so I can explore the data and find the errors. Just like the original author's question?...

0 kudos

04-19-2021 2:09:25 PM

2 More Replies

by NEERAJRATHORE19 • New Contributor

07-26-2019 1:07:18 PM

9255 Views
3 replies
1 kudos

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition : Error

I am creating dataframe using SQL in which all the underline tables are actually tempview based on dataframes. I am getting below error everytime. Can anyone help me to uderstand the issue here. Thanks in advance.An error occurred while calling o183....

Data Engineering

9255 Views
3 replies
1 kudos

07-26-2019 1:07:18 PM

View Replies

Latest Reply

htinhk
New Contributor II

04-09-2021 8:02:35 PM

1 kudos

I also encountered the same problem...It's weird that I can do the query but not the count.

1 kudos

04-09-2021 8:02:35 PM

2 More Replies

by XinhHuynh • New Contributor

06-04-2015 2:28:23 PM

8307 Views
3 replies
0 kudos

How do you add user comments to a notebook?

This is shown in a recent blog post (Figure 5): https://databricks.com/blog/2015/06/04/simplify-machine-learning-on-spark-with-databricks.html

Data Engineering

8307 Views
3 replies
0 kudos

06-04-2015 2:28:23 PM

View Replies

Latest Reply

Munna123
New Contributor II

02-14-2019 1:30:57 AM

0 kudos

Using of mouse and touch pad is very annoying that's why Microsoft launch windows shortcut keys. shortcut keys of laptop This windows shortcut keys are used for avoiding the use of mouse and touch pad.

0 kudos

02-14-2019 1:30:57 AM

2 More Replies

by MatthewHo • New Contributor

08-27-2015 12:24:18 PM

6216 Views
4 replies
0 kudos

"Importing" functions from other notebooks

For the sake of organization, I would like to define a few functions in notebook A, and have notebook B have access to those functions in notebook A. Having everything in one notebook makes it look very cluttered. Is this possible?

Data Engineering

6216 Views
4 replies
0 kudos

08-27-2015 12:24:18 PM

View Replies

Latest Reply

simone01
New Contributor II

02-22-2021 4:55:21 AM

0 kudos

<a href="https://managementassignmentshelp.com/risk-management-assignment-help.php ">Risk Management Assignment Help </a> <a href="https://myassignmentmart.com/assignment/material-science-assignment-help.html "> Material Science assignment help </a>...

0 kudos

02-22-2021 4:55:21 AM

3 More Replies

by RaymondXie • New Contributor

01-30-2020 8:14:18 PM

5707 Views
1 replies
0 kudos

How to union multiple dataframe in pyspark within Databricks notebook

I have 4 DFs: Avg_OpenBy_Year, AvgHighBy_Year, AvgLowBy_Year and AvgClose_By_Year, all of them have a common column of 'Year'.I want to join the three together to get a final df like:`Year, Open, High, Low, Close`At the moment I have to use the ugly...

Data Engineering

5707 Views
1 replies
0 kudos

01-30-2020 8:14:18 PM

View Replies

Latest Reply

thiago_matos
New Contributor II

02-04-2021 7:11:38 AM

0 kudos

Import reduce function in this way: from functools import reduce

0 kudos

02-04-2021 7:11:38 AM

by McKayHarris • New Contributor II

01-03-2017 3:42:14 PM

16696 Views
17 replies
3 kudos

ExecutorLostFailure: Remote RPC Client Disassociated

This is an expensive and long-running job that gets about halfway done before failing. The stack trace is included below, but here is the salient part: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage...

Data Engineering

16696 Views
17 replies
3 kudos

01-03-2017 3:42:14 PM

View Replies

Latest Reply

RodrigoDe_Freit
New Contributor II

12-10-2019 11:56:17 AM

3 kudos

According to https://docs.databricks.com/jobs.html#jar-job-tips:"Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed."That was my prob...

3 kudos

12-10-2019 11:56:17 AM

16 More Replies

User

Count

1603

737

344

284

247

Databricks

Forum Posts

I am not able to add picture to my profile

Kafka-StructuredStreaming-Elastic Search

How to retrieve the SQL queries owned by a user who left the organization in SQL Analytics?

How to push Cluster Logs to Elastic Search?

Resolved! How do I download maven libraries with dependencies?

Can we update the jars on a running interactive cluster? Is there a way we can reload the Jars and make them available for use on the Notebook/Jobs ?

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks ...

Loop through Dataframe in Python

Connect to Blob storage "no credentials found for them in the configuration"

how to infer csv schema default all columns like string using spark- csv?

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition : Error

How do you add user comments to a notebook?

"Importing" functions from other notebooks

How to union multiple dataframe in pyspark within Databricks notebook

ExecutorLostFailure: Remote RPC Client Disassociated

External table from external location

How to increase executor memory in Databricks jobs

Databricks job keep getting failed due to executor...

Set up connection to on prem sql server

Git Integration with Databricks Query Files and Az...