cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User15813097110
by New Contributor III
  • 6740 Views
  • 1 replies
  • 0 kudos
  • 6740 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15813097110
New Contributor III
  • 0 kudos

We can use the below steps to push Cluster Logs to Elastic Search:1. Download the log4j-elasticsearch-java-api repo and build the jar file:git clone https://github.com/Downfy/log4j-elasticsearch-java-api.git cd log4j-elasticsearch-java-api/ mvn clean...

  • 0 kudos
User16871418122
by Contributor III
  • 12261 Views
  • 1 replies
  • 0 kudos

Resolved! How do I download maven libraries with dependencies?

I want to import a maven library with its dependencies. How to do it?

  • 12261 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16871418122
Contributor III
  • 0 kudos

I recommend creating a UBER jar or download jars offline use it in clusters when the maven becomes healthy again: 1. Install the MVN CLI tool on your local mac: brew install mvnvm2. Download the Artifact with all dependencies: mvn dependency:get -Dr...

  • 0 kudos
User15813097110
by New Contributor III
  • 2424 Views
  • 1 replies
  • 0 kudos
  • 2424 Views
  • 1 replies
  • 0 kudos
Latest Reply
User15813097110
New Contributor III
  • 0 kudos

Since the SparkContext is already up and running, it requires a restart. Technically, it might be possible to kill the JVM process and restart it but we do not recommend that approach. In this case, we recommend restarting the cluster so that the Sp...

  • 0 kudos
User16873043212
by New Contributor III
  • 1178 Views
  • 0 replies
  • 0 kudos

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks ...

We can now launch pools on databricks with different instance types. Hybrid Pools allows customers to create clusters and select different Databricks pools for driver and workers. It provides a way to support driver vs. worker heterogeneity, and ther...

  • 1178 Views
  • 0 replies
  • 0 kudos
FernandoBenedet
by New Contributor
  • 6947 Views
  • 2 replies
  • 0 kudos

Loop through Dataframe in Python

Hello, Imagine you have a dataframe with cols: A, B, C. I want to add a column D based on some calculations of columns B and C of the previous record of the df. Which is the best way of doing this? I am trying to avoid looping through the df. I am u...

  • 6947 Views
  • 2 replies
  • 0 kudos
Latest Reply
quincybatten
New Contributor II
  • 0 kudos

Iterating through pandas dataFrame objects is generally slow. Pandas Iteration beats the whole purpose of using DataFrame. It is an anti-pattern and is something you should only do when you have exhausted every other option. It is better look for a...

  • 0 kudos
1 More Replies
winston12
by New Contributor
  • 17178 Views
  • 5 replies
  • 0 kudos

Connect to Blob storage "no credentials found for them in the configuration"

I'm working with Databricks notebook backed by spark cluster. Having trouble trying to connect to the Azure blob storage. I used this link and tried the section Access Azure Blob Storage Directly - Set up an account access key. I get no errors here:s...

  • 17178 Views
  • 5 replies
  • 0 kudos
Latest Reply
Feder
New Contributor II
  • 0 kudos

I have been facing the same problem over and over. Now trying to follow what's written here (https://docs.databricks.com/data/data-sources/azure/azure-storage.html#access-azure-blob-storage-directly), but always getting "shaded.databricks.org.apache...

  • 0 kudos
4 More Replies
Jasam
by New Contributor
  • 11994 Views
  • 3 replies
  • 0 kudos

how to infer csv schema default all columns like string using spark- csv?

I am using spark- csv utility, but I need when it infer schema all columns be transform in string columns by default. Thanks in advance.

  • 11994 Views
  • 3 replies
  • 0 kudos
Latest Reply
jhoop2002
New Contributor II
  • 0 kudos

@peyman what if I don't want to manually specify the schema? For example, I have a vendor that can't build a valid .csv file. I just need to import it somewhere so I can explore the data and find the errors. Just like the original author's question?...

  • 0 kudos
2 More Replies
NEERAJRATHORE19
by New Contributor
  • 14095 Views
  • 3 replies
  • 1 kudos

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange SinglePartition : Error

I am creating dataframe using SQL in which all the underline tables are actually tempview based on dataframes. I am getting below error everytime. Can anyone help me to uderstand the issue here. Thanks in advance.An error occurred while calling o183....

  • 14095 Views
  • 3 replies
  • 1 kudos
Latest Reply
htinhk
New Contributor II
  • 1 kudos

I also encountered the same problem...It's weird that I can do the query but not the count.

  • 1 kudos
2 More Replies
XinhHuynh
by New Contributor
  • 10829 Views
  • 3 replies
  • 0 kudos

How do you add user comments to a notebook?

This is shown in a recent blog post (Figure 5): https://databricks.com/blog/2015/06/04/simplify-machine-learning-on-spark-with-databricks.html

  • 10829 Views
  • 3 replies
  • 0 kudos
Latest Reply
Munna123
New Contributor II
  • 0 kudos

Using of mouse and touch pad is very annoying that's why Microsoft launch windows shortcut keys. shortcut keys of laptop This windows shortcut keys are used for avoiding the use of mouse and touch pad.

  • 0 kudos
2 More Replies
MatthewHo
by New Contributor
  • 9381 Views
  • 4 replies
  • 0 kudos

"Importing" functions from other notebooks

For the sake of organization, I would like to define a few functions in notebook A, and have notebook B have access to those functions in notebook A. Having everything in one notebook makes it look very cluttered. Is this possible?

  • 9381 Views
  • 4 replies
  • 0 kudos
Latest Reply
simone01
New Contributor II
  • 0 kudos

<a href="https://managementassignmentshelp.com/risk-management-assignment-help.php ">Risk Management Assignment Help </a> <a href="https://myassignmentmart.com/assignment/material-science-assignment-help.html "> Material Science assignment help </a>...

  • 0 kudos
3 More Replies
RaymondXie
by New Contributor
  • 10226 Views
  • 1 replies
  • 0 kudos

How to union multiple dataframe in pyspark within Databricks notebook

I have 4 DFs: Avg_OpenBy_Year, AvgHighBy_Year, AvgLowBy_Year and AvgClose_By_Year, all of them have a common column of 'Year'.I want to join the three together to get a final df like:`Year, Open, High, Low, Close`At the moment I have to use the ugly...

0693f000007OoI6AAK
  • 10226 Views
  • 1 replies
  • 0 kudos
Latest Reply
thiago_matos
New Contributor II
  • 0 kudos

Import reduce function in this way: from functools import reduce

  • 0 kudos
McKayHarris
by New Contributor II
  • 34439 Views
  • 17 replies
  • 3 kudos

ExecutorLostFailure: Remote RPC Client Disassociated

This is an expensive and long-running job that gets about halfway done before failing. The stack trace is included below, but here is the salient part: Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 4881 in stage...

  • 34439 Views
  • 17 replies
  • 3 kudos
Latest Reply
RodrigoDe_Freit
New Contributor II
  • 3 kudos

According to https://docs.databricks.com/jobs.html#jar-job-tips:"Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed."That was my prob...

  • 3 kudos
16 More Replies
dtr
by New Contributor
  • 7524 Views
  • 1 replies
  • 0 kudos

PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers.

I am trying to write a function in Azure databricks. I would like to spark.sql inside the function. But it looks like I cannot use it with worker nodes. def SEL_ID(value, index): # some processing on value here ans = spark.sql("SELECT id FRO...

  • 7524 Views
  • 1 replies
  • 0 kudos
Latest Reply
MartinhoAzevedo
New Contributor II
  • 0 kudos

Hi there. i guess im a bit late but do you remember how and if you fixed this issue? im getting the same exact problem. @dtr

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels