Data Engineering

Forum Posts

Sorted by:

by Anonymous • Not applicable

06-07-2022 9:25:52 AM

993 Views
0 replies
2 kudos

www.vandevelde.eu

June Featured Member of the Month ! Werner Stinckens Job Title: Data Engineer @ Van de Velde (www.vandevelde.eu)What are three words your coworkers would use to describe you?Helpful, accurate, inquisitiveWhat is your favorite thing about your curren...

Data Engineering

993 Views
0 replies
2 kudos

06-07-2022 9:25:52 AM

by enri_casca • New Contributor III

03-01-2022 3:50:05 AM

13769 Views
13 replies
2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \ .format("snowflake") \ .options(**options) \ .option('qu...

Data Engineering

13769 Views
13 replies
2 kudos

03-01-2022 3:50:05 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

03-01-2022 3:57:36 AM

2 kudos

can you check this SO topic?

2 kudos

03-01-2022 3:57:36 AM

12 More Replies

by SailajaB • Databricks Partner

02-27-2022 6:15:09 AM

14167 Views
5 replies
12 kudos

Resolved! how to convert each row of df to array of rows(list of rows)

Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...

Data Engineering

14167 Views
5 replies
12 kudos

02-27-2022 6:15:09 AM

View Replies

Latest Reply

SailajaB
Databricks Partner

02-28-2022 2:19:28 AM

12 kudos

@Hubert Dudek , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = [] for row in dfJSONString: # ==== Unflatten the JSON string ==== # js...

12 kudos

02-28-2022 2:19:28 AM

4 More Replies

by Alix • New Contributor III

02-21-2022 9:00:10 AM

14393 Views
8 replies
3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

Data Engineering

14393 Views
8 replies
3 kudos

02-21-2022 9:00:10 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

05-10-2022 7:02:51 PM

3 kudos

@Alix Métivier - The error is thrown from the user code (please investigate the jar file attached to the cluster). at m80.dbruniv_0_1.dbruniv.tFixedFlowInput_1Process(dbruniv.java:941)at m80.dbruniv_0_1.dbruniv.run(dbruniv.java:1654)at m80.dbruniv_...

3 kudos

05-10-2022 7:02:51 PM

7 More Replies

by cuteabhi32 • New Contributor III

06-06-2022 8:17:54 AM

55176 Views
11 replies
1 kudos

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

from pyspark import SparkContextfrom pyspark import SparkConffrom pyspark.sql.types import *from pyspark.sql.functions import *from pyspark.sql import *from pyspark.sql.types import StringTypefrom pyspark.sql.functions import udfdf1 = spark.read.form...

Data Engineering

55176 Views
11 replies
1 kudos

06-06-2022 8:17:54 AM

View Replies

Latest Reply

cuteabhi32
New Contributor III

06-07-2022 7:29:16 AM

1 kudos

Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputsdflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")dfg=dflist.filter(col('name').isin('ref_date')).count()if dfg==1 : a = a.wi...

1 kudos

06-07-2022 7:29:16 AM

10 More Replies

by Steamboat_Ski_C • New Contributor

06-07-2022 5:55:07 AM

1184 Views
0 replies
0 kudos

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the U...

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the United States. These types of condos are typically located in rural or suburban areas and offer resid...

Data Engineering

1184 Views
0 replies
0 kudos

06-07-2022 5:55:07 AM

by Data_Cowboy • New Contributor III

06-06-2022 7:44:26 AM

11824 Views
3 replies
1 kudos

Resolved! Plotting in pyspark.pandas Uncaught ReferenceError Plotly is not defined

Hi, I am trying to plot using pyspark.pandas running this sample code: speed = [0.1, 17.5, 40, 48, 52, 69, 88] lifespan = [2, 8, 70, 1.5, 25, 12, 28] index = ['snail', 'pig', 'elephant', 'rabbit', 'giraffe', 'coyote', 'horse'] psdf = ps.Data...

Data Engineering

11824 Views
3 replies
1 kudos

06-06-2022 7:44:26 AM

View Replies

Latest Reply

Data_Cowboy
New Contributor III

06-07-2022 5:27:34 AM

1 kudos

Thank you @Werner Stinckens . I was able to find the plotly documentation listed below and setting the output_type and calling displayHTML() helped remedy the error.

1 kudos

06-07-2022 5:27:34 AM

2 More Replies

by arda_123 • New Contributor III

06-06-2022 4:53:53 AM

4196 Views
2 replies
1 kudos

SQL Analytics Map Visualization: Map marker size

Hello all, I am trying to use the Map visualization in SQL Analytics Dashboard in Databricks. Does any one knows how or if we can change the size/radius of the markers based on values in another column. This seems like a very trivial parameter but I ...

Data Engineering

4196 Views
2 replies
1 kudos

06-06-2022 4:53:53 AM

View Replies

Latest Reply

arda_123
New Contributor III

06-07-2022 4:13:53 AM

1 kudos

Thanks @Kaniz Fatma

1 kudos

06-07-2022 4:13:53 AM

1 More Replies

by laurencewells • New Contributor III

05-31-2022 7:45:38 AM

6538 Views
5 replies
1 kudos

Autoloader and "cleanSource"

Hi All, We are trying to use the Spark 3 structured streaming feature/option ".option('cleanSource','archive')" to archive processed files. This is working as expected using the standard spark implementation, however does not appear to work using aut...

Data Engineering

6538 Views
5 replies
1 kudos

05-31-2022 7:45:38 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

06-01-2022 5:59:48 AM

1 kudos

https://docs.databricks.com/ingestion/auto-loader/options.html#common-auto-loader-optionscleanSource is not a listed option so it won't do anything.Maybe event retention is something you can use?

1 kudos

06-01-2022 5:59:48 AM

4 More Replies

by RiyazAliM • Honored Contributor

06-06-2022 7:21:48 PM

8540 Views
3 replies
3 kudos

Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

I'm reshaping my dataframe as per requirement and I came across this situation where I'm concatenating 2 dataframes and then transposing them. I've done this previously using pandas and the syntax for pandas goes as below:import pandas as pd df1 = ...

Data Engineering

8540 Views
3 replies
3 kudos

06-06-2022 7:21:48 PM

View Replies

Latest Reply

RiyazAliM
Honored Contributor

06-06-2022 11:45:41 PM

3 kudos

Hi @Kaniz Fatma ,I no longer see the answer you've posted, but I see you were suggesting to use `union`. As per my understanding, union are used to stack the dfs one upon another with similar schema / column names.In my situation, I have 2 different...

3 kudos

06-06-2022 11:45:41 PM

2 More Replies

by PawanShukla • New Contributor III

05-26-2022 2:18:24 AM

1925 Views
1 replies
0 kudos

Workflow Pipeline in Azure Databrick is throwing error for EventHubsSourceProvider could not be instantiated

I am using the sample code which is available in getting start tutorial. And it is simple read the json file and move in another table. But it is throwing error related to EventHubsSourceProvider

Data Engineering

1925 Views
1 replies
0 kudos

05-26-2022 2:18:24 AM

View Replies

by Maverick1 • Valued Contributor II

05-20-2022 3:37:02 AM

9892 Views
3 replies
6 kudos

Is there any way to overwrite a partition in delta table without specifying each and every partition in replace where? For non dated partitions, this is really a mess with delta tables.

Is there any way to overwrite a partition in delta table without specifying each and every partition in replace where. For non dated partitions, this is really a mess with delta tables.Most of my DE teams don't want to adopt delta because of these gl...

Data Engineering

9892 Views
3 replies
6 kudos

05-20-2022 3:37:02 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-06-2022 5:57:43 AM

6 kudos

Hi @Saurabh Verma following up did you get a chance to check @Hubert Dudek previous comments ?

6 kudos

06-06-2022 5:57:43 AM

2 More Replies

by Anonymous • Not applicable

05-20-2022 2:31:45 AM

2663 Views
1 replies
1 kudos

Query silently failed

Hello all, I'm using the older 6.4 runtime and noticed that a query return no result whereas the same query on 10.4 provided the expected result. This is bad, because I got no error, simply no result at all.Is there is some spark settings on the clus...

Data Engineering

2663 Views
1 replies
1 kudos

05-20-2022 2:31:45 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-06-2022 5:58:51 AM

1 kudos

Hi @Alessio Palma following up did you get chance to check @Kaniz Fatma 's previous comments ?

1 kudos

06-06-2022 5:58:51 AM

by Jack • New Contributor II

06-02-2022 7:44:33 AM

6412 Views
1 replies
1 kudos

Append an empty dataframe to a list of dataframes using for loop in python

I have the following 3 dataframes:I want to append df_forecast to each of df2_CA and df2_USA using a for-loop. However when I run my code, df_forecast is not appending: df2_CA and df2_USA appear exactly as shown above.Here’s the code:df_list=[df2_CA,...

Data Engineering

6412 Views
1 replies
1 kudos

06-02-2022 7:44:33 AM

View Replies

Latest Reply

User16764241763
Databricks Employee

06-05-2022 9:36:22 PM

1 kudos

@Jack Homareau Can you try union functionality with dataframes?https://sparkbyexamples.com/pyspark/pyspark-union-and-unionall/and then try to fill NaNs with the desired values?

1 kudos

06-05-2022 9:36:22 PM

by VM • Contributor

04-25-2022 6:23:22 AM

6447 Views
4 replies
2 kudos

Error using Synapse ML: JavaPackage object is not callable

I am using DBR version 10.1. I want to use Synapse ML package. I am able to install and import it by following instructions on the link: https://github.com/microsoft/SynapseML. However when I try to run the code it gives me the error shown in the att...

Data Engineering

6447 Views
4 replies
2 kudos

04-25-2022 6:23:22 AM

View Replies

Latest Reply

User16764241763
Databricks Employee

06-05-2022 9:24:27 PM

2 kudos

Hello @Vikram Mahawal Clusters need to be in the running state to install/uninstall the libraries. Could you please start the cluster and try installing it.If you are still stuck, please file a support case with us, so we can take a look.Thanks

2 kudos

06-05-2022 9:24:27 PM

3 More Replies

Databricks Community

Forum Posts

www.vandevelde.eu

Resolved! Couldn't convert string to float when fit model

Resolved! how to convert each row of df to array of rows(list of rows)

Resolved! Remote RPC client disassociated error

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the U...

Resolved! Plotting in pyspark.pandas Uncaught ReferenceError Plotly is not defined

SQL Analytics Map Visualization: Map marker size

Autoloader and "cleanSource"

Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

Workflow Pipeline in Azure Databrick is throwing error for EventHubsSourceProvider could not be instantiated

Is there any way to overwrite a partition in delta table without specifying each and every partition in replace where? For non dated partitions, this is really a mess with delta tables.

Query silently failed

Append an empty dataframe to a list of dataframes using for loop in python

Error using Synapse ML: JavaPackage object is not callable

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template