cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Anonymous
by Not applicable
  • 993 Views
  • 0 replies
  • 2 kudos

www.vandevelde.eu

June Featured Member of the Month ! Werner Stinckens Job Title: Data Engineer @ Van de Velde (www.vandevelde.eu)What are three words your coworkers would use to describe you?Helpful, accurate, inquisitiveWhat is your favorite thing about your curren...

  • 993 Views
  • 0 replies
  • 2 kudos
enri_casca
by New Contributor III
  • 13769 Views
  • 13 replies
  • 2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \  .format("snowflake") \  .options(**options) \  .option('qu...

  • 13769 Views
  • 13 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

can you check this SO topic?

  • 2 kudos
12 More Replies
SailajaB
by Databricks Partner
  • 14167 Views
  • 5 replies
  • 12 kudos

Resolved! how to convert each row of df to array of rows(list of rows)

Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...

  • 14167 Views
  • 5 replies
  • 12 kudos
Latest Reply
SailajaB
Databricks Partner
  • 12 kudos

@Hubert Dudek​ , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = []  for row in dfJSONString:    # ==== Unflatten the JSON string ==== #    js...

  • 12 kudos
4 More Replies
Alix
by New Contributor III
  • 14393 Views
  • 8 replies
  • 3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

  • 14393 Views
  • 8 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

@Alix Métivier​  - The error is thrown from the user code (please investigate the jar file attached to the cluster). at m80.dbruniv_0_1.dbruniv.tFixedFlowInput_1Process(dbruniv.java:941)at m80.dbruniv_0_1.dbruniv.run(dbruniv.java:1654)at m80.dbruniv_...

  • 3 kudos
7 More Replies
cuteabhi32
by New Contributor III
  • 55176 Views
  • 11 replies
  • 1 kudos

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

from pyspark import SparkContextfrom pyspark import SparkConffrom pyspark.sql.types import *from pyspark.sql.functions import *from pyspark.sql import *from pyspark.sql.types import StringTypefrom pyspark.sql.functions import udfdf1 = spark.read.form...

  • 55176 Views
  • 11 replies
  • 1 kudos
Latest Reply
cuteabhi32
New Contributor III
  • 1 kudos

Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputsdflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")dfg=dflist.filter(col('name').isin('ref_date')).count()if dfg==1 :  a = a.wi...

  • 1 kudos
10 More Replies
Steamboat_Ski_C
by New Contributor
  • 1184 Views
  • 0 replies
  • 0 kudos

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the U...

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the United States. These types of condos are typically located in rural or suburban areas and offer resid...

  • 1184 Views
  • 0 replies
  • 0 kudos
Data_Cowboy
by New Contributor III
  • 11824 Views
  • 3 replies
  • 1 kudos

Resolved! Plotting in pyspark.pandas Uncaught ReferenceError Plotly is not defined

Hi, I am trying to plot using pyspark.pandas running this sample code: speed = [0.1, 17.5, 40, 48, 52, 69, 88] lifespan = [2, 8, 70, 1.5, 25, 12, 28] index = ['snail', 'pig', 'elephant', 'rabbit', 'giraffe', 'coyote', 'horse'] psdf = ps.Data...

Error Message
  • 11824 Views
  • 3 replies
  • 1 kudos
Latest Reply
Data_Cowboy
New Contributor III
  • 1 kudos

Thank you @Werner Stinckens​ . I was able to find the plotly documentation listed below and setting the output_type and calling displayHTML() helped remedy the error.

  • 1 kudos
2 More Replies
arda_123
by New Contributor III
  • 4196 Views
  • 2 replies
  • 1 kudos

SQL Analytics Map Visualization: Map marker size

Hello all, I am trying to use the Map visualization in SQL Analytics Dashboard in Databricks. Does any one knows how or if we can change the size/radius of the markers based on values in another column. This seems like a very trivial parameter but I ...

  • 4196 Views
  • 2 replies
  • 1 kudos
Latest Reply
arda_123
New Contributor III
  • 1 kudos

Thanks @Kaniz Fatma​ 

  • 1 kudos
1 More Replies
laurencewells
by New Contributor III
  • 6538 Views
  • 5 replies
  • 1 kudos

Autoloader and "cleanSource"

Hi All, We are trying to use the Spark 3 structured streaming feature/option ".option('cleanSource','archive')" to archive processed files. This is working as expected using the standard spark implementation, however does not appear to work using aut...

  • 6538 Views
  • 5 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

https://docs.databricks.com/ingestion/auto-loader/options.html#common-auto-loader-optionscleanSource is not a listed option so it won't do anything.Maybe event retention is something you can use?

  • 1 kudos
4 More Replies
RiyazAliM
by Honored Contributor
  • 8540 Views
  • 3 replies
  • 3 kudos

Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

I'm reshaping my dataframe as per requirement and I came across this situation where I'm concatenating 2 dataframes and then transposing them. I've done this previously using pandas and the syntax for pandas goes as below:import pandas as pd   df1 = ...

  • 8540 Views
  • 3 replies
  • 3 kudos
Latest Reply
RiyazAliM
Honored Contributor
  • 3 kudos

Hi @Kaniz Fatma​ ,I no longer see the answer you've posted, but I see you were suggesting to use `union`. As per my understanding, union are used to stack the dfs one upon another with similar schema / column names.In my situation, I have 2 different...

  • 3 kudos
2 More Replies
Maverick1
by Valued Contributor II
  • 9892 Views
  • 3 replies
  • 6 kudos

Is there any way to overwrite a partition in delta table without specifying each and every partition in replace where? For non dated partitions, this is really a mess with delta tables.

Is there any way to overwrite a partition in delta table without specifying each and every partition in replace where. For non dated partitions, this is really a mess with delta tables.Most of my DE teams don't want to adopt delta because of these gl...

  • 9892 Views
  • 3 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Saurabh Verma​ following up did you get a chance to check @Hubert Dudek​ previous comments ?

  • 6 kudos
2 More Replies
Anonymous
by Not applicable
  • 2663 Views
  • 1 replies
  • 1 kudos

Query silently failed

Hello all, I'm using the older 6.4 runtime and noticed that a query return no result whereas the same query on 10.4 provided the expected result. This is bad, because I got no error, simply no result at all.Is there is some spark settings on the clus...

  • 2663 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Alessio Palma​ following up did you get chance to check @Kaniz Fatma​ 's previous comments ?

  • 1 kudos
Jack
by New Contributor II
  • 6412 Views
  • 1 replies
  • 1 kudos

Append an empty dataframe to a list of dataframes using for loop in python

I have the following 3 dataframes:I want to append df_forecast to each of df2_CA and df2_USA using a for-loop. However when I run my code, df_forecast is not appending: df2_CA and df2_USA appear exactly as shown above.Here’s the code:df_list=[df2_CA,...

image image
  • 6412 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16764241763
Databricks Employee
  • 1 kudos

@Jack Homareau​  Can you try union functionality with dataframes?https://sparkbyexamples.com/pyspark/pyspark-union-and-unionall/and then try to fill NaNs with the desired values?

  • 1 kudos
VM
by Contributor
  • 6447 Views
  • 4 replies
  • 2 kudos

Error using Synapse ML: JavaPackage object is not callable

I am using DBR version 10.1. I want to use Synapse ML package. I am able to install and import it by following instructions on the link: https://github.com/microsoft/SynapseML. However when I try to run the code it gives me the error shown in the att...

  • 6447 Views
  • 4 replies
  • 2 kudos
Latest Reply
User16764241763
Databricks Employee
  • 2 kudos

Hello @Vikram Mahawal​ Clusters need to be in the running state to install/uninstall the libraries. Could you please start the cluster and try installing it.If you are still stuck, please file a support case with us, so we can take a look.Thanks

  • 2 kudos
3 More Replies
Labels