cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

boskicl
by New Contributor III
  • 27047 Views
  • 5 replies
  • 10 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 27047 Views
  • 5 replies
  • 10 kudos
Latest Reply
Anonymous
Not applicable
  • 10 kudos

@Ljuboslav Boskic​ there can be multiple reasons why the query is taking more time , during this phase metadata look-up activity happens, can you please check on the below things Ensuring the tables are z-ordered properly, and that the merge key (on ...

  • 10 kudos
4 More Replies
Whitcomb_Selins
by New Contributor
  • 2822 Views
  • 0 replies
  • 0 kudos

What is a natural resource and why do we need them?Natural resources are vital to our survival and well-being. They provide the food, water, and energ...

What is a natural resource and why do we need them?Natural resources are vital to our survival and well-being. They provide the food, water, and energy that we need to live, and they support the ecosystems that we rely on for our livelihoods.However,...

  • 2822 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 645 Views
  • 0 replies
  • 2 kudos

www.vandevelde.eu

June Featured Member of the Month ! Werner Stinckens Job Title: Data Engineer @ Van de Velde (www.vandevelde.eu)What are three words your coworkers would use to describe you?Helpful, accurate, inquisitiveWhat is your favorite thing about your curren...

  • 645 Views
  • 0 replies
  • 2 kudos
enri_casca
by New Contributor III
  • 9559 Views
  • 13 replies
  • 2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \  .format("snowflake") \  .options(**options) \  .option('qu...

  • 9559 Views
  • 13 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

can you check this SO topic?

  • 2 kudos
12 More Replies
SailajaB
by Valued Contributor III
  • 10434 Views
  • 5 replies
  • 12 kudos

Resolved! how to convert each row of df to array of rows(list of rows)

Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...

  • 10434 Views
  • 5 replies
  • 12 kudos
Latest Reply
SailajaB
Valued Contributor III
  • 12 kudos

@Hubert Dudek​ , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = []  for row in dfJSONString:    # ==== Unflatten the JSON string ==== #    js...

  • 12 kudos
4 More Replies
Alix
by New Contributor III
  • 10611 Views
  • 8 replies
  • 3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

  • 10611 Views
  • 8 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

@Alix Métivier​  - The error is thrown from the user code (please investigate the jar file attached to the cluster). at m80.dbruniv_0_1.dbruniv.tFixedFlowInput_1Process(dbruniv.java:941)at m80.dbruniv_0_1.dbruniv.run(dbruniv.java:1654)at m80.dbruniv_...

  • 3 kudos
7 More Replies
cuteabhi32
by New Contributor III
  • 41895 Views
  • 11 replies
  • 1 kudos

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

from pyspark import SparkContextfrom pyspark import SparkConffrom pyspark.sql.types import *from pyspark.sql.functions import *from pyspark.sql import *from pyspark.sql.types import StringTypefrom pyspark.sql.functions import udfdf1 = spark.read.form...

  • 41895 Views
  • 11 replies
  • 1 kudos
Latest Reply
cuteabhi32
New Contributor III
  • 1 kudos

Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputsdflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")dfg=dflist.filter(col('name').isin('ref_date')).count()if dfg==1 :  a = a.wi...

  • 1 kudos
10 More Replies
Steamboat_Ski_C
by New Contributor
  • 618 Views
  • 0 replies
  • 0 kudos

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the U...

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the United States. These types of condos are typically located in rural or suburban areas and offer resid...

  • 618 Views
  • 0 replies
  • 0 kudos
Data_Cowboy
by New Contributor III
  • 8430 Views
  • 3 replies
  • 1 kudos

Resolved! Plotting in pyspark.pandas Uncaught ReferenceError Plotly is not defined

Hi, I am trying to plot using pyspark.pandas running this sample code: speed = [0.1, 17.5, 40, 48, 52, 69, 88] lifespan = [2, 8, 70, 1.5, 25, 12, 28] index = ['snail', 'pig', 'elephant', 'rabbit', 'giraffe', 'coyote', 'horse'] psdf = ps.Data...

Error Message
  • 8430 Views
  • 3 replies
  • 1 kudos
Latest Reply
Data_Cowboy
New Contributor III
  • 1 kudos

Thank you @Werner Stinckens​ . I was able to find the plotly documentation listed below and setting the output_type and calling displayHTML() helped remedy the error.

  • 1 kudos
2 More Replies
arda_123
by New Contributor III
  • 3132 Views
  • 2 replies
  • 1 kudos

SQL Analytics Map Visualization: Map marker size

Hello all, I am trying to use the Map visualization in SQL Analytics Dashboard in Databricks. Does any one knows how or if we can change the size/radius of the markers based on values in another column. This seems like a very trivial parameter but I ...

  • 3132 Views
  • 2 replies
  • 1 kudos
Latest Reply
arda_123
New Contributor III
  • 1 kudos

Thanks @Kaniz Fatma​ 

  • 1 kudos
1 More Replies
laurencewells
by New Contributor III
  • 4199 Views
  • 5 replies
  • 1 kudos

Autoloader and "cleanSource"

Hi All, We are trying to use the Spark 3 structured streaming feature/option ".option('cleanSource','archive')" to archive processed files. This is working as expected using the standard spark implementation, however does not appear to work using aut...

  • 4199 Views
  • 5 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

https://docs.databricks.com/ingestion/auto-loader/options.html#common-auto-loader-optionscleanSource is not a listed option so it won't do anything.Maybe event retention is something you can use?

  • 1 kudos
4 More Replies
RiyazAli
by Valued Contributor
  • 6871 Views
  • 3 replies
  • 3 kudos

Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

I'm reshaping my dataframe as per requirement and I came across this situation where I'm concatenating 2 dataframes and then transposing them. I've done this previously using pandas and the syntax for pandas goes as below:import pandas as pd   df1 = ...

  • 6871 Views
  • 3 replies
  • 3 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 3 kudos

Hi @Kaniz Fatma​ ,I no longer see the answer you've posted, but I see you were suggesting to use `union`. As per my understanding, union are used to stack the dfs one upon another with similar schema / column names.In my situation, I have 2 different...

  • 3 kudos
2 More Replies
PawanShukla
by New Contributor III
  • 1285 Views
  • 1 replies
  • 0 kudos

Workflow Pipeline in Azure Databrick is throwing error for EventHubsSourceProvider could not be instantiated

I am using the sample code which is available in getting start tutorial. And it is simple read the json file and move in another table. But it is throwing error related to EventHubsSourceProvider 

  • 1285 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
I am using the sample code which is available in getting start tutorial. And it is simple read the json file and move in another table. But it is throwing error related to EventHubsSourceProvider 

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Maverick1
by Valued Contributor II
  • 5400 Views
  • 3 replies
  • 6 kudos

Is there any way to overwrite a partition in delta table without specifying each and every partition in replace where? For non dated partitions, this is really a mess with delta tables.

Is there any way to overwrite a partition in delta table without specifying each and every partition in replace where. For non dated partitions, this is really a mess with delta tables.Most of my DE teams don't want to adopt delta because of these gl...

  • 5400 Views
  • 3 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hi @Saurabh Verma​ following up did you get a chance to check @Hubert Dudek​ previous comments ?

  • 6 kudos
2 More Replies
Anonymous
by Not applicable
  • 1728 Views
  • 1 replies
  • 1 kudos

Query silently failed

Hello all, I'm using the older 6.4 runtime and noticed that a query return no result whereas the same query on 10.4 provided the expected result. This is bad, because I got no error, simply no result at all.Is there is some spark settings on the clus...

  • 1728 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Alessio Palma​ following up did you get chance to check @Kaniz Fatma​ 's previous comments ?

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels