Data Engineering

Forum Posts

Sorted by:

by Adalberto • New Contributor II

04-13-2022 8:30:18 AM

2750 Views
4 replies
2 kudos

Resolved! cannot resolve '(CAST(10000 AS BIGINT) div Khe)' due to data type mismatch:

Hi,I'm trying to create a delta table using SQL but I'm getting this errorError in SQL statement: AnalysisException: cannot resolve '(CAST(10000 AS BIGINT) div Khe)' due to data type mismatch: differing types in '(CAST(10000 AS BIGINT) div Khe)' (big...

Data Engineering

2750 Views
4 replies
2 kudos

04-13-2022 8:30:18 AM

View Replies

Latest Reply

Noopur_Nigam
Valued Contributor II

05-13-2022 8:52:43 AM

2 kudos

Hi @Adalberto Garcia Espinosa Do you need khe column to be double? If not, below query is working:%sql CREATE OR REPLACE TABLE Productos(Khe bigint NOT NULL,Fctor_HL_Estiba bigint GENERATED ALWAYS AS (cast(10000 as bigint) div Khe)) seems to be work...

2 kudos

05-13-2022 8:52:43 AM

3 More Replies

by Ambi • New Contributor III

04-04-2022 9:34:14 AM

2592 Views
6 replies
8 kudos

Resolved! Access azure storage account from databricks notebook using pyspark or SQL

I have a storage account - Azure BLOB StorageThere I had container. Inside the container we had a CSV file. Couldn't read the file using the access Key and Storage account name.Any idea how to read file using PySpark/SQL? Thanks in advance

Data Engineering

2592 Views
6 replies
8 kudos

04-04-2022 9:34:14 AM

View Replies

Latest Reply

Atanu
Esteemed Contributor

05-12-2022 10:47:23 PM

8 kudos

@Ambiga D you need to mount the storage https://docs.databricks.com/data/data-sources/azure/azure-storage.html#mount-azure-blob-storage-containers-to-dbfs you can follow this,thanks.

8 kudos

05-12-2022 10:47:23 PM

5 More Replies

by Confused • New Contributor III

04-04-2022 3:57:50 AM

13960 Views
2 replies
1 kudos

Resolved! Configuring pip index-url and using artifacts-keyring

Hi I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where i...

Data Engineering

13960 Views
2 replies
1 kudos

04-04-2022 3:57:50 AM

View Replies

Latest Reply

Atanu
Esteemed Contributor

05-12-2022 10:51:09 PM

1 kudos

for your first question https://docs.databricks.com/libraries/index.html#python-environment-management and https://docs.databricks.com/libraries/notebooks-python-libraries.html#manage-libraries-with-pip-commands this may help. again you can convert t...

1 kudos

05-12-2022 10:51:09 PM

1 More Replies

by Jeff1 • Contributor II

04-01-2022 5:58:37 AM

9342 Views
7 replies
10 kudos

Resolved! How to write *.csv file from DataBricks FileStore

Struggling with how to export a Spark dataframe as a *.csv file to a local computer. I'm successfully using the spark_write_csv funciton (sparklyr R library R) to write the csv file out to my databricks dbfs:FileStore location. Becase (I'm assuming)...

Data Engineering

9342 Views
7 replies
10 kudos

04-01-2022 5:58:37 AM

View Replies

Latest Reply

Kaniz
Community Manager

04-03-2022 11:59:32 PM

10 kudos

Hi @Jeff (Customer), Were you able to follow @Hubert Dudek ? Did it help you?

10 kudos

04-03-2022 11:59:32 PM

6 More Replies

by boskicl • New Contributor III

03-23-2022 11:04:23 AM

13132 Views
5 replies
10 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

Data Engineering

13132 Views
5 replies
10 kudos

03-23-2022 11:04:23 AM

View Replies

Latest Reply

Anonymous
Not applicable

05-16-2022 2:48:25 AM

10 kudos

@Ljuboslav Boskic there can be multiple reasons why the query is taking more time , during this phase metadata look-up activity happens, can you please check on the below things Ensuring the tables are z-ordered properly, and that the merge key (on ...

10 kudos

05-16-2022 2:48:25 AM

4 More Replies

by Whitcomb_Selins • New Contributor

06-07-2022 9:39:54 AM

418 Views
0 replies
0 kudos

What is a natural resource and why do we need them?Natural resources are vital to our survival and well-being. They provide the food, water, and energ...

What is a natural resource and why do we need them?Natural resources are vital to our survival and well-being. They provide the food, water, and energy that we need to live, and they support the ecosystems that we rely on for our livelihoods.However,...

Data Engineering

418 Views
0 replies
0 kudos

06-07-2022 9:39:54 AM

by Anonymous • Not applicable

06-07-2022 9:25:52 AM

328 Views
0 replies
2 kudos

www.vandevelde.eu

June Featured Member of the Month ! Werner Stinckens Job Title: Data Engineer @ Van de Velde (www.vandevelde.eu)What are three words your coworkers would use to describe you?Helpful, accurate, inquisitiveWhat is your favorite thing about your curren...

Data Engineering

328 Views
0 replies
2 kudos

06-07-2022 9:25:52 AM

by enri_casca • New Contributor III

03-01-2022 3:50:05 AM

4709 Views
13 replies
2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \ .format("snowflake") \ .options(**options) \ .option('qu...

Data Engineering

4709 Views
13 replies
2 kudos

03-01-2022 3:50:05 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

03-01-2022 3:57:36 AM

2 kudos

can you check this SO topic?

2 kudos

03-01-2022 3:57:36 AM

12 More Replies

by SailajaB • Valued Contributor III

02-27-2022 6:15:09 AM

6139 Views
5 replies
12 kudos

Resolved! how to convert each row of df to array of rows(list of rows)

Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...

Data Engineering

6139 Views
5 replies
12 kudos

02-27-2022 6:15:09 AM

View Replies

Latest Reply

SailajaB
Valued Contributor III

02-28-2022 2:19:28 AM

12 kudos

@Hubert Dudek , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = [] for row in dfJSONString: # ==== Unflatten the JSON string ==== # js...

12 kudos

02-28-2022 2:19:28 AM

4 More Replies

by Alix • New Contributor III

02-21-2022 9:00:10 AM

7304 Views
9 replies
3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

Data Engineering

7304 Views
9 replies
3 kudos

02-21-2022 9:00:10 AM

View Replies

Latest Reply

Kaniz
Community Manager

05-12-2022 12:28:28 AM

3 kudos

Hi @Alix Métivier , Just a friendly follow-up. Do you still need help, or @Shanmugavel Chandrakasu 's response help you to find the solution? Please let us know.

3 kudos

05-12-2022 12:28:28 AM

8 More Replies

by cuteabhi32 • New Contributor III

06-06-2022 8:17:54 AM

25502 Views
11 replies
1 kudos

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

from pyspark import SparkContextfrom pyspark import SparkConffrom pyspark.sql.types import *from pyspark.sql.functions import *from pyspark.sql import *from pyspark.sql.types import StringTypefrom pyspark.sql.functions import udfdf1 = spark.read.form...

Data Engineering

25502 Views
11 replies
1 kudos

06-06-2022 8:17:54 AM

View Replies

Latest Reply

cuteabhi32
New Contributor III

06-07-2022 7:29:16 AM

1 kudos

Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputsdflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")dfg=dflist.filter(col('name').isin('ref_date')).count()if dfg==1 : a = a.wi...

1 kudos

06-07-2022 7:29:16 AM

10 More Replies

by Steamboat_Ski_C • New Contributor

06-07-2022 5:55:07 AM

296 Views
0 replies
0 kudos

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the U...

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the United States. These types of condos are typically located in rural or suburban areas and offer resid...

Data Engineering

296 Views
0 replies
0 kudos

06-07-2022 5:55:07 AM

by arda_123 • New Contributor III

06-06-2022 4:53:53 AM

1581 Views
5 replies
2 kudos

Resolved! SQL Analytics Map Visualization: Map marker size

Hello all, I am trying to use the Map visualization in SQL Analytics Dashboard in Databricks. Does any one knows how or if we can change the size/radius of the markers based on values in another column. This seems like a very trivial parameter but I ...

Data Engineering

1581 Views
5 replies
2 kudos

06-06-2022 4:53:53 AM

View Replies

Latest Reply

arda_123
New Contributor III

06-07-2022 4:13:53 AM

2 kudos

Thanks @Kaniz Fatma

2 kudos

06-07-2022 4:13:53 AM

4 More Replies

by laurencewells • New Contributor III

05-31-2022 7:45:38 AM

2352 Views
5 replies
1 kudos

Autoloader and "cleanSource"

Hi All, We are trying to use the Spark 3 structured streaming feature/option ".option('cleanSource','archive')" to archive processed files. This is working as expected using the standard spark implementation, however does not appear to work using aut...

Data Engineering

2352 Views
5 replies
1 kudos

05-31-2022 7:45:38 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

06-01-2022 5:59:48 AM

1 kudos

https://docs.databricks.com/ingestion/auto-loader/options.html#common-auto-loader-optionscleanSource is not a listed option so it won't do anything.Maybe event retention is something you can use?

1 kudos

06-01-2022 5:59:48 AM

4 More Replies

by RiyazAli • Contributor III

06-06-2022 7:21:48 PM

4067 Views
6 replies
6 kudos

Resolved! Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

I'm reshaping my dataframe as per requirement and I came across this situation where I'm concatenating 2 dataframes and then transposing them. I've done this previously using pandas and the syntax for pandas goes as below:import pandas as pd df1 = ...

Data Engineering

4067 Views
6 replies
6 kudos

06-06-2022 7:21:48 PM

View Replies

Latest Reply

RiyazAli
Contributor III

06-06-2022 11:45:41 PM

6 kudos

Hi @Kaniz Fatma ,I no longer see the answer you've posted, but I see you were suggesting to use `union`. As per my understanding, union are used to stack the dfs one upon another with similar schema / column names.In my situation, I have 2 different...

6 kudos

06-06-2022 11:45:41 PM

5 More Replies

User

Count

1601

736

343

284

246

Databricks

Forum Posts

Resolved! cannot resolve '(CAST(10000 AS BIGINT) div Khe)' due to data type mismatch:

Resolved! Access azure storage account from databricks notebook using pyspark or SQL

Resolved! Configuring pip index-url and using artifacts-keyring

Resolved! How to write *.csv file from DataBricks FileStore

Resolved! Table write command stuck "Filtering files for query."

What is a natural resource and why do we need them?Natural resources are vital to our survival and well-being. They provide the food, water, and energ...

www.vandevelde.eu

Resolved! Couldn't convert string to float when fit model

Resolved! how to convert each row of df to array of rows(list of rows)

Resolved! Remote RPC client disassociated error

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the U...

Resolved! SQL Analytics Map Visualization: Map marker size

Autoloader and "cleanSource"

Resolved! Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...