cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Adalberto
by New Contributor II
  • 2750 Views
  • 4 replies
  • 2 kudos

Resolved! cannot resolve '(CAST(10000 AS BIGINT) div Khe)' due to data type mismatch:

Hi,I'm trying to create a delta table using SQL but I'm getting this errorError in SQL statement: AnalysisException: cannot resolve '(CAST(10000 AS BIGINT) div Khe)' due to data type mismatch: differing types in '(CAST(10000 AS BIGINT) div Khe)' (big...

  • 2750 Views
  • 4 replies
  • 2 kudos
Latest Reply
Noopur_Nigam
Valued Contributor II
  • 2 kudos

Hi @Adalberto Garcia Espinosa​ Do you need khe column to be double? If not, below query is working:%sql CREATE OR REPLACE TABLE Productos(Khe bigint NOT NULL,Fctor_HL_Estiba bigint GENERATED ALWAYS AS (cast(10000 as bigint) div Khe)) seems to be work...

  • 2 kudos
3 More Replies
Ambi
by New Contributor III
  • 2592 Views
  • 6 replies
  • 8 kudos

Resolved! Access azure storage account from databricks notebook using pyspark or SQL

I have a storage account - Azure BLOB StorageThere I had container. Inside the container we had a CSV file. Couldn't read the file using the access Key and Storage account name.Any idea how to read file using PySpark/SQL? Thanks in advance

  • 2592 Views
  • 6 replies
  • 8 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 8 kudos

@Ambiga D​ you need to mount the storage https://docs.databricks.com/data/data-sources/azure/azure-storage.html#mount-azure-blob-storage-containers-to-dbfs you can follow this,thanks.

  • 8 kudos
5 More Replies
Confused
by New Contributor III
  • 13960 Views
  • 2 replies
  • 1 kudos

Resolved! Configuring pip index-url and using artifacts-keyring

Hi I would like to use the azure artifact feed as my default index-url when doing a pip install on a Databricks cluster. I understand I can achieve this by updating the pip.conf file with my artifact feed as the index-url. Does anyone know where i...

  • 13960 Views
  • 2 replies
  • 1 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 1 kudos

for your first question https://docs.databricks.com/libraries/index.html#python-environment-management and https://docs.databricks.com/libraries/notebooks-python-libraries.html#manage-libraries-with-pip-commands this may help. again you can convert t...

  • 1 kudos
1 More Replies
Jeff1
by Contributor II
  • 9342 Views
  • 7 replies
  • 10 kudos

Resolved! How to write *.csv file from DataBricks FileStore

Struggling with how to export a Spark dataframe as a *.csv file to a local computer. I'm successfully using the spark_write_csv funciton (sparklyr R library R) to write the csv file out to my databricks dbfs:FileStore location. Becase (I'm assuming)...

  • 9342 Views
  • 7 replies
  • 10 kudos
Latest Reply
Kaniz
Community Manager
  • 10 kudos

Hi @Jeff (Customer), Were you able to follow @Hubert Dudek​ ? Did it help you?

  • 10 kudos
6 More Replies
boskicl
by New Contributor III
  • 13132 Views
  • 5 replies
  • 10 kudos

Resolved! Table write command stuck "Filtering files for query."

Hello all,Background:I am having an issue today with databricks using pyspark-sql and writing a delta table. The dataframe is made by doing an inner join between two tables and that is the table which I am trying to write to a delta table. The table ...

filtering job_info spill_memory
  • 13132 Views
  • 5 replies
  • 10 kudos
Latest Reply
Anonymous
Not applicable
  • 10 kudos

@Ljuboslav Boskic​ there can be multiple reasons why the query is taking more time , during this phase metadata look-up activity happens, can you please check on the below things Ensuring the tables are z-ordered properly, and that the merge key (on ...

  • 10 kudos
4 More Replies
Whitcomb_Selins
by New Contributor
  • 418 Views
  • 0 replies
  • 0 kudos

What is a natural resource and why do we need them?Natural resources are vital to our survival and well-being. They provide the food, water, and energ...

What is a natural resource and why do we need them?Natural resources are vital to our survival and well-being. They provide the food, water, and energy that we need to live, and they support the ecosystems that we rely on for our livelihoods.However,...

  • 418 Views
  • 0 replies
  • 0 kudos
Anonymous
by Not applicable
  • 328 Views
  • 0 replies
  • 2 kudos

www.vandevelde.eu

June Featured Member of the Month ! Werner Stinckens Job Title: Data Engineer @ Van de Velde (www.vandevelde.eu)What are three words your coworkers would use to describe you?Helpful, accurate, inquisitiveWhat is your favorite thing about your curren...

  • 328 Views
  • 0 replies
  • 2 kudos
enri_casca
by New Contributor III
  • 4709 Views
  • 13 replies
  • 2 kudos

Resolved! Couldn't convert string to float when fit model

Hi, I am very new in databricks and I am trying to run quick experiments to understand the best practice for me, my colleagues and the company.I pull the data from snowflakedf = spark.read \  .format("snowflake") \  .options(**options) \  .option('qu...

  • 4709 Views
  • 13 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

can you check this SO topic?

  • 2 kudos
12 More Replies
SailajaB
by Valued Contributor III
  • 6139 Views
  • 5 replies
  • 12 kudos

Resolved! how to convert each row of df to array of rows(list of rows)

Hi,How to convert each row of dataframe to array of rows?Here is our scenario , we need to pass each row of dataframe to one function as dict to apply the key level transformations. But as our data is very huge we can't use collect df.toJson().colle...

  • 6139 Views
  • 5 replies
  • 12 kudos
Latest Reply
SailajaB
Valued Contributor III
  • 12 kudos

@Hubert Dudek​ , Thank you for the reply. We are new to ADB. And using the below code, looking for an optimized way to do itdfJSONString = df.toJSON().collect()stringList = []  for row in dfJSONString:    # ==== Unflatten the JSON string ==== #    js...

  • 12 kudos
4 More Replies
Alix
by New Contributor III
  • 7304 Views
  • 9 replies
  • 3 kudos

Resolved! Remote RPC client disassociated error

Hello,I've been trying to submit a job to a transient cluster, but it is failing with this error :Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in ...

  • 7304 Views
  • 9 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @Alix Métivier​ , Just a friendly follow-up. Do you still need help, or @Shanmugavel Chandrakasu​ 's response help you to find the solution? Please let us know.

  • 3 kudos
8 More Replies
cuteabhi32
by New Contributor III
  • 25502 Views
  • 11 replies
  • 1 kudos

Resolved! Trying to check if a column exist in a dataframe or not if not then i have to give NULL if yes then i need to give the column itself by using UDF

from pyspark import SparkContextfrom pyspark import SparkConffrom pyspark.sql.types import *from pyspark.sql.functions import *from pyspark.sql import *from pyspark.sql.types import StringTypefrom pyspark.sql.functions import udfdf1 = spark.read.form...

  • 25502 Views
  • 11 replies
  • 1 kudos
Latest Reply
cuteabhi32
New Contributor III
  • 1 kudos

Thanks i modified my code as per your suggestion and it worked perfectly Thanks again for all your inputsdflist= spark.createDataFrame(list(a.columns), "string").toDF("Name")dfg=dflist.filter(col('name').isin('ref_date')).count()if dfg==1 :  a = a.wi...

  • 1 kudos
10 More Replies
Steamboat_Ski_C
by New Contributor
  • 296 Views
  • 0 replies
  • 0 kudos

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the U...

What are Canyon Creek Condos and what do they offer residents?Canyon Creek Condos are a type of housing that is becoming increasingly popular in the United States. These types of condos are typically located in rural or suburban areas and offer resid...

  • 296 Views
  • 0 replies
  • 0 kudos
arda_123
by New Contributor III
  • 1581 Views
  • 5 replies
  • 2 kudos

Resolved! SQL Analytics Map Visualization: Map marker size

Hello all, I am trying to use the Map visualization in SQL Analytics Dashboard in Databricks. Does any one knows how or if we can change the size/radius of the markers based on values in another column. This seems like a very trivial parameter but I ...

  • 1581 Views
  • 5 replies
  • 2 kudos
Latest Reply
arda_123
New Contributor III
  • 2 kudos

Thanks @Kaniz Fatma​ 

  • 2 kudos
4 More Replies
laurencewells
by New Contributor III
  • 2352 Views
  • 5 replies
  • 1 kudos

Autoloader and "cleanSource"

Hi All, We are trying to use the Spark 3 structured streaming feature/option ".option('cleanSource','archive')" to archive processed files. This is working as expected using the standard spark implementation, however does not appear to work using aut...

  • 2352 Views
  • 5 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

https://docs.databricks.com/ingestion/auto-loader/options.html#common-auto-loader-optionscleanSource is not a listed option so it won't do anything.Maybe event retention is something you can use?

  • 1 kudos
4 More Replies
RiyazAli
by Contributor III
  • 4067 Views
  • 6 replies
  • 6 kudos

Resolved! Is there a way to CONCAT two dataframes on either of the axis (row/column) and transpose the dataframe in PySpark?

I'm reshaping my dataframe as per requirement and I came across this situation where I'm concatenating 2 dataframes and then transposing them. I've done this previously using pandas and the syntax for pandas goes as below:import pandas as pd   df1 = ...

  • 4067 Views
  • 6 replies
  • 6 kudos
Latest Reply
RiyazAli
Contributor III
  • 6 kudos

Hi @Kaniz Fatma​ ,I no longer see the answer you've posted, but I see you were suggesting to use `union`. As per my understanding, union are used to stack the dfs one upon another with similar schema / column names.In my situation, I have 2 different...

  • 6 kudos
5 More Replies
Labels
Top Kudoed Authors