Data Engineering

Forum Posts

Sorted by:

by MartinB • Contributor III

09-11-2021 3:34:17 AM

11788 Views
5 replies
3 kudos

Resolved! Interoperability Spark ↔ Pandas: can't convert Spark dataframe to Pandas dataframe via df.toPandas() when it contains datetime value in distant future

Hi,I have multiple datasets in my data lake that feature valid_from and valid_to columns indicating validity of rows.If a row is valid currently, this is indicated by valid_to=9999-12-31 00:00:00.Example:Loading this into a Spark dataframe works fine...

Data Engineering

11788 Views
5 replies
3 kudos

09-11-2021 3:34:17 AM

View Replies

Latest Reply

ThePhil
New Contributor II

01-31-2025 2:26:53 PM

3 kudos

Be aware, that in Databricks 15.2 LTS this behavior is broken.I cannot find the code, but most likely related to the following option:https://github.com/apache/spark/commit/c1c710e7da75b989f4d14e84e85f336bc10920e0#diff-f9ddcc6cba651c6ebfd34e29ef049c3...

3 kudos

01-31-2025 2:26:53 PM

4 More Replies

by databicky • Contributor II

01-02-2023 1:08:45 AM

18404 Views
13 replies
4 kudos

How can we write a pandas dataframe into azure adls as excel file, when trying to write it is showing error as protocol not known 'abfss' like that.

Data Engineering

18404 Views
13 replies
4 kudos

01-02-2023 1:08:45 AM

View Replies

Latest Reply

FerArribas
Contributor

01-02-2023 1:25:43 PM

4 kudos

Hi @Hubert Dudek,Pandas API doesn't support abfss protocol.You have three options:If you need to use pandas, you can write the excel to the local file system (dbfs) and then move it to ABFSS (for example with dbutils)Write as csv directly in abfss...

4 kudos

01-02-2023 1:25:43 PM

12 More Replies

by amitdatabricksc • New Contributor II

10-15-2021 3:13:54 PM

11136 Views
4 replies
2 kudos

how to zip a dataframe

how to zip a dataframe so that i get a zipped csv output file. please share command. it is only 1 dataframe involved and not multiple.

Data Engineering

11136 Views
4 replies
2 kudos

10-15-2021 3:13:54 PM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

10-18-2021 1:20:14 AM

2 kudos

writing to a local directory does not work.See this topic:https://community.databricks.com/s/feed/0D53f00001M7hNlCAJ

2 kudos

10-18-2021 1:20:14 AM

3 More Replies

by Rani • New Contributor

11-23-2016 8:27:33 AM

9197 Views
2 replies
0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

Data Engineering

9197 Views
2 replies
0 kudos

11-23-2016 8:27:33 AM

View Replies

Latest Reply

subham0611
New Contributor II

10-27-2023 2:02:07 AM

0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

0 kudos

10-27-2023 2:02:07 AM

1 More Replies

by alexkit • New Contributor II

04-11-2023 6:07:18 AM

2630 Views
4 replies
3 kudos

ASP1.2 Error create database in Spark Programming with Databricks training

I'm on Demo and Lab in Dataframes section. I've imported the dbc into my company cluster and has run "%run ./Includes/Classroom-Setup" successfully. When i run the 1st sql command %sql CREATE TABLE IF NOT EXISTS events USING parquet OPTIONS (path "/m...

Data Engineering

2630 Views
4 replies
3 kudos

04-11-2023 6:07:18 AM

View Replies

Latest Reply

KDOCKX
New Contributor II

09-07-2023 12:56:48 AM

3 kudos

I had the same issue and solved it like this:In the includes folder, there is a reset notebook, run the first command, this unmounts all mounted databases.Go back to the ASP 1.2 notebook and run the %run ./Includes/Classroom-Setup codeblock.Then run ...

3 kudos

09-07-2023 12:56:48 AM

3 More Replies

by Ram443 • New Contributor III

12-24-2022 5:02:21 PM

39477 Views
9 replies
5 kudos

Resolved! I created a data frame but was not able to see the data

Code to create a data frame:from pyspark.sql import SparkSessionspark=SparkSession.builder.appName("oracle_queries").master("local[4]")\ .config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()from pyspark.sql.functions ...

Data Engineering

39477 Views
9 replies
5 kudos

12-24-2022 5:02:21 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-26-2022 5:33:15 PM

5 kudos

@ramanjaneyulu kancharla can you please select my answer as best answer

5 kudos

12-26-2022 5:33:15 PM

8 More Replies

by pcriado • New Contributor III

06-07-2023 2:09:05 PM

7075 Views
2 replies
1 kudos

Resolved! Requested array size exceeds VM limit when saving to feature table

Hi, I'm trying to process a small dataset (less than 300 Mb) composed by five queries that run with spark. The end result of those queries is parsed using python and merged into a data frame. Then I try to write this to a delta lake table using featu...

Data Engineering

7075 Views
2 replies
1 kudos

06-07-2023 2:09:05 PM

View Replies

Latest Reply

pcriado
New Contributor III

06-16-2023 5:30:58 AM

1 kudos

Hello, we have recently found that it's my user in particular that casues the memory issue. Two other users in my organization can run the same notebook without problems, but my user consistenly consumes all available ram and crashes the cluster... a...

1 kudos

06-16-2023 5:30:58 AM

1 More Replies

by etsyal1e2r3 • Honored Contributor

06-03-2023 5:00:16 PM

10346 Views
1 replies
2 kudos

Resolved! Compiling Flattened Dataframe back to Struct Columns

I have a dataframe with this format of columns:[`first.second.third` , `alpha.bravo.test1` , `alpha.bravo.test2`]I'd like to get an output dataframe of this:[ `first` | `alpha` ] ---------------...

Data Engineering

10346 Views
1 replies
2 kudos

06-03-2023 5:00:16 PM

View Replies

Latest Reply

etsyal1e2r3
Honored Contributor

06-05-2023 9:43:01 PM

2 kudos

I have figured out the solution.

2 kudos

06-05-2023 9:43:01 PM

by konda1 • New Contributor

05-31-2023 3:27:13 AM

1131 Views
0 replies
0 kudos

Getting Executor lost due to stage failure error on writing data frame to a delta table or any file like parquet or csv or avro

We are working on multiline nested ( multilevel).The file is read and flattened using pyspark and the data frame is showing data using display() method. when saving the same dataframe it is giving executor lost failure error.for some files it is givi...

Data Engineering

1131 Views
0 replies
0 kudos

05-31-2023 3:27:13 AM

by Neil • New Contributor

05-24-2023 5:08:10 AM

5766 Views
1 replies
0 kudos

While trying to save the spark dataframe to delta table is taking too long

While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs...

Data Engineering

5766 Views
1 replies
0 kudos

05-24-2023 5:08:10 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

05-25-2023 12:52:58 AM

0 kudos

can you check the spark UI, to see where the time is spent?It can be a join, udf, ...

0 kudos

05-25-2023 12:52:58 AM

by kll • New Contributor III

05-15-2023 2:13:01 PM

956 Views
0 replies
0 kudos

Spark DataFrame apply Databricks geospatial indexing functions

I have a spark DataFrame with `h3` hex ids and I am trying to obtain the polygon geometries. from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr from pyspark.databricks.sql.functions import * from mosaic import enable_m...

Data Engineering

956 Views
0 replies
0 kudos

05-15-2023 2:13:01 PM

by Vishal09k • New Contributor II

04-29-2023 9:45:39 AM

2776 Views
1 replies
3 kudos

Display Command Not showing the Result, Rather giving the Dataframe Schema

Data Engineering

2776 Views
1 replies
3 kudos

04-29-2023 9:45:39 AM

View Replies

Latest Reply

Rishabh-Pandey
Esteemed Contributor

05-01-2023 3:49:18 AM

3 kudos

hey ,can you try you sql query with this methodselect * from (your sql query )

3 kudos

05-01-2023 3:49:18 AM

by arw1070 • New Contributor II

04-12-2023 9:06:52 AM

2485 Views
2 replies
0 kudos

Downstream delta live table is unable to read data frame from upstream table

I have been trying to work on implementing delta live tables to a pre-existing workflow. Currently trying to create two tables: appointments_raw and notes_raw, where notes_raw is "downstream" of appointments_raw. Following this as a reference, I'm at...

Data Engineering

2485 Views
2 replies
0 kudos

04-12-2023 9:06:52 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-16-2023 12:09:00 AM

0 kudos

@Anna Wuest : Could you please send me the code snippet here? Thanks.

0 kudos

04-16-2023 12:09:00 AM

1 More Replies

by afzi • New Contributor II

08-10-2022 10:40:47 PM

2800 Views
1 replies
1 kudos

Pandas DataFrame error when using to_csv

Hi Everyone, I would like to a Pandas Dataframe to /dbfs/FileStore/ using to_csv method.Usually it would just write the Dataframe to the path described but It has been giving me "FileNotFoundError: [Errno 2] No such file or directory: '/dbfs/FileStor...

Data Engineering

2800 Views
1 replies
1 kudos

08-10-2022 10:40:47 PM

View Replies

Latest Reply

Avinash_94
New Contributor III

04-14-2023 12:31:19 AM

1 kudos

f = open("/dbfs/mnt/blob/myNames.txt", "r")

1 kudos

04-14-2023 12:31:19 AM

by elgeo • Valued Contributor II

02-13-2023 5:07:31 AM

5013 Views
2 replies
0 kudos

Trasform SQL Cursor using Pyspark in Databricks

We have a Cursor in DB2 which reads in each loop data from 2 tables. At the end of each loop, after inserting the data to a target table, we update records related to each loop in these 2 tables before moving to the next loop. An indicative example i...

Data Engineering

5013 Views
2 replies
0 kudos

02-13-2023 5:07:31 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-10-2023 3:11:21 AM

0 kudos

Hi @ELENI GEORGOUSI Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answe...

0 kudos

04-10-2023 3:11:21 AM

1 More Replies