Data Engineering

Forum Posts

Sorted by:

by lrodcon • New Contributor III

12-29-2022 11:27:17 AM

9933 Views
5 replies
4 kudos

Read external iceberg table in a spark dataframe within databricks

I am trying to read an external iceberg database from s3 location using the follwing commanddf_source = (spark.read.format("iceberg") .load(source_s3_path) .drop(*source_drop_columns) .filter(f"{date_column}<='{date_filter}'") )B...

Data Engineering

9933 Views
5 replies
4 kudos

12-29-2022 11:27:17 AM

View Replies

Latest Reply

dynofu
New Contributor II

06-10-2023 11:00:48 AM

4 kudos

https://issues.apache.org/jira/browse/SPARK-41344

4 kudos

06-10-2023 11:00:48 AM

4 More Replies

by Neil • New Contributor

05-24-2023 5:08:10 AM

5638 Views
1 replies
0 kudos

While trying to save the spark dataframe to delta table is taking too long

While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs...

Data Engineering

5638 Views
1 replies
0 kudos

05-24-2023 5:08:10 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

05-25-2023 12:52:58 AM

0 kudos

can you check the spark UI, to see where the time is spent?It can be a join, udf, ...

0 kudos

05-25-2023 12:52:58 AM

by kll • New Contributor III

05-15-2023 2:13:01 PM

861 Views
0 replies
0 kudos

Spark DataFrame apply Databricks geospatial indexing functions

I have a spark DataFrame with `h3` hex ids and I am trying to obtain the polygon geometries. from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr from pyspark.databricks.sql.functions import * from mosaic import enable_m...

Data Engineering

861 Views
0 replies
0 kudos

05-15-2023 2:13:01 PM

by nirajtanwar • New Contributor

03-16-2023 4:29:00 AM

1984 Views
2 replies
2 kudos

To collect the elements of a SparkDataFrame and coerces them into an R dataframe.

Hello Everyone,I am facing the challenge while collecting a spark dataframe into an R dataframe, this I need to do as I am using TraMineR algorithm whih is implemented in R only and the data pre-processing I have done in pysparkI am trying this:event...

Data Engineering

1984 Views
2 replies
2 kudos

03-16-2023 4:29:00 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-17-2023 11:20:04 PM

2 kudos

Hi @Niraj Tanwar Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

2 kudos

03-17-2023 11:20:04 PM

1 More Replies

by Bartek • Contributor

01-27-2023 5:07:14 AM

3425 Views
1 replies
1 kudos

Save Spark DataFrame to shape file (.shp format)

Hello,I know how to create .shp file from Geopandas dataframe using code similar to this, also mentioned on SO:gpd_df = geopandas.GeoDataFrame(pandas_df, geometry='geom') gpd_df .to_file("username/nh.shp")However I have .parquet files that I can load...

Data Engineering

3425 Views
1 replies
1 kudos

01-27-2023 5:07:14 AM

View Replies

Latest Reply

Anonymous
Not applicable

03-08-2023 8:14:39 PM

1 kudos

@Bartosz Maciejewski :Spark does not have native support for writing Shapefiles directly. However, you can use a third-party library such as GeoPandas or PyShp to write your Spark DataFrame to a Shapefile.Here's an example of how to use GeoPandas to...

1 kudos

03-08-2023 8:14:39 PM

by raghub1 • New Contributor II

05-10-2022 1:12:02 AM

7220 Views
4 replies
3 kudos

Resolved! Writing PySpark DataFrame onto AWS Glue throwing error

I have followed the steps as mentioned in this blog : https://www.linkedin.com/pulse/aws-glue-data-catalog-metastore-databricks-deepak-rajak/ but when trying to saveAsTable(table_name), it is giving an error as IllegalArgumentException: Path must be ...

Data Engineering

7220 Views
4 replies
3 kudos

05-10-2022 1:12:02 AM

View Replies

Latest Reply

Anonymous
Not applicable

06-21-2022 9:10:24 AM

3 kudos

Hey @Raghu Bharadwaj Tallapragada Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

3 kudos

06-21-2022 9:10:24 AM

3 More Replies

by kmckee • New Contributor II

12-21-2022 11:48:51 AM

1085 Views
0 replies
1 kudos

Trouble Displaying Full Size Images from Spark Dataframe

Hi, I have followed this guide (https://learn.microsoft.com/en-us/azure/databricks/_static/notebooks/image-data-source.html) to successfully load some image data into a spark df and display it as a thumbnail. I would like to display a single image fr...

Data Engineering

1085 Views
0 replies
1 kudos

12-21-2022 11:48:51 AM

by Mado • Valued Contributor II

12-20-2022 1:01:42 AM

8284 Views
6 replies
2 kudos

Resolved! How to see if condition is True / False for all rows in a DataFrame?

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.Example dataset:# Prepare Data data = [('A', 1), \ ('A', 2), \ ('B', 3) ] # Create DataFrame columns= ['col_1', 'col_2'] df = spark.createDataF...

Data Engineering

8284 Views
6 replies
2 kudos

12-20-2022 1:01:42 AM

View Replies

Latest Reply

Ajay-Pandey
Esteemed Contributor III

12-20-2022 3:57:33 AM

2 kudos

Hi you can use display() or show() function that will provide you expected results.

2 kudos

12-20-2022 3:57:33 AM

5 More Replies

by SIRIGIRI • Contributor

12-17-2022 6:36:08 AM

817 Views
1 replies
1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

Data Engineering

817 Views
1 replies
1 kudos

12-17-2022 6:36:08 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-17-2022 10:01:09 PM

1 kudos

Yeah this is really good post,keep it up Man

1 kudos

12-17-2022 10:01:09 PM

by Manjusha • New Contributor II

10-13-2022 5:16:00 AM

2275 Views
1 replies
1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

Data Engineering

2275 Views
1 replies
1 kudos

10-13-2022 5:16:00 AM

View Replies

Latest Reply

Anonymous
Not applicable

11-24-2022 10:36:21 PM

1 kudos

Hi @Manjusha Unnikrishnan Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

1 kudos

11-24-2022 10:36:21 PM

by andreas9898 • New Contributor II

10-13-2022 3:23:03 PM

3195 Views
3 replies
5 kudos

Getting error with spark-sftp, no such file

In a databricks cluster with Scala 2.1.1 I am trying to read a file into a spark data frame using the following code.val df = spark.read .format("com.springml.spark.sftp") .option("host", "*") .option("username", "*") .option("password", "*")...

Data Engineering

3195 Views
3 replies
5 kudos

10-13-2022 3:23:03 PM

View Replies

Latest Reply

Anonymous
Not applicable

11-23-2022 1:32:06 AM

5 kudos

Hi @Andreas P Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

5 kudos

11-23-2022 1:32:06 AM

2 More Replies

by gideont • New Contributor III

11-07-2022 8:11:31 PM

3752 Views
2 replies
2 kudos

Resolved! spark sql update really slow

I tried to use Spark as much as possible but experience some regression. Hopefully to get some direction how to use it correctly.I've created a Databricks table using spark.sqlspark.sql('select * from example_view ') \ .write \ .mode('overwr...

Data Engineering

3752 Views
2 replies
2 kudos

11-07-2022 8:11:31 PM

View Replies

Latest Reply

Pat
Honored Contributor III

11-08-2022 12:00:14 AM

2 kudos

Hi, @Vincent Doe ,Updates are available in Delta tables, but under the hood you are updating parquet files, it means that each update needs to find the file where records are stored, then re-write the file to new version, and make new file current v...

2 kudos

11-08-2022 12:00:14 AM

1 More Replies

by HariharaSam • Contributor

09-15-2022 4:23:30 AM

1007 Views
0 replies
0 kudos

Converting Rows of Spark Dataframe to List

How to convert the rows of a spark dataframe to list without using Pandas.Input Spark Dataframe :Expected Output:[['A','B','C'],['1','2','3'],['4','5','6'],['7','8','9']]

Data Engineering

1007 Views
0 replies
0 kudos

09-15-2022 4:23:30 AM

by Rajesh_M • New Contributor III

03-10-2022 8:38:21 AM

4107 Views
3 replies
6 kudos

Resolved! Unable to change the index, when writing to a Azure SQL Data Warehouse

Hi,I have some data in a spark data frame and I am trying to write it to a table in Azure SQL Data Warehouse. If I use df.write.mode(saveMode="overwrite") I get this error:com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column ...

Data Engineering

4107 Views
3 replies
6 kudos

03-10-2022 8:38:21 AM

View Replies

Latest Reply

Rajesh_M
New Contributor III

03-10-2022 10:28:26 AM

6 kudos

Thanks @Hubert Dudek . Do you know if there is a way to run a create table statement on Azure Synapse/Azure SQL Datawarehouse from Databricks?

6 kudos

03-10-2022 10:28:26 AM

2 More Replies

by Santosh09 • New Contributor II

01-18-2022 3:07:25 AM

6499 Views
4 replies
3 kudos

Resolved! Writing Spark data frame to ADLS is taking Huge time when Data Frame is of Text data.

Spark data frame with text data when schema is in Struct type spark is taking too much time to write / save / push data to ADLS or SQL Db or download as csv.

Data Engineering

6499 Views
4 replies
3 kudos

01-18-2022 3:07:25 AM

View Replies

Latest Reply

User16764241763
Honored Contributor

03-14-2022 8:27:45 AM

3 kudos

@shiva Santosh Have to checked the count of the dataframe that you are trying to save to ADLS?As @Joseph Kambourakis mentioned the explode can result in 1-many rows, better to check data frame count and see if Spark OOMs in the workspace.

3 kudos

03-14-2022 8:27:45 AM

3 More Replies