cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lrodcon
by New Contributor III
  • 6630 Views
  • 4 replies
  • 4 kudos

Read external iceberg table in a spark dataframe within databricks

I am trying to read an external iceberg database from s3 location using the follwing commanddf_source = (spark.read.format("iceberg")   .load(source_s3_path)   .drop(*source_drop_columns)   .filter(f"{date_column}<='{date_filter}'")   )B...

  • 6630 Views
  • 4 replies
  • 4 kudos
Latest Reply
dynofu
New Contributor II
  • 4 kudos

https://issues.apache.org/jira/browse/SPARK-41344

  • 4 kudos
3 More Replies
Neil
by New Contributor
  • 3710 Views
  • 1 replies
  • 0 kudos

While trying to save the spark dataframe to delta table is taking too long

While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs...

  • 3710 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

can you check the spark UI, to see where the time is spent?It can be a join, udf, ...

  • 0 kudos
kll
by New Contributor III
  • 588 Views
  • 0 replies
  • 0 kudos

Spark DataFrame apply Databricks geospatial indexing functions

I have a spark DataFrame with `h3` hex ids and I am trying to obtain the polygon geometries. from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr from pyspark.databricks.sql.functions import *   from mosaic import enable_m...

  • 588 Views
  • 0 replies
  • 0 kudos
nirajtanwar
by New Contributor
  • 1312 Views
  • 2 replies
  • 2 kudos

To collect the elements of a SparkDataFrame and coerces them into an R dataframe.

Hello Everyone,I am facing the challenge while collecting a spark dataframe into an R dataframe, this I need to do as I am using TraMineR algorithm whih is implemented in R only and the data pre-processing I have done in pysparkI am trying this:event...

  • 1312 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Niraj Tanwar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 2 kudos
1 More Replies
Bartek
by Contributor
  • 2440 Views
  • 1 replies
  • 1 kudos

Save Spark DataFrame to shape file (.shp format)

Hello,I know how to create .shp file from Geopandas dataframe using code similar to this, also mentioned on SO:gpd_df = geopandas.GeoDataFrame(pandas_df, geometry='geom') gpd_df .to_file("username/nh.shp")However I have .parquet files that I can load...

  • 2440 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Bartosz Maciejewski​ :Spark does not have native support for writing Shapefiles directly. However, you can use a third-party library such as GeoPandas or PyShp to write your Spark DataFrame to a Shapefile.Here's an example of how to use GeoPandas to...

  • 1 kudos
raghub1
by New Contributor II
  • 5400 Views
  • 5 replies
  • 3 kudos

Resolved! Writing PySpark DataFrame onto AWS Glue throwing error

I have followed the steps as mentioned in this blog : https://www.linkedin.com/pulse/aws-glue-data-catalog-metastore-databricks-deepak-rajak/ but when trying to saveAsTable(table_name), it is giving an error as IllegalArgumentException: Path must be ...

  • 5400 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hey @Raghu Bharadwaj Tallapragada​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 3 kudos
4 More Replies
kmckee
by New Contributor II
  • 795 Views
  • 0 replies
  • 1 kudos

Trouble Displaying Full Size Images from Spark Dataframe

Hi, I have followed this guide (https://learn.microsoft.com/en-us/azure/databricks/_static/notebooks/image-data-source.html) to successfully load some image data into a spark df and display it as a thumbnail. I would like to display a single image fr...

  • 795 Views
  • 0 replies
  • 1 kudos
Mado
by Valued Contributor II
  • 5319 Views
  • 6 replies
  • 2 kudos

Resolved! How to see if condition is True / False for all rows in a DataFrame?

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.Example dataset:# Prepare Data data = [('A', 1), \ ('A', 2), \ ('B', 3) ]   # Create DataFrame columns= ['col_1', 'col_2'] df = spark.createDataF...

image image
  • 5319 Views
  • 6 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi you can use display() or show() function that will provide you expected results.

  • 2 kudos
5 More Replies
SIRIGIRI
by Contributor
  • 566 Views
  • 1 replies
  • 1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

  • 566 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Yeah this is really good post,keep it up Man

  • 1 kudos
Manjusha
by New Contributor II
  • 1647 Views
  • 1 replies
  • 1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

  • 1647 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Manjusha Unnikrishnan​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 1 kudos
andreas9898
by New Contributor II
  • 2472 Views
  • 3 replies
  • 5 kudos

Getting error with spark-sftp, no such file

In a databricks cluster with Scala 2.1.1 I am trying to read a file into a spark data frame using the following code.val df = spark.read .format("com.springml.spark.sftp") .option("host", "*") .option("username", "*") .option("password", "*")...

  • 2472 Views
  • 3 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Andreas P​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 5 kudos
2 More Replies
gideont
by New Contributor III
  • 2550 Views
  • 3 replies
  • 2 kudos

Resolved! spark sql update really slow

I tried to use Spark as much as possible but experience some regression. Hopefully to get some direction how to use it correctly.I've created a Databricks table using spark.sqlspark.sql('select * from example_view ') \ .write \ .mode('overwr...

image.png
  • 2550 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 2 kudos

Hi @Vincent Doe​ ​, It would mean a lot if you could select the "Best Answer" to help others find the correct answer faster.This makes that answer appear right after the question, so it's easier to find within a thread.It also helps us mark the quest...

  • 2 kudos
2 More Replies
suman9872
by New Contributor II
  • 1516 Views
  • 1 replies
  • 1 kudos

How to dynamically convert Spark DataFrame to Nested json using Spark Scala

I want to convert the DataFrame to nested json. Sourse Data:-DataFrame have data value like :- As image 2 Expected Output:-I have to convert DataFrame value to Nested Json like : -As image 1Appreciate your help !

  • 1516 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 1 kudos

Hi @Suman Mishra​, This article explains how to convert a flattened DataFrame to a nested structure by nesting a case class within another case class.You can use this technique to build a JSON file that can then be sent to an external API.

  • 1 kudos
Rajesh_M
by New Contributor III
  • 3068 Views
  • 4 replies
  • 6 kudos

Resolved! Unable to change the index, when writing to a Azure SQL Data Warehouse

Hi,I have some data in a spark data frame and I am trying to write it to a table in Azure SQL Data Warehouse. If I use df.write.mode(saveMode="overwrite") I get this error:com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column ...

  • 3068 Views
  • 4 replies
  • 6 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 6 kudos

Hi @Rajesh Mangipudi​ , Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer)​ 's response help you to find the solution? Please let us know.

  • 6 kudos
3 More Replies
Labels