cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

lrodcon
by New Contributor III
  • 5157 Views
  • 4 replies
  • 4 kudos

Read external iceberg table in a spark dataframe within databricks

I am trying to read an external iceberg database from s3 location using the follwing commanddf_source = (spark.read.format("iceberg")   .load(source_s3_path)   .drop(*source_drop_columns)   .filter(f"{date_column}<='{date_filter}'")   )B...

  • 5157 Views
  • 4 replies
  • 4 kudos
Latest Reply
dynofu
New Contributor II
  • 4 kudos

https://issues.apache.org/jira/browse/SPARK-41344

  • 4 kudos
3 More Replies
Neil
by New Contributor
  • 997 Views
  • 1 replies
  • 0 kudos

While trying to save the spark dataframe to delta table is taking too long

While working on video analytics task I need to save the image bytes to the delta table earlier extracted into the spark dataframe. While I want to over write a same delta table over the period of complete task and also the size of input data differs...

  • 997 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

can you check the spark UI, to see where the time is spent?It can be a join, udf, ...

  • 0 kudos
kll
by New Contributor III
  • 428 Views
  • 0 replies
  • 0 kudos

Spark DataFrame apply Databricks geospatial indexing functions

I have a spark DataFrame with `h3` hex ids and I am trying to obtain the polygon geometries. from pyspark.sql import SparkSession from pyspark.sql.functions import col, expr from pyspark.databricks.sql.functions import *   from mosaic import enable_m...

  • 428 Views
  • 0 replies
  • 0 kudos
nirajtanwar
by New Contributor
  • 882 Views
  • 2 replies
  • 2 kudos

To collect the elements of a SparkDataFrame and coerces them into an R dataframe.

Hello Everyone,I am facing the challenge while collecting a spark dataframe into an R dataframe, this I need to do as I am using TraMineR algorithm whih is implemented in R only and the data pre-processing I have done in pysparkI am trying this:event...

  • 882 Views
  • 2 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @Niraj Tanwar​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thank...

  • 2 kudos
1 More Replies
Bartek
by Contributor
  • 1905 Views
  • 1 replies
  • 1 kudos

Save Spark DataFrame to shape file (.shp format)

Hello,I know how to create .shp file from Geopandas dataframe using code similar to this, also mentioned on SO:gpd_df = geopandas.GeoDataFrame(pandas_df, geometry='geom') gpd_df .to_file("username/nh.shp")However I have .parquet files that I can load...

  • 1905 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

@Bartosz Maciejewski​ :Spark does not have native support for writing Shapefiles directly. However, you can use a third-party library such as GeoPandas or PyShp to write your Spark DataFrame to a Shapefile.Here's an example of how to use GeoPandas to...

  • 1 kudos
raghub1
by New Contributor II
  • 4375 Views
  • 5 replies
  • 3 kudos

Resolved! Writing PySpark DataFrame onto AWS Glue throwing error

I have followed the steps as mentioned in this blog : https://www.linkedin.com/pulse/aws-glue-data-catalog-metastore-databricks-deepak-rajak/ but when trying to saveAsTable(table_name), it is giving an error as IllegalArgumentException: Path must be ...

  • 4375 Views
  • 5 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hey @Raghu Bharadwaj Tallapragada​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 3 kudos
4 More Replies
kmckee
by New Contributor II
  • 603 Views
  • 0 replies
  • 1 kudos

Trouble Displaying Full Size Images from Spark Dataframe

Hi, I have followed this guide (https://learn.microsoft.com/en-us/azure/databricks/_static/notebooks/image-data-source.html) to successfully load some image data into a spark df and display it as a thumbnail. I would like to display a single image fr...

  • 603 Views
  • 0 replies
  • 1 kudos
Mado
by Valued Contributor II
  • 3779 Views
  • 6 replies
  • 2 kudos

Resolved! How to see if condition is True / False for all rows in a DataFrame?

Assume that I have a Spark DataFrame, and I want to see if records satisfy a condition.Example dataset:# Prepare Data data = [('A', 1), \ ('A', 2), \ ('B', 3) ]   # Create DataFrame columns= ['col_1', 'col_2'] df = spark.createDataF...

image image
  • 3779 Views
  • 6 replies
  • 2 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 2 kudos

Hi you can use display() or show() function that will provide you expected results.

  • 2 kudos
5 More Replies
SIRIGIRI
by Contributor
  • 361 Views
  • 1 replies
  • 1 kudos

medium.com

Sorting In Spark**How to sort null values First and last of the records in the Spark data frame?Please find the answershttps://medium.com/@sharikrishna26/sorting-in-spark-a57db245ecd4

  • 361 Views
  • 1 replies
  • 1 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 1 kudos

Yeah this is really good post,keep it up Man

  • 1 kudos
Manjusha
by New Contributor II
  • 1338 Views
  • 1 replies
  • 1 kudos

SocketTimeout exception when running a display command on spark dataframe

I am using runtime 9.1LTSI have a R notebook that reads a csv into a R dataframe and does some transformations and finally is converted to spark dataframe using the createDataFrame function.after that when I call the display function on this spark da...

  • 1338 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Manjusha Unnikrishnan​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question first. Or else bricksters will get back to you soon. Thanks.

  • 1 kudos
andreas9898
by New Contributor II
  • 1940 Views
  • 3 replies
  • 5 kudos

Getting error with spark-sftp, no such file

In a databricks cluster with Scala 2.1.1 I am trying to read a file into a spark data frame using the following code.val df = spark.read .format("com.springml.spark.sftp") .option("host", "*") .option("username", "*") .option("password", "*")...

  • 1940 Views
  • 3 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

Hi @Andreas P​ Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from you.Thanks!

  • 5 kudos
2 More Replies
gideont
by New Contributor III
  • 1895 Views
  • 3 replies
  • 2 kudos

Resolved! spark sql update really slow

I tried to use Spark as much as possible but experience some regression. Hopefully to get some direction how to use it correctly.I've created a Databricks table using spark.sqlspark.sql('select * from example_view ') \ .write \ .mode('overwr...

image.png
  • 1895 Views
  • 3 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Hi @Vincent Doe​ â€‹, It would mean a lot if you could select the "Best Answer" to help others find the correct answer faster.This makes that answer appear right after the question, so it's easier to find within a thread.It also helps us mark the quest...

  • 2 kudos
2 More Replies
suman9872
by New Contributor II
  • 1183 Views
  • 1 replies
  • 1 kudos

How to dynamically convert Spark DataFrame to Nested json using Spark Scala

I want to convert the DataFrame to nested json. Sourse Data:-DataFrame have data value like :- As image 2 Expected Output:-I have to convert DataFrame value to Nested Json like : -As image 1Appreciate your help !

  • 1183 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Suman Mishra​, This article explains how to convert a flattened DataFrame to a nested structure by nesting a case class within another case class.You can use this technique to build a JSON file that can then be sent to an external API.

  • 1 kudos
Rajesh_M
by New Contributor III
  • 2335 Views
  • 4 replies
  • 6 kudos

Resolved! Unable to change the index, when writing to a Azure SQL Data Warehouse

Hi,I have some data in a spark data frame and I am trying to write it to a table in Azure SQL Data Warehouse. If I use df.write.mode(saveMode="overwrite") I get this error:com.microsoft.sqlserver.jdbc.SQLServerException: The statement failed. Column ...

  • 2335 Views
  • 4 replies
  • 6 kudos
Latest Reply
Kaniz
Community Manager
  • 6 kudos

Hi @Rajesh Mangipudi​ , Just a friendly follow-up. Do you still need help, or @Hubert Dudek (Customer)​ 's response help you to find the solution? Please let us know.

  • 6 kudos
3 More Replies
Labels