cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Writing PySpark DataFrame onto AWS Glue throwing error

raghub1
New Contributor II

I have followed the steps as mentioned in this blog : https://www.linkedin.com/pulse/aws-glue-data-catalog-metastore-databricks-deepak-rajak/

but when trying to saveAsTable(table_name), it is giving an error as

IllegalArgumentException: Path must be absolute: <table_name>-__PLACEHOLDER__.

Can somebody help me on this please ?

1 ACCEPTED SOLUTION

Accepted Solutions

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Looking at the error message I believe the problem is the Glue database has no location which DBR/Delta needs. You can use alter database datalake-processed set location='...' or set the location directly in Glue console on AWS.

https://docs.databricks.com/data/metastores/aws-glue-metastore.html#creating-a-table-in-a-database-w...

View solution in original post

5 REPLIES 5

Prabakar
Esteemed Contributor III
Esteemed Contributor III

Looking at the error message I believe the problem is the Glue database has no location which DBR/Delta needs. You can use alter database datalake-processed set location='...' or set the location directly in Glue console on AWS.

https://docs.databricks.com/data/metastores/aws-glue-metastore.html#creating-a-table-in-a-database-w...

raghub1
New Contributor II

Thanks Prabhakar, I used the option of specifying the path as s3, but it wouldn't work. This is the code I used :

df_final.write.partitionBy("partition_cols").mode("append").option("path", "s3:// location").saveAsTable("table_name")

Can you help me out please ?

Kaniz
Community Manager
Community Manager

Hi @Raghu Bharadwaj Tallapragada​ , Can you paste the error stack here?

prabhatika
New Contributor II

@Kaniz Fatma​ 

I am also facing the same issue while using the `saveAsTable` function of DataFrameWriter. Following is the code snippet: -

import org.apache.spark.sql.functions.{col, dayofmonth, month, to_date, year}
import org.apache.spark.sql.types.DataTypes
 
val df = some-dataframe-here
val glueTableName = "database-name-here.table-name-here"
val s3Path = "s3a://some/path/here/"
val partitionKeys = Array("some-partition-key-here")
val dataframeWithYearMonthDay = df
                                  .withColumn("year", year(to_date(col("createdAt"))).cast(DataTypes.FloatType))
                                  .withColumn("month", month(to_date(col("createdAt"))).cast(DataTypes.FloatType))
                                  .withColumn("day", dayofmonth(to_date(col("createdAt"))).cast(DataTypes.FloatType))
dataframeWithYearMonthDay.write
.partitionBy(List("year", "month", "day") ++ partitionKeys: _*)
.mode("append")
.format("parquet")
.option("path", s3Path)
.saveAsTable(glueTableName)

PFA the stack trace. Please note that the given s3 location is completely empty and I am trying to create a new table here.

Also, I am facing this issue with only one table. Not facing this issue with writing to other tables.

Please let me know if any other information is needed from my end.

Anonymous
Not applicable

Hey @Raghu Bharadwaj Tallapragada​ 

Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.