cancel
Showing results for 
Search instead for 
Did you mean: 

What is the difference between registerTempTable() and saveAsTable()?

cfregly
Contributor
 
15 REPLIES 15

cfregly
Contributor

registerTempTable()

registerTempTable() creates an in-memory table that is scoped to the cluster in which it was created. The data is stored using Hive's highly-optimized, in-memory columnar format.

This is important for dashboards as dashboards running in a different cluster (ie. the single Dashboard Cluster) will not have access to the temp tables registered in another cluster.

Re-registering a temp table of the same name (using overwrite=true) but with new data causes an atomic memory pointer switch so the new data is seemlessly updated and immediately accessble for querying (ie. from a Dashboard).

saveAsTable()

saveAsTable() creates a permanent, physical table stored in S3 using the Parquet format. This table is accessible to all clusters including the dashboard cluster. The table metadata including the location of the file(s) is stored within the Hive metastore.

Re-creating a permanent table of the same name (using overwrite=true) but with new data causes the old data to be deleted and the new data to be saved in the same underlying file on S3. This may lead to moments when the data is not available due to S3's eventual consistency model. There are on-going improvements to reduce this down time, however.

I'm a extreme beginner with Spark, so I'm probably missing something big. Using saveAsTable(), how can I specify where to store the parquet file(s) in S3? SaveAsTable accepts only a table name, and saves data in the dbfs at this location /user/hive/warehouse/. I already mounted S3 with dbutils.fs.mount in /mnt/lake. Thanks

Anonymous
Not applicable

@Claudio Beretta - you are likely looking for the

saveAsParquet()
operation. You can find out more about that and other operations in the API documentation for SchemaRDDs.

One important note:

SchemaRDD
will be changed to
DataFrame
in an upcoming release.

Thanks @Pat McDonough​ , I tried to use saveAsParquet(s"s3n://...") earlier but it complained with "java.lang.RuntimeException: Unsupported datatype TimestampType".

About saveAsTable() I liked that it persists the data and registers the table at the same time. If only it could save it to S3, as the answer states, it would be perfect for what I was trying to do.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.