Topics with Label: Spark DataFrames

Forum Posts

Sorted by:

by THIAM_HUATTAN • Valued Contributor

06-29-2022 5:42:51 AM

2036 Views
2 replies
0 kudos

Resolved! Save data from Spark DataFrames to TFRecords

https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/deep-learning/tfrecords-save-load.htmlI could not run the Cell # 2java.lang.ClassNotFoundException: --------------------------------------------------------------------------- Py4JJ...

Data Engineering

2036 Views
2 replies
0 kudos

06-29-2022 5:42:51 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

07-05-2022 10:47:39 AM

0 kudos

Hi @THIAM HUAT TAN,Which DBR version are you using? are you using the ML runtime?

0 kudos

07-05-2022 10:47:39 AM

1 More Replies

by Raie • New Contributor III

03-18-2022 10:24:34 AM

9798 Views
3 replies
4 kudos

Resolved! How do I specify column's data type with spark dataframes?

What I am doing:spark_df = spark.createDataFrame(dfnew)spark_df.write.saveAsTable("default.test_table", index=False, header=True)This automatically detects the datatypes and is working right now. BUT, what if the datatype cannot be detected or detect...

Data Engineering

9798 Views
3 replies
4 kudos

03-18-2022 10:24:34 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-20-2022 7:47:29 AM

4 kudos

just create table earlier and set column types (CREATE TABLE ... LOCATION ( path path)in dataframe you need to have corresponding data types which you can make using cast syntax, just your syntax is incorrect, here is example of correct syntax:from p...

4 kudos

03-20-2022 7:47:29 AM

2 More Replies

by Jeff1 • Contributor II

03-14-2022 7:34:01 AM

2755 Views
3 replies
5 kudos

Resolved! Understand Spark DataFrames verse R DataFrames

CommunityI’ve been struggling with utilizing R language in databricks and after reading “Mastering Spark with R,” I believe my initial problems stemmed from not understating the difference between Spark DataFrames and R DataFrames within the databric...

Data Engineering

2755 Views
3 replies
5 kudos

03-14-2022 7:34:01 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

03-14-2022 8:18:41 AM

5 kudos

As Spark dataframes are handled in distributed way on workers it is better just to use Spark dataframes. Additionally collect is executed on driver and takes whole dataset into memory so it is shouldn't be used in production.

5 kudos

03-14-2022 8:18:41 AM

2 More Replies

Databricks Community

Resolved! Save data from Spark DataFrames to TFRecords

Resolved! How do I specify column's data type with spark dataframes?

Resolved! Understand Spark DataFrames verse R DataFrames