Am trying to use SQL, but createOrReplaceTempView("myDataView") fails
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-13-2020 04:59 AM
Am trying to use SQL, but createOrReplaceTempView("myDataView") fails.
I can create and display a DataFrame fine...
import pandas as pd
df = pd.DataFrame(['$3,000,000.00','$3,000.00', '$200.5', '$5.5'], columns = ['Amount'])
df
I add another cell, but it fails...
df.createOrReplaceTempView("myDataView")
I get this error..
'DataFrame' object has no attribute 'createOrReplaceTempView'
I see this example out there on the net allot, but don't understand why it fails for me. I am using
Community edition. 6.5 (includes Apache Spark 2.4.5, Scala 2.11)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2020 05:09 AM
I never worked with pandas on spark, but a pandas dataframe is not the same as a spark dataframe.
You need to convert it to a spark dataframe first with Koalas f.e.https://koalas.readthedocs.io/en/latest/user_guide/pandas_pyspark.html#pyspark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-15-2020 12:15 PM
You need to convert the pandas DF to an spark DF. Enabling Apache Arrow will make this process faster.
From https://docs.databricks.com/spark/latest/spark-sql/spark-pandas.html:
import numpy as np import pandas as pd Enable Arrow-based columnar data transferspark.conf.set("spark.sql.execution.arrow.enabled", "true")
Generate a pandas DataFramedf = pd.DataFrame(np.random.rand(100, 3))
Create a Spark DataFrame from a pandas DataFrame using Arrowf = spark.createDataFrame(pdf)
Convert the Spark DataFrame back to a pandas DataFrame using Arrowesult_pdf = df.select("*").toPandas()
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-01-2021 10:10 AM
This is worked for me. Thank you @acorson