12-24-2022 05:02 PM
Code to create a data frame:
from pyspark.sql import SparkSession
spark=SparkSession.builder.appName("oracle_queries").master("local[4]")\
.config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()
from pyspark.sql.functions import current_date,year
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
from datetime import datetime, date
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
rdd=sc.parallelize([('ram','chi'),
('anil','ind')])
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
Trying to see the data but face below error:
df.show()
Error:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-2-1a6ce2362cd4> in <module>
----> 1 df.show()
c:\program files (x86)\python38-32\lib\site-packages\pyspark\sql\dataframe.py in show(self, n, truncate, vertical)
438 """
439 if isinstance(truncate, bool) and truncate:
--> 440 print(self._jdf.showString(n, 20, vertical))
441 else:
442 print(self._jdf.showString(n, int(truncate), vertical))
c:\program files (x86)\python38-32\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
c:\program files (x86)\python38-32\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
126 def deco(*a, **kw😞
127 try:
--> 128 return f(*a, **kw)
129 except py4j.protocol.Py4JJavaError as e:
130 converted = convert_exception(e.java_exception)
c:\program files (x86)\python38-32\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o48.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (DESKTOP-N3C4AUC.attlocal.net executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "C:\pyspark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\worker.py", line 668, in main
ore
12-24-2022 07:21 PM
There can be 3 things
1-You are initializing spark context 2 times
2-You have something wrong with your spark config
3-This can be python version issue
Because I checked , it is working perfectly fine for me
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date,year
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
from datetime import datetime, date
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
rdd=sc.parallelize([('ram','chi'),
('anil','ind')])
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
#Trying to see the data but face below error:
df.show()
Attaching Image also
Please select it as the best answer if you like this.
Thanks
Aviral Bhardwaj
12-24-2022 07:21 PM
There can be 3 things
1-You are initializing spark context 2 times
2-You have something wrong with your spark config
3-This can be python version issue
Because I checked , it is working perfectly fine for me
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date,year
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
from datetime import datetime, date
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
rdd=sc.parallelize([('ram','chi'),
('anil','ind')])
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
#Trying to see the data but face below error:
df.show()
Attaching Image also
Please select it as the best answer if you like this.
Thanks
Aviral Bhardwaj
12-24-2022 09:04 PM
Thank you Bhardwaj for checking the issue.
Below is the configuration I am using, Please let me know if required more details.
If I use read CSV, it is working and I am facing an issue if I create a data frame manually.
I installed Spark in my local system.
Configuration :
[('spark.app.name', 'oracle_queries'),
('spark.driver.extraJavaOptions',
'-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),
('spark.master', 'local[4]'),
('spark.driver.port', '50219'),
('spark.app.startTime', '1671929660604'),
('spark.executor.id', 'driver'),
('spark.app.id', 'local-1671929663025'),
('spark.sql.warehouse.dir', 'file:/C:/softwares/git/pyspark/hive'),
('spark.sql.catalogImplementation', 'hive'),
('spark.rdd.compress', 'True'),
('spark.executor.extraJavaOptions',
'-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),
('spark.app.submitTime', '1671929660216'),
('spark.serializer.objectStreamReset', '100'),
('spark.submit.pyFiles', ''),
('spark.submit.deployMode', 'client'),
('spark.driver.host', 'DESKTOP-test.attlocal.net'),
('spark.ui.showConsoleProgress', 'true')]
CSV Code :
source_path="C:\\softwares\\git\\pyspark\\source\\emp.csv"
emp=spark.read.csv(source_path,header=True,inferSchema=True)
emp.show()
Output:
+-----+------+---------+----+-----------+----+----+------+
|empno| ename| job| mgr| hiredate| sal|comm|deptno|
+-----+------+---------+----+-----------+----+----+------+
| 7369| SMITH| CLERK|7902|17-DEC-1980| 800|null| 20|
| 7499| ALLEN| null|7698|20-FEB-1981|1600| 300| 30|
| 7521| WARD| SALESMAN|7698|22-FEB-1981|1250| 500| 30|
| 7566| JONES| MANAGER|7839| 2-APR-1981|2975|null| 20|
| 7654|MARTIN| SALESMAN|7698|28-SEP-1981|1250|1400| 30|
| 7698| BLAKE| MANAGER|7839| 1-MAY-1981|2850| 0| 30|
| 7782| CLARK| MANAGER|7839| 9-JUN-1981|2450| 0| 10|
| 7788| SCOTT| ANALYST|7566|09-DEC-1982|3000| 0| 20|
| 7839| KING|PRESIDENT| 0|17-NOV-1981|5000| 0| 10|
| 7844|TURNER| SALESMAN|7698| 8-SEP-1981|1500| 0| 30|
| 7876| ADAMS| CLERK|7788|12-JAN-1983|1100| 0| 20|
| 7900| JAMES| CLERK|7698| 3-DEC-1981| 950| 0| 30|
| 7902| FORD| ANALYST|7566| 3-DEC-1981|3000| 0| 20|
| 7934|MILLER| CLERK|7782|23-JAN-1982|1300| 0| 10|
+-----+------+---------+----+-----------+----+----+------+
12-25-2022 08:42 PM
remove these two configuration , that contains this type of configuration then try
extraJavaOptions
Thanks
Aviral
12-26-2022 05:26 PM
Actually, this is a python version issue, I downgraded spark version and now working fine.
12-27-2022 02:58 PM
Which DBR version was causing this issue? and what version is not giving you this error?
12-27-2022 06:02 PM
facing issues with the below versions:
DBR Package: spark-3.3.1-bin-hadoop3
python version: 3.8.3
OS: Windows 11
below versions are working fine:
DBR Package: spark-3.0.2-bin-hadoop2.7
python version: 3.8.3
OS: Windows 11
Let me know if you require more details.
06-16-2023 06:53 PM
How you downgraded the version?
12-26-2022 05:31 PM
wao ,unexpected answer ,i thought this will be configuration issue , but thanks for the info
12-26-2022 05:33 PM
@ramanjaneyulu kancharla can you please select my answer as best answer
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group