- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2022 05:02 PM
Code to create a data frame:
from pyspark.sql import SparkSession
spark=SparkSession.builder.appName("oracle_queries").master("local[4]")\
.config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()
from pyspark.sql.functions import current_date,year
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
from datetime import datetime, date
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
rdd=sc.parallelize([('ram','chi'),
('anil','ind')])
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
Trying to see the data but face below error:
df.show()
Error:
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
<ipython-input-2-1a6ce2362cd4> in <module>
----> 1 df.show()
c:\program files (x86)\python38-32\lib\site-packages\pyspark\sql\dataframe.py in show(self, n, truncate, vertical)
438 """
439 if isinstance(truncate, bool) and truncate:
--> 440 print(self._jdf.showString(n, 20, vertical))
441 else:
442 print(self._jdf.showString(n, int(truncate), vertical))
c:\program files (x86)\python38-32\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
c:\program files (x86)\python38-32\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)
126 def deco(*a, **kw😞
127 try:
--> 128 return f(*a, **kw)
129 except py4j.protocol.Py4JJavaError as e:
130 converted = convert_exception(e.java_exception)
c:\program files (x86)\python38-32\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o48.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (DESKTOP-N3C4AUC.attlocal.net executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "C:\pyspark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\worker.py", line 668, in main
ore
- Labels:
-
Current Date
-
Data
-
Dataframe
-
Return value
-
Trying
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2022 07:21 PM
There can be 3 things
1-You are initializing spark context 2 times
2-You have something wrong with your spark config
3-This can be python version issue
Because I checked , it is working perfectly fine for me
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date,year
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
from datetime import datetime, date
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
rdd=sc.parallelize([('ram','chi'),
('anil','ind')])
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
#Trying to see the data but face below error:
df.show()
Attaching Image also
Please select it as the best answer if you like this.
Thanks
Aviral Bhardwaj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2022 07:21 PM
There can be 3 things
1-You are initializing spark context 2 times
2-You have something wrong with your spark config
3-This can be python version issue
Because I checked , it is working perfectly fine for me
from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date,year
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
from datetime import datetime, date
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
rdd=sc.parallelize([('ram','chi'),
('anil','ind')])
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
#Trying to see the data but face below error:
df.show()
Attaching Image also
Please select it as the best answer if you like this.
Thanks
Aviral Bhardwaj
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-24-2022 09:04 PM
Thank you Bhardwaj for checking the issue.
Below is the configuration I am using, Please let me know if required more details.
If I use read CSV, it is working and I am facing an issue if I create a data frame manually.
I installed Spark in my local system.
Configuration :
[('spark.app.name', 'oracle_queries'),
('spark.driver.extraJavaOptions',
'-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),
('spark.master', 'local[4]'),
('spark.driver.port', '50219'),
('spark.app.startTime', '1671929660604'),
('spark.executor.id', 'driver'),
('spark.app.id', 'local-1671929663025'),
('spark.sql.warehouse.dir', 'file:/C:/softwares/git/pyspark/hive'),
('spark.sql.catalogImplementation', 'hive'),
('spark.rdd.compress', 'True'),
('spark.executor.extraJavaOptions',
'-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),
('spark.app.submitTime', '1671929660216'),
('spark.serializer.objectStreamReset', '100'),
('spark.submit.pyFiles', ''),
('spark.submit.deployMode', 'client'),
('spark.driver.host', 'DESKTOP-test.attlocal.net'),
('spark.ui.showConsoleProgress', 'true')]
CSV Code :
source_path="C:\\softwares\\git\\pyspark\\source\\emp.csv"
emp=spark.read.csv(source_path,header=True,inferSchema=True)
emp.show()
Output:
+-----+------+---------+----+-----------+----+----+------+
|empno| ename| job| mgr| hiredate| sal|comm|deptno|
+-----+------+---------+----+-----------+----+----+------+
| 7369| SMITH| CLERK|7902|17-DEC-1980| 800|null| 20|
| 7499| ALLEN| null|7698|20-FEB-1981|1600| 300| 30|
| 7521| WARD| SALESMAN|7698|22-FEB-1981|1250| 500| 30|
| 7566| JONES| MANAGER|7839| 2-APR-1981|2975|null| 20|
| 7654|MARTIN| SALESMAN|7698|28-SEP-1981|1250|1400| 30|
| 7698| BLAKE| MANAGER|7839| 1-MAY-1981|2850| 0| 30|
| 7782| CLARK| MANAGER|7839| 9-JUN-1981|2450| 0| 10|
| 7788| SCOTT| ANALYST|7566|09-DEC-1982|3000| 0| 20|
| 7839| KING|PRESIDENT| 0|17-NOV-1981|5000| 0| 10|
| 7844|TURNER| SALESMAN|7698| 8-SEP-1981|1500| 0| 30|
| 7876| ADAMS| CLERK|7788|12-JAN-1983|1100| 0| 20|
| 7900| JAMES| CLERK|7698| 3-DEC-1981| 950| 0| 30|
| 7902| FORD| ANALYST|7566| 3-DEC-1981|3000| 0| 20|
| 7934|MILLER| CLERK|7782|23-JAN-1982|1300| 0| 10|
+-----+------+---------+----+-----------+----+----+------+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-25-2022 08:42 PM
remove these two configuration , that contains this type of configuration then try
extraJavaOptions
Thanks
Aviral
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2022 05:26 PM
Actually, this is a python version issue, I downgraded spark version and now working fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2022 02:58 PM
Which DBR version was causing this issue? and what version is not giving you this error?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2022 06:02 PM
facing issues with the below versions:
DBR Package: spark-3.3.1-bin-hadoop3
python version: 3.8.3
OS: Windows 11
below versions are working fine:
DBR Package: spark-3.0.2-bin-hadoop2.7
python version: 3.8.3
OS: Windows 11
Let me know if you require more details.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-16-2023 06:53 PM
How you downgraded the version?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2022 05:31 PM
wao ,unexpected answer ,i thought this will be configuration issue , but thanks for the info
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-26-2022 05:33 PM
@ramanjaneyulu kancharla can you please select my answer as best answer

