cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

I created a data frame but was not able to see the data

Ram443
New Contributor III

Code to create a data frame:

from pyspark.sql import SparkSession

spark=SparkSession.builder.appName("oracle_queries").master("local[4]")\

  .config("spark.sql.warehouse.dir", "C:\\softwares\\git\\pyspark\\hive").getOrCreate()

from pyspark.sql.functions import current_date,year

from pyspark.sql.types import IntegerType,StructType,StructField,StringType

from datetime import datetime, date

from pyspark import SparkContext

sc=SparkContext.getOrCreate()

rdd=sc.parallelize([('ram','chi'),

             ('anil','ind')])

df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))

Trying to see the data but face below error:

df.show()

Error:

---------------------------------------------------------------------------

Py4JJavaError Traceback (most recent call last)

<ipython-input-2-1a6ce2362cd4> in <module>

----> 1 df.show()

c:\program files (x86)\python38-32\lib\site-packages\pyspark\sql\dataframe.py in show(self, n, truncate, vertical)

438 """

439 if isinstance(truncate, bool) and truncate:

--> 440 print(self._jdf.showString(n, 20, vertical))

441 else:

442 print(self._jdf.showString(n, int(truncate), vertical))

c:\program files (x86)\python38-32\lib\site-packages\py4j\java_gateway.py in __call__(self, *args)

1302

1303 answer = self.gateway_client.send_command(command)

-> 1304 return_value = get_return_value(

1305 answer, self.gateway_client, self.target_id, self.name)

1306

c:\program files (x86)\python38-32\lib\site-packages\pyspark\sql\utils.py in deco(*a, **kw)

126 def deco(*a, **kw😞

127 try:

--> 128 return f(*a, **kw)

129 except py4j.protocol.Py4JJavaError as e:

130 converted = convert_exception(e.java_exception)

c:\program files (x86)\python38-32\lib\site-packages\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name)

324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)

325 if answer[1] == REFERENCE_TYPE:

--> 326 raise Py4JJavaError(

327 "An error occurred while calling {0}{1}{2}.\n".

328 format(target_id, ".", name), value)

Py4JJavaError: An error occurred while calling o48.showString.

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (DESKTOP-N3C4AUC.attlocal.net executor driver): org.apache.spark.api.python.PythonException: Traceback (most recent call last):

File "C:\pyspark\spark-3.3.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\worker.py", line 668, in main

ore

1 ACCEPTED SOLUTION

Accepted Solutions

Aviral-Bhardwaj
Esteemed Contributor III

There can be 3 things

1-You are initializing spark context 2 times

2-You have something wrong with your spark config

3-This can be python version issue

Because I checked , it is working perfectly fine for me

from pyspark.sql import SparkSession
 
from pyspark.sql.functions import current_date,year
 
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
 
from datetime import datetime, date
 
from pyspark import SparkContext
 
sc=SparkContext.getOrCreate()
 
rdd=sc.parallelize([('ram','chi'),
 
             ('anil','ind')])
 
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
 
 
 
#Trying to see the data but face below error:
 
df.show()

Attaching Image also

image 

Please select it as the best answer if you like this.

Thanks

Aviral Bhardwaj

AviralBhardwaj

View solution in original post

9 REPLIES 9

Aviral-Bhardwaj
Esteemed Contributor III

There can be 3 things

1-You are initializing spark context 2 times

2-You have something wrong with your spark config

3-This can be python version issue

Because I checked , it is working perfectly fine for me

from pyspark.sql import SparkSession
 
from pyspark.sql.functions import current_date,year
 
from pyspark.sql.types import IntegerType,StructType,StructField,StringType
 
from datetime import datetime, date
 
from pyspark import SparkContext
 
sc=SparkContext.getOrCreate()
 
rdd=sc.parallelize([('ram','chi'),
 
             ('anil','ind')])
 
df=spark.createDataFrame(rdd,schema=StructType([StructField("name",StringType(),True),StructField("loc",StringType(),True)]))
 
 
 
#Trying to see the data but face below error:
 
df.show()

Attaching Image also

image 

Please select it as the best answer if you like this.

Thanks

Aviral Bhardwaj

AviralBhardwaj

Thank you Bhardwaj for checking the issue.

Below is the configuration I am using, Please let me know if required more details.

If I use read CSV, it is working and I am facing an issue if I create a data frame manually.

I installed Spark in my local system.

Configuration :

[('spark.app.name', 'oracle_queries'),

 ('spark.driver.extraJavaOptions',

 '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),

 ('spark.master', 'local[4]'),

 ('spark.driver.port', '50219'),

 ('spark.app.startTime', '1671929660604'),

 ('spark.executor.id', 'driver'),

 ('spark.app.id', 'local-1671929663025'),

 ('spark.sql.warehouse.dir', 'file:/C:/softwares/git/pyspark/hive'),

 ('spark.sql.catalogImplementation', 'hive'),

 ('spark.rdd.compress', 'True'),

 ('spark.executor.extraJavaOptions',

 '-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED'),

 ('spark.app.submitTime', '1671929660216'),

 ('spark.serializer.objectStreamReset', '100'),

 ('spark.submit.pyFiles', ''),

 ('spark.submit.deployMode', 'client'),

 ('spark.driver.host', 'DESKTOP-test.attlocal.net'),

 ('spark.ui.showConsoleProgress', 'true')]

CSV Code :

source_path="C:\\softwares\\git\\pyspark\\source\\emp.csv"

emp=spark.read.csv(source_path,header=True,inferSchema=True)

emp.show()

Output:

+-----+------+---------+----+-----------+----+----+------+

|empno| ename| job| mgr| hiredate| sal|comm|deptno|

+-----+------+---------+----+-----------+----+----+------+

| 7369| SMITH| CLERK|7902|17-DEC-1980| 800|null| 20|

| 7499| ALLEN| null|7698|20-FEB-1981|1600| 300| 30|

| 7521| WARD| SALESMAN|7698|22-FEB-1981|1250| 500| 30|

| 7566| JONES| MANAGER|7839| 2-APR-1981|2975|null| 20|

| 7654|MARTIN| SALESMAN|7698|28-SEP-1981|1250|1400| 30|

| 7698| BLAKE| MANAGER|7839| 1-MAY-1981|2850| 0| 30|

| 7782| CLARK| MANAGER|7839| 9-JUN-1981|2450| 0| 10|

| 7788| SCOTT| ANALYST|7566|09-DEC-1982|3000| 0| 20|

| 7839| KING|PRESIDENT| 0|17-NOV-1981|5000| 0| 10|

| 7844|TURNER| SALESMAN|7698| 8-SEP-1981|1500| 0| 30|

| 7876| ADAMS| CLERK|7788|12-JAN-1983|1100| 0| 20|

| 7900| JAMES| CLERK|7698| 3-DEC-1981| 950| 0| 30|

| 7902| FORD| ANALYST|7566| 3-DEC-1981|3000| 0| 20|

| 7934|MILLER| CLERK|7782|23-JAN-1982|1300| 0| 10|

+-----+------+---------+----+-----------+----+----+------+

Aviral-Bhardwaj
Esteemed Contributor III

remove these two configuration , that contains this type of configuration then try

extraJavaOptions

Thanks

Aviral

AviralBhardwaj

Actually, this is a python version issue, I downgraded spark version and now working fine.

jose_gonzalez
Databricks Employee
Databricks Employee

Which DBR version was causing this issue? and what version is not giving you this error?

Ram443
New Contributor III

facing issues with the below versions:

DBR Package: spark-3.3.1-bin-hadoop3

python version: 3.8.3

OS: Windows 11

below versions are working fine:

DBR Package: spark-3.0.2-bin-hadoop2.7

python version: 3.8.3

OS: Windows 11

Let me know if you require more details.

riyapics1115
New Contributor II

How you downgraded the version?

Aviral-Bhardwaj
Esteemed Contributor III

wao ,unexpected answer ,i thought this will be configuration issue , but thanks for the info

AviralBhardwaj

Aviral-Bhardwaj
Esteemed Contributor III

@ramanjaneyulu kancharla​  can you please select my answer as best answer

AviralBhardwaj

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group