cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

showing only a limited number of lines from the CSV file

Yyyyy
New Contributor III

Expected no of lines is - 16400

Showing only 20 No of records

Script

spark.conf.set(
    "REDACTED",
    "REDACTED"
)

# File location
file_location = "REDACTED"

# Read in the data to dataframe df
df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location).show()

 

4 REPLIES 4

romy
New Contributor III
New Contributor III

Hi, the show() method prints only the top 20 rows by default: DataFrame.show(n: int = 20truncate: Union[bool, int] = Truevertical: bool = False) (cf https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.show...)

You can either use show() with a bigger n parameter, or use the Databricks display() command to print the dataframe in a tabular format:

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

display(df)

https://www.databricks.com/spark/getting-started-with-apache-spark/dataframes#view-the-dataframe 

Yyyyy
New Contributor III
 hi, pls look help me
spark.conf.set(
    "REDACTED",
    "REDACTED"
)

# File location
file_location = "REDACTED"

# Read in the data to dataframe df
df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

# Display the dataframe
display(df)
 
error - Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Endpoint>command-results.s3.amazonaws.com</Endpoint><Bucket>command-results</Bucket><RequestId>AMNJ84M2CZ0G4MFK</RequestId><HostId>rXnbI5MLQZdZmhOfF/SbvNDErLlAqj92hFAxcTi4cwGqo2Qe2E1VIDkMoyAOUpIkBLePYy4+up4=</HostId></Error>
 
once, i am tring to use display() i am getting above error

Hi @Yyyyy ,

You should edit your question and redacted key your'e setting in spark session.

Kaniz_Fatma
Community Manager
Community Manager

Hi @Yyyyy,

  1. The error might be related to lazy evaluation in Spark. Before calling display(df), ensure that there are no issues in the code preceding this action. Lazy evaluation means that Spark doesn’t execute transformations until an action (like display()) is called. Check if there are any issues with your DataFrame operations before the display() call.
  2. Could you please also verify that your Databricks cluster is correctly configured? If you’re using User-Defined Variables (UC), ensure they are set up correctly. Sometimes, misconfigured UC settings can cause issues with actions like display().
  3. Implement exception handling to catch any errors during execution. You can use a try-except block to handle exceptions gracefully.
  4. If you’re using S3, verify the bucket configuration and permissions.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group