cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

showing only a limited number of lines from the CSV file

Yyyyy
New Contributor III

Expected no of lines is - 16400

Showing only 20 No of records

Script

spark.conf.set(
    "REDACTED",
    "REDACTED"
)

# File location
file_location = "REDACTED"

# Read in the data to dataframe df
df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location).show()

 

3 REPLIES 3

romy
Databricks Employee
Databricks Employee

Hi, the show() method prints only the top 20 rows by default: DataFrame.show(n: int = 20truncate: Union[bool, int] = Truevertical: bool = False) (cf https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.show...)

You can either use show() with a bigger n parameter, or use the Databricks display() command to print the dataframe in a tabular format:

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

display(df)

https://www.databricks.com/spark/getting-started-with-apache-spark/dataframes#view-the-dataframe 

Yyyyy
New Contributor III
 hi, pls look help me
spark.conf.set(
    "REDACTED",
    "REDACTED"
)

# File location
file_location = "REDACTED"

# Read in the data to dataframe df
df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

# Display the dataframe
display(df)
 
error - Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Endpoint>command-results.s3.amazonaws.com</Endpoint><Bucket>command-results</Bucket><RequestId>AMNJ84M2CZ0G4MFK</RequestId><HostId>rXnbI5MLQZdZmhOfF/SbvNDErLlAqj92hFAxcTi4cwGqo2Qe2E1VIDkMoyAOUpIkBLePYy4+up4=</HostId></Error>
 
once, i am tring to use display() i am getting above error

Hi @Yyyyy ,

You should edit your question and redacted key your'e setting in spark session.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group