cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

showing only a limited number of lines from the CSV file

Yyyyy
New Contributor III

Expected no of lines is - 16400

Showing only 20 No of records

Script

spark.conf.set(
    "REDACTED",
    "REDACTED"
)

# File location
file_location = "REDACTED"

# Read in the data to dataframe df
df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location).show()

 

3 REPLIES 3

romy
Databricks Employee
Databricks Employee

Hi, the show() method prints only the top 20 rows by default: DataFrame.show(n: int = 20truncate: Union[bool, int] = Truevertical: bool = False) (cf https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.show...)

You can either use show() with a bigger n parameter, or use the Databricks display() command to print the dataframe in a tabular format:

df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

display(df)

https://www.databricks.com/spark/getting-started-with-apache-spark/dataframes#view-the-dataframe 

Yyyyy
New Contributor III
 hi, pls look help me
spark.conf.set(
    "REDACTED",
    "REDACTED"
)

# File location
file_location = "REDACTED"

# Read in the data to dataframe df
df = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter", ",").load(file_location)

# Display the dataframe
display(df)
 
error - Failed to upload command result to DBFS. Error message: Status code: 301 Moved Permanently, Error message: <?xml version="1.0" encoding="UTF-8"?> <Error><Code>PermanentRedirect</Code><Message>The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.</Message><Endpoint>command-results.s3.amazonaws.com</Endpoint><Bucket>command-results</Bucket><RequestId>AMNJ84M2CZ0G4MFK</RequestId><HostId>rXnbI5MLQZdZmhOfF/SbvNDErLlAqj92hFAxcTi4cwGqo2Qe2E1VIDkMoyAOUpIkBLePYy4+up4=</HostId></Error>
 
once, i am tring to use display() i am getting above error

szymon_dybczak
Esteemed Contributor III

Hi @Yyyyy ,

You should edit your question and redacted key your'e setting in spark session.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now