cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Fetching top 1 L rows in Databricks SQL

KVNARK
Honored Contributor II

how to fetch the top 1L rows from a long SQL query as the query is returning 5gb data, I want to check only the first 1L ROWS.

1 ACCEPTED SOLUTION

Accepted Solutions

pvignesh92
Honored Contributor

@KVNARK .โ€‹ Hi, I could see this question is already answered. But this is something I found, Please check.

By default, Databricks SQL notebooks limit the number of rows displayed to 1,000. However, you can increase this limit to display more than 1 Lakh (100,000) rows by changing the configuration setting for spark.databricks.query.displayMaxRows

.

To change the configuration setting, follow these steps:

  1. In the Databricks workspace, navigate to the cluster that you're using for your notebook.
  2. Click on the "Advanced Options" tab.
  3. In the "Spark" section, click on the "Edit" button next to "Spark Config".
  4. Add the following configuration setting:
  5. spark.databricks.query.displayMaxRows 10000 (or the desired maximum number of rows)
  6. Click "Confirm" to save the configuration change.

Thanks,

Vignesh

View solution in original post

12 REPLIES 12

Anonymous
Not applicable

By 1L do you mean 100,000? You could do a limit in your SQL query.

youssefmrini
Honored Contributor III

As Josephk stated you can use limit it's the easiest solution

SergeRielau
Valued Contributor
SELECT * FROM t LIMIT 10000;

IMIT clause in the docs

KVNARK
Honored Contributor II

I'm not asking about a single table. My query is I have a long SQL query which will generate the out put as 5gb data. Out of which I need tolimit the rows only 1L.

select * from (my long sql query) A order by A.X limit 100000:

This is not working.

SergeRielau
Valued Contributor

Can you clarify "not working", please.

is there an error of sorts? Which one? OR is it simply "slow" and you expected it to be faster given the LIMIT clause?

KVNARK
Honored Contributor II

Not getting any result if I give like above. But when given like below, I'm getting the output.

select * from (my long sql query) A

SergeRielau
Valued Contributor

You get an empty result set? May that should be a support ticket.

Cleary adding an ORDER BY may make things slower, but it should not change the resut set.

Can you try only ORDER BY without LIMIT? Just t see where the problem is..

Also try LIMIT without ORDER BY

pvignesh92
Honored Contributor

Did you tried to see if you are getting output with less limit count.

Ex select * from (my long sql query) A order by A.X limit 100;

This can help understand if the underlying subquery is a problem or with the volume trying to display

KVNARK
Honored Contributor II

@Serge Rielauโ€‹ both the ways I tried but its of no use. its giving 10000 rows max but note more than that.

pvignesh92
Honored Contributor

@KVNARK .โ€‹ Hi, I could see this question is already answered. But this is something I found, Please check.

By default, Databricks SQL notebooks limit the number of rows displayed to 1,000. However, you can increase this limit to display more than 1 Lakh (100,000) rows by changing the configuration setting for spark.databricks.query.displayMaxRows

.

To change the configuration setting, follow these steps:

  1. In the Databricks workspace, navigate to the cluster that you're using for your notebook.
  2. Click on the "Advanced Options" tab.
  3. In the "Spark" section, click on the "Edit" button next to "Spark Config".
  4. Add the following configuration setting:
  5. spark.databricks.query.displayMaxRows 10000 (or the desired maximum number of rows)
  6. Click "Confirm" to save the configuration change.

Thanks,

Vignesh

KVNARK
Honored Contributor II

@Vigneshraja Palanirajโ€‹ - Perfect. Thanks a lot!

Sangram
New Contributor III

use the below commands

df.show(100000)

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group