cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

QRY Results incorrect but Exported data is OK

AndLuffman
New Contributor II

I ran a query "Select * from fact_Orders".     This presented a lot of garbage,  The correct column headers, but the contents were extremely random, e.g.  blanks in the key column, VAT rates of 12282384234E-45  . 

When I export to CSV , it presents fine in the CSV.

When I just select the Order_id and Vat_rate, they present correctly. When I select * a single row, it presents fine.

There seems to be a 'limit' on the number of columns but this varies , I think depending on data_types, but the column count limit seems to be a power of 2  (I have failed at the 4th and the 8th column, depending on which columns are chosen).

So, the data is being presented incorrectly in the databricks interface.

 

Has anybody else seen this and, more importantly, is there a fix?

5 REPLIES 5

Kaniz_Fatma
Community Manager
Community Manager

Hi @AndLuffmanThe issue you're experiencing might be related to the limitations of the Databricks interface when dealing with large datasets with many columns. The interface has a limit on the number of rows it can display at once, which can lead to the display issues you're seeing when running a SELECT * query on a large table. 

Hi @Kaniz_Fatma ,  it does feel like something to do with a limitation somewhere but the confusions with that are 1: It has worked fine for months with much more data (I did limit the table to 500 records after the issue manifested itself, 400,000 in original)  and 2: other people in the team with essentially the same setup are no suffering the same issue. 
We were wondering if there was something subtle in the config that is causing this.

Hi @AndLuffman

• It's challenging to pinpoint the exact cause of the issue without more specific details
• The setup has worked fine for months with more data and other team members
• Unlikely that this is a general limitation of Databricks


• Possible factors to consider:


  - Version Differences: Check Databricks version to ensure it meets requirements
  - Access Mode: Some functionalities are supported only on Single User access mode
  - Privileges: Ensure necessary privileges on schema and objects
  - Python UDFs: Python UDFs not supported in versions 13.1 and below; use version 13.2 or above
  - Thread Pools: Standard Scala thread pools are not supported; use unique thread pools in org.apache.spark.util.ThreadUtils
  - Data Size: Limitations based on pandas and clusters compute resources, bamboolib limited to approximately 10 million rows
• If these factors do not resolve the issue, consider more specific configurations or seek assistance from Databricks support by filing a support ticket.

Thanks @Kaniz_Fatma , a few things to ponder/research. I am not knowledgeable on how databricks works, I just use it. I was hoping that this had happened to someone else and there would be simple switch to flick in a setting somewhere. 

Hi @AndLuffmanYou're welcome! I understand that not everyone is familiar with the inner workings of Databricks. While there might not be a simple switch to resolve the issue, it's always worth exploring community forums or contacting Databricks support for assistance. Others who have faced similar challenges can often provide insights or solutions that might help. Remember, even if you're not well-versed in the technical details, seeking help and asking questions can lead to valuable solutions. Good luck with resolving the situation!

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!