cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

Ydata-profiling on Spark cluster -working for you?

manojvas
New Contributor II

I am trying to profile my dataset using ydata-profiling.

I constantly run into errors, even with simple datasets on my spark cluster.

Profiling in Spark cluster erroring out · Issue #1350 · ydataai/ydata-profiling (github.com)

Has anyone got ydata-profiling to work on their databricks cluster?

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @manojvas, Databricks has a built-in data profiling feature that you can use.

When you use the display(<dataframe>) command in Scala or Python or run a SQL query, the results pane shows a new tab, "Data Profiles," that presents an interactive tabular and graphic summary of the DataFrame or table.

You can also use the Databricks utilities command.

 dbutils.data.summarize 

If you are experiencing errors, it might be due to various reasons, such as standard errors in notebooks, issues with Databricks Connect, or problems related to your Spark or Python setup.

View solution in original post

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @manojvas, Databricks has a built-in data profiling feature that you can use.

When you use the display(<dataframe>) command in Scala or Python or run a SQL query, the results pane shows a new tab, "Data Profiles," that presents an interactive tabular and graphic summary of the DataFrame or table.

You can also use the Databricks utilities command.

 dbutils.data.summarize 

If you are experiencing errors, it might be due to various reasons, such as standard errors in notebooks, issues with Databricks Connect, or problems related to your Spark or Python setup.

Thank you @Kaniz_Fatma 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group