cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
AkshaySharma
Databricks Employee
Databricks Employee

In order to gain valuable insights from large and complex data, it is necessary to use contemporary tools and technology. Organizations may enhance their performance by using data-driven choices and better knowledge of their operations with the correct tools. Databricks has built-in support for charts and visualizations in both Databricks SQL and in notebooks. On this page we will discuss another great utility for developing dashboards and applications in pure python called ‘Bokeh’.

Bokeh is a Python module for developing interactive visualizations compatible with web browsers.  It enables you to create stunning visualizations, from straightforward plots to intricate dashboards with flowing statistics. Without programming any JavaScript yourself, you may build visualizations that are powered by JavaScript using Bokeh. It is a flexible visualization library that works with many different use cases.

  • Interactive visualizations: Bokeh offers several ways to respond to browser-based interactions from users. A lot of this interactivity can be defined in Python, with no or only limited JavaScript required.
  • Web-Friendly: Bokeh can generate complete HTML pages for Bokeh documents using the file_html() function.This html can be further embedded in Web applications and can be returned as a response for any given API.
  • Python integration: Bokeh is a Python library and thus is easily adjustable for different use-cases and also integrates well with other popular Python libraries such as NumPy, Pandas, Matplotlib,Seaborn,Scikit-Learn, OpenCV etc.
  • Versatility: Bokeh natively supports a variety of charts, like histograms, scatter plots, bar charts, stacked bar charts, line charts, data tables and even geospatial charts along with integration with other visualization libraries like Matplotlib, Seaborn etc. 
  • Customizable: Bokeh lets users customize their visualizations through different palettes, formatters and even allows custom HTML and CSS for custom or conditional formatting.
  • Accessibility: Bokeh is open source and has a large community of developers actively contributing to its development and also has a large number of examples in its gallery to start with.

 

We need to follow the below steps to create our dashboard:

 

1. To begin, we will first install the necessary dependencies: We will be using Flask framework to create a shareable application and geopandas to create a world map as a visualization (optional), also we will be using databricks-sql-connector for fetching data from Delta tables.

AkshaySharma_0-1689958787852.png

 

2. Next we will configure Databricks SQL connector for fetching data from delta tables into Pandas Dataframes.

AkshaySharma_1-1689958787840.png

Here http_path can be obtained from cluster config:

AkshaySharma_2-1689958787841.png

 

3. Now we can proceed to create charts:

We will use a data set from the retail sector in this post. It includes data on orders that the company gets from clients in various nations with varying order priority (urgent, high, medium, low, others). To investigate some of the conclusions that may be drawn from this data collection, we will utilize visualizations.

a) Line Chart: We can determine how the quantity of orders in the different order categories varies by year.

AkshaySharma_3-1689958787850.png

AkshaySharma_4-1689958787853.png

 

b) Bar Chart: We will plot revenue by individual countries over different years.

AkshaySharma_5-1689958787854.png

AkshaySharma_6-1689958787834.png

 

c) Data Table: We can analyze the revenue by Customer IDs and improve the readability of the graphical representation and figure out which customers provide the most income.

AkshaySharma_7-1689958787846.png               

AkshaySharma_8-1689958787836.png

In this example let’s try to make this table a bit more beautiful by using HTML and CSS by using HTMLTemplateFormatter and adding formatter in the chart. The below code distinguishes customers based on revenue category – revenue <= $1.5M, between $1.5M – $3.0M(included) and > $3.0M:

AkshaySharma_9-1689958787837.png

AkshaySharma_10-1689958787848.png

AkshaySharma_11-1689958787838.png

 

d) Map: We can present the income from different nations in a more interesting way by using a globe map visualization.

AkshaySharma_12-1689958787851.png

AkshaySharma_13-1689958787843.png

 

Now we can proceed to create a full dashboard with 2 tabs:

AkshaySharma_14-1689958787844.png

 

In this Dashboard we will create a Tabbed layout with 2 tabs, between these tabs we will leverage the charts created above, With 2 Tabs with Tab 1 containing Line Chart, Bar Chart, Data Table and Tab 2 containing Map.

Tab 1:

AkshaySharma_15-1689958787845.png

 Tab 2: 

AkshaySharma_16-1689958787868.png

 

Till the above step we have created all the charts and dashboard on Notebook Interface only. To convert this dashboard to a shareable dashboard we simply have to embed this application into the Flask framework.

AkshaySharma_17-1689958787852.png

And this URL can be shared with other users as well.

AkshaySharma_18-1689958787854.png

 

AkshaySharma_19-1689958787856.png

In conclusion, Bokeh is a versatile and effective Python framework to create interactive visualizations for data exploration, analysis, and communication. Due to its user-friendly design and numerous customization possibilities, it can be an excellent tool for both new and expert users. It is a great tool for making charts that can be shared and incorporated in websites or applications due to its scalability and web friendliness. Users can quickly and easily generate complex visualizations at scale by utilising Databricks' distributed computing capabilities. They can also streamline their data analysis workflows, produce compelling graphs that effectively convey their findings, and take advantage of the platform's performance and scalability advantages. Furthermore Delta Tables’s features such as data versioning, data integrity checks, and optimizations can help with consistent and reliable data for visualization purposes. With Databricks' powerful data processing and analytics capabilities, along with Bokeh's visualization features, Users can extract key insights and make informed decisions.



Full databricks notebook can be found here :

https://github.com/AkshaySharma74/BokehDatabricks

4 Comments