Bamboolib with databricks, low-code programming is now available on #databricks
Now you can prepare your databricks code without ... coding. Low code solution is now available on Databricks. Install and import bamboolib to start (require a version of 11 DBR for Azure and AWS, 11.1 for GCC). %pip can be used to install or cluster settings -> โlibrariesโ tab:
As we see on the above screen, we have a few options.
- read CSV files,
- read the parquet file,
- read the table from metastore (I bet it will be the most popular option),
- or use some example dataset with ***** data
We will use an example titanic dataset.
Now we can make transformations and actions using the wizard. We can see below the auto-generated code:
So, letโs assume that we select only two fields:
Put age to bins (0-10 years old, 10-20, 20-30, etc.):
Group by and see the result together with the code:
Now we can copy our code and use it in our projects. We can remember replacing pandas with pandas on spark so it will be run distributed way.
These are example transformations available:
- Select or drop columns,
- Filter rows,
- Sort rows,
- Group by and aggregate,
- Join / merge,
- Change data types,
- Change names,
- Find and replace,
- Conditional replace / if else
- Change DateTime frequency,
- Extract DateTime,
- Move column,
- Bin column,
- Concatenatete,
- Pivot,
- Unpivot,
- Window functions,
- Plot creators
Thanks to the plot creator so we can visualize our data easily.
In the below example, we used a bar plot.
Auto-generated code from the above example is as below:
import pandas as pd; import numpy as np
df = pd.read_csv(bam.titanic_csv)
# Step: Select columns
df = df[['Age', 'Seex']]
# Step: Bin column
df['Age'] = pd.cut(df['Age'], bins=[0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0], right=False, precision=0)
# Step: Group by and aggregate
df = df.groupby(['Age', 'Seex']).agg(Seex_count=('Seex', 'count')).reset_index()
# Step: Change data type of Age to String/Text
df['Age'] = df['Age'].astype('string')
import plotly.express as px
fig = px.bar(df, x='Age', y='Seex_count', color='Seex')
fig