10-19-2022 06:22 AM
Bamboolib with databricks, low-code programming is now available on #databricks
Now you can prepare your databricks code without ... coding. Low code solution is now available on Databricks. Install and import bamboolib to start (require a version of 11 DBR for Azure and AWS, 11.1 for GCC). %pip can be used to install or cluster settings -> “libraries” tab:
As we see on the above screen, we have a few options.
- read CSV files,
- read the parquet file,
- read the table from metastore (I bet it will be the most popular option),
- or use some example dataset with ***** data
We will use an example titanic dataset.
Now we can make transformations and actions using the wizard. We can see below the auto-generated code:
So, let’s assume that we select only two fields:
Put age to bins (0-10 years old, 10-20, 20-30, etc.):
Group by and see the result together with the code:
Now we can copy our code and use it in our projects. We can remember replacing pandas with pandas on spark so it will be run distributed way.
These are example transformations available:
- Select or drop columns,
- Filter rows,
- Sort rows,
- Group by and aggregate,
- Join / merge,
- Change data types,
- Change names,
- Find and replace,
- Conditional replace / if else
- Change DateTime frequency,
- Extract DateTime,
- Move column,
- Bin column,
- Concatenatete,
- Pivot,
- Unpivot,
- Window functions,
- Plot creators
Thanks to the plot creator so we can visualize our data easily.
In the below example, we used a bar plot.
Auto-generated code from the above example is as below:
import pandas as pd; import numpy as np
df = pd.read_csv(bam.titanic_csv)
# Step: Select columns
df = df[['Age', 'Seex']]
# Step: Bin column
df['Age'] = pd.cut(df['Age'], bins=[0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0], right=False, precision=0)
# Step: Group by and aggregate
df = df.groupby(['Age', 'Seex']).agg(Seex_count=('Seex', 'count')).reset_index()
# Step: Change data type of Age to String/Text
df['Age'] = df['Age'].astype('string')
import plotly.express as px
fig = px.bar(df, x='Age', y='Seex_count', color='Seex')
fig
10-19-2022 07:13 AM
@Hubert Dudek Informative article, thanks for creating
10-19-2022 12:56 PM
Thanks
06-29-2023 03:52 AM
I have tried to load parquet file using bamboolib menu, and getting below error that path does not exist
I can load the same file without no problem using spark or pandas using following path
citi_pdf = pd.read_parquet(f'/dbfs/mnt/orbify-sales-raw/WideWorldImportersDW/Dimension_City_new.parquet', engine='pyarrow')
does it work already or still has some bugs ?
AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/mnt/orbify-sales-raw/WideWorldImportersDW/Dimension_City_new.parquet. Full stack trace: ----------------------------- Traceback (most recent call last): File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/bamboolib/helper/gui_outlets.py", line 346, in safe_execution hide_outlet = execute_function(self, *args, **kwargs) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/bamboolib/setup/module_view.py", line 365, in open_parquet df = exec_code(code, symbols=self.symbols, result_name=df_name) File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/bamboolib/helper/utils.py", line 446, in exec_code exec(code, exec_symbols, exec_symbols) File "", line 1, in File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 48, in wrapper res = func(*args, **kwargs) File "/databricks/spark/python/pyspark/sql/readwriter.py", line 533, in parquet return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths))) File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__ return_value = get_return_value( File "/databricks/spark/python/pyspark/errors/exceptions.py", line 234, in deco raise converted from None pyspark.errors.exceptions.AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/mnt/orbify-sales-raw/WideWorldImportersDW/Dimension_City_new.parquet.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group