cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Bamboolib with databricks, low-code programming is now available on #databricks Now you can prepare your databricks code without ... coding. Low code ...

Hubert-Dudek
Esteemed Contributor III

Bamboolib with databricks, low-code programming is now available on #databricks

Now you can prepare your databricks code without ... coding. Low code solution is now available on Databricks. Install and import bamboolib to start (require a version of 11 DBR for Azure and AWS, 11.1 for GCC). %pip can be used to install or cluster settings -> “libraries” tab:

Picture2As we see on the above screen, we have a few options.

-        read CSV files,

-        read the parquet file,

-        read the table from metastore (I bet it will be the most popular option),

-        or use some example dataset with ***** data

We will use an example titanic dataset.

Picture3Now we can make transformations and actions using the wizard. We can see below the auto-generated code:

bamboolibSo, let’s assume that we select only two fields:

Picture4 

Put age to bins (0-10 years old, 10-20, 20-30, etc.):

Picture5 

Group by and see the result together with the code:

Picture6 

Now we can copy our code and use it in our projects. We can remember replacing pandas with pandas on spark so it will be run distributed way.

These are example transformations available:

-        Select or drop columns,

-        Filter rows,

-        Sort rows,

-        Group by and aggregate,

-        Join / merge,

-        Change data types,

-        Change names,

-        Find and replace,

-        Conditional replace / if else

-        Change DateTime frequency,

-        Extract DateTime,

-        Move column,

-        Bin column,

-        Concatenatete,

-        Pivot,

-        Unpivot,

-        Window functions,

-        Plot creators

Thanks to the plot creator so we can visualize our data easily.

In the below example, we used a bar plot.

Picture7 

Auto-generated code from the above example is as below:

import pandas as pd; import numpy as np
df = pd.read_csv(bam.titanic_csv)
 
# Step: Select columns
df = df[['Age', 'Seex']]
 
# Step: Bin column
df['Age'] = pd.cut(df['Age'], bins=[0.0, 10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0], right=False, precision=0)
 
# Step: Group by and aggregate
df = df.groupby(['Age', 'Seex']).agg(Seex_count=('Seex', 'count')).reset_index()
 
# Step: Change data type of Age to String/Text
df['Age'] = df['Age'].astype('string')
 
 import plotly.express as px
fig = px.bar(df, x='Age', y='Seex_count', color='Seex')
fig

3 REPLIES 3

karthik_p
Esteemed Contributor

@Hubert Dudek​ Informative article, thanks for creating

Hubert-Dudek
Esteemed Contributor III

Thanks

Palkers
New Contributor III
I have tried to load parquet file using bamboolib menu, and getting below error that path does not exist
I can load the same file without no problem using spark or pandas using following path
citi_pdf = pd.read_parquet(f'/dbfs/mnt/orbify-sales-raw/WideWorldImportersDW/Dimension_City_new.parquet', engine='pyarrow')

does it work already or still has some bugs ?
AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/mnt/orbify-sales-raw/WideWorldImportersDW/Dimension_City_new.parquet.



Full stack trace:
-----------------------------
Traceback (most recent call last):
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/bamboolib/helper/gui_outlets.py", line 346, in safe_execution
hide_outlet = execute_function(self, *args, **kwargs)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/bamboolib/setup/module_view.py", line 365, in open_parquet
df = exec_code(code, symbols=self.symbols, result_name=df_name)
File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/bamboolib/helper/utils.py", line 446, in exec_code
exec(code, exec_symbols, exec_symbols)
File "", line 1, in
File "/databricks/spark/python/pyspark/instrumentation_utils.py", line 48, in wrapper
res = func(*args, **kwargs)
File "/databricks/spark/python/pyspark/sql/readwriter.py", line 533, in parquet
return self._df(self._jreader.parquet(_to_seq(self._spark._sc, paths)))
File "/databricks/spark/python/lib/py4j-0.10.9.5-src.zip/py4j/java_gateway.py", line 1321, in __call__
return_value = get_return_value(
File "/databricks/spark/python/pyspark/errors/exceptions.py", line 234, in deco
raise converted from None
pyspark.errors.exceptions.AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/dbfs/mnt/orbify-sales-raw/WideWorldImportersDW/Dimension_City_new.parquet.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group