Hi there, I see via an announcement last year that Spark Declarative Pipeline (previously DLT) was getting open sourced into Apache Spark, and I see that this recently is true as of Apache 4.1:
I'm trying to test this out on a docker container, just to see if/how it's possible to use SDP as a fully standalone tool and help ease vendor lock-in concerns. However, outside of building a SDP pipeline in Databricks, I'm not sure how I'd go about doing this with the open source version. For context, here is the dockerfile I'm currently using to get the latest version of apache spark (4.1.1 at this time).
FROM apache/spark:latest
# Switch to root to install packages
USER root
# Install pyspark and findspark
RUN pip install --no-cache-dir \
jupyter \
ipykernel \
findspark \
pyspark
# 3. Register the ipykernel
# This ensures "Python 3" is available as a kernel option in the UI
RUN python3 -m ipykernel install --user
ENV SPARK_HOME=/opt/spark
ENV PATH=$SPARK_HOME/bin:$PATH
# Switch back to spark user if desired, or stay root for VS Code access
USER root
WORKDIR /opt/spark/app
CMD ["/bin/bash"]
I'm able to import the pipelines library in pyspark, but when I attempt to use it I quickly get an error:

ERROR MESSAGE:
PySparkRuntimeError: [GRAPH_ELEMENT_DEFINED_OUTSIDE_OF_DECLARATIVE_PIPELINE] APIs that define elements of a declarative pipeline can only be invoked within the context of defining a pipeline.
Any help would be much appreciated to clarify what might be the issue here!