Databricks

tom_shaffner · ‎04-25-2022

I'm trying to use delta live tables, but if I import even the example notebooks I get a warning saying `ModuleNotFoundError: No module named 'dlt'`. If I try and install via pip it attempts to install a deep learning framework of some sort.

I checked the requirements document and don't immediately see a runtime requirement; am I missing something? Is there something else I need to do to use this feature?

Aashita · ‎04-27-2022

Yes, you will get that error when you run the notebook.

Follow the below steps-

On the Databricks notebook left panel, select 'Jobs'

Select 'Delta Live Tables'

Select 'Create Pipeline'

Fill in the details- Pipeline name and in Notebook Libraries: Point to your notebook where you have the dlt code.

Click on 'Start' on top right corner
This will start the pipeline, populate the tables and give a graphical representation.
NOTE: Make sure in your notebook, you attach the cluster

https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-quickstart.html

View solution in original post

Aashita · ‎04-27-2022

Yes, you will get that error when you run the notebook.

Follow the below steps-

On the Databricks notebook left panel, select 'Jobs'

Select 'Delta Live Tables'

Select 'Create Pipeline'

Fill in the details- Pipeline name and in Notebook Libraries: Point to your notebook where you have the dlt code.

Click on 'Start' on top right corner
This will start the pipeline, populate the tables and give a graphical representation.
NOTE: Make sure in your notebook, you attach the cluster

https://docs.databricks.com/data-engineering/delta-live-tables/delta-live-tables-quickstart.html

jose_gonzalez · ‎04-29-2022

Thanks a lot for sharing this great example

mangeldfz · ‎04-28-2022

This error is so annoying... Is it going to be fixed or is there any workaround to avoid it?

Aashita · ‎04-28-2022

Well you are not supposed to run the notebook, you just need to create tables Delta Live Tables in the notebook and attach a cluster. After you have done that, you need to go to jobs to start the pipeline. The pipeline gathers resources from the notebook, initializes it, sets up tables and renders the graph. Delta live Tables is of type orchestration.

tom_shaffner · ‎04-28-2022

Got it. That helps, thanks.

That could maybe be clearer in the documentation; it wasn't immediately clear to me that I couldn't run this outside a normal notebook environment. From the documentation it sounded like I could develop that way and then set up the DLT environment only for use.

Insight6 · ‎12-01-2022

Here's the solution I came up with... Replace `import dlt` at the top of your first cell with the following:

    try:
      import dlt # When run in a pipeline, this package will exist (no way to import it here)
    except ImportError:
      class dlt: # "Mock" the dlt class so that we can syntax check the rest of our python in the databricks notebook editor
        def table(comment, **options): # Mock the @dlt.table attribute so that it is seen as syntactically valid below
          def _(f):
            pass
          return _;

Further mocking may be required depending on how many features from the dlt class you use, but you get the gist.

You can "catch" the import error and mock out a dlt class sufficiently that the rest of your code can be checked. This slightly improves the developer experience until you get a chance to actually run it in a pipeline.

As many have noted, the special "dlt" library isn't "available" when running your python code from the databricks notebook editor, only when running it from a pipeline (which means you lose out on being able to easily check your code's syntax before attempting to run it)

You also can't "%pip install" this library, because it isn't a public package, and the "dlt" package out there has nothing to do with Databricks.