As far as I understand, Delta Live Tables should now support reading data from an external location, but I can’t get it to work. I’ve added an ADLS container to Unity Catalog as an external location. There’s a folder in the container containing an example file with one json object per line.
Reading the file with an all-purpose cluster or a job cluster works with this code.
df = spark.read.format("json").load("abfss://<container_name>@<storage_name>.dfs.core.windows.net/test_json/")
df.printSchema
As far as I understand, this is the counterpart that should work in a DLT pipeline.
import dlt
@dlt.table
def test_data():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(f"abfss://<container_name>@<storage_name>.dfs.core.windows.net/test_json/")
)
But I get the error Failed to resolve flow: 'test_data'. What am I doing wrong?