Downstream delta live table is unable to read data frame from upstream table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-12-2023 09:06 AM
I have been trying to work on implementing delta live tables to a pre-existing workflow. Currently trying to create two tables: appointments_raw and notes_raw, where notes_raw is "downstream" of appointments_raw. Following this as a reference, I'm attempting to load the appointments_raw table using dlt.read (inside notes_raw), but the result of dlt.read("appointments_raw") appears to be an empty DataFrame. Appointments raw data frame does seem to be read correctly according to pipeline storage and hive metastore. we are following this example: https://docs.databricks.com/_extras/notebooks/source/dlt-wikipedia-python.html
Specifically, where “top referring pages” code is referencing dlt.read(“clickstream_prepared”). We are trying to do the same but facing an error.
- Labels:
-
Dataframe
-
Delta Live Tables
-
Live Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-16-2023 12:09 AM
@Anna Wuest : Could you please send me the code snippet here? Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-18-2023 06:30 AM
Do you mean this?
@dlt.table(
comment="Raw table of appointments from EDW",
)
def appointments_raw():
return fetch_data.fetch_appointments(spark=spark, secret_handler=SECRET_HANDLER)
@dlt.table(
comment="Raw table of notes from SOLR",
)
def notes_raw():
appointments = dlt.read("appointments_raw")
print(type(appointments))
print(appointments.head())
appointments = appointments.pandas_api()
mrns = fetch_data.select_mrns(
appointments, today=TIMESTAMP, days_ahead=APPOINTMENTS_DAYS_AHEAD
)
notes = fetch_data.fetch_notes(
mrns, cohort_id=COHORT_ID, secret_handler=SECRET_HANDLER, spark=spark
)
return notes

