I'm a little confused about how streaming works with DLT.
My first questions is what is the difference in behavior if you set the pipeline mode to "Continuous" but in your notebook you don't use the "streaming" prefix on table statements, and similarly, if you set the pipeline mode to "Triggered" but use "streaming_read" and other streaming statements in your notebook?
My second question is how do joins work in streaming tables in DLT? For example, if you have two streaming sources and you create a new streaming table using the .join clause, the DAG will show both tables running concurrently and converging into a single table, also streaming. This makes sense to me so far. But if Source1 drops a file with 10 rows and Source2 drops a file with related rows (common key between files) but 30 seconds later, wouldn't the pipeline immediately try to ingest Source1 and the inner join in the next step finds no common rows so doesn't load anything? So unless both files drop exactly at the same time, you will have a race condition that will always drop rows?
Third question, how is the batch size determined for streaming sources in DLT? If a file with 100 rows gets picked up by autoloader will it try to load all 100 before going on to the next step of the pipeline, or can it be < 100? Same question but for very large files (millions)?