Hi There,
I'm new to the delta table format so please bear with me if I've missed something obvious! I've migrated data from on prem. Sql to fabric and stored two related tables as delta tables. When I query data from these tables and join them based on a related key the query takes a significant amount of time. Ie 60 seconds for a limit 1000 sql query. Table 1 has c. 6m rows table 2 maybe 1m. The data types are currently string but I can change this to integer should it help. The keys are integers from sql but I've stored them as string format for now.
Is them being a string hindering performance or should I employ an optimisation technique such as Z ordering? (I have tried Z ordering but it has no impact on the files.)
I am using pyspark in a notebook in ms fabric which I understand runs delta 2.3. I believe later versions (those on databricks) also support an auto incrementing identity column which isn't in place here.