10-25-2024 02:32 AM
I am working through the current version of the standard AutoLoader demo, i.e.
a month ago
Hi @RoelofvS,
How are you doing today? As per my understanding, You just make sure cloudFiles.schemaEvolutionMode is set to addNewColumns to enable automatic schema updates for new columns. If schema versions aren't updating in the same location, try pointing to a new schema location to reset schema tracking temporarily, though this shouldn’t be needed under normal conditions. Check file access permissions in the inferred_schema directory as permissions issues could prevent schema updates. Running the demo on a fresh cluster or with a different schema location can help identify if the environment is impacting schema evolution. Lastly, consider testing with a different Databricks runtime version to rule out runtime-specific issues.
Give a try and let me know.
Regards,
Brahma
a month ago
Hello Brahma,
Thank you for your response. To answer your suggestions:
1) cloudFiles.schemaEvolutionMode: It is default behaviour, but I have also added it explicitly
.option("cloudFiles.schemaEvolutionMode", "addNewColumns")
2) "try pointing to a new schema location to reset schema" - I have repointed it as a test, and a new file 0 gets created. I have also just renamed 0 to zero in the terminal, and a new file 0 gets created. In both cases schema evolution picked up the new column(s).
3) "file access permissions" they are always rwxrwxrwx for the files, and drwxrwxrwx fir the directories.
4) Other: IO have also tried with fresh cluster, and with different locations. I have tested with the latest runtime version via "use_current_cluster=True", and also with the cluster version that it creates itself.
Extra info:
It definitely reads the latest version of the evolution file. I have edited 0 (or 1) with vi, and changed the first line "v1" to "v2". An error gets thrown about not being happy with "v2". But also a second error with "UnknownFieldException" that is expected in the demo. This error does not get raised in my normal testing.
I managed to get evolution to work as expected, but once only, This involved renaming 0 to zero, adding new column, copying the new 0 to 1, adding new columns, and after that, just adding new columns with no fiddling inbetween. But I reset the demo and could not get it working again.
I wonder if anyone else has the demo up and running, and could confirm whether they get the same issue or not.
Basically the frames called are:
f2 to reset the demo, with $reset_all_data=true
f11 to do the initial inference
Then playing with
f16 to add a new column name each time
f17 to load and display the dataframe to check whether the new column got picked up after a "UnknownFieldException" message.
Kind regards - Roelof
a month ago
Hi @RoelofvS,
I have gone through your response and here is my suggestion below.
Make sure to allow for a slight delay or checkpoint refresh after each schema change to ensure Auto Loader registers updates fully. Given that renaming 0 to 1 prompted schema evolution once, try incremental versioning by creating successive files (0, 1, etc.) after each change. Additionally, consider restarting the stream whenever you make schema modifications, as this can help Auto Loader refresh the schema properly. Setting an explicit schema with .schema() for the initial load may also stabilize the evolution process by providing a structured base. For more insights, enable detailed logging to trace schema evolution steps and check for any timing or metadata conflicts.
Hope this helps!
Good day.
Regards,
Brahma
4 weeks ago
Hello @Brahmareddy,
I have tried the above, without success.
> enable detailed logging to trace schema evolution steps
Please can you giude me with the steps or a URL? We are on AWS.
Kind regards - Roelof
3 weeks ago
Hello @Brahmareddy,
> enable detailed logging to trace schema evolution steps
Please can you still advise on the above?
Kind regards - Roelof
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group