cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Vectorisation job automatisation and errors

brahaman
New Contributor II

Hey there ! 

So I'm fairly new to AI and RAG, and at this moment I'm trying to automatically vectorise documents (.pdf, .txt, etc...) each time a new file comes in a volume that I created.
For that I created, a job that's triggered each time a new files, it would run a suite of job including the vectorisation process.
Because I'm new to this, I chose to use the following notebooks (with minor tweaks to point at my volumes) provided by Databricks: 
https://docs.databricks.com/aws/en/notebooks/source/generative-ai/unstructured-data-pipeline.html
Unfortunately I'm facing a lot of issues regarding the files that are automatically downloaded from HuggingFace because I think that the Job doesn't have a lot of possibilities about modifying the files ?

So my question would be, is there other ways to automatise this ? Is there other ways to optimise the pipeline ?

Thanks in advance 😄

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now