Databricks recently released Genie Code. It sounds very promising. We know itโs a Databricks product and we wouldnโt be surprised if I tell you it works within Databricks. In this experiment, we want to see how much effort does it take to migrate someone elseโs code and onboard to Databricks. We will take the latest autoresearch by Andrej Karpathy.
Having done a similar migration last year with Gemini (https://www.tredence.com/blog/game-arena-deepmind-on-databricks), I know it will take about an eveningโs work to migrate with LLMs that donโt understand Databricks, at least back in the days when skills were not available.
While I am confident that Genie code will live up to the expectation, I want to measure its effectiveness by the following:
- How many prompts do I need to use to achieve my goal?
- How long is the prompt that I need to write? Do I need to create a very detailed project plan? Or will it just read my mind?
- Will the first run be successful?
- Does it recover from errors automatically?
Letโs find out.
First of all, we need to clone the repo, and I am glad to do this step manually: https://github.com/karpathy/autoresearch

The second step is to โdesignโ a prompt. Iโd say this worries me, but I tried to start with something simple and continue to improve it. Below is the prompt I used:
โCan you convert autoresearch using databricks FMAPI and run a research topic.
Make sure you track the chain of thoughts using MLflow and log the experiment.โ
Having used many agents in the past year, I know for a fact that the agent wonโt know a few things:
- The syntax for connecting to FMAPI
- It is not aware of the UC version of MLflow
- It definitely doesnโt know tracing, which I didnโt mention in the prompt
With the code and the above prompt. I hit enter. And Iโd expect some clarifications and failures, which even the most senior data scientist would ask. For example:
- Where is the autoresearch code located?
- Where is the endpoint located?
- Which function do I want to log?
- Where should I log the experiment?
But we will let Genie Code figure it out or allow it to ask follow-up questions. Below is the output:

It took a while, but as we can see, everything ran automatically, and it was able to find the code and modify it.
The results? Yes, it was able to run an experiment and log those traces into MLflow! We can see below that it ran for 5 iterations and produced a result in the 6th step, which is clean and organized. It also did some testing by itself in the first step to ensure the logging is working.

We can open up the final step and see the research output!

Whatโs more, Genie Code also summarizes the iterations to determine which ones to keep and which to drop, aligning with the ReadMe.

Conclusion
We now have the answer. Genie Code is not a simple rebrand from Assistant. Itโs built for deep Databricks understanding. To answer my own questions:
- How many prompts do I need to use to achieve my goal?
One prompt only. Did not try to improve it. - How long is the prompt that I need to write? Do I need to create a very detailed project plan? Or will it just read my mind?
I wrote two sentences. Very short. - Will the first run be successful?
Yes โ I can see the output - Does it recover from errors automatically?
Yes โ Genie code automatically recovers from error and determines the next step!
This is a huge step forward for any code migration and onboarding to Databricks. No prior expert knowledge is required, and a single prompt is all you need!