Community-produced videos to help you leverage Databricks in your Data & AI journey. Tune in to explore industry trends and real-world use cases from leading data practitioners.
In this episode, Kilian Lieret, Research Software Engineer, and Carlos Jimenez, Computer Science PhD Candidate at Princeton University, discuss SWE-bench and SWE-agent, two groundbreaking tools for evaluating and enhancing AI in software engineering.
Carlos introduces SWE-bench, explaining:
“SWE Bench is a benchmark to evaluate language models on their ability to write code in a more production like setting… in the real world, a lot of people's actual time spent software engineering is less on this algorithmic isolated solution problem solving and more on maintaining big software systems.”
Kilian shares how SWE-agent tackles these challenges in a structured, iterative way:
“Very simply put, it's our way of solving SWE Bench style problems… You start off the agent with an initial prompt, give it the problem statement, tell it about the tools it has… and it just proposes the next action until it believes that it has solved the issue.”