Thanks a lot, that makes the workflow clear.
I have a couple of points with which i needed help
* Since it mentioned traces, i thought it was going to refer the last 10 traces present in the traces section instead of the eval dataset.
I did have around 50 traces in the traces section although the evaluation dataset was empty.
* I also notice issue where after defining a scorer, when i click "run scorer" through the ui using last n traces, it does nothing. The traces section also doesn't show up pass/fail check within the assessment column.
* When the evaluating Traces: ON is mentioned for any scorer, it doesn't evaluate new traces that get generated.
* After defining the evaluation dataset and building it from selective traces, the dataset section only shows the empty delta table that was created and not the added selective traces.
* Also wanted to understand what is the "Examples" (formerly "improve quality" tab) where we add questions and guidelines and then start labelling session. Is it only for the human reviewer or does it relate with any scorer to assess all traces with general questions and guidelines
* Since this is in beta, are there any known UI level bugs in the experiment tab?