Skip to main content

Changelog

New features, improvements, and fixes in Agenta.

Playground Improvements

  • We've improved the workflow for adding outputs to a dataset in the playground. In the past, you had to select the name of the test set each time. Now, the last used test set is selected by default..
  • We have significantly improved the debugging experience when creating applications from code. Now, if an application fails, you can view the logs to understand the reason behind the failure.
  • We moved the copy message button in the playground to the output text area.
  • We now hide the cost and usage in the playground when they aren't specified
  • We've made improvements to error messages in the playground

Bug Fixes

  • Fixed the order of the arguments when running a custom code evaluator
  • Fixed the timestamp in the Testset view (previous stamps was droppping the trailing 0)
  • Fixed the creation of application from code in the self-hosted version when using Windows
Read more →
v0.14.0

Prompt and Configuration Registry

We've introduced a feature that allows you to use Agenta as a prompt registry or management system. In the deployment view, we now provide an endpoint to directly fetch the latest version of your prompt. Here is how it looks like:


from agenta import Agenta
agenta = Agenta()
config = agenta.get_config(base_id="xxxxx", environment="production", cache_timeout=200) # Fetches the configuration with caching

You can find additional documentation here.

Improvements

  • Previously, publishing a variant from the playground to an environment was a manual process., from now on we are publishing by default to the production environment.
Read more →
v0.13.8

Miscellaneous Improvements

  • The total cost of an evaluation is now displayed in the evaluation table. This allows you to understand how much evaluations are costing you and track your expenses.

Bug Fixes

  • Fixed sidebar focus in automatic evaluation results view
  • Fix the incorrect URLs shown when running agenta variant serve
Read more →

Evaluation Speed Increase and Numerous Quality of Life Improvements

  • We've improved the speed of evaluations by 3x through the use of asynchronous batching of calls.
  • We've added Groq as a new provider along with Llama3 to our playground.

Bug Fixes

  • Resolved a rendering UI bug in Testset view.
  • Fixed incorrect URLs displayed when running the 'agenta variant serve' command.
  • Corrected timestamps in the configuration.
  • Resolved errors when using the chat template with empty input.
  • Fixed latency format in evaluation view.
  • Added a spinner to the Human Evaluation results table.
  • Resolved an issue where the gitignore was being overwritten when running 'agenta init'.
Read more →
v0.13.0

Observability (beta)

You can now monitor your application usage in production. We've added a new observability feature (currently in beta), which allows you to:

  • Monitor cost, latency, and the number of calls to your applications in real-time.
  • View the logs of your LLM calls, including inputs, outputs, and used configurations. You can also add any interesting logs to your test set.
  • Trace your more complex LLM applications to understand the logic within and debug it.

As of now, all new applications created will include observability by default. We are working towards a GA version in the next weeks, which will be scalable and better integrated with your applications. We will also be adding tutorials and documentation about it.

Find examples of LLM apps created from code with observability here.

Read more →
v0.12.5

Minor improvements

Toggle variants in comparison view

You can now toggle the visibility of variants in the comparison view, allowing you to compare a multitude of variants side-by-side at the same time.

Improvements

  • You can now add a datapoint from the playground to the test set even if there is a column mismatch

Bug fixes

  • Resolved issue with "Start Evaluation" button in Testset view
  • Fixed bug in CLI causing variant not to serve
Read more →
v0.12.4

New evaluators

We have added some more evaluators, a new string matching and a Levenshtein distance evaluation.

Improvements

  • Updated documentation for human evaluation
  • Made improvements to Human evaluation card view
  • Added dialog to indicate testset being saved in UI

Bug fixes

  • Fixed issue with viewing the full output value during evaluation
  • Enhanced error boundary logic to unblock user interface
  • Improved logic to save and retrieve multiple LLM provider keys
  • Fixed Modal instances to support dark mode
Read more →
v0.12.3

Minor improvements

  • Improved the logic of the Webhook evaluator
  • Made the inputs in the Human evaluation view non-editable
  • Added an option to save a test set in the Single model evaluation view
  • Included the evaluator name in the "Configure your evaluator" modal

Bug fixes

  • Fixed column resize in comparison view
  • Resolved a bug affecting the evaluation output in the CSV file
  • Corrected the path to the Evaluators view when navigating from Evaluations
Read more →