How does Arthur AI's Evals Engine support both traditional ML and Generative AI models?
Arthur AI's Evals Engine is designed to be model-agnostic, meaning it can evaluate the performance of both traditional machine learning models (like classifiers, regression, and NLP) and Generative AI systems (such as RAG Co-Pilots, LLMs, and AI Agents). It provides continuous evaluation across various metrics relevant to each model type, ensuring reliability regardless of the underlying AI architecture.
What specific types of guardrails does Arthur AI offer to protect against unwanted outputs?
Arthur AI incorporates built-in guardrails that leverage PII detection, sensitive data identification, custom LLM rules, and regex rules. These mechanisms are designed to block problematic responses and off-brand interactions before they reach end-users, ensuring that AI agents operate within defined safety and brand guidelines.
Can Arthur AI be deployed in a self-managed VPC or on-premise environment, and what are the benefits of these options?
Yes, Arthur AI offers flexible deployment options including self-managed VPC service, BYOCloud, and on-premise installations, in addition to multi-tenant and single-tenant SaaS. These options provide enhanced data security, locality, and control, which are crucial for organizations with strict compliance requirements or specific infrastructure preferences, especially for Enterprise-tier customers.
How does the Startup Partner Program specifically assist venture-backed startups using AI Agents?
The Startup Partner Program is tailored for venture-backed startups focused on building and reliably shipping AI Agents to production. While specific benefits are not detailed, it aims to provide support and resources to help these startups overcome the challenges of deploying AI agents effectively, likely including specialized guidance, discounted access, or tailored features.
What is the difference in data retention and monitoring capacity between the Free, Premium, and Enterprise plans?
The Free plan offers 7 days of data retention, monitoring for up to 4 use cases, 5k jobs, 300k spans, 12k inferences, and 3k evals. The Premium plan extends this to 30 days of data retention, up to 100 use cases, 20k jobs, 1.2M spans, 100k inferences, and 75k evals. The Enterprise plan provides unlimited data retention and custom capacities for jobs, spans, inferences, and evals, along with advanced monitoring and support features.