Skip to main content

Core Concepts

Understand the fundamental building blocks of PromptReports and how they work together to create a powerful prompt engineering platform.

Platform Architecture#

PromptReports is built around a hierarchical structure that mirrors professional software development workflows. Understanding this architecture helps you organize your work effectively and leverage all platform capabilities.

Prompt Folders

Top-level containers for organizing related prompts. Similar to projects or repositories.

Prompts

Individual prompt templates with variables, stored within folders.

Versions

Immutable snapshots of prompts that enable change tracking and rollback.

Test Datasets

Collections of test cases for systematic prompt evaluation.

Prompt Folders#

Prompt folders are the primary organizational unit in PromptReports. Think of them as projects or repositories that contain all related prompts, datasets, and configurations.

Each folder can contain multiple prompts, and prompts within a folder share access to the folder's test datasets. This makes it easy to evaluate multiple prompt variations against the same test cases.

Prompts & Versions#

Prompts in PromptReports are versioned, meaning every change creates a new immutable version. This approach provides several benefits:

FeatureDescription
Change HistoryView complete history of all changes with diffs
RollbackInstantly revert to any previous version
A/B TestingCompare performance between versions
Promotion FlowMove versions through dev/staging/production stages
Audit TrailTrack who made what changes and when

Each prompt version can be in one of several states:

StateDescriptionCan Edit?
DraftWork in progress, not yet saved as versionYes
DevelopmentSaved version under active developmentNo (create new)
StagingVersion being tested for productionNo
ProductionLive version serving real trafficNo
ArchivedDeprecated version kept for referenceNo

Test Datasets#

Test datasets are collections of input-output pairs (or just inputs) used to evaluate prompt quality systematically. They're essential for:

  • Regression Testing: Ensure changes don't degrade quality
  • Benchmarking: Compare different prompt versions objectively
  • Quality Metrics: Track performance over time
  • Edge Cases: Document and test known difficult scenarios

Datasets can be created in multiple ways:

Manual Entry

Add test cases one by one through the UI.

CSV Import

Upload test cases from spreadsheets or other sources.

From History

Create datasets from past prompt executions.

Evaluations#

Evaluations run your prompts against test datasets and measure quality. PromptReports supports several evaluation types:

TypePurposeWhen to Use
Batch EvaluationRun prompt against all dataset rowsRegular quality checks, before deployments
A/B TestingCompare two versions with statistical significanceDeciding between prompt variations
Pairwise ComparisonHead-to-head comparison on same inputsDetailed quality analysis
BacktestingTest new version against historical dataUnderstanding impact of changes
Regression TestingCompare against baseline before promotionPreventing quality degradation

Workflows#

PromptReports supports collaborative workflows for teams:

Key Terminology#

Here's a quick reference of important terms used throughout PromptReports:

TermDefinition
PromptA template containing text and variables that generates AI responses
VersionAn immutable snapshot of a prompt at a point in time
VariableA placeholder in a prompt (e.g., {{name}}) that gets replaced at runtime
PresetA saved set of variable values for quick testing
Context FileAdditional content injected into prompts for reference
DatasetA collection of test cases with variable values and optional expected outputs
EvaluationA run of a prompt against a dataset to measure quality
PromotionMoving a version from one stage to another (e.g., dev → production)
RegressionQuality degradation compared to a baseline version
BacktestTesting a new version against historical execution data