Real-world validation.
Real-time trust.
Healthcare AI, verified.

We provide the verifiable Human-in-the-Loop infrastructure to ensure AI models meet expert-reviewed quality standards before clinical deployment.

model output
verified output
Gradient Background

The high cost of unvalidated AI

Models deployed without sufficient expert review create a dangerous gap between technical capabilities and clinical safety requirements.

Unsafe recommendations

Unvalidated AI models can provide clinically inappropriate guidance.

Patient harm

Incorrect AI outputs risk patient safety and treatment outcomes.

Legal exposure

Organizations face liability without proper AI validation frameworks.

Loss of trust

Clinicians and patients lose confidence in AI-driven healthcare.

Fail Summery

The new standard in clinical AI validation

We have combined rigorous HITL testing with a global clinician network to build the adjudication layer for the next generation of healthcare AI.

PACR
Clinical Reasoning
ICC
Clinician Consensus
BERT
Semantic Analysis
PIP
Clinical Robustness

Clinician expertise, at scale

7,500+ clinicians contributing to AI evaluation

7500+
Active physicians network

The clinical adjudication layer

Human-in-the-loop evaluation for medical AI

Our Human-in-the-Loop system captures the reasoning of top medical experts, creating the ground-truth data needed to train the automated AI judges of tomorrow.

Real-world scenarios
Real-world scenarios

Our foundation isn’t just code. For years, iCliniq has been a global ecosystem of care, serving over 8 million patients.

We applied the same rigorous accountability from our clinical operations to our internal AI tools. We realized that traditional evaluation methods failed to capture the unique interdependence of accuracy, safety, and empathy required in medicine.

8M+
Consultations
100M+
Global users
7500+
Physicians
80+
Specialties

Use cases

Clinical Utility

Measure whether your model is clinically useful-not just correct.

Modern LLMs may provide technically correct answers but still lack actionability, clinical nuance, or safe recommendations. Perpendicular AI evaluates clinical utility using both gold-standard benchmarks and physician review, depending on the product.

What we measure

Actionable clinical recommendations
Appropriateness within patient context
Recognition of red flags
Safety awareness and risk avoidance
Alignment with evidence-based practice

Where this matters

Symptom checkers
Clinical decision support
Care coordination assistants
Provider-facing summarization tools

Preferred Response

Ensure your model expresses the right answer in the right way.

Healthcare requires not only correctness but also the preferred style of communication for clinicians, patients, or internal workflows.

What we measure

Response format compliance
Tone (clinical, empathetic, formal, concise)
Adherence to institution-specific preferences
Avoidance of moralizing, hedging, or overconfidence
Toxicity & bias risk

Where this matters

Patient-facing chatbots
Medical documentation drafting
Triage and intake assistants
Provider feedback tools

Chain of Thought Evaluation

Assess reasoning quality-without revealing sensitive internal reasoning to users.

Many healthcare tasks depend on structured reasoning: differential diagnosis, treatment justification, rule-out logic, and stepwise problem solving.

What we measure

Clinical reasoning completeness
Logical coherence and steps
Differential diagnosis structure
Ability to reference relevant factors
Identification of contradictions or invalid reasoning

Where this matters

Diagnostic support
Summaries with reasoning trails
Risk stratification
Workflow automation

Citations & Evidence Alignment

Evaluate citation correctness + evidence grounded-ness.

In healthcare, unsupported claims pose real risk. Perpendicular AI evaluates whether outputs are factually linked to reputable evidence sources and whether citations are: Correct, Relevant, Non-hallucinated, Consistent with the clinical answer

What we measure

Citation validity (real vs hallucinated)
Evidence alignment
Fact grounding
Correct referencing of guidelines, labs, imaging findings
Accuracy of extracted text from papers or EMRs

Where this matters

RAG systems
Medical summarization
Research assistants
Clinical guideline retrieval

Imaging Response Evaluation

Evaluate multimodal model reasoning with radiology, pathology, dermatology, and other clinical images.

As models expand into image modalities, evaluation must measure diagnostic quality, finding recognition, and risk detection.

What we measure

Imaging classification & interpretation accuracy
Recognition of key findings (e.g., nodules, fractures, lesions)
Correctness of descriptions and implications
Safety risks (missed critical findings)
Comparison against expert radiologists (optional HITL)

Where this matters

Radiology AI assistants
Dermatology triage
Pathology image review
Multimodal clinical decision support

Retrieval-Augmented Generation (RAG) Evaluation

Evaluate the full RAG pipeline-retrieval quality, grounding, and final answer accuracy.

For healthcare, the retrieval component can make or break safety. We provide RAG-specific evaluation metrics to assess how well sources are fetched, grounded, and cited.

What we measure

Retrieval precision & recall
Faithfulness to retrieved evidence
Relevance & context matching
Hallucination detection
Confidence mismatch (when the model is wrong but confident)

Where this matters

Clinical knowledge-bases
Guideline retrieval tools
Research summarization
EMR-linked assistants

11-axis evaluation framework

We transform AI evaluation from a single metric to a comprehensive assessment of clinical viability using a proprietary framework.

Real-world scenarios

Real-world scenarios

High-fidelity workflows drawn from millions of consultations.

Real-world scenarios

Hybrid scoring

Integrating automated metrics (BERT, ICC) with HITL judgment.

Real-world scenarios

Expert adjudication

Continuous oversight by Medical Review Board.

Real-world scenarios

Risk & compliance

Dedicated bias checks aligned with HIPAA/GDPR.

Pricing

Real-world scenarios

Get a tailored enterprise quote built to match your model’s needs, evaluation depth, and delivery timelines.

Frequently asked questions

How do you verify that your clinicians are real and qualified?

Every clinician undergoes identity and credential verification, with licenses checked against national medical boards.

How long does an evaluation take?

Most evaluations are completed within 5-7 business days, depending on volume and complexity.

Is my model data kept private and secure?

Yes, all data is encrypted in transit and at rest. We comply with HIPAA and GDPR for data handling.