Real-world validation.
Real-time trust.
Healthcare AI, verified.

We provide the verifiable Human-in-the-Loop infrastructure to ensure AI models meet expert-reviewed quality standards before clinical deployment.

Request Evals Explore Case Studies

model output

verified output

The high cost of unvalidated AI

Models deployed without sufficient expert review create a dangerous gap between technical capabilities and clinical safety requirements.

Unsafe recommendations

Unvalidated AI models can provide clinically inappropriate guidance.

Patient harm

Incorrect AI outputs risk patient safety and treatment outcomes.

Legal exposure

Organizations face liability without proper AI validation frameworks.

Loss of trust

Clinicians and patients lose confidence in AI-driven healthcare.

The new standard in clinical AI validation

We have combined rigorous HITL testing with a global clinician network to build the adjudication layer for the next generation of healthcare AI.

PACR

Clinical Reasoning

ICC

Clinician Consensus

BERT

Semantic Analysis

PIP

Clinical Robustness

Clinician expertise, at scale

7,500+ clinicians contributing to AI evaluation

7500+

Active physicians network

The clinical adjudication layer

Human-in-the-loop evaluation for medical AI

Our Human-in-the-Loop system captures the reasoning of top medical experts, creating the ground-truth data needed to train the automated AI judges of tomorrow.

Our foundation isn’t just code. For years, iCliniq has been a global ecosystem of care, serving over 8 million patients.

We applied the same rigorous accountability from our clinical operations to our internal AI tools. We realized that traditional evaluation methods failed to capture the unique interdependence of accuracy, safety, and empathy required in medicine.

8M+

Consultations

100M+

Global users

7500+

Physicians

80+

Specialties

Use cases

Clinical Utility

Measure whether your model is clinically useful-not just correct.

Modern LLMs may provide technically correct answers but still lack actionability, clinical nuance, or safe recommendations. Perpendicular AI evaluates clinical utility using both gold-standard benchmarks and physician review, depending on the product.

What we measure

• Actionable clinical recommendations

• Appropriateness within patient context

• Recognition of red flags

• Safety awareness and risk avoidance

• Alignment with evidence-based practice

Where this matters

• Symptom checkers

• Clinical decision support

• Care coordination assistants

• Provider-facing summarization tools

Preferred Response

Ensure your model expresses the right answer in the right way.

Healthcare requires not only correctness but also the preferred style of communication for clinicians, patients, or internal workflows.

What we measure

• Response format compliance

• Tone (clinical, empathetic, formal, concise)

• Adherence to institution-specific preferences

• Avoidance of moralizing, hedging, or overconfidence

• Toxicity & bias risk

Where this matters

• Patient-facing chatbots

• Medical documentation drafting

• Triage and intake assistants

• Provider feedback tools

Chain of Thought Evaluation

Assess reasoning quality-without revealing sensitive internal reasoning to users.

Many healthcare tasks depend on structured reasoning: differential diagnosis, treatment justification, rule-out logic, and stepwise problem solving.

What we measure

• Clinical reasoning completeness

• Logical coherence and steps

• Differential diagnosis structure

• Ability to reference relevant factors

• Identification of contradictions or invalid reasoning

Where this matters

• Diagnostic support

• Summaries with reasoning trails

• Risk stratification

• Workflow automation

Citations & Evidence Alignment

Evaluate citation correctness + evidence grounded-ness.

In healthcare, unsupported claims pose real risk. Perpendicular AI evaluates whether outputs are factually linked to reputable evidence sources and whether citations are: Correct, Relevant, Non-hallucinated, Consistent with the clinical answer

What we measure

• Citation validity (real vs hallucinated)

• Evidence alignment

• Fact grounding

• Correct referencing of guidelines, labs, imaging findings

• Accuracy of extracted text from papers or EMRs

Where this matters

• RAG systems

• Medical summarization

• Research assistants

• Clinical guideline retrieval

Imaging Response Evaluation

Evaluate multimodal model reasoning with radiology, pathology, dermatology, and other clinical images.

As models expand into image modalities, evaluation must measure diagnostic quality, finding recognition, and risk detection.

What we measure

• Imaging classification & interpretation accuracy

• Recognition of key findings (e.g., nodules, fractures, lesions)

• Correctness of descriptions and implications

• Safety risks (missed critical findings)

• Comparison against expert radiologists (optional HITL)

Where this matters

• Radiology AI assistants

• Dermatology triage

• Pathology image review

• Multimodal clinical decision support

Retrieval-Augmented Generation (RAG) Evaluation

Evaluate the full RAG pipeline-retrieval quality, grounding, and final answer accuracy.

For healthcare, the retrieval component can make or break safety. We provide RAG-specific evaluation metrics to assess how well sources are fetched, grounded, and cited.

What we measure

• Retrieval precision & recall

• Faithfulness to retrieved evidence

• Relevance & context matching

• Hallucination detection

• Confidence mismatch (when the model is wrong but confident)

Where this matters

• Clinical knowledge-bases

• Guideline retrieval tools

• Research summarization

• EMR-linked assistants

11-axis evaluation framework

We transform AI evaluation from a single metric to a comprehensive assessment of clinical viability using a proprietary framework.

Real-world scenarios

High-fidelity workflows drawn from millions of consultations.

Hybrid scoring

Integrating automated metrics (BERT, ICC) with HITL judgment.

Expert adjudication

Continuous oversight by Medical Review Board.

Risk & compliance

Dedicated bias checks aligned with HIPAA/GDPR.

Pricing

Get a tailored enterprise quote built to match your model’s needs, evaluation depth, and delivery timelines.

Frequently asked questions

How do you verify that your clinicians are real and qualified?

Every clinician undergoes identity and credential verification, with licenses checked against national medical boards.

How long does an evaluation take?

Most evaluations are completed within 5-7 business days, depending on volume and complexity.

Is my model data kept private and secure?

Yes, all data is encrypted in transit and at rest. We comply with HIPAA and GDPR for data handling.

Real-world validation.Real-time trust.Healthcare AI, verified.

The high cost of unvalidated AI

Unsafe recommendations

Patient harm

Legal exposure

Loss of trust

The new standard in clinical AI validation

Clinician expertise, at scale

The clinical adjudication layer

Use cases

Clinical Utility

What we measure

Where this matters

Preferred Response

What we measure

Where this matters

Chain of Thought Evaluation

What we measure

Where this matters

Citations & Evidence Alignment

What we measure

Where this matters

Imaging Response Evaluation

What we measure

Where this matters

Retrieval-Augmented Generation (RAG) Evaluation

What we measure

Where this matters

11-axis evaluation framework

Real-world scenarios

Hybrid scoring

Expert adjudication

Risk & compliance

Pricing

Frequently asked questions

Real-world validation.
Real-time trust.
Healthcare AI, verified.