The future of AI is plural.

Grounded in peer-reviewed and emerging multi-agent AI research, AskPlural runs a structured research pipeline across vendor-distinct frontier models — with live web retrieval, blind cross-critique, atomic claim verification, and a final synthesis paired with explicit sceptic dissent.

How it works

A research contract, a panel, atomic verification, then synthesis with dissent.

Vendor-distinct frontier models answer in parallel against a controlled evidence ledger, critique each other anonymously, revise against the critique, have every factual claim graded against its cited source, and only then does a synthesiser write the answer — with an independent sceptic’s dissent shown alongside.

The problem

One model, asked once, is a confident guess.

Every frontier language model was trained on overlapping data, tuned with similar techniques, and optimised for similar benchmarks. Ask the same hard question to any one of them and you get a fluent, assertive answer that often sounds more certain than it has any right to be. Hallucinations, stale facts, blind spots, subtle bias — all of it comes out wearing the same confident voice.

The standard fixes — better prompting, more retrieval, bigger models — reduce mistakes but don’t surface the ones that remain. If the model is wrong, you don’t usually find out until you act on the answer.

The research answer

Have the models disagree in the open, then verify.

The last three years of multi-agent debate research — at ICML, ICLR, ACL and EMNLP — have substantially sharpened the picture. The foundational result (Du et al., 20231) showed the basic mechanism: independent models that cross-examine each other’s reasoning catch errors a single model would defend. One model alone will assert a wrong answer confidently; a panel reading each other’s working will often surface the flaw.

Since then the programme has tightened. Heterogeneity — models from different labs, not copies of the same one2 — matters more than sheer agent count. Handing each agent a different slice of the retrieved evidence beats letting all of them anchor on the same sources9. Hiding peer confidence prevents over-confidence cascades7. Auditing disagreement points in the transcript recovers correct minority answers that majority voting loses entirely8. And factuality is best assessed atomically — long-form answers are tangles of supported and unsupported sub-claims, and the sub-claims must be checked against the cited evidence one by one6.

AskPlural implements these findings together — and adds one more: a recent preprint5 calls it “architectural heterogeneity” and argues it’s what prevents consensus collapse, the failure mode where a panel of models from the same lab confidently converge on the same wrong answer because they inherited the same biases in training.

What we do

A research contract, a panel, atomic verification, then synthesis with dissent.

When you ask AskPlural a hard question, a small preflight sets the research contract; the evidence ledger is built from the URLs you named plus targeted live retrieval; vendor-distinct frontier models answer in parallel; they critique each other anonymously; the critique surfaces gaps that drive a second round of targeted research and revision; every factual claim in every revised brief is decomposed and graded against its cited sources; a primary synthesiser writes the answer and an independent sceptic writes a dissent; and a final citation audit gates what reaches you. Every stage is visible in the UI — you can audit any of it — but what you read is the synthesis, not a transcript to reconcile yourself.

  1. 1

    Set the research contract

    A small preflight model reads your question and configures the rest of the pipeline — what kind of question this is, which source types matter, how strict the citation gate should be, and any URLs you named that should be fetched directly. It does not answer the question; it sets the rules.

  2. 2

    Acquire evidence

    If you named specific URLs, those pages are fetched first and added as primary sources. Then live web search runs in three angled framings — neutral, supportive, and challenging — and the results are deduplicated, scored for authority and recency, and assembled into a controlled evidence ledger that every downstream stage cites by source ID.

  3. 3

    Independent briefs

    Vendor-distinct frontier models answer in parallel against the ledger. Each takes a deterministic lens — empiricist, sceptic, theorist, pragmatist, risk auditor — so the panel covers complementary angles by construction rather than coincidence. No model sees any other model’s answer at this stage.

  4. 4

    Blind cross-critique

    Each model is then shown the other models’ briefs with their authors hidden and labels shuffled. They write structured attacks: unsupported claims, missing primary sources, citation mismatches, the strongest counterargument. Anonymising the briefs is what keeps the critique honest — the model can’t defer to a “famous” lab.

  5. 5

    Evidence repair + revision

    Critique gaps are clustered into targeted retrieval tasks. The ledger grows. Every model then revises its brief with sight of its peers’ first-round work, the critiques aimed at it, and the augmented evidence. This is the debate literature’s core loop running live: independent proposal, peer review, revise.

  6. 6

    Atomic claim verification

    Each revised brief is decomposed into atomic factual claims. Two independent verifier models grade entailment for every claim against its cited sources. Deterministic checks run alongside — URL liveness, date sanity, quote match. Each claim ends up with a support grade: high, medium, low, or unsupported.

  7. 7

    Synthesis + sceptic dissent

    Two parallel calls read the verified claims and full transcript. A primary synthesiser writes the user-facing decision memo. A separate sceptic call writes a structured dissent. They are blind to each other — when they disagree materially that is an honest disagreement, and we surface both. Disagreement is information, not a failure mode.

  8. 8

    Citation audit

    Before the answer reaches you, a final auditor reads the synthesised memo and checks every factual claim against the ledger. Invented citations, weak entailment, internal contradictions, and stale model names are flagged. Claims that don’t pass are removed, downgraded, or kept with an explicit “unsupported judgement” disclaimer.

Cross-critique is the debate literature’s core loop. Atomic verification turns “sounds supported” into “is supported by the cited source”. Dissent shown alongside the answer turns disagreement from a failure mode into a signal. Every claim in the final answer lands with a citation back into the evidence ledger.

Why it matters

Disagreement is information.

When the panel converges — from different priors, looking at different sources, reading each other’s strongest objections — that is much stronger evidence than a single model’s confident assertion. When it does not converge, the second round of research is aimed precisely where the disagreement lives, and any unresolved disagreement is preserved in the answer rather than smoothed away.

AskPlural is for the questions where you’d rather know the panel is uncertain than be told a confident wrong thing.

Pricing

Free to try. £50/month unlocks the full pipeline.

Every run spins up a heterogeneous panel of frontier models, live web retrieval, atomic verification, and a synthesis pass paired with sceptic dissent — real compute on every question, so heavy use is paid. See current plans for the details. Full pricing here.

References

Where the research comes from.

Peer-reviewed here means accepted at ICML / ICLR / ACL / EMNLP / NeurIPS — not “on arXiv.” arXiv preprints that haven’t cleared a conference are labelled emerging.

  1. 1

    ICML 2024 · peer-reviewed

    Improving Factuality and Reasoning in Language Models through Multiagent Debate

    Du, Li, Torralba, Tenenbaum & Mordatch

    The foundational result: multiple models reading each other’s reasoning catch errors any single model would defend. Establishes that factuality and reasoning improve when independent models cross-examine each other rather than answer alone.

    Read on arXiv
  2. 2

    ACL 2024 · peer-reviewed

    ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs

    Chen, Saha & Bansal

    Shows consensus quality is higher when agents are drawn from different model families rather than repeated instances of the same model, and that a transcript-level judge outperforms majority voting. Underwrites the heterogeneous-panel design.

    Read on arXiv
  3. 3

    EMNLP 2024 · peer-reviewed

    Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

    Liang et al.

    Motivates debate as the corrective for the Degeneration-of-Thought problem that emerges when a single model becomes locked into its initial reasoning path.

    Read on arXiv
  4. 4

    arXiv 2026 · emerging

    Demystifying Multi-Agent Debate

    Zhu et al.

    Shows performance improves when the initial debate pool is made more diverse and when agents communicate calibrated confidence during revision. Influences the AskPlural design choice that lens roles are deterministic and confidence is verified, not asserted.

    Read on arXiv
  5. 5

    arXiv 2026 · emerging

    Heterogeneous Debate Engine: Identity-Grounded Cognitive Architecture for Resilient LLM-Based Ethical Tutoring

    HDE paper

    Argues that architectural heterogeneity — models from different labs — prevents “consensus collapse”, where homogeneous panels share the same training biases and confidently converge on the same wrong answer.

    Read on arXiv
  6. 6

    FActScore (EMNLP 2023) · peer-reviewed

    FActScore: Fine-Grained Atomic Evaluation of Factual Precision in Long-Form Generation

    Min et al.

    Shows that long-form answers which look supported are often a tangle of supported and unsupported atomic facts. Justifies decomposing each brief into atomic claims and verifying entailment claim-by-claim — the basis of AskPlural’s Stage 5.

    Read on arXiv
  7. 7

    arXiv 2025 · emerging

    Enhancing Multi-Agent Debate System Performance via Confidence Expression

    Wu et al.

    Finds that when debating agents see each other’s confidence scores the panel drifts toward over-confidence and loses signal. Informs the AskPlural design choice that cross-critique turns on reasoning and disconfirmation conditions, not assertiveness.

    Read on arXiv
  8. 8

    arXiv 2026 · emerging

    Auditing Multi-Agent LLM Reasoning Trees Outperforms Majority Vote and LLM-as-Judge

    AgentAuditor paper

    Shows that adjudicating at divergence points — by comparing localised branch evidence — beats both majority vote and generic LLM-as-judge, recovering correct minority answers where voting loses them entirely. Underpins the AskPlural design choice to surface dissent alongside the synthesis instead of voting it away.

    Read on arXiv
  9. 9

    arXiv 2025 · emerging

    Retrieval-Augmented Generation with Conflicting Evidence (MADAM-RAG)

    Wang, Prasad, Stengel-Eskin & Bansal

    Assigns each agent a different subset of the retrieved evidence, then lets them debate. Reports factuality gains of 11–16 percentage points on benchmarks with ambiguous or conflicting documents. Basis for the per-analyst evidence partitioning: agreement reached by analysts reading different sources is much stronger evidence than agreement when everyone read the same article.

    Read on arXiv

Questions, feedback, partnerships:

hello@askplural.ai

Privacy Policy

Last updated: April 2026

Who we are

AskPlural is operated from the United Kingdom (“we”, “us”). We are the data controller for personal data collected through askplural.ai. Contact us at hello@askplural.ai.

What we collect

  • Account data— your email address, a hashed magic-link token, your credit balance, and a log of credit transactions. Created when you sign in.
  • Query content— the questions you submit and the evidence retrieved to answer them. Held only for the processing window.
  • Payment data— we do not see or store card details. Our payment processor (see below) handles all card data directly.
  • IP addresses— held in memory for short-window rate-limiting and abuse prevention. Not persisted to a database.
  • Analytics— Umami, a privacy-first cookie-free analytics service.

How we use your data

We process your questions solely to generate the requested output. We do not use them to train AI models. We do not sell your data. We use your email address to send magic sign-in links, purchase receipts, and service announcements directly relevant to your account.

Sub-processors

We rely on the following processors to run the service. Each handles your data under its own privacy policy:

  • LemonSqueezy (merchant of record for all purchases; processes payments via Stripe) — payment, billing, tax.
  • Supabase(EU region) — account, credits, and transaction data.
  • Resend— transactional email (magic links, receipts).
  • Vercel— web hosting and request routing.
  • Model providers(OpenAI, Anthropic, Google, xAI, DeepSeek) — the frontier model panel that answers your questions.
  • Tavily— live web search and page extraction used as evidence.
  • Umami— cookie-free analytics.

International transfers

Some sub-processors (model providers, LemonSqueezy, Stripe, Resend, Vercel) operate in the United States. Transfers rely on Standard Contractual Clauses or equivalent safeguards under UK GDPR.

Retention

Account rows and credit transaction ledger entries are retained for as long as your account is active. Chat transcripts are stored in your browser’s localStorage, not on our servers. When you delete your account from the account page, your user row, credit ledger, and pending magic links are removed. Purchase order history is retained by LemonSqueezy per their policy for tax compliance.

Your rights

Under UK GDPR, you have the right to access, correct, export, restrict, or delete your personal data, and to object to processing. You may exercise most of these directly from the account page. For anything else, email hello@askplural.ai. You have the right to lodge a complaint with the Information Commissioner’s Office (ico.org.uk).

Terms of Service

Last updated: April 2026

The service

AskPlural runs multi-agent research across a panel of vendor-distinct frontier language models with live web retrieval, blind cross-critique, atomic claim verification, and a final synthesis paired with sceptic dissent.

Use

Use the service only for lawful purposes. Do not attempt to circumvent rate limits, reverse-engineer prompts, or submit content that infringes on third-party rights.

AI-generated content

Output is produced by AI and provided for informational purposes only. It should not be treated as professional advice. Running several models side-by-side surfaces more uncertainty than a single chatbot does — use that signal. Independently verify critical information before acting on it.

Pricing

Free to try; paid plans unlock heavier usage. Current plans and limits are published on the pricing page, and may change with reasonable notice.

Intellectual property

You retain ownership of your inputs and own the output generated from them. The AskPlural platform, branding, and workflows are owned by us.

Liability

To the maximum extent permitted by law, AskPlural shall not be liable for indirect, incidental, or consequential damages. Our total liability shall not exceed the amount you paid us in the preceding 12 months.

Governing law

These terms are governed by the laws of England and Wales.

Contact

Questions? hello@askplural.ai

Terms of Sale

Last updated: April 2026

Seller & merchant of record

AskPlural is operated from the United Kingdom. Purchases are processed by LemonSqueezy as the merchant of record. LemonSqueezy collects payment, applies any VAT or sales tax that is due in your jurisdiction, and issues a tax-compliant receipt. Card payments are handled by Stripe; we never see your card details.

What you’re buying

Credits are prepaid usage units for the AskPlural service. Each research run costs 10 credits. Credits do not expire, have no cash value, cannot be transferred between accounts, and can only be used on askplural.ai.

Price & currency

Credit packs are priced in GBP on the pricing page. LemonSqueezy may present the checkout in your local currency; the final charge is shown on the LemonSqueezy checkout page before you confirm payment.

Delivery

Credits are added to your AskPlural account automatically once payment is confirmed, typically within a few seconds. If your balance hasn’t updated within ten minutes, email hello@askplural.ai with your LemonSqueezy order number.

Refunds

Unused credits can be refunded in full within 14 days of purchase, no questions asked. Email hello@askplural.ai with your order number; we will refund via LemonSqueezy and remove the unused credits from your balance.

Used creditsare non-refundable. When you run an analysis you are receiving the service you paid for — real compute is spent on your behalf across multiple model providers and a retrieval provider, and that cost cannot be reversed. By using credits you consent to immediate performance and acknowledge that you lose the statutory right to cancel under Regulation 37(1)(a) of the Consumer Contracts (Information, Cancellation and Additional Charges) Regulations 2013 for those consumed credits.

If a run fails for technical reasons on our side, credits are automatically refunded to your balance — you do not need to email us.

Chargebacks

Please email us first if something has gone wrong — we’re quick to respond and a refund is usually simpler for everyone than a card dispute. Chargebacks may result in suspension of the associated account and clawback of any unused credits.

Governing law

These terms of sale are governed by the laws of England and Wales. Your statutory rights as a consumer are not affected.