Performance

Track Record

Three layers of evidence. Every live PDUFA classification documented against the actual FDA outcome — wins and losses both on the record. A 102-event retrospective backtest running the same PoA engine against 25 months of FDA history. And a $1M portfolio simulation that compounds each classification through time to show what a subscriber mirroring the calls would have experienced.

Resolved Events

—

approved + decided

Correct Calls

—

loading accuracy

Pending / Delayed

—

awaiting resolution

STN Calls Correct

—

sell-the-news layer

Loading track record from database…

Live Track Record — Pre-Decision Calls

Ticker	Drug / Indication	PDUFA Date	PoA	Grade	STN Risk	FDA Outcome	Actual Move	PoA ✓
Fetching track record from database…

Retrospective Validation

102-Event Backtest

Every NME approval and CRL the FDA issued between February 2024 and March 2026. Rescored retroactively using the same PoA engine that runs on live events today. Built as a calibration test — not a curve-fit.

Methodology PoA layer only. STN / dilution / convexity layers are not backtested — their inputs (30-day run-up, shelf registration status, cash runway, proximity) require point-in-time data unavailable for 2024 events. Those three layers stay validated by the live scorecard above. The scoring functions (classifyIndication, getPathwayRate, parseAdcomMultiplier) are byte-for-byte identical to the production score-events engine. Source: FDA CDER NME 2024 + 2025 lists and Drugs@FDA CRL database.

Total Events

—

Feb 2024 → Mar 2026

Calls Made

—

APPROVE or CRL

Binary Accuracy

—

on calls made

WATCH Rate

—

no directional call

PoA Calibration

For each PoA bucket, the share of drugs that actually approved. A calibrated model matches: events scored 80–89% PoA should approve ~85% of the time. The vertical tick marks the bucket midpoint; the bar shows actual approval rate observed.

Loading calibration data…

Reading the Calibration The 80–89% bucket landing close to 100% approved and the 30–49% bucket landing near 0% is what you'd want to see — the model is directionally right at the extremes. The 50–69% bucket (the WATCH tier) over-approves relative to the midpoint: that's because WATCH is intentionally a conservative label, not a neutral one. When PoA lands in that band the scanner declines to issue a directional call rather than forcing one. The events that end up there tend to approve anyway because the clinical data was there but some other input (AdCom vote, pathway type, resubmission status) pulled the score down. Treat the WATCH bucket as "no position taken" — not as a miscalibrated APPROVE call.

All 102 Scored Events

— rows

Date	Ticker	Drug	Indication Area	Type	PoA	Call	Outcome	✓
Loading 102 events…

Backtest IRR — Applied to 102 Events

Same $1M portfolio simulation applied to the full backtested universe above. Each event sized equally, direction set by scanner_call (APPROVE → long, CRL → short, WATCH → skipped — no capital deployed), compounded chronologically through 25 months of FDA decisions.

Position size per trade: 5% default — smaller sizing for 102-event horizon

Starting Capital

$1.00M

Feb 2024 baseline

Final Equity

—

CAGR / IRR

—

Max Drawdown

—

peak-to-trough

Equity curve — backtest IRR · 102-event sequence

Loading backtest curve…

Trades Taken

—

Win Rate

—

Avg Winner

—

Avg Loser

—

Backtest Trade Sequence

#	Date	Ticker	Dir	Outcome	Return	Position	P&L	Equity
Loading backtest sequence…

What the backtest IRR is — and is not This takes the 102-event backtest universe and asks a different question than the calibration curve above: not "was the PoA well-calibrated" but "what would the compounded P&L have looked like if you actually traded every scanner call on real-market data." Only events with real Polygon.io close-to-close moves are included — placeholder (PRIVATE), foreign-exchange (BOT.AX, BSLN, LUMI, HLB), and unfunded (CHINA, SUNPHARMA) tickers are excluded rather than modeled. This shrinks the trade count from all ~92 directional calls to the ~73 that have real post-decision price data, and keeps every P&L figure traceable to actual market activity. It still only validates the PoA layer — the STN, dilution, and convexity filters that would have cut or sized down the worst setups aren't applied here, so in practice a trader using the full scanner would have skipped some of these and the curve would differ. Non-overlapping events and no leverage means IRR collapses to CAGR.

Capital Simulation

Portfolio IRR

What a subscriber mirroring the live scanner calls would have experienced. $1M starting capital, fixed-percent position sizing, compounded chronologically through every resolved classification. Non-overlapping events, no leverage — so IRR collapses cleanly to CAGR.

Methodology Starting capital $1,000,000. Each resolved trade sized at a fixed percentage of current account equity (toggle below). Trades executed chronologically by PDUFA decision date. WATCH-list classifications (no position taken) are excluded. trade_type determines direction — LONG uses return_pct as-is, SHORT inverts it. Because PDUFA events rarely overlap at the portfolio level, the sequence compounds cleanly and IRR collapses to CAGR: (ending / starting)^(1/years) − 1. No leverage, no options, no overlapping positions, no intra-trade management. Fees and slippage not modeled.

Position size per trade: % of current account equity · rebased each trade

Starting Capital

$1.00M

Jan 2024 baseline

Final Equity

—

Annualized (CAGR)

—

across window

Max Drawdown

—

peak-to-trough

Equity curve · $1M → final

Loading simulation…

Trades Taken

—

Win Rate

—

Avg Winner

—

Avg Loser

—

Trade-by-Trade Sequence

#	Date	Ticker	Dir	Result	Return	Position	P&L	Equity
Loading trades…

Small-N Caveat Live portfolio simulation currently reflects — resolved trades. Statistical significance at this sample size is limited — the simulation is presented as directional evidence, not a strategy attribution. The 102-event retrospective backtest above provides the larger-sample calibration check. As the resolved count grows, this simulation becomes the primary performance artifact.

How We Track Accuracy

Every PDUFA event scored by Submarine Catalyst is logged before the FDA decision date. After the decision, the actual outcome (Approved, CRL, Delayed, Withdrawn) is recorded against our predicted PoA score and grade.

A prediction is marked correct if: the scanner scored PoA ≥ 65% and the drug was approved, OR the scanner scored PoA < 50% and the drug received a CRL/rejection. Delayed events are not counted as wins or losses.

Sell-The-News (STN) Risk Score

Approval ≠ stock move. The scanner includes a Post-Approval Move Score that predicts whether an FDA approval will actually move the stock. This was added after observing that RCKT was approved (PoA correct) but the stock dropped 20% — a textbook sell-the-news event.

The STN Risk score evaluates:

• PoA priced-in factor — Higher PoA = more priced in = less upside on approval
• Application type — sNDA/supplements are incremental, not transformative
• Revenue potential — Small/rare indication vs large market blockbuster
• AdCom unanimity — Unanimous vote = zero surprise value in approval
• Resubmission status — Post-CRL approval was already expected
• Competitive landscape — Crowded market reduces differentiation
• First-in-class / PRV eligible — Unique value drivers that support price

Risk levels: VERY LOW → LOW → MODERATE → ELEVATED → HIGH

The PoA model tells you IF it gets approved. The STN risk tells you if that approval is worth trading.

The Backtest — Why It Exists

A short live track record is a short live track record, and sophisticated biotech investors know it. The retrospective backtest runs the same PoA engine against every FDA NME decision from February 2024 forward — a fixed, auditable universe of 102 events pulled from the FDA CDER NME lists and the Drugs@FDA CRL database.

The scoring code is the same code that runs in production: the three core functions (classifyIndication, getPathwayRate, parseAdcomMultiplier) are duplicated byte-for-byte from the live score-events edge function, with a synced-block comment preventing silent divergence. The backtest can only validate what production actually uses.

This page is updated within 24 hours of every live PDUFA resolution. The backtest is re-run any time production PoA logic changes. No exceptions. No deletions.

Get Access — $29.99/month

Full scanner · 60+ scored events · AI research tool