FinArrow — Methodology

Folio · Methodology

FinArrow is a deterministic SaaS due diligence engine. This page names the mechanism — the ten-stage pipeline, the seventeen invariants, and the five statistical anomaly checks — in the order the engine applies them. Every threshold cited here exists as a single named, owner-confirmed setting. Every default applied during a run is written to the assumptions log on the report’s final page.

Read it in full if you intend to defend a number computed by FinArrow at an Investment Committee. Read §1 and §2 if your time is short.

Section 01 · The pipeline

The five stages, in the
order the engine runs them.

Same input. Same config. Same output. Always. Pipeline order is non-negotiable; the engine raises before it computes if a stage cannot resolve its inputs.

1.0

Ingest

Load the billing export.

FinArrow accepts a CSV or XLSX from any billing system — Stripe, Chargebee, ChartMogul, FEC, raw bank statements. Column names are mapped automatically against an explicit synonym table; when a column resists matching, an optional Ollama-backed fallback proposes a mapping which is logged per file for audit.

Required fields: customer ID, date, amount, currency, and transaction type. No customer PII is required, and customer identifiers are anonymised before any AI step.

2.0

Normalise

Coerce types, enforce refund sign, convert FX, then expand annual rows.

Refund sign is enforced negative at ingest (refunds entered as positive amounts in the source export are the most common DD bug FinArrow catches). Currency conversion runs next. Only then are annual contracts expanded into twelve synthetic monthly rows — the order matters: expanding before conversion would corrupt the base-currency amount on those rows.

Synthetic rows are flagged as synthetic. They flow through every metric normally; the flag is an audit marker, not a filter.

3.0

Compute MRR

Point-in-time monthly sum. Never a running total.

MRR for month M is the sum of all recurring charges billed in month M. Each month stands alone. This is not a cumulative sum: a cumulative sum was used in an earlier version and was incorrect — it inflated every month by all prior months’ billings and made churn detection impossible (the series became monotonically non-decreasing).

ARR is annualised from the last full month (ARR = MRR × 12). Trailing partial months below 3% of the prior month are dropped, not smoothed.

4.0

Customer metrics

Cohorts, NRR, GDR, logo retention, gross churn.

Cohort-based NRR on a 12-month horizon. GDR capped at 100%. Gross-margin LTV only. No silent defaults. Every assumption logged.

Vintage month is the customer’s first positive recurring charge. Cohort horizons are 12 / 24 / 36 months; logo retention rolls on a 12-month window. The reactivation grace period defaults to 1 month and is capped at 2 (anything longer is almost certainly voluntary churn).

5.0

Render

A nine-page HTML/PDF, with assumptions logged on the last page.

The report renders as HTML, with an optional PDF. Charts are inline (no external image hosts). The final page logs every default applied during the run — the ones you didn’t set, and the setting that produced each one.

Between stages 4 and 5, the engine also runs flag evaluation, the five anomaly checks, the data-quality grading, and (when configured) the Monte Carlo valuation simulation. Those are detailed in §3.

Section 02 · The seventeen ground rules

Seventeen invariants the
engine refuses to violate.

These are not aspirations. Each one is an invariant the engine’s test suite asserts on every change, while a per-edit audit refuses to ship a violation. They are codified in full in the engine’s determinism specification.

01

No threshold is invented in code — every numeric cutoff is a named, owner-confirmed setting.

Every cutoff lives as a named field in one configuration object. The owner confirms the value; the engine only reads it.
02

Which charges count as recurring revenue never changes without sign-off.

Which transaction types count as MRR is the load-bearing classification of the entire pipeline. It cannot drift.
03

Every red or amber flag traces back to the exact threshold that triggered it.

A red or amber flag in the report can be traced — in one click — to the threshold that fired it.
04

Every assumption the engine applies on your behalf is logged.

If you didn’t set it, the report tells you what was set on your behalf.
05

Data is never imputed, filled, or smoothed — save two narrow, logged exceptions.

Two narrow exceptions, both off by default for already-normalised sources, both logged when they fire.
06

If no recurring revenue is found, the run stops with an explanation — never a silent zero.

Silence on a zero-MRR run is the failure mode this rule prevents.
07

ARR bridge shows gross movements. Never net Expansion vs Contraction.

Netting hides the most diagnostic part of the bridge: how much of the movement is up and how much is down.
08

Never use first-vs-last-period as the canonical NRR/GDR/logo retention. Always cohort-based 12m.

First-vs-last is window-length-dependent and not comparable to published SaaS benchmarks.
09

Recurring revenue is measured point-in-time, never as a running total.

A cumulative series only ever climbs — churn becomes invisible in it.
10

Gross dollar retention is always capped at 100%.

Gross dollar retention measures only churn and contraction. Expansion above the base is reflected in NRR, never in GDR.
11

The ARR bridge counts recurring revenue only — one-off and setup fees never inflate it.

Including one-off and setup fees would inflate both opening ARR and the “New” component.
12

LTV uses gross margin, not EBITDA. Revenue-based fallback must be labeled.

EBITDA-LTV creates a circular distortion with CAC, which already deducts S&M spend. The 3× benchmark is calibrated for contribution-basis LTV.
13

Reactivation grace period must be ≤ 2 months (default 1).

Default 1 covers Stripe dunning and card-update failures. A 3-month gap is almost certainly voluntary churn in B2B SaaS.
14

When P&L history is too short to apply the sales-cycle lag, the report says so.

If the engine cannot apply the requested sales-cycle lag, it falls back to zero offset and surfaces a notice. Never silent.
15

Sources that are already monthly (FEC, ChartMogul, Baremetrics) are never re-expanded.

Sources that are already at monthly granularity must not be expanded a second time. Doing so would double-count revenue.
16

Annual contract expansion must happen after FX, before metrics.

Running the steps in the wrong order would corrupt the converted amount on every synthetic monthly row.
17

Rows synthesised from annual-contract expansion are tagged for audit, not filtered — they flow through every metric normally.

The tag is there for the reviewer; it changes no calculation anywhere except the suspected-annual-contract check.

Section 03 · Anomaly checks

Five statistical checks.
Each with one threshold.

Five integrity checks fire on every run, each with a single, owner-confirmed threshold. Every flagged transaction is written to an Excel audit workbook alongside the report — so you can open any flag and verify the underlying transaction yourself.

01 · 05

Orphan amount

A single billed amount that exceeds the customer’s median MRR by a configured multiplier. Catches one-off charges miscoded as recurring — the “hidden ARR” pattern.

Threshold: 2.0× median MRR

02 · 05

Non-recurring charge in the recurring stream

Amounts billed once or twice and never repeated, bucketed by a rounding tolerance. Catches setup fees and migration credits that drifted into the recurring bucket.

Threshold: amounts grouped to the nearest 10

03 · 05

MRR spike

Per-customer monthly MRR that exceeds its own historical z-score band. Catches data-entry mistakes and contract renegotiations that should be reviewed.

Threshold: 2.5σ from the customer’s own mean

04 · 05

Unique large amount

Cross-customer outlier: an amount above the configured percentile of the dataset that occurs only once. Catches one-time professional services billed through the recurring channel.

Threshold: above the 90th percentile, occurring once

05 · 05

Suspected annual contract

Recurring billing with a 10–14 month gap between charges — the unmistakable signature of an unlabelled annual contract. Informational, not a fraud signal, and suppressed automatically once the contract is labelled as annual.

Threshold: 10–14 month gap

Section 04 · Determinism

A property the engine
is tested against.

Same input + same config ⇒ same output, always.

That sentence is the engine’s founding principle, not a marketing claim — it is the single property the test suite asserts on every change. Re-running last quarter’s deal on this quarter’s FinArrow build returns the same nine-page report — bit-for-bit identical — provided the input data and the configuration are unchanged.

Three mechanisms enforce the property in practice. First, every threshold lives as a named, configurable setting — there is no hard-coded number anywhere in the metric logic. Second, every default applied during a run is written to the assumptions log and rendered on the report’s final page, so the reader can always see which setting produced which figure. Third, an automated audit walks the entire codebase on every change, rejecting any stray hard-coded threshold before it can ship.

The Monte Carlo valuation simulation preserves the same property. Its random number generator runs from a fixed seed, so two runs on the same inputs return identical P10 / P50 / P90 enterprise-value distributions to the ninth decimal.

Section 05 · Academic anchor

Developed at
HEC Lausanne.

FinArrow’s methodology was developed by Maxime Bezier as part of a Master’s thesis in Finance at HEC Lausanne, 2026. The thesis quantifies the “Lemon Premium” — the gap between P10 and P50 enterprise-value distributions that arises specifically from definition drift in SaaS unit economics — and grounds the deterministic engine’s metric definitions in the published SaaS benchmarking literature (SaaS Capital, KeyBanc, Bessemer, Fader & Hardie, Damodaran).

The completed thesis will be linked here from the HEC Lausanne repository upon publication.

Section 06 · Constraints

What this does not do.

FinArrow does not benchmark a target against an industry comparable set. It does not predict customer behaviour beyond the v0.2 Monte Carlo’s 36-month horizon. It does not forecast competitive dynamics, executive risk, or product roadmap. It does not let the AI narrative override a deterministic flag — the deterministic flag card always renders first and the AI explanation, where it appears, is labelled “AI-Assisted Interpretation” and visually distinct.

Anonymised peer benchmarks are on the v1.0 roadmap. Until they ship, every metric in a FinArrow report is computed from the target’s own billing data, with no implicit comparison to a population the operator did not select.