MethodFocusInsightsWhite PaperRequest a Brief
Institutional Reference Document

Bayesian Spatial Inference, Detection Correction, and Systematic Geological Mispricing

A capital allocation engine that identifies where geological truth probability is structurally underpriced by the exploration market.

v1.12 — February 2026·Strata Discovery Research·112 Pages

00 — Abstract

Mineral exploration capital is systematically misallocated. Across mature mining jurisdictions, the vast majority of geological terrain has never been drilled. Drill holes cluster around a small fraction of the landscape, creating a structural conflation between geological prospectivity and exploration history. Areas with high drill density appear more prospective because more deposits have been found there — but deposits were found because drilling occurred, not because geology is fundamentally more favorable.

Strata Discovery formally decomposes the observable signal into two latent processes: geological truth probability (the probability that a cell contains mineralization given geological evidence) and detection probability (the probability that a deposit would have been detected given exploration history). Traditional methods estimate the product directly, embedding exploration bias into every prediction. We estimate them separately. Cells where geological probability is high and detection probability is near-zero represent the maximum-mispricing frontier — ground where the model assigns high probability of mineralization but where no exploration has ever tested the hypothesis.

Key Findings

Research Outcomes

>15×

concentration at 5% area coverage — examining 5% of the jurisdiction captures over fifteen times more deposits than random allocation.

>95%

of known deposits fall within the top fraction of scored cells. Near-complete capture at a fraction of the search space.

Frontier

maximum-mispricing cells where geological probability is high but detection probability is near-zero — ground the market has never tested.

Contents

Ten Chapters

01

The Structural Inefficiency

Why the vast majority of geological terrain has never been drilled. The exploration paradox, detection bias, and why conventional prospectivity mapping conflates geology with exploration history.

02

The p × q Decomposition

Architecture of the generative model. Separating geological truth from detection probability using distinct feature sets, distinct priors, and a decision firewall.

03

The Data Foundation

Multi-domain geoscience evidence — structural geology, geophysics, radiometrics, terrain, surficial geology, and drilling records — harmonized on a standardized spatial grid.

04

Feature Construction and Governance

From raw geological evidence to structured predictors. Anti-bias triage, the decision firewall, and automatic feature selection.

05

The Inference Engine

Full-mode Bayesian inference with uncertainty quantification. Regime-specific calibration, hierarchical pooling, and posterior stability.

06

Validation and Proof of Edge

Four null hypothesis tests, temporal holdout, spatial block holdout, ablation analysis, independent backtesting, and rank stability.

07

The Capital Engine

Target delineation, classification, tiering, sweet-spot identification, attribution-driven narratives, and phased exploration recommendations.

08

Governance and Reproducibility

SHA256 artifact sealing, version immutability, 24 production quality gates, decision firewall enforcement, anti-bias triage as standing governance, and the reasoning protocol.

09

The Compounding Loop

Feedback infrastructure design, data and detection correction moats, capital speed advantage, institutional credibility through sealed version history, and improving-returns exploration.

10

Limitations and Boundary Conditions

Resolution constraints, geological model assumptions, spatial nonstationarity, data regime shifts, coverage gaps, conditional independence, and operational parameters for capital allocation.

System Architecture

The system operates as a closed feedback loop — from raw data integration through generative inference to capital deployment, with each exploration campaign improving the next cycle.

01

Data Foundation

Multi-domain geoscience evidence spanning structural geology, potential-field geophysics, gamma-ray spectrometry, terrain, surficial geology, and drilling records — projected onto a standardized spatial grid across each target jurisdiction.

02

Generative Inference

A two-layer Bayesian model that separately estimates where geological processes have concentrated gold (truth) and where exploration has been sufficient to detect it (detection). A hard firewall prevents any detection feature from entering the truth layer.

03

Automatic Selection

From 48 candidate geological variables, the model objectively identifies the 8 that carry genuine signal. Seven-criteria anti-bias triage blocks features that encode exploration history rather than geological merit.

04

Target Classification

Targets classified across three maturity tiers: frontier opportunities in untested ground, camp extensions of known districts, and elevated-interest areas. Maximum-mispricing cells identified where geology is favorable and detection is near-zero.

Part I

The Problem

pp. 8–22

A Market That Conflates Looking with Finding

The geological endowment of mature mining jurisdictions is not in question. The Superior Province — the largest Archean craton on Earth — hosts the Abitibi greenstone belt, the single most productive orogenic gold district in history. The camps of Timmins, Kirkland Lake, Red Lake, Hemlo, and Detour Lake have collectively attracted billions of dollars in exploration and development capital over more than a century. Similar patterns recur across every major mining jurisdiction worldwide.

Yet the vast majority of geological terrain in these jurisdictions has never been systematically tested. Drill holes concentrate in a small fraction of the landscape, leaving the overwhelming majority of geological ground unexamined. This is not a reflection of geological uniformity. Productive geological belts extend far beyond established camps, with greenstone belt segments, structural corridors, and favourable lithological assemblages distributed across vast areas that have never been drilled.

Drilling clusters around infrastructure — roads, power lines, railheads, and existing mine sites. Remote areas have effectively zero drilling despite hosting geologically continuous extensions of productive belts. Each new discovery within a camp attracts follow-up programs, creating a positive feedback loop that amplifies drilling density in already-explored ground. The result is a systematic conflation between exploration effort and geological prospectivity.

The paradox is structural: the most geologically favourable ground may be the least explored ground, but there is no systematic mechanism to identify it. A first-order shear zone in an unexplored portion of a greenstone belt has the same structural architecture as a first-order shear zone hosting a producing mine 100 km away. The difference is detection, not geology.

Conventional prospectivity methods — weights-of-evidence, logistic regression, random forests, gradient-boosted trees — share a common architectural flaw: they train on “was a deposit found here?” This label conflates two fundamentally different quantities. Cross-validation selects the model that best predicts drilling patterns, not geological prospectivity. The strongest predictors in discriminative models are invariably related to exploration access: distance to nearest drill hole, drill density, distance to nearest road. These features achieve high performance precisely because they encode detection history, not geological merit.

Capital that follows these maps is not accessing geological alpha — it is accessing the reflected image of its own prior investment. Billions of dollars in exploration capital have been deployed based on prospectivity maps that embed detection bias. The maps recommend areas that have already been explored, reinforcing the cycle. The marginal value of the next drill hole in a mature camp is lower than the marginal value of the first drill hole in a structurally favourable but untested greenstone segment nearby.

At typical base rates of mineralisation, a model that predicts “no deposit” everywhere achieves near-perfect accuracy. Accuracy is not a useful metric. The question is not whether deposits exist, but where geological evidence concentrates probability — and whether the exploration market has tested those concentrations.

Part II

The Framework

pp. 23–36

Two Questions, Separated by Architecture

The observable outcome — whether a deposit has been found at a given location — is the product of two independent processes. Geological truth probability asks: given structural, geophysical, and terrain evidence, what is the probability that gold mineralisation exists here? Detection probability asks: given drilling history, survey coverage, and overburden, what is the probability that a deposit would have been found if one existed? The observation of “no deposit” is fundamentally ambiguous: it could mean the geology is unfavourable, or it could mean the geology is favourable but no exploration has occurred.

Traditional methods estimate this product directly. When detection probability approaches zero — as it does across the vast majority of any jurisdiction’s terrain — the product approaches zero regardless of geological merit. This is the mechanism of mispricing. Any model that trains on the joint product without decomposition will learn to predict drilling patterns, not geological prospectivity. No amount of hyperparameter tuning, feature selection, or cross-validation can correct this bias within the discriminative framework.

Strata Discovery builds a generative model where geological truth and detection are first-class latent variables with distinct data-generating processes. A hard programmatic firewall — not a convention, but an assertion that halts execution if violated — prevents any detection-layer feature from entering the truth layer. Structural geology, geophysics, and terrain evidence drive the truth estimate. Drilling density, survey coverage flags, and overburden characteristics drive the detection estimate. The predictor sets are non-overlapping by design.

Each jurisdiction is discretised into a standardised grid — the canonical spatial frame into which all data sources are projected. Every dataset is transformed into cell-level features within this common frame: vector data rasterised as distance-to-nearest and kernel density, raster data resampled to cell-level statistics, point data assigned to the nearest grid cell. The grid contract ensures spatial consistency across all analyses.

The actionable output is not a conventional prospectivity score. It is the identification of cells where geological truth probability is high and detection probability is near-zero: ground that is geologically indistinguishable from cells hosting known deposits, but that no one has drilled. The score that traditional methods estimate — the joint product — is what the market already prices. The truth probability, evaluated where detection is near-zero, is what the market has not priced.

This reframing has a specific consequence: a probability surface enables expected-value budgeting across an entire jurisdiction. Instead of evaluating dozens of prospects sequentially through heuristic filters, a decision-maker can identify the cells with the highest expected return per exploration dollar — factoring in both geological probability and the information value of first-mover exploration in undetected ground.

Part III

The Evidence Base

pp. 37–48

Structured Geological Intelligence at Provincial Scale

The system integrates multi-domain geoscience evidence across the full spatial grid of each target jurisdiction. Each data source is not merely an input column — it is a structured representation of a geological process, evaluated through three lenses: what it measures geologically, how it enters the model statistically, and what capital consequence its inclusion or exclusion carries. Data coverage is heterogeneous, and the pattern of missing data encodes information about historical exploration priorities. Missingness is informative, not random.

Structural geology provides the irreplaceable backbone. Fault traces and dike swarms serve as the spatial foundation for the truth layer. Fault networks and shear zones are fluid conduits — channeling metamorphic fluids from depth into structural traps where gold precipitates. Dike swarms indicate magmatic-hydrothermal activity. Lithological contacts create competency contrasts that focus deformation. Fault density receives the strongest posterior coefficient of any feature. Removing structural features from the model costs the majority of frontier predictive power. No other evidence family approaches this level of necessity.

Potential-field geophysics — airborne magnetics and Bouguer gravity at 500 m resolution — senses subsurface architecture where the geological map is incomplete. Magnetic depletion zones indicate hydrothermal alteration that destroys magnetite, creating a detectable signature of mineralising systems even beneath glacial cover. Elevated gravity indicates dense mafic volcanic host rocks — the lithological environment of most orogenic gold deposits. Geophysical gradient features — the local variability captured by standard deviation — are among the strongest individual predictors.

Gamma-ray spectrometry measures surface expression of potassic alteration, uranium mobility, and lithological background. Where radiometric surveys exist, they preferentially cover geologically prospective terrain. Hard gating enforces exactly zero values for areas without radiometric coverage, preventing the model from hallucinating patterns where no data exists. On covered cells, radiometric features produce substantial uplift in average precision.

Quaternary surficial geology maps where geological signals can reach the surface and where thick glacial cover creates exploration blackout zones. It is fundamentally a detectability map — it encodes where bedrock geology is visible and where it is not. After conditioning on drill density, quaternary geology shows the strongest conditional signal of any evidence family. This feature enters the detection layer exclusively, informing the model where exploration could have succeeded.

Digital terrain captures the topographic expression of bedrock geology across the Shield. Flat terrain indicates Paleozoic platform cover or deeply weathered peneplain; rugged topography indicates exposed Archean bedrock. Terrain features fail univariate triage but are retained because removing them causes catastrophic convergence failure — they serve as a geometric regulariser for the posterior, stabilising the estimation process even though their direct predictive contribution is negligible.

Each evidence domain is subject to a seven-criteria anti-bias triage before admission to the model. Specific gravity measurements, though sparse at 1% coverage, validate the physical basis of gravity anomalies — confirming that elevated gravity genuinely reflects subsurface density contrasts rather than processing artefacts. Geochemical data from lake sediment, till, and lithogeochemistry samples are triaged out of the model but retained as interpretation context for the highest-priority target dossiers.

Part IV

Feature Governance

pp. 49–58

A Firewall Between Geology and Exploration History

Features are not arbitrary column extractions from source datasets. Each feature encodes a specific geological hypothesis. Fault density within 5 km encodes the structural halo around deposit-scale deformation zones. Negative magnetic coefficient indicates hydrothermal alteration that destroys magnetite — converting it to pyrite, pyrrhotite, and non-magnetic oxides. Positive gravity coefficient indicates dense mafic volcanic host rocks. Negative flat-terrain fraction indicates rugged Shield topography with bedrock exposure. Every feature has a geological interpretation that can be independently evaluated.

Every candidate predictor must survive a seven-criteria triage protocol before admission to the truth layer. The protocol tests signal strength via block cross-validation, permutation importance against a null baseline, stability of importance across spatial folds, transfer from accessible terrain to remote terrain, generalisation across geological regimes, independence from drill density after residualisation, and monotonic correlation with exploration effort. The triage is what prevents the model from learning “drill here because we already drill here.”

Only two evidence families — structural geology and geophysics — pass all seven criteria. Radiometric features fail triage but are admitted with hard gating: for cells without radiometric survey coverage, all radiometric values are forced to exactly zero, preventing the model from learning patterns where no data exists. Terrain features fail triage but are retained because removing them causes catastrophic convergence failure. This is a novel finding: feature selection based on predictive importance alone can destroy model estimation.

The decision firewall enforces that the truth predictor set and the detection predictor set have zero overlap. This is not a modelling convention. It is a programmatic assertion that halts execution if violated. The firewall is the single most important governance mechanism in the system. Detection layer coefficients remain stable across all model variants — with correlations between 0.88 and 0.92 across ten ablation tests — confirming that truth-layer changes do not contaminate the detection layer. The layers are genuinely separable.

Scaling is fitted on background cells only — cells that exclude known deposits and their proximity halos. This prevents the scaler from embedding drill-density bias into feature distributions. Post-scaling, a value of zero represents the geological background median — a geologically neutral baseline. All features are imputed to zero where missing, ensuring that absence of data is treated as absence of signal, not as a hidden predictor.

The model’s automatic feature selection objectively identifies the geological signals that carry information. Of 48 candidate truth-layer variables, 8 survive selection with signal-to-noise ratios above 2. Five marginal features carry weak but non-zero signal. The remaining 35 are shrunk toward zero by the regularised prior. The global shrinkage parameter indicates that roughly 16% of features should carry genuine signal — aligning precisely with the observed selection rate of 8 out of 48. Capital allocation decisions are grounded in the 8 validated signals, not in the abundance of available features.

Part V

Inference

pp. 59–70

Uncertainty as a First-Class Output

The problem’s characteristics demand Bayesian treatment. At typical base rates of mineralisation, point estimates are unreliable. A maximum likelihood estimator will produce coefficients that are numerically unstable and whose standard errors are unreliable. The posterior preserves the full uncertainty over each parameter and, by extension, over each cell’s probability. A cell with high probability and tight uncertainty warrants different treatment than a cell with high probability and wide uncertainty.

Gold deposits are spatially clustered — not because of statistical artefact, but because geological processes such as fault networks, volcanic sequences, and plutonic margins are spatially continuous. This violates the independence assumption required by frequentist methods. Hierarchical structure with regime-level partial pooling and block-stratified subsampling accounts for spatial dependence without requiring an explicit spatial covariance function, which would be computationally intractable at this grid resolution.

Each jurisdiction is partitioned into geological regimes. Hierarchical partial pooling borrows strength from data-rich regimes to inform data-sparse regimes, preventing both overfitting and overconfidence. The posterior automatically widens in regimes with fewer training examples, honestly reporting higher uncertainty where the evidence base is thinner.

Automatic feature selection is not manual — it is data-driven selection with principled uncertainty propagation. A regularised prior encodes the expectation that most features are noise while allowing a few to have large effects. The global shrinkage parameter controls how many features are expected to matter. Per-feature local scales determine which specific features carry signal. A slab parameter bounds the maximum coefficient magnitude. Together, these three quantities perform continuous model selection without discrete thresholds or arbitrary cutoffs.

Data coverage is heterogeneous across any jurisdiction. The pattern of missingness is not random — survey boundaries follow political, logistical, and institutional constraints that correlate with exploration history. Hard coverage gating and binary coverage flags in the detection layer prevent the model from treating absence of data as absence of signal.

Uncertainty directly governs position sizing. Posterior distributions propagate through every downstream computation — scoring, attribution, target delineation, capital allocation. Cells near known deposits with rich data have low uncertainty. Frontier cells with sparse data have high uncertainty. This is correct behaviour — the model honestly reports where it knows more and where it knows less. A discovery fund that deploys capital against cells with wide posterior intervals should size positions smaller. The uncertainty surface is not ancillary to the decision — it is integral to it.

Full-mode inference produces 8,000 posterior draws across four chains. Convergence diagnostics confirm reliable sampling: effective sample sizes exceed 1,600, convergence statistics indicate chain mixing, and zero divergences confirm that the sampler navigates the posterior geometry without pathology. The complete inference process is sealed at completion, fixing the posterior to a specific, auditable realisation.

Part VI

Validation

pp. 71–84

The Signal Survives Every Challenge

Concentration metrics alone do not prove geological signal. The model achieves strong concentration at low area coverage, capturing the vast majority of known deposits in the highest-scored cells. But the model could achieve high concentration by learning spatial autocorrelation patterns, predicting exploration effort, or overfitting to training data. The question is not whether the model concentrates deposits — it does — but whether the concentration reflects geology rather than artefact.

Four null hypothesis tests — each with a progressively more stringent null model — rule out alternative explanations. IID permutation destroys all feature-label structure and produces zero permutations out of 1,000 that beat the observed score. Spatial block permutation preserves within-block autocorrelation but destroys between-block relationships — again, zero permutations beat the observed. These two tests rule out chance and spatial clustering as explanations.

The effort-preserving test is the critical differentiator. The province is divided into 20 strata defined by exploration effort and survey coverage. Within each stratum, all cells have received the same level of exploration attention. If the model’s signal is merely a proxy for drilling patterns, deposit cells should rank no higher than the 50th percentile within their effort peers. The observed rank is the 67th percentile, with a statistical significance of p = 4.5 × 10⁻⁴⁰. The most conservative block-by-effort test — combining geographic and effort stratification — yields p = 6.4 × 10⁻²⁴. The signal survives every debiasing available.

Temporal holdout provides the strongest form of validation. The model is retrained on data available before a cutoff date, with all subsequent discoveries removed. It then scores the jurisdiction using only historical information. The coefficient structure is stable across temporal holdouts, confirming that the model learns the same geological relationships regardless of which deposits are included in training.

Spatial block holdout across geographic blocks yields strong mean lift with low inter-block dispersion. The system does not work only in well-known camps and fail in remote terrain — the edge is geographically uniform. Independent backtesting against separately maintained mineral resources databases confirms that the vast majority of priority deposits score in the highest percentiles.

Ablation analysis isolates each evidence family’s contribution by systematically removing feature groups and measuring the impact. Removing structural geology costs the majority of frontier predictive power — it is the irreplaceable backbone. Removing geophysics is the second-largest loss. Removing radiometrics can improve frontier performance where coverage is incomplete, because uncovered cells introduce noise. Removing terrain causes catastrophic convergence failure despite modest direct predictive loss — confirming its role as a geometric regulariser rather than a predictor.

Rank stability confirms that target lists are robust to sampling variation. Quick-mode and full-mode runs produce near-identical rankings, with correlation exceeding 0.99999 in frontier cells. The top 500 targets overlap at 98.8% between modes. This means that capital allocation decisions made on quick-mode results would not change materially with full-mode inference — the signal is structural, not a consequence of sampling noise.

Part VII

Capital Deployment

pp. 85–96

From Probability Surface to Exploration Programme

The continuous probability surface is converted to discrete exploration targets through a four-step process: binary thresholding at the 98th percentile, connected-component grouping of adjacent flagged cells, size filtering to remove noise clusters, and watershed splitting to separate large agglomerations into geologically coherent units. The result is a ranked inventory of targets covering a small fraction of the total jurisdiction.

Each target is classified by exploration maturity. Frontier opportunities represent untested ground with high geological probability and near-zero detection — the greenfield reconnaissance tier. Camp extensions sit adjacent to producing districts, leveraging existing infrastructure for infill and extension programmes. The remaining elevated-interest areas carry moderate-to-high geological probability with variable detection scores. Classification determines the appropriate exploration strategy and capital commitment for each target.

The sweet spot is the intersection of high geological probability and near-zero detection probability — cells where geological truth probability is high and detection probability is near-zero. These are the maximum-mispricing cells: locations where the model assigns high probability of mineralisation based on geological evidence, but where the exploration market has never tested the hypothesis. They are geologically indistinguishable from cells hosting known deposits. The difference is detection, not geology.

Every score decomposes exactly into geological drivers in logit space — the decomposition is not approximate. A geologist reviewing a target can evaluate whether the model’s reasoning is geologically plausible. “This target scores highly because of dense faulting and magnetic depletion” is a testable geological hypothesis, not a statistical abstraction. Attribution transforms the model from a black box into a geological reasoning tool.

Information richness determines delivery format. The highest-priority targets receive full geological dossiers including setting analysis, risk matrices, attribution breakdowns, uncertainty assessments, corroborating evidence review, and phased exploration recommendations. Lower-tier targets receive summary analyses or machine-readable metadata cards. Every dossier is generated deterministically from the sealed posterior and attribution data.

Phased exploration recommendations are calibrated to information state. Frontier opportunities receive reconnaissance programmes — till geochemistry on 500 m grids, ground magnetic surveys on 100 m lines — before any drilling commitment. Phase two introduces soil geochemistry and induced polarisation. Phase three, contingent on positive results, commits to diamond drilling. Camp extensions proceed directly to data compilation and infill drilling, leveraging existing infrastructure. Each phase produces a go/no-go decision point. The system recommends structured information-gathering, not speculative drilling.

GIS deliverables enable operational use: GeoPackage files containing cell scores, target polygons, and known occurrences; GeoTIFF rasters spanning probability surfaces, uncertainty metrics, and attribution layers; and static maps at publication resolution. The complete delivery is SHA256-sealed — a self-contained decision-support toolkit for jurisdiction-scale capital allocation.

Part VIII

Governance

pp. 97–104

Reproducibility as Competitive Infrastructure

Every deliverable file is cryptographically sealed at creation time with SHA256 hashing. Once an artefact is sealed, it cannot be modified without creating a new version — the hash proves no alteration has occurred. A capital allocator can verify that every number in every dossier traces back to auditable data, sealed artefacts, and validated inference. A sceptical counterparty can reproduce every number by running the sealed pipeline against the sealed data.

Twenty-four production quality gates with exact numeric thresholds govern the pipeline. Gates enforce convergence diagnostics, identifiability checks, detection quality bounds, scoring saturation flags, target overlap assertions, and delivery manifest verification. Each gate has a specific pass criterion, a documented rationale, and a defined failure consequence. Gates that fail halt execution — they do not produce warnings that can be ignored. The pipeline cannot produce outputs that violate its own quality standards.

The decision firewall is the single most important governance mechanism. It enforces that truth-layer and detection-layer feature sets have zero overlap — a hard programmatic constraint, not a modelling convention. Detection coefficients remain stable across ablation tests, confirming genuine layer separation. Without the firewall, detection-correlated features would enter the truth layer and the entire decomposition would collapse.

Version immutability means no silent updates. The model that generated a target dossier is the same model that will be referenced in any subsequent review. When new data becomes available, a new version is created — the previous version remains frozen and available for longitudinal comparison. Sealed versions constitute an audit trail that compounds with each iteration, building institutional credibility that a fresh model cannot match.

Anti-bias triage is not a one-time exercise — it is standing governance applied to every new feature family considered for inclusion. Every data source added since the initial build — terrain, radiometrics, specific gravity, quaternary geology — has been evaluated against the same seven-criteria protocol. Features that pass enter the truth layer. Features that fail either enter the detection layer if conditionally independent of geology, serve as interpretation context if informative but biased, or are blocked entirely.

A five-step reasoning protocol governs every non-trivial design decision: surface multiple approaches before committing, declare trade-offs explicitly, state material assumptions, signal uncertainty boundaries, and enumerate failure modes. This protocol — applied to data processing, feature engineering, prior specification, validation design, and delivery format — prevents silent errors, premature convergence, and unexamined assumptions. Governance in this system is not compliance overhead. It is competitive infrastructure.

Part IX

The Compounding Loop

pp. 105–108

A System That Improves with Use

Each exploration campaign generates data that sharpens the system. New drill observations enter the public database, improving the detection layer’s estimate of where exploration has been sufficient to find deposits. Sharper detection correction improves the truth layer’s estimate of where geological processes have concentrated gold. The improved probability surface identifies new targets. Capital is allocated against the updated targets. The cycle repeats — each iteration compounding the system’s informational advantage.

This compounding transforms exploration from a declining-returns activity — each marginal dollar in a known camp produces less new information — into an improving-returns system. Each dollar of frontier exploration simultaneously tests the geological hypothesis and improves the detection model. Capital deployed through the system generates returns not only through potential discovery but through information creation that benefits all future cycles.

The learning is asymmetric but always positive. Successful campaigns — new discoveries — update the truth layer with new positive labels, sharpening the geological signal. Unsuccessful campaigns — barren drill results — update the detection layer, resolving the ambiguity of “no deposit” from “unexplored” to “tested and found barren.” Both outcomes improve the model. Both outcomes create value for future cycles.

The data moat compounds over time. Years of engineering transform public geological datasets into a coherent, validated, bias-corrected inference system. The raw data is public; the transformation is not trivial and is measured in years, not months. The detection correction moat requires Bayesian spatial modelling expertise, identification of the detection bias problem at a formal level, infrastructure to run extended inference, governance to enforce truth/detection separation across the full pipeline, and a validation framework capable of testing whether the separation is genuine.

The governance moat grows linearly with operational history. Each sealed version adds to the audit trail. Sealed versions represent a credibility infrastructure that cannot be replicated by a new entrant regardless of technical capability. The capital speed moat enables jurisdiction-scale systematic coverage: the production pipeline generates targets, maps, and dossiers in minutes. Conventional approaches require weeks to months per project area.

The system is not a one-time analysis. It is feedback infrastructure designed to compound its informational advantage over successive exploration campaigns. The most durable competitive advantage is the feedback loop itself — the designed mechanism by which the system converts exploration activity into improved predictions, regardless of whether individual campaigns discover deposits.

Part X

Limitations

pp. 109–112

Where the System Can Break

The model operates at 1 km resolution. Narrow vein systems, breccia pipes, and sub-kilometre structural features are below detection limit. The system finds statistical pattern in regional-scale geological evidence, not causal mechanism at deposit scale. It narrows the search from provincial scale to camp scale; traditional methods narrow from camp scale to drill-target scale. In backtesting, a deep narrow-vein deposit ranks at the 94.8th percentile rather than the top 2% — a documented consequence of 1 km cell averaging that dilutes the high-grade core with barren wall rock.

Each deployment is calibrated to the dominant deposit type in the target jurisdiction. The features selected by the model encode the geological prerequisites of the target mineral system specifically. Different deposit types — porphyry copper-gold, volcanogenic massive sulphide, intrusion-related gold — have different geological signatures and require separate model calibration. Extrapolation across geological settings requires retraining with appropriate data.

Geological regime intercepts provide piecewise-constant baseline prospectivity across each jurisdiction. This captures broad variation but may miss strong local geological transitions — greenstone-to-gneissic boundaries, subprovince contacts, or abrupt structural changes that create prospectivity gradients within a single regime. In data-sparse regimes, hierarchical pooling widens the posterior appropriately, but if remote areas are geologically distinct in ways not captured by the shared feature set, uncertainty may be understated.

The posterior is fixed at the seal date. New discoveries, revised geological maps, and updated drilling records accumulate after sealing. Without periodic retraining, the model’s accuracy degrades relative to current knowledge. The truth layer changes slowly — fault maps are revised on decadal timescales — but the detection layer changes rapidly as hundreds of new drill holes are added annually. The compounding loop mitigates this through version progression, but each version represents a snapshot, not a real-time system.

Radiometric survey coverage varies by jurisdiction. On covered cells, radiometric features produce strong conditional signal. On uncovered cells, all radiometric values are forced to zero — the model cannot leverage surface alteration signatures in these areas. Expanding survey coverage in frontier areas would improve model performance, representing a quantifiable exploration investment with measurable expected return.

The conditional independence assumption — that geological truth and detection probability are independent given their respective feature sets — may be weakly violated for camp-extension targets where geological expression at surface attracts both mineralisation and exploration attention. Monitoring diagnostics remain within threshold, but the assumption deserves scrutiny for targets where both truth and detection probabilities are moderate. A credible institutional document must demonstrate awareness of its own boundaries. These limitations are documented not as disclaimers but as operational parameters that inform how the system’s outputs should be weighted in capital allocation decisions.

Proof of Edge

Four independent null hypothesis tests. The signal survives every one.

Concentration metrics alone do not prove geological signal. The model could achieve high concentration by learning spatial autocorrelation or predicting exploration patterns. Four progressively stringent tests rule out each alternative explanation.

N0IID Permutation

p = 0.001

The model beats pure chance. Necessary but insufficient — does not rule out spatial autocorrelation or effort correlation.

N1Spatial Block Permutation

p = 0.001

Scores permuted within 100 km blocks. The signal is not merely a consequence of gold deposits being spatially clustered.

N2Effort-Preserving Ranking

p = 4.5 × 10⁻⁴⁰

Among cells with identical exploration effort and survey coverage, deposit cells rank at the 67th percentile. The model finds geology beyond effort.

N3Block × Effort Ranking

p = 6.4 × 10⁻²⁴

The most conservative null. Even within geographic and effort peer groups, geology-scored cells outrank their neighbours. The signal survives every debiasing available.

Temporal Holdout

100% capture

Trained on pre-2005 data, all post-2005 discoveries fall within the top 5% of scored cells. Out-of-sample temporal evidence.

Spatial Block Holdout

18.0× mean lift

Across 57 geographic blocks with low inter-block dispersion. The edge is geographically uniform, not concentrated in known camps.

Independent Backtest

18 of 19 deposits

Verified against an independent mineral resources database. 18 of 19 priority deposits score in the top 2%.

Scale

The Domain

0

Geological variables screened

0

Validated signal features

0

Validation protocols

0

Independent inference layers

Structural Differentiation

Why this is architecturally different

vs. Heuristic GIS Targeting

Cannot quantify uncertainty. Cannot correct for detection bias. Cannot scale beyond dozens of prospects. Two geologists examining the same data reach different conclusions, and neither can demonstrate calibration.

vs. Black-Box Machine Learning

Gradient-boosted models fit the joint product of geology and detection directly. The strongest predictors are invariably related to exploration access, not geological merit. They confirm what is already known rather than identifying what is unknown.

vs. Consultant-Driven Exploration

Cannot systematically evaluate over a million cells. Assessment criteria vary across individuals. Uncertainty quantification is qualitative. The incremental cost of evaluating an additional prospect is thousands of dollars.

vs. Junior Exploration Capital Allocation

Capital follows stories, not probability surfaces. Staking rushes concentrate capital in narrow geographic areas. Herding behaviour eliminates informational edge. Market timing replaces geological analysis as the primary investment criterion.

Strata Discovery does not post-hoc correct for exploration bias. The truth/detection separation is the model architecture — embedded in the likelihood, the prior structure, and the feature assignment. Every score decomposes exactly into geological drivers. Every artifact is cryptographically sealed and independently verifiable. The system is designed to compound its informational advantage with each exploration campaign.

Institutional Reference Document

Request the Full Document

The complete institutional reference is available by request to qualified operators, institutional investors, and technical partners. Includes full validation evidence, attribution methodology, and target classification framework.

All inquiries treated as confidential. Response within 48 hours.

Geological mispricing is the alpha. Detection correction is the edge. Governance is the moat. Compounding is the strategy.