Hyperspectral Mineral Analysis by Topic Modelling (2022)

Business Context

In geometallurgy, knowing how an ore will behave in processing requires lab work that is too slow and expensive to run on everything. Hyperspectral imaging promises a cheap proxy, but only if the spectral evidence can be mapped reliably to the lab targets. This 2022 work was the first to show that the document/topic metaphor from text mining does exactly that on real mineral samples.

Strategic Value

To our knowledge the first formalisation of HSI mineral-sample characterisation as a probabilistic topic-modelling problem. The contribution over the prior hierarchical scheme (Egaña et al., Minerals 2020) was to make the clustering stage probabilistic and interpretable — each sample is a soft mixture of latent mineral topics inferred by LDA, rather than a hard cluster assignment. On the DB1 drill-core holdout, topic-routed regression with LDA Version 1 cut copper-recovery MAE from 4.568 (naive baseline) to 0.422 — roughly a 10x error reduction — and Version 3 was comparable. This is the seed idea that later scaled into the full LDA-HSI research platform; here it stays modest and period-accurate: three recipes, one backbone (LDA), small private datasets.

KPI	Baseline	Result	Impact
Copper-recovery error (DB1)	Naive per-spectrum MAE 4.568	LDA Version 1 MAE 0.422	~10x error reduction
Method	Hard clustering + regression	Probabilistic LDA topic routing	Soft, interpretable membership

KPI

Baseline

Result

Impact

Copper-recovery error (DB1)

Naive per-spectrum MAE 4.568

LDA Version 1 MAE 0.422

~10x error reduction

Method

Hard clustering + regression

Probabilistic LDA topic routing

Soft, interpretable membership

The Founding Idea (2022)

Presented as “Geometallurgical Estimation of Mineral Samples from Hyperspectral Images and Statistical Topic Modelling” at the 18th International Conference on Mineral Processing and Geometallurgy (Procemin Geomet 2022, Gecamin), from postdoctoral research at ALGES / AMTC, Universidad de Chile. The idea: treat a hyperspectral mineral sample as a document, its quantised spectral patterns as vocabulary, and let an LDA topic model infer a small set of latent mineral “topics” — then use that topic mixture to route a per-topic regression onto the lab targets.

Spectra as Documents

Three “wordification” recipes were compared (Table 2 of the paper): Version 1 — each wavelength band is a word, the document counts summed quantised intensities per band (reduced and interpretable); Version 2 — words are quantised intensity levels; Version 3 — joint per-spectrum band intensities. Reflectance was quantised to Q levels; topic count chosen by coherence score; engine gensim LDA with pyLDAvis for inspection.

The Result

On a 20% holdout of the DB1 drill-core set, topic-routed hierarchical regression with LDA Version 1 cut copper-recovery MAE from 4.568 (naive per-spectrum baseline) to 0.422 — an order-of-magnitude reduction — with Version 3 comparable (0.432) and Version 2 weaker (0.714). Molybdenum recovery improved similarly (18.6 → 2.2). On the smaller DB2 set (7 topics) estimation error dropped ~10–15% versus baselines. Version 1 — band-frequency — was the strong, interpretable recipe and survives as the canonical baseline in the modern platform.

Scope (Period-Accurate)

This entry stays faithful to the 2022 paper: three recipes, one backbone (LDA), a few small private mineral datasets (drill-core DB1/DB2 plus the early HIDSAG geological subsets) — no public benchmark scenes, no neural backbones, no design-space sweep. That breadth came later. The idea seeded here — spectra as documents, topics as structure — scaled into the LDA-HSI platform: 19 recipes, four backbones, six public scenes, and a live web app.

Hyperspectral Mineral Analysis by Topic Modelling (2022)

Business Context

Strategic Value

The Challenge

Our Approach

Key Performance Indicators

Architecture

The Founding Idea (2022)

Spectra as Documents

The Result

Scope (Period-Accurate)

Technology Stack