Fast-Forward Indexes

Efficient Neural Ranking using Forward Indexes

Jurek Leonhardt, Koustav Rudra, Megha Khosla, Abhijit Anand, Avishek Anand

L3S Research Center, Hannover, Germany

Retrieve-then-Rank Approach

Sparse Retrieval: Lexical Indexing

Corpus

➔

Document ID: D001

Life is a characteristic that distinguishes physical entities that have biological processes, such as signaling and self-sustaining processes, from those that do not, either because such functions have ceased (they have died) or because they never had such functions and are classified as inanimate. Various forms of life exist, such as plants, animals, fungi, protists, archaea, and bacteria. Biology is the science that studies life. [...]

Source: https://en.wikipedia.org/wiki/Life

Document

➔

Term	Freq.
life	D001: 3
characteristic	D001: 1
distinguishes	D001: 1
...	...

Term frequencies

➔

Inverted index

Retrieve-then-Rank Approach

Sparse Retrieval: Lexical Matching

🤔

What is the meaning of life?

↓

Inverted index

➔

Term matching (BM25)

Candidate documents

Retrieve-then-Rank Approach

Re-Ranking

Candidate documents

Re-ranking model

Re-ranking step

Retrieve-then-Rank Approach

Re-Ranking

Candidate documents

🤔

What is the meaning of life?

↓

Re-ranking model

Re-ranking step

Retrieve-then-Rank Approach

Re-Ranking

Candidate documents

🤔

What is the meaning of life?

↓

Re-ranking model

Re-ranking step

#1 D371

Score: 0.95

#2 D222

Score: 0.85

#3 D984

Score: 0.71

#4 D017

Score: 0.60

#5 D101

Score: 0.45

Final ranking

Dense Retrieval

An Alternative to the Retrieve-then-Rank Approach

Dense Retrieval

Semantic Matching

Documents and query are embedded in a common vector space.

Retrieval is performed as an (approximate) nearest neighbor search operation.

📄 📄 📄 📄 📄 📄 📄 📄 📄 📄

🤔

The vector space is typically 768-dimensional.

Dense Retrieval

Dual-Encoders

The dense (semantic) relevance score of a document $d$ w.r.t. a query $q$ is computed as

$\phi_D (q, d)$ $ = $ $\zeta (q)$ $ \cdot $ $\eta (d)$,

where $\zeta$ is the query encoder and $\eta$ is the document encoder.

The dot product of two vectors encodes their proximity in the vector space.

A dense index contains $\eta (d)$ pre-computed for all documents in the corpus.

Hybrid Retrieval

For a query $q$, retrieve $K^q_S$ using a sparse retriever and $K^q_D$ using a dense retriever.

Rank the documents by combining the scores.

Motivation

Sparse retrieval alone is fast, but not competitive.
Dense retrieval shows good nDCG, but recall suffers and it is slow.
Re-ranking using dense retrieval models works well.
Score interpolation further helps.

$\phi (q, d)$ $ = \alpha \ \cdot $ $ \phi_S (q, d)$ $+\ (1 - \alpha) \ \cdot $ $\phi_D(q, d)$

	AP@1k	R@1k	nDCG@10
Sparse retrieval
BM25	0.331	0.697	0.519
Dense retrieval
TCT-ColBERT	0.279	0.576	0.612
ANCE	0.254	0.510	0.633
Re-ranking (BM25 retrieval)
TCT-ColBERT	0.370	0.697	0.685
ANCE	0.336	0.697	0.654
BERT	0.283	0.697	0.520
Interpolation (with BM25)
TCT-ColBERT	0.406	0.697	0.696
ANCE	0.387	0.697	0.673
BERT	0.365	0.697	0.612

Table: TREC-DL 2019 document ranking results.

Fast-Forward Indexes

Efficient Interpolation-based Re-Ranking on CPUs

Fast-Forward Indexes

$\phi (q, d) = \alpha \cdot \phi_S (q, d) + (1 - \alpha) \ \cdot $ $\phi_D(q, d)$

Our approach: Fast re-ranking with dual-encoders.

A Fast-Forward index is a look-up table for pre-computed document representations:

$\phi_D^{FF} (q, d)$ $= \zeta (q) \cdot \eta^{FF} (d) $

Fast-Forward Indexes

The look-up operation is much more efficient than actual re-ranking:

nDCG@20	0.632	0.655	0.655
Query latency [ms]	1191	1203	253

Re-ranking
Interpolation
Interpolation (Fast-Forward)

Figure: TCT-ColBERT on the TREC-DL 2019 document ranking task.

In the following, we introduce techniques to improve efficiency even further.

MaxP Indexes

Long documents are often chunked into passages and scored using a $\max$ operation:

$\phi_D^{FF} (q, d) = \max_{p_i \in d} (\zeta (q) \cdot \eta^{FF} (p_i))$

We propose sequential coalescing to compress maxP indexes.

Sequential Coalescing

Compressing a MaxP Index

‌

$p_0$

‌

$p_1$

‌

$\mathcal{A}$

‌

$p_2$

‌

$\mathcal{A}$

‌

$\hat{p}_1$

‌

$p_3$

‌

$\hat{p}_2$

‌

$p_4$

‌

$\hat{p}_3$

‌

$p_5$

‌

$p_6$

‌

$\mathcal{A}$

‌

$\hat{p}_4$

All passage representations of a document in the embedding space.

                  
                    def coalesce(P):
                      P_new = []
                      A = []
                      A_avg = None
                      first_iteration = True
                      for p in P:
                        if first_iteration:
                          first_iteration = False
                        elif distance(p, A_avg) >= delta:
                          P_new.append(A_avg)
                          A = []
                        A.append(p)
                        A_avg = np.mean(A, axis=0)
                      P_new.append(A_avg)
                      return P_new

Sequential Coalescing

Effects on Index Size

Size reduction
nDCG

0		0.676
0.01		0.673
0.015	27%	0.672
0.02	46%	0.667
0.025	60%	0.661
0.03	70%	0.659
0.035	77%	0.657
0.04	80%	0.653
0.045	82%	0.657
0.05	83%	0.658
0.1	84%	0.659

Distance threshold $\delta$

Figure: TCT-ColBERT on TREC-DL 2019 document ranking with sequential coalescing.

Early Stopping

Many tasks require only a small number of documents (e.g. top-10).

Lexical (sparse) and semantic (dense) scores are usually correlated.

In many cases, it is not necessary to compute all semantic scores.

We propose an approach for early stopping.

Early Stopping

Retrieving the Top-3 Documents

We set $\alpha = 0.5$:

$\phi (q, d) = 0.5 \cdot \phi_S (q, d) + 0.5 \cdot \phi_D (q, d)$

	$\phi_S (q, d)$	$\phi_D (q, d)$	$\phi (q, d)$ ↓
D123 D215 D300	0.89 0.85 0.81	0.61 0.51 0.67	0.75 0.68 0.74
x x x	x x x	x x x	x x x

Early Stopping

Retrieving the Top-3 Documents

We set $\alpha = 0.5$:

$\phi (q, d) = 0.5 \cdot \phi_S (q, d) + 0.5 \cdot \phi_D (q, d)$

Stop the computation once the highest possible score $\phi_{\text{max}} (q, d)$ is too low to make a difference.

$\phi_D$ is not normalized. We approximate its maximum as the highest observed score:

$\phi_{\text{max}} (q, \text{D224}) = 0.5\ \cdot$ $0.73$ $ +\ 0.5\ \cdot$ $0.67$ $ \gt $ $0.68$

	$\phi_S (q, d)$	$\phi_D (q, d)$	$\phi (q, d)$ ↓
D123 D300 D215	0.89 0.81 0.85	0.61 0.67 0.51	0.75 0.74 0.68
D224 D105 D900	0.73 0.49 0.42	0.71 x x	0.72 x x

Early Stopping

Retrieving the Top-3 Documents

We set $\alpha = 0.5$:

$\phi (q, d) = 0.5 \cdot \phi_S (q, d) + 0.5 \cdot \phi_D (q, d)$

Stop the computation once the highest possible score $\phi_{\text{max}} (q, d)$ is too low to make a difference.

$\phi_D$ is not normalized. We approximate its maximum as the highest observed score:

$\phi_{\text{max}} (q, \text{D105}) = 0.5\ \cdot$ $0.49$ $ +\ 0.5\ \cdot$ $0.71$ $ \lt $ $0.72$

	$\phi_S (q, d)$	$\phi_D (q, d)$	$\phi (q, d)$ ↓
D123 D300 D224	0.89 0.81 0.73	0.61 0.67 0.71	0.75 0.74 0.72
D215 D105 D900	0.85 0.49 0.42	0.51 x x	0.68 x x

Early Stopping

Effects on Index Look-ups

Avg. number of index look-ups per query

100	4114
90	4023
80	3889
70	3736
60	3544
50	3302
40	3050
30	2665
20	2179
10	1319

Cut-off depth $k$

Figure: ANCE on TREC-DL 2019 passage ranking with early stopping.

Results

TREC-DL Document Ranking Task

		$k_S = 1000$			$k_S = 5000$
	Latency [ms]	AP@1k	R@1k	nDCG@20	AP@1k	R@1k	nDCG@20
Hybrid retrieval
BM25, TCT-ColBERT	582	0.463	0.809	0.615	0.469	0.852	0.621
BM25, ANCE	582	0.479	0.809	0.621	0.488	0.846	0.632
Re-ranking (BM25 retrieval)
TCT-ColBERT	1189 + 2	0.414	0.809	0.587	0.405	0.794	0.585
ANCE	1189 + 2	0.426	0.809	0.593	0.422	0.761	0.604
BERT	185 + 2	0.329	0.809	0.512	0.221	0.727	0.375
Interpolation (with BM25)
TCT-ColBERT	1189 + 14	0.469	0.809	0.621	0.478	0.838	0.626
---Fast-Forward	253	0.469	0.809	0.621	0.478	0.838	0.626
------coalesced	109	0.440	0.809	0.594	0.447	0.837	0.607
ANCE	1189 + 14	0.490	0.809	0.630	0.502	0.828	0.640
---Fast-Forward	253	0.490	0.809	0.630	0.502	0.828	0.640
------coalesced	121	0.471	0.809	0.622	0.479	0.823	0.629
BERT	185 + 14	0.460	0.809	0.602	0.459	0.839	0.601

Table: TREC-DL 2020 document ranking results. Latency is reported for $k_S = 5000$ on GPU and CPU.

Results

TREC-DL Passage Ranking Task

		$k_S = 1000$		$k_S = 5000$
	Latency [ms]	AP@1k	RR@10	AP@1k	RR@10
Hybrid retrieval
BM25, TCT-ColBERT	307	0.434	0.894	0.454	0.902
BM25, ANCE	307	0.410	0.856	0.422	0.864
Re-ranking (BM25 retrieval)
TCT-ColBERT	186 + 2	0.426	0.827	0.439	0.842
ANCE	186 + 14	0.389	0.836	0.392	0.857
BERT	185 + 14	0.353	0.715	0.275	0.576
Interpolation (with BM25)
TCT-ColBERT	186 + 14	0.438	0.894	0.460	0.902
---Fast-Forward	114	0.438	0.894	0.460	0.902
------early stopping	72	-	0.894	-	0.902
ANCE	186 + 14	0.417	0.856	0.435	0.864
---Fast-Forward	114	0.417	0.856	0.435	0.864
------early stopping	52	-	0.856	-	0.864
BERT	185 + 2	0.378	0.809	0.392	0.532

Table: TREC-DL 2019 passage ranking results. Latency is reported for $k_S = 5000$ on GPU and CPU.

Questions?

leonhardt@l3s.de

pip install fast-forward-indexes

Icons created by juicy_fish (Flaticon).