Efficient Neural Ranking using Forward Indexes


Jurek Leonhardt, Koustav Rudra, Megha Khosla, Abhijit Anand, Avishek Anand

L3S Research Center, Hannover, Germany

Retrieve-then-Rank Approach

Retrieve-then-Rank Approach

Sparse Retrieval: Lexical Indexing

Corpus

Document ID: D001

Life is a characteristic that distinguishes physical entities that have biological processes, such as signaling and self-sustaining processes, from those that do not, either because such functions have ceased (they have died) or because they never had such functions and are classified as inanimate. Various forms of life exist, such as plants, animals, fungi, protists, archaea, and bacteria. Biology is the science that studies life. [...]

Source: https://en.wikipedia.org/wiki/Life

Document

Term Freq.
life D001: 3
characteristic D001: 1
distinguishes D001: 1
... ...

Term frequencies

Inverted index

Retrieve-then-Rank Approach

Sparse Retrieval: Lexical Matching

🤔

What is the meaning of life?

Inverted index

Term matching (BM25)

Candidate documents

Retrieve-then-Rank Approach

Re-Ranking

Candidate documents

Re-ranking model

Re-ranking step

Retrieve-then-Rank Approach

Re-Ranking

Candidate documents

🤔

What is the meaning of life?

Re-ranking model

Re-ranking step

Retrieve-then-Rank Approach

Re-Ranking

Candidate documents

🤔

What is the meaning of life?

Re-ranking model

Re-ranking step

#1 D371

Score: 0.95

#2 D222

Score: 0.85

#3 D984

Score: 0.71

#4 D017

Score: 0.60

#5 D101

Score: 0.45

Final ranking

Dense Retrieval

An Alternative to the Retrieve-then-Rank Approach

Dense Retrieval

Semantic Matching

Documents and query are embedded in a common vector space.

Retrieval is performed as an (approximate) nearest neighbor search operation.

📄 📄 📄 📄 📄 📄 📄 📄 📄 📄
🤔

The vector space is typically 768-dimensional.

Dense Retrieval

Dual-Encoders

The dense (semantic) relevance score of a document $d$ w.r.t. a query $q$ is computed as

$\phi_D (q, d)$ $ = $ $\zeta (q)$ $ \cdot $ $\eta (d)$,

where $\zeta$ is the query encoder and $\eta$ is the document encoder.


The dot product of two vectors encodes their proximity in the vector space.

A dense index contains $\eta (d)$ pre-computed for all documents in the corpus.

Hybrid Retrieval

For a query $q$, retrieve $K^q_S$ using a sparse retriever and $K^q_D$ using a dense retriever.

Rank the documents by combining the scores.

Motivation

  1. Sparse retrieval alone is fast, but not competitive.
  2. Dense retrieval shows good nDCG, but recall suffers and it is slow.
  3. Re-ranking using dense retrieval models works well.
  4. Score interpolation further helps.

$\phi (q, d)$ $ = \alpha \ \cdot $ $ \phi_S (q, d)$ $+\ (1 - \alpha) \ \cdot $ $\phi_D(q, d)$

AP@1k R@1k nDCG@10
Sparse retrieval
BM25 0.331 0.697 0.519
Dense retrieval
TCT-ColBERT 0.279 0.576 0.612
ANCE 0.254 0.510 0.633
Re-ranking (BM25 retrieval)
TCT-ColBERT 0.370 0.697 0.685
ANCE 0.336 0.697 0.654
BERT 0.283 0.697 0.520
Interpolation (with BM25)
TCT-ColBERT 0.406 0.697 0.696
ANCE 0.387 0.697 0.673
BERT 0.365 0.697 0.612

Table: TREC-DL 2019 document ranking results.

Fast-Forward Indexes

Efficient Interpolation-based Re-Ranking on CPUs

Fast-Forward Indexes

$\phi (q, d) = \alpha \cdot \phi_S (q, d) + (1 - \alpha) \ \cdot $ $\phi_D(q, d)$

Our approach: Fast re-ranking with dual-encoders.


A Fast-Forward index is a look-up table for pre-computed document representations:

$\phi_D^{FF} (q, d)$ $= \zeta (q) \cdot \eta^{FF} (d) $

Fast-Forward Indexes

The look-up operation is much more efficient than actual re-ranking:


nDCG@20 0.632 0.655 0.655
Query latency [ms] 1191 1203 253
  • Re-ranking
  • Interpolation
  • Interpolation (Fast-Forward)

Figure: TCT-ColBERT on the TREC-DL 2019 document ranking task.


In the following, we introduce techniques to improve efficiency even further.

MaxP Indexes

Long documents are often chunked into passages and scored using a $\max$ operation:

$\phi_D^{FF} (q, d) = \max_{p_i \in d} (\zeta (q) \cdot \eta^{FF} (p_i))$


We propose sequential coalescing to compress maxP indexes.

Sequential Coalescing

Compressing a MaxP Index

$p_0$

$p_1$

$\mathcal{A}$

$p_2$

$\mathcal{A}$

$\hat{p}_1$

$p_3$

$\hat{p}_2$

$p_4$

$\hat{p}_3$

$p_5$

$p_6$

$\mathcal{A}$

$\hat{p}_4$

All passage representations of a document in the embedding space.

                  
                    def coalesce(P):
                      P_new = []
                      A = []
                      A_avg = None
                      first_iteration = True
                      for p in P:
                        if first_iteration:
                          first_iteration = False
                        elif distance(p, A_avg) >= delta:
                          P_new.append(A_avg)
                          A = []
                        A.append(p)
                        A_avg = np.mean(A, axis=0)
                      P_new.append(A_avg)
                      return P_new
                  
                

Sequential Coalescing

Effects on Index Size

  • Size reduction
  • nDCG
0 0.676
0.01 0.673
0.015 27% 0.672
0.02 46% 0.667
0.025 60% 0.661
0.03 70% 0.659
0.035 77% 0.657
0.04 80% 0.653
0.045 82% 0.657
0.05 83% 0.658
0.1 84% 0.659

Distance threshold $\delta$

Figure: TCT-ColBERT on TREC-DL 2019 document ranking with sequential coalescing.

Early Stopping

Many tasks require only a small number of documents (e.g. top-10).


Lexical (sparse) and semantic (dense) scores are usually correlated.

In many cases, it is not necessary to compute all semantic scores.


We propose an approach for early stopping.

Early Stopping

Retrieving the Top-3 Documents

We set $\alpha = 0.5$:

$\phi (q, d) = 0.5 \cdot \phi_S (q, d) + 0.5 \cdot \phi_D (q, d)$

Stop the computation once the highest possible score $\phi_{\text{max}} (q, d)$ is too low to make a difference.

$\phi_D$ is not normalized. We approximate its maximum as the highest observed score:

$\phi_{\text{max}} (q, \text{D224}) = 0.5 \cdot 0.73 + 0.5 \cdot 0.67 = 0.70$

$\phi_S (q, d)$ $\phi_D (q, d)$ $\phi (q, d)$ ↓
  • D123
  • D215
  • D300
  • 0.89
  • 0.85
  • 0.81
  • 0.61
  • 0.51
  • 0.67
  • 0.75
  • 0.68
  • 0.74
  • x
  • x
  • x
  • x
  • x
  • x
  • x
  • x
  • x
  • x
  • x
  • x

Early Stopping

Retrieving the Top-3 Documents

We set $\alpha = 0.5$:

$\phi (q, d) = 0.5 \cdot \phi_S (q, d) + 0.5 \cdot \phi_D (q, d)$

Stop the computation once the highest possible score $\phi_{\text{max}} (q, d)$ is too low to make a difference.

$\phi_D$ is not normalized. We approximate its maximum as the highest observed score:

$\phi_{\text{max}} (q, \text{D224}) = 0.5\ \cdot$ $0.73$ $ +\ 0.5\ \cdot$ $0.67$ $ \gt $ $0.68$

$\phi_S (q, d)$ $\phi_D (q, d)$ $\phi (q, d)$ ↓
  • D123
  • D300
  • D215
  • 0.89
  • 0.81
  • 0.85
  • 0.61
  • 0.67
  • 0.51
  • 0.75
  • 0.74
  • 0.68
  • D224
  • D105
  • D900
  • 0.73
  • 0.49
  • 0.42
  • 0.71
  • x
  • x
  • 0.72
  • x
  • x

Early Stopping

Retrieving the Top-3 Documents

We set $\alpha = 0.5$:

$\phi (q, d) = 0.5 \cdot \phi_S (q, d) + 0.5 \cdot \phi_D (q, d)$

Stop the computation once the highest possible score $\phi_{\text{max}} (q, d)$ is too low to make a difference.

$\phi_D$ is not normalized. We approximate its maximum as the highest observed score:

$\phi_{\text{max}} (q, \text{D105}) = 0.5\ \cdot$ $0.49$ $ +\ 0.5\ \cdot$ $0.71$ $ \lt $ $0.72$

$\phi_S (q, d)$ $\phi_D (q, d)$ $\phi (q, d)$ ↓
  • D123
  • D300
  • D224
  • 0.89
  • 0.81
  • 0.73
  • 0.61
  • 0.67
  • 0.71
  • 0.75
  • 0.74
  • 0.72
  • D215
  • D105
  • D900
  • 0.85
  • 0.49
  • 0.42
  • 0.51
  • x
  • x
  • 0.68
  • x
  • x

Early Stopping

Effects on Index Look-ups

  • Avg. number of index look-ups per query
100 4114
90 4023
80 3889
70 3736
60 3544
50 3302
40 3050
30 2665
20 2179
10 1319

Cut-off depth $k$

Figure: ANCE on TREC-DL 2019 passage ranking with early stopping.

Results

Results

TREC-DL Document Ranking Task

$k_S = 1000$ $k_S = 5000$
Latency [ms] AP@1k R@1k nDCG@20 AP@1k R@1k nDCG@20
Hybrid retrieval
BM25, TCT-ColBERT 582 0.463 0.809 0.615 0.469 0.852 0.621
BM25, ANCE 582 0.479 0.809 0.621 0.488 0.846 0.632
Re-ranking (BM25 retrieval)
TCT-ColBERT 1189 + 2 0.414 0.809 0.587 0.405 0.794 0.585
ANCE 1189 + 2 0.426 0.809 0.593 0.422 0.761 0.604
BERT 185 + 2 0.329 0.809 0.512 0.221 0.727 0.375
Interpolation (with BM25)
TCT-ColBERT 1189 + 14 0.469 0.809 0.621 0.478 0.838 0.626
---Fast-Forward 253 0.469 0.809 0.621 0.478 0.838 0.626
------coalesced 109 0.440 0.809 0.594 0.447 0.837 0.607
ANCE 1189 + 14 0.490 0.809 0.630 0.502 0.828 0.640
---Fast-Forward 253 0.490 0.809 0.630 0.502 0.828 0.640
------coalesced 121 0.471 0.809 0.622 0.479 0.823 0.629
BERT 185 + 14 0.460 0.809 0.602 0.459 0.839 0.601

Table: TREC-DL 2020 document ranking results. Latency is reported for $k_S = 5000$ on GPU and CPU.

Results

TREC-DL Passage Ranking Task

$k_S = 1000$ $k_S = 5000$
Latency [ms] AP@1k RR@10 AP@1k RR@10
Hybrid retrieval
BM25, TCT-ColBERT 307 0.434 0.894 0.454 0.902
BM25, ANCE 307 0.410 0.856 0.422 0.864
Re-ranking (BM25 retrieval)
TCT-ColBERT 186 + 2 0.426 0.827 0.439 0.842
ANCE 186 + 14 0.389 0.836 0.392 0.857
BERT 185 + 14 0.353 0.715 0.275 0.576
Interpolation (with BM25)
TCT-ColBERT 186 + 14 0.438 0.894 0.460 0.902
---Fast-Forward 114 0.438 0.894 0.460 0.902
------early stopping 72 - 0.894 - 0.902
ANCE 186 + 14 0.417 0.856 0.435 0.864
---Fast-Forward 114 0.417 0.856 0.435 0.864
------early stopping 52 - 0.856 - 0.864
BERT 185 + 2 0.378 0.809 0.392 0.532

Table: TREC-DL 2019 passage ranking results. Latency is reported for $k_S = 5000$ on GPU and CPU.

Questions?

leonhardt@l3s.de


pip install fast-forward-indexes


Icons created by juicy_fish (Flaticon).