Topics

Locally optimal choices may lead to suboptimal overall sequences. Greedy decoding is fast, but if we put aside efficiency for a minute, it might seem more reasonable to search for the most likely sequence, i.e. maximize:

and not the sequence of (greedily selected) most likely tokens.

Example

  • Input: hy het my met 'n tert geslaan
  • Target: he hit me with a pie
  • Decoding steps: he he hit he hit a (locally optimal, but potentially suboptimal)

Thus, greedy approach can lead to irreversible suboptimal choices, as it doesn’t consider future implications of current decisions.

|480

In above case, greedy will pick The nice woman since the probs are 0.5 x 0.4 = 0.2, instead of picking The dog has whose probs will be 0.4 x 0.9 = 0.36 (globally better).