Both models trade word-by-word generation for parallel denoising. Only one of them does it without losing intelligence in the trade.

Read the full story at Decrypt →