Day 6: Sequence-to-Sequence (Seq2Seq) Models

Welcome to Day 6. So far, our Recurrent architectures have been focused unilaterally on Classification. We feed an LSTM 200 words, and the final Output Layer generates Positive or Negative.

But what if we feed it 200 English words, and we want it to Output 200 French words?

Classification architectures completely fail here. The Input sequence and Output sequence are different lengths! "Good morning" (2 words) translates to "bon matin" (2 words). But "how are you" (3 words) translates to "comment ça va" (3 words).

To solve this, we engineer a Sequence-to-Sequence (Seq2Seq) architecture.

The Encoder / Decoder Paradigm

Seq2Seq abandons the traditional Neural Network structure. Instead, it literally deploys two entirely separate LSTMs!

The Encoder: It takes the English Input sentence. It reads "good night" word by word. It has no Output Layer! Its sole purpose is to accumulate the entire psychological meaning of the linguistic sequence and compress it into a massive mathematical coordinate (the final Hidden State). We call this coordinate the Context Vector.
The Decoder: It takes the Context Vector generated by the Encoder. It completely unzips the vector, reverse-engineering the French translation word by word!

Implementing Seq2Seq in PyTorch

Look at day6_ex.py. We initialize two standalone Classes in PyTorch!

# day6_ex.py

# 1. Look! No Output Layer! Just an Embedding into an LSTM!
class Encoder(nn.Module):
    def __init__(self, input_dim, embed_dim, hidden_dim, num_layers):
        super(Encoder, self).__init__()
        self.embedding = nn.Embedding(input_dim, embed_dim)
        self.lstm = nn.LSTM(embed_dim, hidden_dim, num_layers, batch_first=True)

    def forward(self, x):
        embedded = self.embedding(x)
        # We extract ONLY the final Context (Hidden) state!
        outputs, (hidden, cell) = self.lstm(embedded)
        return hidden, cell

The Decoder Unzipping

The Decoder is wildly different. We pass it the (hidden, cell) state from the Encoder, and we force it into an infinite generation loop!

It must predict 1 word. Then it takes its own prediction and feeds it back into itself to predict the next word! We use <SOS> (Start of Sentence) to trigger it, and <EOS> (End of Sentence) to force it to stop.

        # Extract English Matrix!
        hidden, cell = self.encoder(src)

        # In a Loop, Unzip the Matrix into French!
        for t in range(1, tgt_len):
            # The Decoder returns a French Word, AND an updated Hidden State!
            output, hidden, cell = self.decoder(input, hidden, cell)

Wrapping Up Day 6

Seq2Seq models are the absolute bedrock of Chatbots, Translators, and Text Summarizers. An Encoder mathematically swallows an English document. The Decoder regurgitates it as a Summary!

Tomorrow, on Day 7: Summary Capstone, we return to Keras to build the ultimate triple-architecture Sentiment Classifier utilizing SimpleRNN, LSTM, and GRU side-by-side to definitively prove which memory block reigns supreme!