C1) One-hot encoding is inefficient towards vectorizing a corpus of words because a one-hot encoded vector is sparse so most of the indices are zero, which isn’t helpful if we had a lot of words in the vocabulary.
C2)
The graph that plots the training loss vs validation loss shows that the validation loss fluctuates up and down but is significantly higher than the training loss that is decliniing the whole time as the epochs increase. The training loss decreases, but the validation loss increases is a strong indicator of overfitting.
The graph that plots the training accuracy vs validation accuracy shows that the training accuracy is continuously higher and continuously increases than the validation accuracy that flutates as the epochs increase. The training accuracy being a lot higher is a strong indicator that the model is overfit. The validation accuracy fluctuating by increasing, decreasing, and then increasing again is also another indicator of overfitting.
D1)
The training loss starts off higher than the validation loss then rapidly declines as the epochs increases. The validation loss is more stable and increases as epochs increase. From middle to end the validation loss is a lot higher. The validation loss increasing and training loss decreasing is a strong indicator of overfitting.
The training accuracy starts off a lot lower than the validation accuracy. The validation accuracy stays steady and the training accuracy has a sharp increases and then slowly increases towrds the end. In the end, the training accuracy is alot higher than the validation accuracy. The training accuracy being a lot higher than the validation accuracy is a strong indicstor that the model is overfit.
Stacking two or more LSTM layers to the model didn’t really make that much of a difference. As you can see, the graphs are very similar and don’t hold a significant difference. The small difference did techinically make the model worse as the test accuracy declined from 0.8577600121498108 to 0.8512399792671204. So, adding these layers did not positively impact the model.