Data Prediction (Modeling)/Overfitting (variance problem)
overfitting
* 低偏差 * 記住訓練集合上所有data的label * 低偏差的model在訓練集合上更加準確 * 高變異
from : 偏差和變異之權衡 (Bias-Variance Tradeoff) 2012| 逍遙文工作室
from: Statistics - Bias-variance trade-off (between overfitting and underfitting) [Gerardnico]
training and testing error curves as a function of training set size
- will potentially inform us about whether the model has a bias or variance problem and give clues about what to do about it.
If the model has a variance problem (overfitting)
* the training error curve will remain well below the testing error and may not plateau.
* If the training curve does not plateau, this suggests that collecting more data will improve model performance.
* To prevent overfitting and bring the curves closer to one another, one should
* increase the severity of regularization,
* reduce the number of features
* and/or use an algorithm that can only fit simpler hypothesis functions.
from: Overfitting, bias-variance and learning curves - rmartinshort