General paradigm of machine learning
- Features selection
- Training vs test sets
- Cross validation
- Hyperparameters optimization
- Boot-strapping
- Data snooping bias
- Accuracy, confusion matrix, recall vs precision, F1-score, log-loss
Programming tutorial (available as pre-recorded session)
Setting up the problem with multiple linear regression as the learning model
- Exercise: Predicting 1-day SPY returns using simple technical indicators
Learning algorithms
- Exercise: Predicting SPY returns using various learning algorithms
Stepwise linear regression
Classification and regression trees (CART)
- Stopping criteria for tree growing
- Using the whole tree or selecting certain nodes for prediction?
- Reducing overfitting by cross-validation
- Increasing training sample size by bootstrapping/bagging
- Decreasing number of predictors: random subspace
- Random forest
- Learning from past errors: boosting
- Which technique gives most accurate predictions?
- Improving accuracy with weighted samples, priors, and hyperparameters optimization
Support vector machine (SVM)
- Predicting sign of returns
Neural networks (NN)
- Neural network as nonlinear function fitting
- What network architecture to pick?
- Drawback of using NN for financial predictions
An extended exercise on features selection
- Building a multifactor stock selection model using fundamental factors
- Techniques: multiple regression, stepwise regression, and CART
- What fundamental factors are most useful for predicting stock portfolio returns?