- These notes repository are available in google slides but you can donwload them as power-point or pdf. If you want to modify the file, just download locally and upload them on your Google Drive or wherever you prefer.
- This is the link to download all the notes.
Online contolled experiment | Bandit testing | Correlation does not imply causation! | How to determine causality
Hadhoop | Spark | Map Reduced
R2 trap for time series analysis |
Tree | Honest tree | Soft decision trees | Random forest | Mixture of experts | Bagging | Boosting | Stacking |Mmeta leaner | Blending | Gini impurity | Feature importance | OOB score | Discrete AdaBoost | real AdaBoost | Voting classifier
Permutation importance | Partial Dependence Plot = DDP | Individual Conditional Expectation = ICE | Accumulated Local Effects (ALE) | Counterfactual explanation | Explainability vs interpretability | Feature interaction (H-statistic) | Global surrogate | Local interpretable model-agnostic explanations = LIME | SHAP | SHAP = SHapley Additive exPlanations | Feature importance vs. sensitivity
Optimiser for deep learning | Convex vs. non-convex | Smooth vs. non-smooth | Noisy vs. non-noisy | Well vs. ill-conditioned | Quadratic vs. non-quadratic | Gradient descent | SGD = Stochastic Dradient Descent | Adam | AdamBelief | AdaGrad | AdaDelta | RSMProp | Batch gradient descent | Mini-batch gradient descent | LARS | LAMB | Weight updates | Online vs. offline learning | Learning rate strategies | Feature scaling | Feature normalisation | No-free-lunch principal | How to Choose an Algorithm? | Robust vs. reliability | Necessary and Sufficient Condition for minimum | Optimality conditions | Negative log-likelihood = cross entropy | Cost vs. loss function | Hinge loss, L1, L2 and Huber loss functions | Metric, scoring and loss function | Custom loss function | Continuation method | Multi-objective optimisation | Multi-point optimisation | Multi-constaint optimisation | Pareto front | Hupervolume indicator | Niching | Genetic algorithm | SOM = Self-Organizing Maps | SA = Simulated annealing | Momentum vs. pure gradient descent | Nesterov Momentum | Hessian matrix | Conjugate gradient | Quasi-Newton BFGS | L-BFGS | Gradient clipping | Complex step | Saddle points | Trust region | Maratos effect | Line search method | The Wolfe conditions | Weak minimum | Simplex, Complex and Subplex | SL-SQP | Hyperparameter optimisation | Random Search vs. grid search
CPU | GPU | TPU | Hyper-threading |
How to deal with the lack of labels | Crowdsourcing vs. curated crowds | Weak supervision | Active learning | Self-supervised learning | Curriculum learning | Human-in-the-loop | Acrive learning | Uncertainty, diversity, and random sampling
Collaborative Filtering | Matrix Factorisation | Bayesian Personalized Ranking | Calibrated Recoomendations | Explicit & Implicit user data | Factorisation Machines | Locality Sensitive Hashing (LSH) | Weighted Approximate Rank Pairwise Loss = WARP