14. ML – Algorithms

Linear Learner

  • Linear regression
  • Can handle both regression (numeric) predictions and classification predictions
  • Inputs
    • RecordIO-wrapped protobuf
      • Float32 data only!
    • CSV
      • First column assumed to be the label
    • File or Pipe mode both supported
  • Processes
    • Preprocessing
      • Training data must be normalized (so all features are weighted the same)
      • Input data should be shuffled
    • Training
      • Uses stochastic gradient descent
      • Choose an optimization algorithm (Adam, AdaGrad, SGD, etc)
      • Multiple models are optimized in parallel
      • Tune L1, L2 regularization
    • Validation
      • Most optimal model is selected
  • Hyperparameters
    • Balance_multiclass_weights
      • Gives each class equal importance in loss functions
    • Learning_rate, mini_batch_size
    • L1 : Regularization
    • Wd : Weight decay (L2 regularization)
    • target_precision
      • Use with binary_classifier_model_selection_criteria set to
        recall_at_target_precision
      • Holds precision at this value while maximizing recall
    • target_recall
      • Use with binary_classifier_model_selection_criteria set to
        precision_at_target_recall
      • Holds recall at this value while maximizing precision
  • Instance Types
    • multi-GPU models not suitable

XGBoost

  • eXtreme Gradient Boosting
    • Boosted group of decision trees
    • New trees made to correct the errors of previous trees
    • Uses gradient descent to minimize loss as new trees are added
  • Can be used for classification
  • And also for regression, using regression trees
  • Inputs
    • RecordIO-protobuf
    • CSV
    • libsvm
    • Parquet
  • Hyperparameters
  • Subsample
    • Prevents overfitting
  • Eta
    • Step size shrinkage, prevents overfitting
  • Gamma
    • Minimum loss reduction to create a partition; larger = more conservative
  • Alpha
    • L1 regularization term; larger = more conservative
  • Lambda
    • L2 regularization term; larger = more conservative
  • eval_metric
    • Optimize on AUC, error, rmse…
    • For example, if you care about false positives more than accuracy, you might use AUC here
  • scale_pos_weight
    • Adjusts balance of positive and negative weights
    • Helpful for unbalanced classes
    • Might set to sum(negative cases) / sum(positive cases)
  • max_depth
    • Max depth of the tree
    • Too high and you may overfit
  • Instance Types
    • Is memory-bound, not compute-bound
    • So, M5 is a good choice
    • XGBoost 1.2
      • single-instance GPU training is available
      • Must set tree_method hyperparameter to gpu_hist
    • XGBoost 1.5+: Distributed GPU training
      • Must set use_dask_gpu_training to true
      • Set distribution to fully_replicated in TrainingInput
      • Only works with csv or parquet input

Sequence to Sequence (Seq2Seq)

  • Input is a sequence of tokens, output is a sequence of tokens
  • Machine Translation
  • Text summarization
  • Speech to text
  • Implemented with RNN’s and CNN’s with attention


Random Forest