Machine Learning Architectures
Last updated
Last updated
©2024 Total Materia AG. All Rights Reserved
From a large number of possible ML concepts and architectures, XGBoost [] has been selected from tree boosting ML methods, owing to its proven performance in solving similar problems, see [-] and references therein. XGBoost, which a scalable, distributed gradient-boosted decision tree algorithm, was first introduced in 2016 by Tianqi Chen and Carlos Guestrin. The structure of XGBoost includes multiple root nodes, internal nodes, leaf nodes, and branches (Figure 3a). Then, the internal nodes make subsequent decisions, the branch points point directly to the decision to be made, and the leaf nodes represent the prediction results of a single three. Finally, the results of all leaf-pointing nodes are combined to obtain the prediction results of the XGBoost model []. In the search for the best leaf node segmentation, XGBoost uses the basic exact greedy algorithm and the corresponding approximate algorithm to enumerate all the features to ensure accuracy [].
Many authors have shown that XGBoost is superior to other algorithms [] in handling tabular datasets, such as artificial neural networks (ANN) and support vector regression (SVR), however, in the last decade, deep learning counterparts, such as TabNet [], have emerged expanding the use of high-performance deep learning architectures from images and videos to tabular data. Its main features can be summarized as follows: (i) uses sparse instance-wise feature selection learned from data; (ii) constructs a sequential multi-step architecture, where each step contributes to a portion of the decision based on the selected features; (iii) improves the learning capacity via nonlinear processing of the selected features; and (iv) mimics ensembling via higher dimensions and more steps []. The TabNet structure has one encoder and one decoder (Figure 3b). The encoder consists of multiple feature transformers and multiple attentive transformers, which obtain the mask matrix of the current step according to the result of the previous step, and tries to make the mask matrix sparse and non-repetitive. Thus, the attentive transformer plays the function of feature selection. The feature transformer layer does the calculation, and processing of the features selected in the current step, and uses the output of the previous step to determine the importance of the data features. The output of each step is used for the final decision by accumulation [].