Classifying digital signals as jammed or healthy using a Random Forest Decision Tree Algorithm
The process of classifying an instance of a digital signal as healthy or jammed involves asking a series of questions about the features of the received signal. It turns out decision tree algorithms are great tools for automating this task. At the heart of decision tree algorithms lives the concept of minimizing uncertainty; if we had five instances of jammed signals and five instances of healthy signals, our uncertainty picking a jammed signal would be 50%. To achieve our goal, we have to decide what questions to ask and when to ask them.
To set up the mechanism to classify signals, we need to define a few terms.
- Decision Node: A node in which the dataset gets filtered/split.
- True or False format questions being asked
- Leaf node: Node containing a list of examples only, which are categorized correctly.
Now we have to come up with a mechanism that allows us to split the tree. We’ll call this mechanism “finding the Gini index.” index of 0 indicates the dataset has no uncertainty. In other words, only healthy or only jammed signals are present in the set. Refer to figure X for details of the Gini calculation. To set up the process above, we’ll tap in modules offered in the scikit-learn  library. A rich python package that provides the bare bones we’ll need for our random forest algorithm. The version I’ll be using in this study is Random Forest Classifier (RFC). The main feature in this category is that we’ll use multiple decision trees in each iteration and aggerate those results to minimize uncertainty.
If you wish to experiment with the dataset of the digital signals and the source code you can do so via: