Booz Allen Commercial delivers advanced cyber defenses to the Fortune 500 and Global 2000. We are technical practitioners and cyber-focused management consultants with unparalleled experience – we know how cyber-attacks happen and how to defend against them.
Our strategy and technology consultants have empowered our international clients with the knowledge and experience they need to build their own local resources and capabilities.
In facing challenges of modernization, our Middle East and North Africa clients have complex requirements that benefit from our proven experience in guiding major programs and projects for governments and private-sector organizations. The services we offer in UAE, Qatar, Egypt, Turkey, Kuwait, Morocco, Jordan, and other regional countries build on our consulting legacy.
Our clients call upon us to work on their hardest problems—delivering effective health care, protecting warfighters and their families, keeping our national infrastructure secure, bringing into focus the traditional boundaries between consumer products and manufacturing as those boundaries blur.
Booz Allen was founded on the notion that we could help companies succeed by bringing them expert, candid advice and an outside perspective on their business. The analysis and perspective generated by that talent can be found in the case studies and thought leadership produced by our people.
Explore our featured teams and missions. Search openings and find out how you can support our meaningful missions.
Continue your mission with us. Get advice from our recruiting team, and browse our FAQs.
Seeking an internship or entry-level position? Learn about the impact you can make on our team.
Find out more about our application process, explore our benefits, and review our FAQs.
Learn more about Booz Allen's diverse culture and environment of inclusion that fosters respect and opportunity for all employees.
Our 26,300 engineers, scientists, software developers, technologists, and consultants live to solve problems that matter. We’re proud of the diversity throughout our organization, from our most junior ranks to our board of directors and leadership team.
Booz Allen takes pride in a culture that encourages and rewards the many dimensions of leadership—innovative thinking, active collaboration, and personal service. We’re particularly proud of the diversity of our Leadership Team and Board of Directors, among the most diverse in corporate America today.
February 20, 2015
Feature Selection is crucial to any model construction in data science. Focusing on the most important, relevant features will help any data scientist design a better model and accelerate outcomes.
So what exactly is a feature in data science analysis? Let’s start there. Frequently we refer to the set of values that describe a data point as features. In actual practice, these values can go by multiple names, attributes, values, and dimensions. However, they refer to the same thing: measurements that describe an instance.
Here’s an example.
To describe a single day of the year, what kinds of measurements could we gather?
Some values that you might encounter:
If we were interested in predicting the weather the next day, which of these measurements would be most valuable? Most likely, the temperature, air pressure, hours of sunlight, precipitation and humidity would yield the best prediction.
Data regarding pennies, jaywalkers, oil and economic jobs would likely not yield any useful information regarding the weather. However, it is always possible that there might be some form of relationship that exists mathematically, although it is not explainable. (This is referred to as a spurious relationship in statistics.)
If we dedicate resources in our learning algorithm to uninformative features such as these, we complicate our algorithm. And that ups the likelihood that we will overfit the model to our dataset and thus learn incorrect concepts. My example was simplistic. Now imagine there were 1,000 more features, like socks color of a random 1,000 people. This data won’t help predict the weather. But if it’s in the dataset, the algorithm would waste huge effort configuring and tuning parameters trying to learn the relationship between socks and weather the next day. There’s no value in that. If we only had a few parameters, we could make much more efficient use of the algorithm/model. So discovering a small set of powerful features is the goal.
There is another case where one feature carries the same information contained in another feature. In my example, a redundant feature may be the presence of precipitation and the amount of rainfall that day. The amount of precipitation is more detailed than a simple "yes/no" value. A model trained on this data will spend extra effort learning values that it already has information on.
Ideally we would like to have a set of features that have relevance to the model being constructed and carry unique information amongst each other.
Feature selection is critical in situations where the number of features greatly outnumbers the number of samples. This is referred to as The Curse Of Dimensionality, a rich topic for another time. For a quick overview:
The basics are that you need an increasingly larger number of samples to cover the instance space at the same density. A helpful interactive demo of this concept can be found here https://prpatil.shinyapps.io/cod_app/
Now that we understand what a feature is and why too many can be a bad thing, what can we do?
Two common approaches for feature selection are Filter Methods and Wrapper Methods. In both methods, features are evaluated to assess the quality of a model that could be constructed from this feature set. A filter method generally looks at features independently, evaluating the relevance of each particular feature. A filter method would score the features independently of how they perform on the model of interest. For example, a simple filter method that could be used is a t-test. This looks for significant differences between classes, looking only at one feature at a time. We could go through each feature one by one, testing for significance between the class distributions, keeping only the features that are significantly different.
A wrapper method evaluates the features in relation to their performance on the model. The set of features are used to construct the model and the performance of the set is scored. Feature sets that perform better are indicative of good feature sets. You can think of the feature sets as a team – and the team is evaluated for its overall quality as compared to another “team” of features. For the weather example, if we selected temperature, air pressure, and precipitation and constructed a predictive model using only these features, how well would it predict tomorrow’s weather? This result could be compared to a model constructed from jaywalkers and the price of oil. The feature set containing weather measurements would likely have a better model accuracy.
What makes a good feature set? Everything depends on what you want to accomplish with your model. If you ask different questions of your data, a feature could have different importance.
The next time you face a large set of features with your dataset, consider whether the presence of all of them will be beneficial. If not, try filtering or wrapping.
Let’s continue the conversation @paulyacci