Booz Allen Commercial delivers advanced cyber defenses to the Fortune 500 and Global 2000. We are technical practitioners and cyber-focused management consultants with unparalleled experience – we know how cyber-attacks happen and how to defend against them.
Our strategy and technology consultants have empowered our international clients with the knowledge and experience they need to build their own local resources and capabilities.
In facing challenges of modernization, our Middle East and North Africa clients have complex requirements that benefit from our proven experience in guiding major programs and projects for governments and private-sector organizations. The services we offer in UAE, Qatar, Egypt, Turkey, Kuwait, Morocco, Jordan, and other regional countries build on our consulting legacy.
Our clients call upon us to work on their hardest problems—delivering effective health care, protecting warfighters and their families, keeping our national infrastructure secure, bringing into focus the traditional boundaries between consumer products and manufacturing as those boundaries blur.
Booz Allen was founded on the notion that we could help companies succeed by bringing them expert, candid advice and an outside perspective on their business. The analysis and perspective generated by that talent can be found in the case studies and thought leadership produced by our people.
Learn more about Booz Allen's diverse culture and environment of inclusion that fosters respect and opportunity for all employees.
We've come a long way delivering innovative solutions. But our next chapter is still being written.
Our 22,600 engineers, scientists, software developers, technologists, and consultants live to solve problems that matter. We’re proud of the diversity throughout our organization, from our most junior ranks to our board of directors and leadership team.
Booz Allen takes pride in a culture that encourages and rewards the many dimensions of leadership—innovative thinking, active collaboration, and personal service. We’re particularly proud of the diversity of our Leadership Team and Board of Directors, among the most diverse in corporate America today.
February 19, 2015
Modern computing has no shortage of tools for the data scientist. The open source community alters the landscape every six to 12 months, and competition keeps you on the bleeding edge. In my career as a data scientist, I use everything from scientific Python™ packages to the newest cloud computing architectures—and sometimes all within the same project, as the initial stages of data exploration and mining are often done in a different language than the final product implementation. Here is a brief tour of my experiences with some essential tools for data science:
Python - Python has the richest collection of packages I have come across. When I see a new data set, I am inclined to tackle my problem by dissecting the data with one of the scientific libraries (SciPy, scikit-learn, etc.) and then visualizing the results (Matplotlib). Python is the easiest tool to use and has provided me with the most mileage for my data set investigations.
Java™ - Java is the backbone of cloud computing and a must-know for any data scientist who wants to build portable, production-quality products.
Scala - This language acts as a flavor of Java and is an easy, functional, programming language. Scala is the natural language for the latest distributed computing platform, Spark. As a high-level language, Scala allows the author to focus on what should be done with the data set and not on how to position the data to do it effectively. This is one of my favorite tools.
Hadoop® - It is the quintessential cloud environment. Hadoop provides long-term storage of data across a cluster of computers.
Storm - Originally developed at Twitter, this tool enables stream processing of data collected from live feeds. This is an easy tool for Java developers working on streaming analytics projects.
Spark™ - This replaced the old MapReduce paradigm by hiding the processing details behind the functional language Scala. It has the ability to work with streaming data and with large static data sets. This is the latest and greatest tool in distributed computing.
HBase - A sturdy Hadoop-based database system that is easy to use.
Kafka - This message passing system sits between the raw data source and the consuming process like Spark or Storm. Kafka prevents data loss between the source and the analytics engine in the streaming analytics process.
Mahout™ - A configurable cloud analytics toolkit that allows for advanced computing techniques through an API, allowing the data scientist to think about the data set instead of the code.
Elasticsearch - The latest search engine platform on top of which you can place a friendly Kibana user interface.
Lucene™ - This tool is the backbone to open source text processing—and the basis for text processing in many other tools, including Mahout, Solr™, and Elasticsearch. This is a must-have for text processing gurus.
I hope you will find these tools useful in examining the data sets in the National Data Science Bowl and in your everyday life as a data scientist. Good luck!