Tuesday, May 26, 2015

Python for Data Scientists - scikit-learn


Introduction

In the previous posts we've covered the basics of data analysis. Now it's gloves off and here come the big guns - machine learning library called scikit-learn.

scikit-learn has become one of the most popular open source machine learning libraries for Python. It provides algorithms for machine learning tasks including classification, regression, dimensionality reduction, clustering and many more. It also provides modules for extracting features, processing data and evaluating models.

Installation

scikit-learn is dependent upon both NumPy and SciPy, of which we've talked. So make sure to upgrade both to latest version prior to installing the package, which is done, of course, using the python package manager.

pip install scikit-learn

Conclusion

scikit-learn covers a very broad spectrum of data science fields, each deserving a dedicated discussion. And this is exactly what we're going to do for the next couple of sessions, diving deeper into each sphere of data analysis and discovering how scikit-learn assists us in each field.

This article concludes the python for data scientists series and as of now we have enough knowledge to dive deeper into murky waters of data science.