The Science of SciPy
- Brandon Sodenkamp
- Dec 26, 2018
- 3 min read
With the collective experience of the last decade or so, SciPy has become a fundamental library for data science in Python. A small group of three people began this project back in 2001, and one of them continued to maintain it up until about 2010. A large community has grown up around this project, and today there are about 120 contributors per release and 36 people with commit rights. One of these core developers is Ralf Gommers, with whom we were able to host a live webinar. He filled us in on where the project is headed and how it is helping to shape Python data science and scientific computing.
Building this project was not easy and required about 30-40 years of numerical algorithm development history. In addition to the technical elements, SciPy is built upon five different languages specifically; Python, Cython, C, C++, and Fortran. The rewards from all this effort have been as great as the effort contributed to building it, and nowhere is this more apparent than the current user base. Since its release, it has had over 2.6 million downloads on anaconda.org alone and has found use in finance, education, consulting, the energy sector, and many other industries.
Looking toward the future, there is still a lot of work to be done. Currently, the project provides many user-friendly and efficient numerical routines for numerical integration and optimization, but Ralf described his aspirations for the project. On the top of his wish list is improving the BLAS and LAPACK support. At the moment these elements work and allow SciPy to provide support for linear algebra, but there are opportunities to improve performance and stability. In addition to basic improvements, there are also opportunities to take advantage of LAPACK features which have yet to be exposed. With all of the interdependencies which have been built upon SciPy for other projects like NumPy, there is a sense of urgency to optimize the BLAS and LAPACK support.
To move this project forward will require a focused effort on the part of the community and new contributors. A new and ambitious plan has been developed to provide support for distributed arrays and GPU arrays. Currently, there is nothing like this built into SciPy, but with NumPy splitting its API from its execution engine, this will become possible. The current plans would enable some parts of SciPy to use distributed execution, but not all of it. One potential application could be using SciPy with machine learning through GPU support; it could potentially work in tandem with things like PyTorch or Tensorflow. This advancement would go a long way to improving user capabilities, but it will require a great deal of support to make this into a reality.
We enjoyed hearing from Ralf, and appreciate his efforts to help keep the SciPy project prospering. With a list of new features on the docket, it will certainly be interesting to see what direction the community helps this project to evolve. When asked about a 2.0 release, Ralf could not help but laugh a little and say, “We spent 16 or 17 years getting to 1.0, so we are not yet thinking about 2.0.” Through the continued support of community members new and old, and additional financial backers coming on board, the potential of SciPy is indicative of a bright open source future.
Yorumlar