Thanks for following this link, I am going to walk you through my GSoC2024 experience and the project that I worked on under sktime organization.
Project Title
Scaling PyTorch backend and Darts integration
Overview
sktime is a Unified framework for machine learning with time series, that uses an interactive Python interpreter and supports various machine learning tasks including forecasting, time series classification, regression and clustering.
I spent the whole summer working on the mission of unifying machine learning with time series under the project entitled Scaling backends, foundation models, PyTorch, darts and pytorch-forecasting with sktime Where I mainly worked on integrating the darts Regression models, PyTorch implemented classification models, and deep diving into the structure of deep learning interface of sktime, debugging errors and suggesting improvements to the interface. Despite working on the project mentioned above my contributions went to regural software development lifecycle including the bugs fixing in the repository and improvement of development workflow such as implementing linting standards.
Key outcomes
- My community engagement made me learn a lot of new skills, and the team working spirit among the organization led to the successful completion of the project.
- Communicating with my mentors made me grow on a proffessional level and gave me a taste on how working on impactful project should feel.
- Solving complex bugs made me take enough time to deep dive into the codebase and gave me a quick understanding of the sktime code base.
- The whole program boosted personal confidence to the level that I wish to mentor open-source contributors in the future projects.
- Learned about a new machine Learning library(darts).
- Continued support for open-source through my contributions.
Technical outcomes of the project
Proposed project merged Pull requests
- Integration of the darts regression models adapter: Pull Request
- OSX system dependency installation There was main issue installing libraries like
xgboost
due tolibomp
installation and compiler flags settings.
In Progress Proposed project Pull requests
- Implementation of GRU-based time series classifiers: Pull Request
- Migration of MCNN classifier from
sktime-dl
tosktime
: Pull Request
Work Done
Darts Regression models adapter adapter
With sktime now one can quickly interface the darts library. As mentioned from the darts documentation it provides user-friendly forecasting and anomaly detection on time series through various models, and initially the implementation of the regression models adapter in sktime now gives the capability to interface all available regression models from darts that are: RegressionModel, LinearRegressionModel, and XGBModel, yet remaining models LightGBMModel, RandomForest and CatBoostModel needs an implementation classes as the adapter for these models is already implemented.
Challenges and takeaways
Getting this pull request merged took a lot of efforts as initially when implementing the adapter there were strange errors on the CI, and handling the dependencies inside
sktime
was problematic initially. After a thorough investigation the main issue was how sktime installsOSX
dependencies on CI and the pull request to resolve this issue which addresses the installation oflibomp
and setting the compiler flags properly made it possible to test this adapter on macos runners.There was another issue on how darts handles dataset indexes differently from sktime and resolving this took some efforts as I had to implement custom functions to handle such cases.
This task equiped me with the skill of diving into complex issues ranging from regular implementation to the testing environment.
The examples showcasing the use of darts models with sktime can be found in this notebook
Merged pull requests for maintainance work of the project
- Fix the colalign functionality to ScipyDist class as specified in the docstrings This pull request was aimed to correct the implementation as per the docstring of the
ScipyDist
class and it was a better way to get myself familiar with sktime codebase. - Fix the docs local build failure due to corrupt notebook Initially starting the implementation for darts regression models adapter, I faced an issue with building docs locally and I debugged the issue and found the solution to the problem. This Pull request indirectly contributed to the implementation of the darts regression models adapter as I was able to build docs locally before pushing changes for CI runs.
- Fix ForecastX when forecaster_X_exogeneous=“complement” This Pull request fixed the issue with columns complementation in the
ForecastX
class. - Fix HolidayFeatures crashes if dataframe doesn’t contain specified date Fixed the issue in crashing the forecaster when passed the dates that are out of the window, instead raising user warnings.
- Switch to ruff as linting tool Moving our linter to Ruff as it is fast.
- Resolve the issue with diacritics failing to be decoded on Windows This issue was problematic when writing diactritics in docstring, such as
Félix
and this PR enforced the character encoding to recognize these types of characters when doinggit diff
Work left to be done
- Finish GRU-based time series classifiers implementation
The Pull request is in the last stages to be merged, adding final touches to fix the CI errors being raised on macOS runners.
Having GRU classifiers will benefit by leveraging the power of GRU network architecture for univariate time series classifications.
- Migration of MCNN classifier from
sktime-dl
tosktime
This Pull Request was put on hold due to the dependency conflict between TensorFlow
and pandas
as mentioned in this Issue.
The above mentioned ongoing tasks includes to address the suggested refactor of the deep learning facing classes in sktime, and continous maintainance of impelemented modules to continue the support of open-source. This means that even though my GSoC period is coming to an end I will continue to contribute to the sktime project and interacting with the community.
What’s next
The first thing to do after the completion of the GSoC program, is to get the ongoing Pull Requests merged. As I am planning to stick around sktime I plan to start working on the rework of the Deep Learning submodule, by proposing the design document to have well organized roadmap for the deep learning interface of sktime.
I initially joined sktime with plan to deep dive into the research and methodology, I am going to work on that in the field of deep learning for time series classification.
Sktime is taking over the pytorch-forecasting
library as the main maintainer, my plan it to get involved in the maintainance activities of the library.
I plan to continue contributing to sktime in terms of code and community growth by participating in community activities and interacting with community members on discord.
Appreciation
First and foremost I thank GSoC organizers for the opportunity to enable open-source enthusiast to be involved in development of great freely available tools.
I would to express my sincere gratitude to my mentors Franz Kiraly, Anirban Ray, and Ugo Onyeka who made it possible to get the project done through day to day communication and continous feedback to my progress.
I would love to thank my fellow mentees whom we took this challenge together, they all have been helpful and we exchanged ideas throughout the program.
I thank all the sktime community as whole interacting with everyone on discord improved my way of communicating and asking questions without fear.