Machine learning and data science turned python from a niche scripting language into one of the most popular developer ecosystems. From numpy and pandas to scikit-learn and pytorch and tensorflow to name but a few, there are some amazing python open source frameworks out there. These projects have completely transformed what people can do with data. What used to be expensive, proprietary and arcane software is now one pip install away!
Hacktoberfest is a great excuse to get involved with python data science and learn what all the excitement is about.
What is hactoberfest??
Hacktoberfest is an annual global hackathon event celebrating open source software hosted by DigitalOcean in partnership with Github.
The event was created to raise awareness for the open source community
and encourage participation in open source projects. Participants are
challenged to contribute a specific amount of pull requests during the
month of October to public open source repositories on GitHub in order
to earn a limited-edition T-shirt and swag from the host and sponsors.
The catch is that these are sophisticated and mature frameworks, frequently using also optimized C/C++ code underneath the hood. But there is also the "long tail" of niche python libraries and tools that focus on some specific data science task and these might be an easier stepping stone for aspiring data scientists.
Two such libraries you can contribute to this hactoberfest are https://github.com/open-risk/transitionMatrix and concentrationMetrics. Here is a brief description of what they are about and how you can contribute:
transitionMatrix
transitionMatrix
transitionMatrix is a library for the statistical analysis and visualization of state transition phenomena. It can be used to analyze (produce a transition matrix) for any dataset that captures timestamped transitions in a discrete state space. You can use the library to:
- Estimate transition matrices from historical event data using a variety of estimators
- Manipulate transition matrices (generators, comparisons etc.)
- Visualize event data and transition matrices
- Provide standardized data sets for testing
- Model transitions using threshold processes
- Map credit ratings using mapping tables between popularly used rating systems
Use cases include credit rating transitions in finance, system state event logs etc.
concentrationMetrics
concentrationMetrics
concentrationMetrics is a python library for the computation of various concentration, diversification and inequality indices. You can use concentrationMetrics to
- access an exhaustive collection of such indexes and metrics
- perform file input/output in both json and csv formats
- compute indexes with confidence intervals via bootstraping
- visualize using matplotlib
How you can contribute
How you can contribute
Afterwards:
- fork the repos from the above links
- look at the code / documentation and / try the examples
- find bugs or other issues and raise issues
- think and work on possible extensions, better documentation or any other ideas that fit within the scope of each library
- eventually contribute via a pull request
- get a tree planted in your name, or the Hacktoberfest 2022 t-shirt :-)
Good luck, enjoy hactoberfest and hope to see you around the python metaverse!