Client: FOSSA
Case Study: Vulnerability Data Enrichment
The FOSSA Package Observer web application was developed to simplify and centralize detailed information on software packages across various programming languages. Package data is fragmented and hard to wrangle. The project goal was to create a unified platform where developers, security teams, and open-source enthusiasts could easily access comprehensive package details, including version history, dependencies, and GitHub metrics.
Built using the Flask framework and hosted on PythonAnywhere, this web app combines API integrations and web scraping to provide an in-depth view of software packages across a variety of ecosystems.
Deliverables
- Flask Application: A custom-built web application hosted on PythonAnywhere.
- Data Aggregation and Caching: Integration with multiple APIs and web scraping to fetch and cache package information.
- GitHub Integration: Extraction and display of repository details.
- Responsive UI: A user-friendly interface built with templates for a seamless user experience.
Audience Profile
The primary targets for the FOSSA Package Aggregator are:
- Developers: Individuals seeking insights into the packages they use or plan to use in their projects.
- Security Teams: Professionals responsible for assessing software vulnerabilities and dependencies within their software supply chain.
- Open-Source Contributors: Contributors and maintainers looking to analyze and showcase the impact and usage of their projects.
Orbital Analysis
In designing the FOSSA Package Observer, the team adopted a modular approach that catered to the needs of developers and security analysts. The application supports a diverse range of programming languages and platforms, including JavaScript/TypeScript, Python, Go, Haskell, Ruby, Java, Kotlin, Scala, PHP, Rust, C#, and Perl.
The vulnerability data comes in the form of JSON files directly from FOSSA’s SaaS platform. Those JSON files get uploaded daily to the server via GitHub, and the Flask app creates the individual package pages and the server cache, and the sitemap — automatically — on the fly.
The aggregation logic behind the platform leverages a mix of APIs and web scraping (using Selenium) to ensure comprehensive data coverage. For platforms where API data is limited, web scraping techniques are employed to extract additional information directly from the repositories. This dual approach enables the application to offer a more complete dataset, including release dates, dependencies, and installation instructions.
The application’s caching mechanism was designed to optimize performance by reducing redundant API calls, ensuring a responsive user experience even when dealing with large datasets. Customizable environment variables in the .env
file allow for easy configuration and management of API keys, GitHub secrets, and caching parameters.
Future Trajectories
Currently, the FOSSA Package Aggregator is in its Minimum Viable Product (MVP) phase, serving as a robust foundation for future enhancements and a lot more vulnerability data. Planned updates include expanding the platform to include vulnerability data on many additional packages, implementing advanced filtering and search capabilities, and adding discovered licenses beyond the ones declared by the developer.
By providing a single, comprehensive source of package information, the FOSSA Package Aggregator aims to become an essential tool for developers and security teams, helping them make informed decisions about the software components they rely on.
Website design for FOSSA Package Observer by Bárbara Mercedes Muñoz Cruzado.