REALISE Lab | Concordia University

Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software

Pre-trained models (PTMs) are becoming a common component in open-source software (OSS) development, yet their roles, maintenance practices, and lifecycle challenges remain underexplored. This report presents a plan for an exploratory study to investigate how PTMs are utilized, maintained, and tested in OSS projects, focusing on models hosted on platforms like Hugging Face and PyTorch Hub. We plan to explore how PTMs are used in open-source software projects and their related maintenance practices, by mining software repositories that use PTMs, and analyze their code-base, historical data, and reported issues. This study aims to provide actionable insights into improving the use and sustainability of PTM in open-source projects and a step towards a foundation for advancing software engineering practices in the context of model dependencies.

Research Focus:

Profiling key characteristics of PTMs (licensing, domains, size, architecture) used in open-source projects.

Analyzing integration and usage patterns of PTMs, including roles in core functionality and loading strategies.

Investigating the lifecycle and maintenance of PTMs: longevity, evolution, and update frequency.

Assessing testing practices for PTM components: coverage analysis and test case evaluation.

Examining issue trackers to uncover common challenges and support needs for PTM usage.

Bias Mitigation in Machine Learning-Based Systems

Machine learning models are increasingly used in high-stakes decision making, from credit approvals to criminal justice; however, they often produce biased outcomes that can disproportionately impact marginalized groups. While many bias mitigation techniques have been proposed, there remains little practical guidance on selecting the appropriate methods, how to use them, understanding their limitations, and anticipating their trade-offs.

This project conducts a large-scale evaluation of ten state-of-the-art bias mitigation methods across diverse real-life datasets, models, and fairness metrics. By systematically analyzing the impact and robustness of these methods under real-world conditions, we aim to support practitioners in making informed and responsible choices when applying fairness interventions.

Research Focus:

How effective are bias mitigation techniques at reducing unfair outcomes across different datasets and demographic groups?

What is the impact of bias mitigation techniques on machine learning model performance, and how significant are the trade-offs when improving fairness?

Which bias mitigation techniques are most likely to produce favorable outcomes, where both fairness and performance improve?

How robust are bias mitigation techniques when the data distribution shifts (data drift) and when applied across model variants through fine-tuning?

Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories

Authors:

Matin Koohjani, Diego Elias Costa

Venue:

The Mining Software Repositories (MSR) conference

PDF

A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets

Authors:

Ahmad Abdellatif, Khaled Badran, Diego Elias Costa, Emad Shihab

Venue:

ESEM, 2024

PDF Replication package

BibTex

SE4AI: A Training Program Considering Technical, Social, and Professional Aspects of AI-based Software Systems

Authors:

Ahmad Abdellatif, Gita Ghiasi, Diego Elias Costa, Tanja Tajmel, Emad Shihab

Venue:

IEEE Software journal, Software Engineering Educating and Training, 2023

PDF

BibTex

An Empirical Study on Bugs Inside PyTorch: A Replication Study

Authors:

Sharon Ho, Vahid Majdinasab, Mohayeminul Islam, Diego Elias Costa, Emad Shihab, Foutse Khomh, Sarah Nadi, Muhammad Raza

Venue:

ICSME'23: International Conference on Software Maintenance and Evolution

PDF

BibTex

Empirical Analysis of Security Vulnerabilities in Python Packages

Authors:

Mahmoud Alfadel, Diego Elias Costa, Emad Shihab

Venue:

ICSME: IEEE International Conference on Software Analysis, Evolution and Reengineering

PDF Dataset

BibTex

Engineering Reliable AI Systems

Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software

Research Focus:

Contacts:

Status:

Bias Mitigation in Machine Learning-Based Systems

Research Focus:

Contacts:

Status:

Exploring the Lifecycle and Maintenance Practices of Pre-Trained Models in Open-Source Software Repositories

Authors:

Venue:

A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets

Authors:

Venue:

SE4AI: A Training Program Considering Technical, Social, and Professional Aspects of AI-based Software Systems

Authors:

Venue:

An Empirical Study on Bugs Inside PyTorch: A Replication Study

Authors:

Venue:

Empirical Analysis of Security Vulnerabilities in Python Packages

Authors:

Venue: