6 posts tagged with "paper"

Opportunities and Security Risks of Technical Leverage

March 28, 2025 · One min read

Diego Elias Costa

What are the trade-offs of heavily relying on Free and Open-Source (FOSS) components to develop your own software system? How much faster are you able to ship your code to production versus what security risks you may expose your system to?

Inspired by the work of Massacci and Pashenko, who used the technical leverage to assess this trade-off in the Java ecosystem, we perform a large-scale analysis of opportunities and risks of technical leverage in the JavaScript ecosystem.
Our models indicate that heavily relying on FOSS shorten the release cycles of small libraries, but at a cost of significantly higher (4-7x) vulnerability exposure.

info

Interested? You can find a pre-print of our paper here.

Our Paper 'A Dataset of Performance Measurements and Alerts from Mozilla (Data Artifact)' Accepted at ICPE'25 🎉

March 19, 2025 · 4 min read

Mohamed Bilel Besbes, Diego Elias Costa

What is the paper about?

Performance regressions in software systems can lead to significant financial losses and degraded user satisfaction, making their early detection and mitigation critical.

One of the major issues encountered in both the academic and industrial landscapes is access to data from the industry, and this issue prevales in performance engineering as such datasets contain information about a company's internal systems. To address this gap, we introduce a unique dataset to support various research studies in performance engineering, anomaly detection, and machine learning.

This paper introduces a unique dataset of performance measurements and alerts from Mozilla, aimed at advancing research in performance engineering and anomaly detection.
Collected from Mozilla Firefox’s testing systems, this dataset contains:

5,655 time series
17,989 performance alerts
Mozilla Engineers-validated annotations spanning May 2023 – May 2024

Exploring the Potential of Llama Models in Automated Code Refinement accepted at SANER 2025

December 5, 2024 · One min read

Genevieve Caumartin, Diego Elias Costa

How does smaller and open-source Large Language Models compare to ChatGPT in refining code? Our study dives into code reviews, a cornerstone of modern software development. While code reviews are indispensable for ensuring quality and transferring knowledge, they can also become bottlenecks in large-scale projects.

Inspired by a recent papaer by Guo et al, we explore how open-source models like CodeLlama and Llama 2 (7B parameters) measure up against proprietary solutions like ChatGPT for automating code refinement tasks.
Our findings show that with proper tuning, these open-source models can offer an interesting balance between performance, cost-efficiency, and privacy.
This research not only opens doors for privacy-conscious and cost-effective solutions but also sheds light on where current AI models shine—and where they still need a human touch.

info

Interested? You can find a pre-print of our paper here. Our replication package is available here.

Our Approach for Predicting Contributor's Response Latency was accepted at TSE.

August 3, 2024 · One min read

Diego Elias Costa

The efficiency of a Pull Request (PR) process hinges on how quickly maintainers and contributors respond to each other. Knowing how long this might take can improve interactions and manage expectations.

Our new study introduces a machine-learning method to predict these response times by analyzing data from 20 popular open-source GitHub projects. We examine various features of the projects and PRs, and identified key factors that influence response times.

PRs submitted earlier in the week, with a moderate number of commits and clear descriptions, tend to get quicker responses.
Contributors who are more engaged and have a good track record also tend to respond faster.
We also highlight how understanding and predicting response times can enhance the PR review process.

info

Interested? You can find a pre-print of our paper here.

Our Approach for Early Detection of Performance Regressions was accepted at ICSE 2025.

June 3, 2024 · One min read

Diego Elias Costa

Finding performance regressions usually require the execution of long and costly performance test suites. This is because performance tests often have to test the system end-to-end. Could we reduce the testing costs by testing locally (e.g., module, a service, method) and using a model to predict the impact of local changes no the system as a whole?

Our new paper proposes exactly this! The paper entitled "Early Detection of Performance Regressions by Bridging Local Performance Data and Architectural Models" has been accepted at 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025).

We are currently finalizing the camera-ready version of the paper, and we will share the preprint soon. Stay tuned for more updates!

Data Augmentation Approach for Chatbot Training accepted at ESEM 2024

May 23, 2024 · One min read

Diego Elias Costa

Are you interested in training Chatbots for Software Engineering tasks? Our paper "A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets" has been accepted at 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2024).

In this paper, we propose an approach to augment Chatbot training datasets tailored for Sofdrware Engineering tasks.

info

Interested? You can find a pre-print of our paper here.

What is the paper about?​

What is the paper about?