Skip to main content

8 posts tagged with "paper"

View All Tags

· One min read

How can OSS maintainers better understand the communities that depend on their work? While thousands of projects build on top of open libraries, maintainers often have limited visibility into how their code is actually being used. This gap matters: testing practices may overlook the very parts of the library most critical to dependents.

We introduce analytics that surface which features are most used by dependent projects and how well those features are tested.

Our analysis finds that not all community-used APIs are fully reflected in maintainers’ test suites, pointing to gaps that could inform more targeted maintenence strategies. We observe that while maintainers provide extensive tests, unit test suites does not always extend to every API most relied upon by dependents.”

info

Interested? You can find a pre-print of our paper here.

· One min read

We are happy to announce that our paper "Beyond More Context: How Granularity and Order Drive Code Completion Quality" was accepted in the Context Competition Challenge Workshop, colocated with ASE 2025. This work was authored by Uswat Yusuf during her internship at RealiseLab last summer.

Context results

The competition challenged participants to develop strategies for gathering code context to maximize the performance of code completion models, based on a baseline provided by JetBrains. Our team achieved third place in the competition! We experimented with file chunking and chunk ordering on both Python and Kotlin source files, and found that chunk-level retrieval outperforms file-level retrieval.

  • Also, ordering is important: ordering the chunks in reverse order yielded measurable gains, supporting the recency bias.
info

Interested? You can find a pre-print of our paper here.

You can also find the paper that describes all approaches here

· One min read

What are the trade-offs of heavily relying on Free and Open-Source (FOSS) components to develop your own software system? How much faster are you able to ship your code to production versus what security risks you may expose your system to?

  • Inspired by the work of Massacci and Pashenko, who used the technical leverage to assess this trade-off in the Java ecosystem, we perform a large-scale analysis of opportunities and risks of technical leverage in the JavaScript ecosystem.
  • Our models indicate that heavily relying on FOSS shorten the release cycles of small libraries, but at a cost of significantly higher (4-7x) vulnerability exposure.
info

Interested? You can find a pre-print of our paper here.

· 4 min read

What is the paper about?

Performance regressions in software systems can lead to significant financial losses and degraded user satisfaction, making their early detection and mitigation critical.

One of the major issues encountered in both the academic and industrial landscapes is access to data from the industry, and this issue prevales in performance engineering as such datasets contain information about a company's internal systems. To address this gap, we introduce a unique dataset to support various research studies in performance engineering, anomaly detection, and machine learning.

This paper introduces a unique dataset of performance measurements and alerts from Mozilla, aimed at advancing research in performance engineering and anomaly detection.
Collected from Mozilla Firefox’s testing systems, this dataset contains:

  • 5,655 time series
  • 17,989 performance alerts
  • Mozilla Engineers-validated annotations spanning May 2023 – May 2024

· One min read

How does smaller and open-source Large Language Models compare to ChatGPT in refining code? Our study dives into code reviews, a cornerstone of modern software development. While code reviews are indispensable for ensuring quality and transferring knowledge, they can also become bottlenecks in large-scale projects.

  • Inspired by a recent papaer by Guo et al, we explore how open-source models like CodeLlama and Llama 2 (7B parameters) measure up against proprietary solutions like ChatGPT for automating code refinement tasks.
  • Our findings show that with proper tuning, these open-source models can offer an interesting balance between performance, cost-efficiency, and privacy.
  • This research not only opens doors for privacy-conscious and cost-effective solutions but also sheds light on where current AI models shine—and where they still need a human touch.
info

Interested? You can find a pre-print of our paper here. Our replication package is available here.

· One min read

The efficiency of a Pull Request (PR) process hinges on how quickly maintainers and contributors respond to each other. Knowing how long this might take can improve interactions and manage expectations.

Our new study introduces a machine-learning method to predict these response times by analyzing data from 20 popular open-source GitHub projects. We examine various features of the projects and PRs, and identified key factors that influence response times.

  • PRs submitted earlier in the week, with a moderate number of commits and clear descriptions, tend to get quicker responses.
  • Contributors who are more engaged and have a good track record also tend to respond faster.
  • We also highlight how understanding and predicting response times can enhance the PR review process.
info

Interested? You can find a pre-print of our paper here.

· One min read

Finding performance regressions usually require the execution of long and costly performance test suites. This is because performance tests often have to test the system end-to-end. Could we reduce the testing costs by testing locally (e.g., module, a service, method) and using a model to predict the impact of local changes no the system as a whole?

Our new paper proposes exactly this! The paper entitled "Early Detection of Performance Regressions by Bridging Local Performance Data and Architectural Models" has been accepted at 47th IEEE/ACM International Conference on Software Engineering (ICSE 2025).

We are currently finalizing the camera-ready version of the paper, and we will share the preprint soon. Stay tuned for more updates!

· One min read

Are you interested in training Chatbots for Software Engineering tasks? Our paper "A Transformer-based Approach for Augmenting Software Engineering Chatbots Datasets" has been accepted at 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM 2024).

  • In this paper, we propose an approach to augment Chatbot training datasets tailored for Sofdrware Engineering tasks.
info

Interested? You can find a pre-print of our paper here.