ENASE 2026 Abstracts


Area 1 - Challenges and Novel Approaches to Systems and Software Engineering (SSE)

Full Papers
Paper Nr: 45
Title:

Evaluating the Effectiveness of LLM Agents Generating API Implementations from Gherkin Scenarios: A Pilot Study

Authors:

Miłosz Mertka and Michał Śmiałek

Abstract: This pilot study aims to investigate whether Large Language Model (LLM) agents are capable of generating fully functional Application Programming Interface (API) implementations directly from requirements specifications written using the Gherkin syntax with embedded JSON strings. The study evaluates and compares the effectiveness of different LLMs in transforming behavioural scenarios into complete API implementations in Java programming language. Four modern commercial LLMs (GPT-5, Claude Sonnet 4.5, Gemini 2.5 Pro, and Grok Code Fast 1) were examined in two experiment variants: one in which agents were required to generate both the implementation and the test suite, and the second where a complete test suite was provided upfront by a human programmer. The API implementations produced were assessed using metrics that included Functional Correctness, Maintainability Index, Cyclomatic Complexity, and the QMOOD metrics set. Static code analysis was performed to identify quality and security issues in the generated systems. The results show that while simple scenarios may be successfully implemented by LLMs, more complex business logic poses serious difficulties for the evaluated agents. Additionally, multiple security vulnerabilities and code quality issues were identified across models. This article concludes that further research is necessary to investigate the robustness of LLMs across various problem domains and languages.
Download

Paper Nr: 51
Title:

Can Large Language Models (LLMs) Meaningfully Interpret Operational Technology (OT) Software Bills of Materials (SBOMs)?

Authors:

Padma Iyenghar

Abstract: As Software Bills of Materials (SBOMs) become mandatory artifacts for operational technology (OT) products under regulations such as the EU Cyber Resilience Act (CRA), cybersecurity engineers face the challenge of assessing SBOM correctness beyond syntactic validity. In practice, engineers must determine whether an SBOM plausibly reflects the functional architecture of an OT device and whether essential components are missing or misrepresented. In the presence of heterogeneous SBOM quality and limited contextual information, large language models (LLMs) are explored as rapid sense-making aids for SBOM interpretation. This paper empirically examines whether LLMs can support OT SBOM review as an engineering communication task. A curated dataset of 38 OT SBOMs is introduced, covering diverse industrial device archetypes and manually constructed and annotated by a senior OT cybersecurity engineer with expert-defined device types and component-level functional roles. Five state-of-the-art LLMs are evaluated on a constrained SBOM interpretation task enforcing device-specific architectural semantics. The results show that LLMs achieve high accuracy when assigning roles within the valid architectural scope, with in-scope accuracies of up to 89%. However, all models exhibit substantial out-of-scope and overly generic role assignments across several device classes, limiting reliability in unsupervised use. Overall, the findings indicate that LLMs can provide meaningful semantic support for OT SBOM review when appropriately constrained, while still requiring expert oversight.
Download

Paper Nr: 89
Title:

Towards Automated Migration: High-Quality, Cost-Efficient Source Code Translation Using Large Language Models

Authors:

Max Mager, Aleksandr Perevalov and Andreas Both

Abstract: Code migration between programming languages is a costly and complex task, often requiring significant manual effort to ensure functional correctness and maintainability. In this paper, we evaluate the effectiveness of approaches for automated source code translation across widely used languages (C++, C, Java, JavaScript, Python) as well as lesser-used languages (Haskell, Pascal) using Large Language Models (LLMs). Despite using four models in our experiments, our approach is LLM-agnostic and does not require any costly fine-tuning of the used LLMs or other LLM-specific optimizations. Instead, our approach incorporates multiple LLM-based translation techniques, in particular, an iterative refinement process guided by self-consistency checks. We analyze the quality of the generated code, identifying common translation errors, highlighting areas where models struggle, and evaluating cost efficiency. Our findings contribute to a generalized, high-quality, cost-optimizing approach for automated code migration, capable of increasing translation and serving as a foundation for automatic code migration. Hence, we present a promising path toward reducing the economic and technical barriers to code migration across the programming language spectrum.
Download

Paper Nr: 149
Title:

PromptMark: A Prompt-Guided Iterative-Feedback Framework for Source Code Watermarking

Authors:

Istiaq Ahmed Fahad, Mridha Md. Nafis Fuad and Kazi Sakib

Abstract: Watermarking has become a crucial technique for ensuring provenance and accountability in AI-generated source code. As large language models (LLMs) are increasingly integrated into development workflows, reliable attribution remains challenging. In practice, most developers rely on commercial LLM APIs operating under black-box constraints, making existing approaches that require access to the decoding process less feasible for real-world integration. To address this limitation, we propose PromptMark, a black-box, prompt-guided watermarking framework that embeds invisible yet statistically detectable signals into generated code via structured input instructions. The method steers models toward subtle identifier and comment naming patterns while preserving the functional correctness and structural integrity of the generated code. Detection is performed using statistical tests designed to remain reliable across varying code lengths and model outputs. The embedding is further refined through an iterative feedback loop, where prompts are updated based on watermark detection scores. Experiments on the MBPP and HumanEval benchmarks show that PromptMark consistently achieves strong watermark detectability while maintaining high code correctness, outperforming baseline approaches.
Download

Paper Nr: 154
Title:

Process-Oriented Security Compliance for Industrial IoT Systems: Formal Modeling and Integration

Authors:

Linda Maria Kölbel, Markus Hornsteiner and Stefan Schönig

Abstract: This paper introduces a formal modeling framework for process-oriented Industrial Internet of Things (IIoT) security management. It defines a formal syntax and lifecycle-oriented modeling rules to embed security controls directly into Business Process Model and Notation (BPMN) process models. The framework includes a specialized parsing architecture that performs automated error detection through syntax and semantic validation. The verified models are compiled into executable rule sets for policy enforcement in monitoring stacks. Developed using a rigorous Design Science Research (DSR) approach and evaluated through two real-life industrial use cases, the framework demonstrates its ability to transform security modeling from documentation to a verifiable, machine-executable mechanism.
Download

Paper Nr: 170
Title:

Fractured Awareness: Why Platform Privacy Systems Are Accepted More than They Are Understood

Authors:

Omar Haggag, John Grundy and Mohan Baruwal Chhetri

Abstract: Public debate about digital privacy is often framed around a familiar dilemma: while concern about surveillance and data misuse continues to grow, individuals nonetheless continue to participate extensively in platform ecosystems that rely on large-scale data extraction. This phenomenon is commonly referred to as the privacy paradox. However, attributing the paradox solely to individual behaviour neglects the structural conditions within which privacy decisions take place. This paper investigates how transparency mechanisms are implemented across major digital platforms and how these implementations shape the environments in which users must interpret and make decisions about the data-collection, use, and sharing practices that underpin platform privacy. We conduct a multi-layer analysis of three platform ecosystems: Meta (Facebook and Instagram), TikTok, and YouTube, examining the structural complexity of privacy policies and the architecture of consent interfaces used to communicate data practices. Our analysis shows that transparency mechanisms frequently produce a condition we describe as fractured awareness. Privacy disclosures are structurally complex and often distributed across multiple documents, while consent interfaces require substantial interaction effort to locate or modify privacy settings. Together, these characteristics impose high informational and interactional costs on users attempting to understand platform data practices. This study contributes to software engineering research by reframing privacy transparency as a socio-technical property of digital systems rather than solely a legal or behavioural issue. By analysing how regulatory disclosure requirements are translated into software artefacts such as policies and consent interfaces, the paper highlights how platforms can satisfy formal transparency obligations while still limiting meaningful user comprehension. These findings suggest that addressing privacy challenges requires treating transparency not only as a regulatory requirement but also as a design and engineering problem within large-scale digital platforms.
Download

Paper Nr: 181
Title:

From Regulatory Text to Executable Configuration: A Neuro-Symbolic Architecture for Automated Enterprise Software Systems

Authors:

Chandan Prakash, Pavan Kumar Chittimalli and Ravindra Naik

Abstract: Instantiating enterprise software systems from complex regulatory requirements remains a resource-intensive bottleneck in modern application development processes. Translating regulatory text into executable software configuration parameters traditionally relies on extensive manual requirements engineering workflows. This manual intervention delays system deployment and escalates operational costs. While unconstrained generative AI models offer rapid automation, they introduce severe reliability and traceability risks that are unacceptable for compliance-critical software. To address these fundamental software engineering challenges, we propose a forward-thinking neuro-symbolic architecture designed for reliable enterprise system automation. Our approach treats regulatory documents as primary requirements artifacts and systematically transforms them into deployable configurations. The architectural design deliberately integrates deterministic symbolic reasoning pipelines with constrained generative AI components. This integration optimizes key engineering trade-offs across correctness, traceability, reliability, scalability, and overall operational cost. We evaluate the proposed architecture against manual, rule-based, and fully generative baselines using real-world requirements from a production core banking deployment. Assessing software engineering impact metrics reveals that our neuro-symbolic system improves configuration correctness and completeness from 65% to 98%. Furthermore, the architecture reduces deployment turnaround time from approximately one month to a few hours. This significantly automates the configuration engineering effort and improves long-term maintainability without compromising strict auditability standards.
Download

Short Papers
Paper Nr: 31
Title:

A Method for Building Developers’ Human Error Awareness in Software Defect Prevention

Authors:

Jackson Seal, Camren Adkins, Owen Wright and Fuqun Huang

Abstract: Root cause analysis is a key strategy for preventing software defects. Human error represents a major category of defect causes; therefore, mitigating human error in software development is fundamental to effective defect prevention. Preventing human error largely involves supporting and regulating developers’ cognitive processes, among which situational awareness plays a critical role. This paper proposes a novel approach for building developers’ situational awareness of human errors across diverse programming contexts. The approach trains programmers to identify conditions in their current programming situations that correspond to general Error-Prone Scenarios known to trigger specific Human Error Modes-recurring patterns of human errors. To support the application of the approach, we developed a graphical prototype system. We conducted an empirical study to evaluate the proposed approach and to collect feedback on the prototype system. Fifty participants took part in the study and performed human error root cause analysis on 506 defects drawn from their real-world programming projects. The results indicate that the approach is effective in promoting programmers’ situational awareness in error-prone scenarios and that participants perceive the approach as beneficial for long-term defect prevention.
Download

Paper Nr: 34
Title:

Handling Toolchain Evolution with Modular Meta-Languages: An Approach for Structured LLM-Based Artifact Generation

Authors:

Louis Burk, Alexander Fischer, Uwe Wienkop, Ramin Tavakoli Kolagari, Christoph Scharnagl and Alexandra Arzberger

Abstract: Engineering toolchains evolve rapidly, introducing schema and interface drift that breaks automatically generated artifacts even when intent remains stable. Large Language Models (LLMs) can assist artifact authoring from natural-language descriptions, but reliable use requires explicit structural contracts aligned with downstream tools. Prior work addressed this through Meta-Language-defined Structure Instructions (MLDSI), which encode admissible types, attributes, relations, and constraints directly in prompts. While structurally effective, early MLDSI specifications are monolithic and brittle under incremental evolution. This paper introduces a compact M2-level meta model for Modular MLDSI (MMLDSI). The model defines versioned modules, profiles, interfaces, validators, and explicit provides/requires dependencies; formalizes deterministic composition via refine > extend > base; and integrates module-scoped validation through a central Gatekeeper. Cross-domain instantiations in virtual reality scene generation and ISO 21434-aligned automotive security modeling show that the same M2 vocabulary captures heterogeneous domains while localizing change. The contribution is a concise, machine-checkable foundation for modular prompt contracts in evolving LLM-assisted engineering workflows.
Download

Paper Nr: 37
Title:

Grammar-Prompted Synthesis of Verification Properties from Natural Language Requirements for Multiple Model Checkers

Authors:

Vladimir Estivill-Castro and René Hexel

Abstract: We propose to directly synthesise formal verification formulas for multiple model checkers from naturallanguage requirements. Our approach utilises arrangements of logic-labelled finite-state machines (LLFSMs) to construct executable behaviour models. We can then prepare both the model (as a Kripke structure) and associated verification properties as input for each model checker, generating code in several programming languages as well, ensuring identical execution traces across all generated artefacts. We introduce a grammar-prompted Large Language Model (LLM) approach to obtain the Structured English Grammar (SEG) formula for the requirements, complementing the verification of these executable models without semantic gaps, using precisely the same traces in programming languages as well as model checkers. Our tools translate the full set of patterns of SEG formulas automatically to the specific syntax of five different model checkers, sparing developers from the steep learning curve of mathematical formalisms for model checking, including the differences in syntax these model checkers require, even for the same temporal logic formalism, such as Linear Temporal Logic (LTL) or Computation Tree Logic (CTL). This work significantly reduces barriers to the adoption of formal methods, by enabling developers to work with familiar finite-state machine notation and natural language requirements while attaining formally verified properties and taking advantage of the particular, individual strengths of multiple model checkers.
Download

Paper Nr: 55
Title:

Collaborative Code Modernization with Local LLM Deployment and Evidence-Based Prompt Guidelines

Authors:

Ada Slupczynski, Michal Slupczynski, Leila Mangonaux, Ilija Kovacevic, Stefan Decker and Horst Lichter

Abstract: Modernizing legacy software remains a critical challenge in software engineering, often hindered by high complexity and rigorous data privacy constraints. While Large Language Models (LLMs) offer potential for automated code analysis and transformation, their industrial adoption is impeded by security risks of cloudbased inference and a lack of domain-specific prompting methodologies. This paper addresses how to maximize LLM effectiveness in modernization contexts through two contributions: (1) a reusable architecture for secure, local LLM deployment with collaborative prompt curation, and (2) Prompt Design Guidelines (PDG), an evidence-based collection of strategies for constructing effective modernization prompts. The architecture is instantiated as Modernizer, an open-source Visual Studio Code extension. Evaluation demonstrates that structured prompting reduces syntax errors in LLM-generated code by up to 97%, while the Modernizer instantiation achieves excellent usability (SUS=82.78) among practitioners
Download

Paper Nr: 87
Title:

Modeling and Reasoning with NFRs Using GenAI: From Informal Descriptions to Semi-Formal SIG Models

Authors:

Ahmad AlShomar, Sam Supakkul, Tom Hill and Lawrence Chung

Abstract: Non-functional requirements (NFRs), such as security and usability, are crucial to system success; yet stakeholders often describe them informally in natural language. However, this informality makes it difficult to detect deficiencies, reason about trade-offs, and model NFRs correctly using Softgoal Interdependency Graphs (SIGs). This paper presents the ReGenAI+ framework for transforming informal NFR descriptions into semi-formal SIG models. ReGenAI+ includes essential SIG concepts and relationships, semantic reasoning rules, and processes for identifying modeling gaps such as missing decompositions, unlinked softgoals, or conflicting contribution links. These gaps are repaired using retrieval-augmented generation (RAG), which grounds the GenAI/LLM output in external NFR knowledge sources such as FISMA, GDPR, and the NFR Framework. The semantic reasoning rules further validate the resulting SIG models. We validated ReGenAI+ using NFR statements extracted from PURE, A Dataset of Public Requirements Documents. The experiment results show that the framework can detect deficiencies and semantic inconsistencies, thereby identifying the most significant modeling problems.
Download

Paper Nr: 107
Title:

From Standards to Practice: A Position on Challenges in Operationalizing Software Quality-in-Use

Authors:

Yvette D. Hastings and Ann Marie Reinhold

Abstract: Software quality-in-use (QIU) provides a stakeholder centered framework for evaluating whether a software product achieves beneficial outcomes within its specified context of use. Formalized in ISO/IEC 25019, QIU extends beyond traditional usability testing by incorporating broader attributes related to stakeholder beneficialness, acceptability, and freedom from risk. Despite this broader concept, many empirical studies continue to operationalize QIU through usability testing. This narrowing to usability testing limits the comprehensiveness of QIU evaluations. Our position is that shortcomings in recent operationalizations of QIU exhibit weaknesses in construct clarity, methodological rigor, and reporting practices. We present examples of QIU studies that remain usability centric and identify critical methodological gaps that undermine their ability to capture the multifaceted dimensions of QIU. To address these gaps, we propose three recommendations: (1) clarifying QIU constructs and context of use, (2) improving methodological rigor, and (3) strengthening reporting practices. Finally, we outline our ongoing and future work to improve QIU evaluations to better reflect its stakeholder and context centered foundations.
Download

Paper Nr: 109
Title:

Security and Privacy Governance in SSI-Based Agri-Food Supply Chains

Authors:

Jihene Khoualdi, Ilhem Abdelhedi Abdelmoula and Hella Kaffel Ben Ayed

Abstract: Agri-food supply chains (AFSCs) are increasingly digitalized to improve food safety, quality assurance, and regulatory compliance. However, this transformation introduces significant security, privacy, and governance challenges. Blockchain-based traceability solutions ensure data integrity and provenance but offer limited support for identity management, fine-grained authorization, and privacy governance. This paper addresses these limitations by proposing a governance-oriented framework for SSI-enabled AFSCs that integrates decentralized identity, verifiable credentials, and policy-driven authorization. The framework defines actors, credential types, access control rules, revocation mechanisms, and privacy-preserving practices. A layered security enforcement architecture centered on an SSI-aware gateway is introduced. The approach is illustrated through BPMN-based process models with verification and revocation checkpoints. A proof-of-concept implementation in a honey supply chain scenario demonstrates the feasibility of credential-based authorization and revocation-aware access control. Overall, this work advances SSI-based supply chain systems by embedding enforceable governance, security, and privacy mechanisms, supporting the development of trustworthy and scalable digital agri-food ecosystems.
Download

Paper Nr: 111
Title:

LLM-Based Generation of Structured Requirements and UML Models from Requirements Elicitation Notes

Authors:

Diogo Cardante, António Miguel Rosado da Cruz and Estrela Ferreira Cruz

Abstract: Within the software development process, the modeling and design phase is both crucial and critical due to its central role in transforming requirements into a clear and consistent technical solution. Software models act as a bridge between the problem domain (requirements) and the solution domain (implementation). This phase, however, can be complex and labor-intensive, since requirements are typically defined in natural language and may be ambiguous, vague, or even inconsistent, making design decisions more difficult. Recent advances in artificial intelligence, particularly in natural language processing and Large Language Models (LLMs), have driven significant progress in requirements engineering. In this paper, we leverage LLMs and a network of specialized agents linked together to generate software models based on a set of requirements defined in Natural Language (NL). The evaluation of the proposed approach is conducted using a dataset consisting of ten NL requirements documents from different software domains.
Download

Paper Nr: 138
Title:

A Pilot Study on Detecting Software Design Patterns with Large Language Models: An Empirical Evaluation

Authors:

Oishik Chowdhury, Bastin Tony Roy Savarimuthu and Sherlock A. Licorish

Abstract: Design patterns provide reusable solutions to recurring software design problems. Automatically detecting these patterns in source code can help bootstrap new developers’ understanding of unfamiliar software system architectures, and can help experienced developers to quickly identify and rectify potential quality issues. While many prior research works have explored graph based and machine-learning based detection techniques, this work evaluates the design pattern recognition capabilities of four Large Language Models and two ensemble approaches consisting three out of the four models. We also compare their performance when prompted with a) Source code, b) PlantUML representation of source code, and c) Text-based descriptions of the source code. We investigate the detection of five design patterns: singleton, adapter, bridge, composite and decorator. Our preliminary results indicate that LLMs show promise for automatically detecting design patterns, with NextCoder and Gemma 3 demonstrating comparatively higher accuracy than other models evaluated, and the ensemble approaches enhancing the overall efficiency. We identify several directions for future work.
Download

Paper Nr: 144
Title:

A Security Framework and Hybrid Access Control Model for a DEMO-Based Low-Code Platform

Authors:

Eduardo Silveiro, Vítor Freitas, David Aveiro and João Seco

Abstract: Low-Code platforms allow organizations to build applications rapidly but this acceleration often creates significant security gaps. Broken access control remains the primary risk in these environments because development simplicity frequently leads to poor permission management. Citizen developers often lack the technical background required to configure complex rules, which results in unintended data exposure. This paper argues that security must be an explicit part of the platform core rather than a secondary configuration. We present a centralized, hybrid access control framework for DISME (Dynamic Information System Modeller and Executor), a DEMO-based Low-Code Platform that makes security a structural property of the system model. By integrating the Casbin authorization engine directly into the execution layer, the framework ensures that security checks are mandatory and independent of application logic. A key contribution of this work is the automated generation of access policies during design time. By parsing external entity references within visual action rules, the platform suggests validated permissions that are formally grounded by an extended EBNF grammar. This approach, supported by dynamic actors and visual governance tools like permission inheritance graphs, ensures that organizations can maintain data integrity and regulatory compliance without sacrificing the agility of low-code development.
Download

Paper Nr: 153
Title:

Adaptive Fine-Tuning for Efficient Long-Context Learning in Large Language Models

Authors:

Sadia Tabassum, Mussammat Maimuna Faria and Md. Nurul Ahad Tawhid

Abstract: In recent times, Large Language Models (LLMs) have been used for high-stakes tasks that demand high long-context reasoning capabilities, including legal synthesis, multi-paper scientific analysis, and code repository generation. However, we argue that the ”one-size-fits-all” approach to model architecture is suboptimal because the architectural components that make LMs stable with long code repository sequences can unnecessarily introduce noise or computational overhead when working with shorter sequences that are denser. In this paper, we propose LoRA-Prime, a framework that uses context-adaptive, parameter-efficient fine-tuning with Position-Content Fusion and rank-stabilized LoRA to mitigate representational decay in long sequences and to demonstrate that context modulation improves quality and speed during model training. In our experiments with GPT-2, Qwen2-0.5B, and TinyLLaMA domains, we observe that LoRA is best used when the base model is weaker and requires architectural compensation, long consumer sequences with perplexity explosion problems, and memory-constrained consumer hardware.
Download

Paper Nr: 158
Title:

TrustTranslate: A Multi-Dimensional Framework for Trustworthy Emotion-Aware Machine Translation with Hallucination Detection, Bias Mitigation, and Green AI

Authors:

Nour El Houda Ben Chaabene, Laid Kahloul, Hamza Hammami and Mohamed Khalgui

Abstract: Neural machine translation (NMT) achieves high fluency but loses emotional nuance, hallucinates content, encodes cultural biases, and lacks interpretability. We present TrustTranslate, a unified neuro-symbolic framework extending TransExplain-LD with five modules: a linguistic quality analyzer, an idiom preservation engine, a hallucination detection and correction pipeline, a multi-axis bias auditor, and a carbon-aware Green AI monitor-jointly addressing emotion preservation, hallucination robustness, fairness, explainability, and efficiency. Evaluations yield 36.2 BLEU, 0.868 COMET, 0.78 EPI, 94.3% hallucination F1, 87.4% bias reduction, 72.8% idiom preservation, and GAS 2.14, demonstrating that trustworthy, emotion-aware translation is achievable without sacrificing performance.
Download

Paper Nr: 162
Title:

N-SIM: Enhancing Program Similarity Analysis with Over-Basic-Block Semantic Comparison

Authors:

Zihao Wang, Qinkun Bao, Jinquan Zhang and Dinghao Wu

Abstract: Binary similarity comparison underpins many security applications, including malware detection, plagiarism detection, and vulnerability patch identification. Most existing semantic approaches are block-centric: they compute equivalence on individual basic blocks, which yields inaccurate results under inter-block optimizations and obfuscations, relies on expensive SMT solving, and incurs redundant computation. We propose an N-gram based binary similarity analysis that models the comparison unit as N-gram basic block sequences extracted from execution traces via a sliding window, and we accelerate formula equivalence checking with a three-round sample-testing method and result caching. Our prototype, N-SIM, achieves Precision@1 up to 1.0 on DiffUtils and 0.911 on OpenSSL across compiler optimizations, outperforming baselines such as BinDiff and Asm2Vec, while providing a 50–100× speedup over theorem-prover-based comparison. N-SIM also identifies six real-world SSL/TLS vulnerabilities—including Heartbleed and POODLE—in both standalone and Nginx-deployed OpenSSL binaries.
Download

Paper Nr: 182
Title:

An Industrial Evaluation of Automated CI/CD Pipeline Generation Framework

Authors:

Uldis Karlovs-Karlovskis, Oksana Nikiforova, Oscar Pastor and Ngoc Bao Tram Tran

Abstract: Modern software development relies on CI/CD pipelines that are commonly implemented as handwritten configuration code. In multi-product environments, this leads to repeated engineering effort, configuration drift, and inconsistent quality gates. This paper presents an industrial evaluation of an architecture-centric approach to CI/CD pipeline generation, extending prior prototype and proof-of-concept work to a real-world setting. The evaluation is conducted as a case study within the ZenIS Chatbot ecosystem, covering one shared build pipeline and nine deployment pipelines. It examines the model-to-code transformation chain, the impact on efficiency of recurring pipeline engineering tasks, and runtime correctness based on pipeline execution logs. Empirical data include engineering effort estimates for three maintenance scenarios, execution logs, and a Developer Experience (DevEx) survey (n = 18) comparing software architecture models and GitLab CI/CD pipeline definitions. The results indicate reduced maintenance effort, improved consistency of pipeline structure and quality gates, and improved cross-role communication.
Download

Paper Nr: 184
Title:

Teaching Testing Seriously in Academia

Authors:

Tanja E. J. Vos, Bart Th. Knaack, Beatriz Marín, Niels Doorn and Nikè van Vugt-Hage

Abstract: As systems grow more complex and incorporate AI, testing becomes more critical. Yet testing education in academia remains misaligned with both professional practice and the empirical nature of testing. Current curricula predominantly adopt a rationalist paradigm, emphasizing prescriptive methods and confirmation of expected outcomes. This limits students’ ability to reason critically under uncertainty. In this position paper, we argue that testing should instead be taught as an empirical, inquiry-driven professional skill. We propose an instructional design based on the Four-Component Instructional Design (4C/ID) model to support whole-task learning. We introduce P4TEST, a pedagogical framework that makes explicit the core competencies, epistemic moves, and habits of mind involved in testing, while avoiding prescriptive processes. The paper outlines how P4TEST can guide curriculum design, scaffolding, and assessment in software testing education.
Download

Paper Nr: 66
Title:

Generative Artificial Intelligence in Supporting Requirements Engineering: An Exploratory Study with Industry Professionals

Authors:

Josué Viana Ferreira, Sandro Ronaldo Bezerra Oliveira and Carlos dos Santos Portela

Abstract: Requirements Engineering (RE) is central to software development, ensuring systems meet stakeholders’ needs. Recent advances in Generative Artificial Intelligence (GenAI) have introduced new possibilities to support RE. Despite the growing adoption of Large Language Model (LLM), there is limited understanding of how these tools effectively support different RE phases, especially regarding Nonfunctional Requirements (NFRs). This exploratory study investigates how software engineers perceive the use of GenAI tools in RE, analyzing benefits, limitations, and opportunities for improvement. The research builds on RE foundations and Software Engineering Body of Knowledge (SWEBOK) guidelines, while engaging with recent studies on LLM applications to requirements elicitation, classification, and analysis. A qualitative exploratory approach was applied through semi-structured interviews with fifteen software engineering professionals experienced in both RE and GenAI tools. Findings show that Functional Requirements (FRs) benefit most from GenAI, especially in specification and elicitation. NFRs such as performance, usability, and security receive some support but remain harder to address. The study highlights GenAI’s potential to reduce manual effort and improve efficiency in RE while identifying gaps for future development. By proposing integration and personalization strategies, it contributes to advancing AIsupported practices in software engineering and Information Systems.
Download

Paper Nr: 94
Title:

Verifying Interoperability in Evolving IoT Systems

Authors:

Hongming Zhang, Judy Bowen, Jessica Turner and Jemma König

Abstract: As IoT devices age they must be replaced to maintain reliability and performance. During upgrades, ensuring interoperability among new and existing devices is critical for preserving designated system behaviours. Existing formal verification approaches rely on system documentation or source code to build formal models or extract flat models from system logs. We propose a novel log-driven verification framework that automatically discovers executable Hierarchical Colored Petri Nets (HCPNs) from raw IoT system logs. The framework integrates model checking to verify cross-layer interoperability during IoT system evolution and device replacement. We demonstrate the effectiveness of our approach using a street lighting system study.
Download

Paper Nr: 132
Title:

An Automated Evaluation of LLM Generated Code Maintainability Based on Multi-Level Measures and Comparative Analysis

Authors:

Rahma Becha, Asma Sellami, Nadia Bouassida and Ali Idri

Abstract: Chatbots and Gen AI tools are increasingly deployed across domains and have gained substantial attention within software organizations owing to their ability to deliver rapid responses and reduce development time. They can generate codes and assist developers with programming tasks. However, assessing the maintainability of the generated code remains challenging, especially in the presence of scope creep, which leads the development team to spend a lot of time testing and verifying the results. Therefore, it is paramount to propose an automated evaluation approach that able to predict the maintainability of the LLM-generated code. The proposed evaluation approach is based on quantifying the internal code size using the COSMIC ISO 19761 standard and its extension. A set of 17 functional processes (FP) in chatbot AI systems at the application and API layers were evaluated by generating their code in React and Python using four LLM chatbots (ChatGPT, DeepSeek, Gemini, and Claude). The FP were ranked by size and compared with the rankings obtained from the maintainability index (ISO 5055). The results showed a strong correlation between rankings, with Gemini generating maintainable code for most FP. This study could help practitioners decide which of the generated codes is the most maintainable based on the COSMIC extended approach without hindering their productivity.
Download

Paper Nr: 159
Title:

Learning Loops in the Age of AI

Authors:

Jan Bosch and Helena Holmström Olsson

Abstract: Software-intensive systems companies face mounting pressures to accelerate time-to-market. This drives adoption of DevOps, frequent deployments and AI-enabled analytics. However, while traditional feedback loops channel usage data, logs and interactions into development cycles, this doesn’t necessarily translate into learning. Although companies use DevOps, they still view product development as building products with a fixed scope. To address this, we conceptualize the notion of learning loops and how companies move towards continuous improvement of product performance. In our view, ’learning loops’ distinguish themselves by translating data from products into actionable improvements executed by humans, by traditional ML, by Agentic AI or by a combination of these. This paper synthesizes longitudinal case study research and interviews, revealing that mechanisms like continuous integration and deployment, A/B testing, federated and reinforcement learning are unified instances of post-deployment learning loops. The contribution of this paper is two-fold. First, we provide empirical examples reflecting how R&D teams and systems learn and improve performance over time. Second, we present a conceptual model in which we detail the concept of learning loops that can be executed by humans, by traditional ML, by Agentic AI or by a combination of these.
Download

Area 2 - Systems and Software Engineering (SSE) for Emerging Domains

Short Papers
Paper Nr: 35
Title:

Design and Deployment of a RAG System for Navigating Heterogeneous Legal and Technical Documents

Authors:

Marco Claps, Giovanni Simonini and Giorgio Zucchi

Abstract: This work presents a Retrieval-Augmented Generation (RAG) system deployed in an industrial environment to support navigation of heterogeneous legal and technical documents in the energy and technical services (ETS) domain. Modern regulatory frameworks in ETS are becoming increasingly complex due to rapidly evolving norms at regional, national, and European levels. This complexity, combined with the diversity of technical and legal documentation, creates bottlenecks in operational decision-making and compliance verification. Researching information is a very important but also very time-consuming task. RAG systems mitigate these challenges by pairing large language models (LLMs) with retrieval mechanisms that ground generated answers in authoritative and legal documents. In our system, documents are continuously ingested, cleaned, embedded, and indexed; during query time, relevant portions of text are retrieved and used to produce accurate, context-aware, and auditable responses. We evaluated the system on a corpus of more than 70 real ETS documents written in Italian. Compared to keyword-based baselines, our approach improved Recall@k and reduced query resolution time. The user study involved 25 industry professionals, who evaluated the proposed system and workflow. Participants reported high perceived usefulness, correctness, and usability.. We also report methodological insights-including query augmentation and cross-encoder re-ranking-that generalize across regulated domains.

Paper Nr: 40
Title:

Contrastive Learning in Lesion Detection for Mammography Screening Programs Explained with XAI

Authors:

Liviu-Mihai Iacob, Anca Marginean and Adrian Groza

Abstract: This paper aims to increase the understanding of the self-supervised contrastive learning, exploring beyond the standard performance metrics. The context of the problem is given by the mammography images acquired during screening programs. Even though the classic training method yielded somewhat better metrics on the validation dataset, it fell behind on generalization, proven by the external validation dataset. For that reason, an Explainable AI approach was added to the scope, to visualize and quantify the quality of feature extraction. Our findings demonstrate that the contrastive approach yields significantly more refined and clinically relevant feature maps, on top of better performance metrics on the external validation dataset, offering a robust justification for its use in Computer-Aided Detection systems.
Download

Paper Nr: 79
Title:

Beyond Directly‑Follows Graphs: Graphical Modelling in Process Mining

Authors:

Jose Luis Fernández-Pascual, Francisco Javier Pérez-Blanco, Juan Manuel Vara, Cristian Gómez-Macías and David Granada

Abstract: Directly-Follows Graphs (DFGs) have traditionally been used to represent reconstructed processes in process mining. However, recent research has explored the integration of more expressive modelling notations such as Business Process Model and Notation (BPMN). This paper presents a Systematic Literature Review (SLR), following the guidelines of Kitchenham and Biolchini, to analyse existing process mining tools and the graphical notations they support. The results show that current tools mainly rely on procedural notations such as DFGs, BPMN, and Petri Nets, with limited interoperability between representations and little adaptation to different user profiles, highlighting research opportunities for more user-centred and interoperable graphical modelling approaches.
Download

Paper Nr: 81
Title:

GEM-Qt: Generating Embeddings for Meteorological Reanalysis Data Using a Quadtree-Based Approach

Authors:

Gabriela Czibula, Andrei Mihai, Sîrbu Alexandru-Gabriel, Istvan Gergely Czibula, Eugen Mihuleţ and Ioan-Ştefan Gabrian

Abstract: Data embeddings are used to capture essential features of raw high-dimensional data by transforming it into a lower-dimensional space and preserving essential characteristics of the data. The paper focuses on creating embeddings for meteorological reanalysis data, and introduces a new method GEM-Qtthat uses quadtree-based decomposition of the data, to compress and encode relevant meteorological information from it. Reanalysis data embeddings are helpful for data clustering, classification, and retrieval tasks, as they allow for the group-ing of similar items and efficient querying in large meteorological datasets. The quality of the proposed quadtree-based embeddings is validated using clustering-based performance metrics. A comparison to related work highlights that the GEM-Qt outperforms other baseline models proposed in the literature for generating embeddings from meteorological images.
Download

Paper Nr: 133
Title:

InputGuard: Lightweight Input Validation Framework for Robust Edge AI Vision Pipelines - Evaluation in Mobile Thermal Imaging

Authors:

Katarzyna Baran

Abstract: Designing reliable edge AI vision systems for mobile devices faces a fundamental software engineering challenge: input quality degradation (e.g., blur, overexposure, thermal artifacts) often renders even highly optimized inference models unreliable. In this position paper, we argue that lightweight, pre-inference input validation should be considered a standard defensive design practice in edge AI software pipelines, rather than an optional add-on. To support this position, we present InputGuard, a concrete instantiation of such a mechanism-a minimal-footprint framework that acts as an early quality gate before the main model. Preliminary evaluation on a real-world dataset of smartphone-captured thermal images shows that InputGuard reduces erroneous downstream decisions by 79.9–88.0%, with validation latency below 35 ms and memory footprint under 4.8 MB on mid-range devices. These results provide initial evidence for our thesis and open a discussion on integrating lightweight quality components into the software engineering lifecycle of resource-constrained AI systems. We conclude by outlining research challenges and future directions for making input validation a first-class citizen in edge AI development.
Download

Paper Nr: 161
Title:

Automated Glacier Change Monitoring in Greenland Using Random Forest and SVM Classification of Landsat Imagery on Google Earth Engine

Authors:

Andrei Văran, Ioan Daniel Pop and Adriana Mihaela Coroiu

Abstract: Glacier melt is an important indicator of climate change, and Greenland is witnessing some of the fastest glacier loss in the world. This article describes a Machine Learning strategy for snow cover categorization and glacier change analysis that employs combined multispectral imagery from Landsat 8 and Landsat 9, filtered for the summer months June-September, as well as glacier metadata from the GLIMS dataset. Two supervised classifiers, Support Vector Machine (SVM) and Random Forest (RF), are compared with 200 labeled training points and stratified 5-fold cross-validation. Both produce F1-scores of around 0.83, with RF achieving higher precision 0.83 vs. 0.77 and SVM stronger recall 0.92 vs. 0.82. The trained RF is used throughout a ten year period 2014-2023 to measure ablation zone and ice area change, producing each year estimates, year to year transition metrics, and a statistically defined linear trend. The classification findings are visualized using an interactive web application developed on Google Earth Engine.
Download

Paper Nr: 164
Title:

Generating Synthetic Datasets for Process Mining

Authors:

Jakub Sawczuk and Marcin Szpyrka

Abstract: Testing new ideas related to process mining requires access to datasets on which we can repeatedly test new methods/algorithms. This often requires performing computational experiments on similar datasets (logs) to assess, for example, whether the algorithm is resistant to minor disturbances. Unfortunately, there are few real logs available to the general public, and if we were to search for a set of similar logs, it would be difficult to expect success. This article presents several methods for modifying logs so that their original character is preserved, but some noise is introduced. By using a set of hyper-parameters, the user can control the scale of individual changes that will be introduced and generate as mamy similar logs as necessary. The article presents a method of introducing noise that is completely controlled by the user, as well as methods of generating new traces using GANs and diffusion networks.
Download

Area 3 - Systems and Software Quality

Full Papers
Paper Nr: 63
Title:

Benchmarking Cross-Language Code Smell Detection with Pretrained and Large Language Models

Authors:

Vasilica Moldovan

Abstract: Code smell detection is an important challenge in software engineering, as code smells indicate potential design and maintainability issues. As software systems continue to grow in size and complexity, ensuring code quality becomes increasingly important. Moreover, modern software systems are rarely monolingual, which further increases their complexity and highlights the need for cross-language code smell detection. Recent advances in large language models have made both pretrained language models (PLMs) and large language models (LLMs) promising candidates for automated code smell detection. In this study, we analyze and compare the performance of PLMs and LLMs in detecting four common code smells: Long Method, God Class, Long Parameter List, and Long Ternary Conditional Expression, across Python, Java, and C++. We evaluate both in-language and cross-language learning settings. Our results show that PLMs generally achieve better performance than the evaluated LLM configurations. We observe that cross-language transfer works better between similar languages such as Java and C++, while multi-language training benefits C++ more. Performance also varies across code smells, indicating the need for further research to improve generalization.
Download

Paper Nr: 83
Title:

Enhancing Smart Contract Testing: A RAG-Based Approach for Unit Test Case Generation

Authors:

Feyza Sboui and Mariam Lahami

Abstract: In the rapidly evolving realm of blockchain, smart contracts serve as the essential backbone of decentralized applications, executing transactions with permanence and precision. However, any flaws within smart contracts are irreversible and costly to rectify as well as time consuming, making rigorous unit testing essential to ensure their correctness and reliability. Our proposed approach is an innovative platform that combines Large Language Models (LLMs) with Retrieval Augmented Generation (RAG) to automatically generate high-quality unit tests for Solidity smart contracts addressing the domain-specific knowledge gaps that LLMs solely may face. We conducted first an ablation study to evaluate several state-of-the-art LLMs and prompt strategies using both quantitative and qualitative metrics. This analysis identified Codestral as the most effective model, demonstrating superior performance in generating accurate and relevant unit tests. Building on this selection, we present RAG-based Unit Test Generator for solidity Smart Contract (RAGUnitTest4SC), which leverages retrieval-augmented generation over smart contract datasets to enrich prompts and improve test quality. The generated unit tests are automatically executed and validated within the Hardhat framework to ensure correctness. Experimental results demonstrate the high potential of RAGUnitTest4SC in enhancing testing coverage and highlight the effectiveness of combining RAG with LLMs to generate reliable JavaScript unit tests.
Download

Paper Nr: 131
Title:

An Exploration of Clean Code Categories and Attributes in Python Open-Source Projects

Authors:

Simona Motogna, Arthur-Jozsef Molnar, Diana Cristea and Diana-Florina Șotropa

Abstract: Maintaining high quality source code remains a critical concern for software developers. Static analysis tools such as SonarQube are widely used to assist with maintaining consistent formatting and naming schemes, to avoid common software defects or potential security issues, as well as to enforce good coding practices. In the present paper, we aim to explore the characterization of Python source code through the lens of the clean code taxonomy that was introduced in recent versions of the Sonar platform. This recently introduced taxonomy is expected to reveal potential software defects and inconsistencies at a higher level than that of Sonar rules, helping stakeholders make improved decisions regarding the maintenance of the software codebase. We aim to explore how this recently introduced taxonomy is reflected in complex software. We aim to explore the prevalence and diffusion of the proposed clean code categories and attributes in Python source code. In case an application infringes on different clean code categories and attributes, stakeholders can use this information to prioritize maintenance activities. Our study covers multiple releases of 57 widely-used open-source Python projects. The selection of Python as the target language was driven by its popularity on one hand, and the relevant lack of empirical investigation targeting the static analysis of Python code. We show that many applications have an observable clean code profile, where a small number of clean code attributes are infringed, resulting in most of the reported code issues; we examine this in detail for several of the included applications. We put our results into context and show that many of the code smells prevalent in Python were also widespread in Java code, and highlight our findings in the context of existing literature.
Download

Paper Nr: 136
Title:

On the Soundness and Consistency of LLM Agents for Executing Test Cases Written in Natural Language

Authors:

Sébastien Salva and Redha Taguelmimt

Abstract: The use of natural language (NL) test cases for validating graphical user interface (GUI) applications is emerging as a promising alternative to manually written executable test scripts, which are costly to develop and difficult to maintain. Recent advances in large language models (LLMs) have opened the possibility of the direct execution of NL test cases by LLM agents. This paper explores this direction, with particular attention to NL test case unsoundness and to the consistency of test case execution. NL test cases are inherently unsound because ambiguous instructions or unpredictable agent behaviour can produce false failures. Furthermore, repeated executions of the same NL test case may lead to inconsistent outcomes, undermining test reliability. To address these challenges, we propose an algorithm for executing NL test cases with specialised agents and guardrail mechanisms that dynamically verify the test step executions. We introduce measures to evaluate the capabilities of LLM agents in test execution and a measure to estimate NL test case execution consistency. We also propose a definition of weak unsoundness capturing contexts where rare incorrect verdicts are tolerable. Our experimental evaluation with eight publicly available LLMs demonstrates the potential of LLM agents for GUI testing. Our experiments show that Meta Llama 3.3 70B demonstrates good capabilities in NL test case execution with respect to the industrial quality levels 3-Sigma (mean accuracies greater than 98%), with high execution consistency.
Download

Paper Nr: 177
Title:

NavA11y: A Dynamic Analysis Approach for WCAG 2.4 Focus-Behavior Evaluation

Authors:

Abu Jafar Saifullah, Tasmia Zerin, Zerina Begum and Kazi Sakib

Abstract: Despite websites becoming integral to daily life, most of them are inaccessible to people with disabilities. While automated techniques exist to detect accessibility violations, prior work primarily relies on static analysis of HTML, CSS, and JavaScript. This analysis fails to determine violations which require live browser interaction. Several violations under the WCAG 2.4 focus-behavior Success Criteria (SCs), such as incorrect tab order, missing or insufficient focus indicators, and element obscuration, arise only during user interaction. To address this gap, we present a dynamic analysis approach targeting all five focus-behavior SCs defined in WCAG 2.4 navigable guidelines. The approach analyzes a webpage at two levels, page level and element level. At the page level, it drives a real browser, simulates keyboard navigation, and presses Tab to record and evaluate the focus order of elements across a webpage. At the element level, style attributes and obscuration ratio are collected for each element. Finally, these are used to check the visibility of a focus indicator and any obscuration by overlapping content. This approach is implemented as a tool called NavA11y. To evaluate this, we constructed a 22-page labeled dataset by extending six focus-related pages from the GDS Accessibility Tool Audit with 16 newly constructed test cases covering all five criteria. NavA11y detected all violations on this dataset without any false positives. To evaluate on real-world cases, we collected 26 production websites. Our approach detected 2,947 violations across those websites, with manual verification on a stratified sample confirming 90% as true positives.
Download

Paper Nr: 189
Title:

Temporal Modeling of Change History for Black-Box Test Suite Minimization

Authors:

Kamruzzaman Asif, Md. Siam and Kazi Sakib

Abstract: Test Suite Minimization (TSM) reduces the size of test suites while preserving their fault detection capability. In black-box TSM, reduction is performed without relying on production-code instrumentation. While several black-box TSM approaches have explored metrics like test logs or test similarity, these often suffer from scalability and efficiency issues. Recently, change history has been explored as a lightweight and scalable indicator for guiding black-box TSM. However, existing approaches treat historical modifications uniformly, ignoring the temporal dynamics of software evolution where recently modified code tends to be more fault-prone. To address this limitation, we introduce temporal modeling into black-box TSM and propose Temporal Risk-driven Test Suite Minimization (TRTM). TRTM extracts modification history from version-control metadata and applies exponential temporal attenuation to weight changes based on recency, producing time-weighted class-level risk scores that reflect fault-proneness. Next, it determines dependencies between test cases and production classes by constructing static call graphs derived solely from test code, preserving the black-box setting. The risk scores of the classes exercised by each test case are then aggregated using statistical measures such as Average and Geometric Mean to compute a risk score for the test case. Finally, test cases with the highest risk scores are selected to construct the reduced suite. Evaluation on a large dataset containing 14 projects with 631 versions shows that TRTM consistently outperforms the state-of-the-art baseline, achieving a mean Accuracy of 0.72 (vs. 0.66) and Fault Detection Rate (FDR) of 0.75 (vs. 0.69), while also reducing execution time.
Download

Short Papers
Paper Nr: 42
Title:

A Preliminary Exploratory Assessment of ChatGPT to Generating STRIDE Data Flow Diagrams

Authors:

Hassan Alsayegh and Mohamed El-Attar

Abstract: Threat modeling is a core activity in security-by-design practices, enabling early identification of architectural weaknesses before system implementation. The drawings used during STRIDE analysis are typically Data Flow Diagrams (DFDs), referred to as “STRIDE diagrams” in this paper. STRIDE diagrams provide a visual approach for categorizing security threats; however, constructing accurate STRIDE diagrams require experience and is often time-consuming. Recent advances in Large Language Models (LLMs), such as ChatGPT, raise important questions about their suitability for supporting structured security modeling tasks. This study presents a preliminary exploratory assessment of ChatGPT’s ability to generate, analyse, and iteratively refine STRIDE diagrams from natural language system descriptions. The evaluation focuses on syntactic correctness, semantic accuracy, trust-boundary placement, and the model’s capacity to incorporate iterative feedback. Using multiple case studies with varying architectural complexity, the study examines how system size and prompt structure influence the quality of the generated diagrams. The findings shed light on the strengths and limitations of LLMs in early-stage threat modeling and assess their potential role as a supportive tool in security engineering workflows.
Download

Paper Nr: 49
Title:

A Survey of Machine Learning Lifecycle Provenance: Models, Approaches, and Tools

Authors:

Lynn Vonderhaar, Tyler Thomas Procko and Omar Ochoa

Abstract: The World Wide Web Consortium (W3C) defines provenance as information about the activities and agents involved in producing some artifact, which may indicate its quality or trustworthiness. Machine learning (ML) models are complex computer artifacts engendered through the efforts of ML specialists, data scientists and researchers, which are iterated upon, integrated with varying data sources, augmented after evaluations, and so on. In short: ML models, e.g., Large Language Models (LLMs), are accompanied by inordinate quantities of provenance that are lost in the typical ML lifecycle. Such provenance, if maintained, can lend to the betterment of ML, as people involved in ML workflows can more reliably gauge the suitability of ML models for particular purposes. The present paper is a terse literature review of the models, approaches and tools of ML lifecycle provenance. By surveying existing models, approaches, and tools, this work highlights open challenges and outlines future directions for a deeper enmeshing of provenance with the ML lifecycle.
Download

Paper Nr: 64
Title:

Decoding Refactoring from Commit Messages: A Multi-View Mixture-of-Experts Approach

Authors:

Rim Mahouachi and Cherifa Ben Khelil

Abstract: Refactoring is a key software maintenance activity that improves code quality without altering behavior. Automatically identifying refactoring types from commit messages supports code review, project management, and software evolution analysis, but messages are often noisy and inconsistent, making classification challenging. We propose a multi-view Mixture-of-Experts (MoE) framework integrating semantic, lexical, and syntactic representations: Sentence-BERT encodes semantics, TF-IDF captures lexical patterns, and syntactic features provide structural signals. A dynamic router assigns view weights, learning each representation’s contribution to predicting refactoring types. Experiments show our model achieves state-of-the-art performance, particularly for challenging types such as Push Down Method. Router weight analysis reveals distinct view preferences across refactoring types, highlighting the value of complementary feature representations for accurate automated refactoring classification.
Download

Paper Nr: 92
Title:

Incremental Static Analysis for Detecting and Refactoring Data Clumps in TypeScript

Authors:

Padma Iyenghar, Nils Baumgartner, Marlena Schmidt and Elke Pulvermüller

Abstract: Data Clumps are a structural code smell in which groups of parameters or fields repeatedly occur together, indicating missing abstractions and reduced cohesion. While Data Clumps have been studied for established object-oriented languages, comparable detection and refactoring support for TypeScript remains limited despite its widespread use in modern software systems. This paper presents an incremental, rule-based static analysis methodology for detecting and mitigating Data Clumps in typed code bases within interactive development environments. The approach combines structural indexing, pairwise detection with explicit language-specific exclusions, and conservative refactoring workflows that prioritize semantic safety and developer control. The methodology is instantiated for TypeScript and evaluated on five projects ranging from small synthetic benchmarks to a large industrial-scale framework. The empirical study assesses detection latency, scalability, and refactoring overhead. Results show that incremental detection supports low-latency feedback for typical application-scale projects and that semi-automated refactoring can be applied efficiently within clearly defined safety boundaries. The findings demonstrate that the proposed approach is feasible as a practical software engineering activity and provide empirical evidence for integrating structural smell analysis into modern development workflows.
Download

Paper Nr: 99
Title:

Semantic Diversity-Driven Test Case Prioritization for System Tests

Authors:

Leyla Isabalayeva, Gürsel Cesur, Tanay Alpkonur, Hakan Kılınç and Hasan Sözer

Abstract: Regression testing is a critical component of the modern software development lifecycle, ensuring that system enhancements do not degrade existing functionality. However, in large-scale Continuous Integration (CI) environments, the growing size of test suites delays the valuable feedback from the execution of all tests. This paper presents a Test Case Prioritization (TCP) method that aims at increasing the semantic diversity among the executed system tests. We empirically evaluate the proposed approach in an industrial setting using the cost-cognizant Average Percentage of Faults Detected (APFD) metric. The results show that semantic diversity–based TCP substantially outperforms both the original test ordering and a runtime-based prioritization strategy. In particular, our approach achieves up to 39% improvement in fault detection effectiveness, with a large proportion of faults revealed early in the test execution, thereby effectively reducing regression testing feedback loops in CI environments.
Download

Paper Nr: 165
Title:

Reinforcement Learning-Based Software Metric Selection for Defect Prediction

Authors:

Dipender Singh, Abhinav Jamwal, Manish Agrawal and Sandeep Kumar

Abstract: Software defect prediction aims to identify defect-prone modules so that testing resources can be allocated effectively. Modern software projects often contain a large number of software metrics, many of which may be irrelevant or redundant for prediction. Using all available metrics can increase model complexity and reduce prediction effectiveness. To address this issue, this paper proposes a reinforcement-learning-based metric selection framework called Reinforced Metric Selection for Software Defect Prediction (RMS-DP). The proposed approach formulates metric selection as a sequential decision-making problem and employs a Deep Q-Network (DQN) to learn an adaptive metric selection policy. The selected metric subset is then used to train a Logistic Regression classifier for defect prediction. The framework is evaluated on seven projects from the JIRA dataset containing 65 software metrics. Experimental results show that RMS-DP improves MCC and Recall compared with a non-metric-selection baseline while reducing the number of metrics from 65 to approximately 4–10 for most projects. Additional experiments further demonstrate the robustness of the selected metrics across multiple classifiers and highlight the importance of class-imbalance handling in defect prediction.
Download

Paper Nr: 190
Title:

A Multimodal NLP Framework for Detecting and Explaining Code Vulnerabilities

Authors:

Jorge Guerreiro and Ibéria Medeiros

Abstract: Source code vulnerabilities, in particular in web applications, are direct threats to users’ privacy and security. This work introduces VULNLAN, a novel NLP-ensemble framework designed to detect and explain vulnerabilities in applications’ source code. A code translator module normalises code into an Intermediate Language (IL), then detection is supported by NLP classification models and a heuristic that processes their results, dictating the final classification. Explanation relies on sequence models that reveal the intermediate execution states the application’s code takes, thereby explaining why vulnerabilities exist (or not) in the code. We instantiate VULNLAN with two neural networks and a sequential model for classification, and two explanation sequential models to discover SQL Injection (SQLi) vulnerabilities in PHP web applications. A preliminary evaluation was conducted on 1,362 instances from the SARD dataset and 10 real web applications. The NLP-ensemble achieved an accuracy of 96% in the former and flagged 518 vulnerabilities in the latter, with 10% being false positives. In both datasets, the sequential explanation models successfully identified taint propagation and sanitisation points, producing interpretable outputs. These results show that applying NLP to an IL representation can both detect and explain vulnerabilities in PHP source code, closing the gap between automated detection and actionable remediation.
Download

Paper Nr: 139
Title:

Automated Unit Test Refactoring to Eliminate Test Smells and Support Test Maintenance

Authors:

Anna Derezińska and Olgierd Sobieraj

Abstract: Many smells specific for unit tests have been classified in recent years. However, it is not practical to deal with all of them in the maintenance of unit tests. A subset of more than 20 test smells was carefully analysed. The aim of the research was not only to automatically detect smells in the test code, but also to automatically refactor the tests under concern with regard to increasing the maintenance of the tests. A tool support has been introduced that detects ten test smells and automatically refactors the unit tests if required by a developer. The approach was experimentally evaluated over a set of open source programs. We have compared metrics of the test smell detection ability. We also measured different software quality metrics to assess the characteristics of the automatically refactored unit tests and examine the impact of this kind of refactoring on test maintenance.
Download

Area 4 - Theory and Practice of Systems and Applications Development

Full Papers
Paper Nr: 30
Title:

A Simple Trace Semantics for Asynchronous Sequence Diagrams

Authors:

David Faitelson and Shmuel Tyszberowicz

Abstract: Sequence diagrams are a popular technique for describing interactions between software entities. However, because the OMG’s UML standard is not based on a rigorous mathematical structure, it is impossible to deduce a single interpretation of the notation’s semantics, nor understand precisely how its different fragments interact. Although many semantics have been proposed in the literature, they are too mathematically demanding for most software engineers and are often incomplete, especially in handling lifeline creation and deletion semantics. In this work, we describe a simple semantics based on the theory of regular languages. This theory is a standard part of most undergraduate computer science curricula. Our semantics covers all the major compositional fragments and the creation and deletion of lifelines. Another important contribution is that we uncover significant ambiguities and inconsistencies in the sequence diagram notation, particularly regarding the roles of lifelines, self-messages, the separation of send and receive into distinct events, and the meaning of the loop fragment. We believe that these issues should be resolved one way or the other, regardless of the particular semantics presented in this work.
Download

Paper Nr: 32
Title:

User Requirements in Research Software Engineering: User-Centric Software Engineering for Sensitive Qualitative Research Data

Authors:

Sabina Mollenhauer

Abstract: This paper describes a requirement analysis that uses a qualitative elicitation strategy, combining ethnography, expert interviews, and grounded theory analysis, to develop software-supported processes for research data management in research software engineering. The primary stakeholders are qualitatively working researchers who produce interview transcripts that embed sensitive information in rich, contextual descriptions. The analysis pursues two main objectives: (1) to identify the requirements for transforming these transcripts into FAIR research data, and (2) to meet these requirements by adapting existing disciplinary practices within a collaborative software-engineering workflow, with a focus on user-centered design and iterative refinement. The iterative analysis yielded categories and requirements that captured the complexities of research practices among qualitatively working researchers and data advisors, highlighting the potential of use cases such as de-identification and encryption. Notably, the analysis revealed a unique de-identification practice, composite narratives, which lacks existing software support, and identified a need for institutional support for better encryption practices, underscoring the potential for developing a software-supported, user-centered encryption process.
Download

Paper Nr: 36
Title:

The Fifth Graph Normal Form (5GNF): A Trait-Based Framework for Metadata Normalization in Property Graphs

Authors:

Yahya Sa'd, Vojtěch Merunka and Renzo Angles

Abstract: Graph databases are increasingly used in software systems that depend on rich metadata, yet current modelling practices often duplicate metadata across nodes, leading to redundancy and inconsistent semantics. This paper introduces the Fifth Graph Normal Form (5GNF), a trait-based normalization framework that structures reusable metadata as canonical Trait Nodes connected through explicit HAS TRAIT relationships. We formalize trait dependencies (tFDs), present the TraitExtraction5GNF algorithm, and implement the approach in Neo4j. An experimental evaluation on the Northwind dataset demonstrates substantial reductions in metadata redundancy. The normalization process removes approximately 2,991 redundant metadata instances, reducing the Metadata Reuse Ratio from 26.67 to 1.74, while query workloads show improved efficiency including up to a 3.6x reduction in database accesses for metadata-intensive queries. These results indicate that 5GNF provides a reproducible and semantically precise framework for metadata normalization in property graphs while maintaining competitive query performance and supporting modular metadata modelling aligned with conceptual modelling and software engineering practices.
Download

Paper Nr: 38
Title:

Observable Consistency Checking across Requirements and Models

Authors:

Tianhai Liu, Shmuel Tyszberowicz and Bernhard Beckert

Abstract: Model-Driven Development (MDD) promises productivity gains in software development. However, its adoption remains limited due to persistent challenges in maintaining consistency, both between requirements and the models derived from them, and across the models themselves. Heterogeneous requirements and models from multiple disciplines, such as software, mechanical, or electrical engineering, often overlap in describing a system. They use different terminology and value ranges, which potentially leads to misalignments and contradictions. Existing syntactic methods capture only structural differences, while constraint-based approaches struggle to integrate heterogeneous artefacts. We address this challenge through observables: domain properties constrained by requirements and models, which serve as a unifying abstraction for consistency. We present an ontology-driven framework that leverages retrieval-augmented generation and large language models (LLMs) to automatically extract observables and constraints from heterogeneous artefacts. The extracted observables are harmonised via ontology construction, and the constraints are compiled into solver-ready formulas. This enables observable-centric semantic consistency checks across requirements and models, ensuring a consistent and accurate representation of data. It helps engineers identify inconsistencies earlier and reduces manual effort in consistency analysis. Evaluation on synthetic datasets and an industrial automotive case study demonstrates the feasibility and scalability from small sets to thousands of requirements.
Download

Paper Nr: 52
Title:

Process-Centric Vulnerability Handling Evidence Engineering for Cyber Resilience Act (CRA) Compliance

Authors:

Padma Iyenghar

Abstract: The Cyber Resilience Act (CRA) introduces explicit obligations for manufacturers to establish, operate, and document structured vulnerability-handling processes across a product’s lifecycle. While many organizations perform vulnerability handling activities, systematically implementing these obligations and producing auditable compliance evidence remains an engineering challenge. This paper proposes a process-centric vulnerability evidence engineering framework for CRA compliance. The framework represents CRA vulnerability-management obligations as normative requirements, defines a structured evidence metamodel for vulnerability cases, and realizes both through a BPMN-based vulnerability handling workflow. Evidence production is embedded into process execution, and executable traceability over obligation–task and task–artifact relations is used to assess obligation coverage and evidence completeness. A docs-as-code realization demonstrates the operational feasibility of the framework by showing that CRA process establishment, remediation, disclosure, and notification obligations are systematically implemented and traceable to concrete evidence artifacts, with completeness checks executed automatically as models evolve. The results show how CRA compliance can be engineered as a property of software processes rather than reconstructed through manual documentation.
Download

Paper Nr: 68
Title:

Towards Interpretable Formal Software Requirements: Empirical Assessment of Open-Source LLMs for LTL to NL Translation

Authors:

A. Vimaleswar and Arpit Sharma

Abstract: Formal software requirements specified in Linear Temporal Logic (LTL) provide unambiguous, mathematically verifiable descriptions of software behavior, essential for model checking and runtime verification. However, LTL’s symbolic syntax is inaccessible to most stakeholders, hindering validation, communication, trace-ability, counterexample debugging, and trust in safety-critical systems. While large language models (LLMs) excel in requirements engineering (RE) tasks, prior research has focused almost exclusively on translating natural language (NL) to temporal logics. Systematic evaluation of the reverse–LTL to human-readable NL descriptions–remains limited. This paper presents the first systematic empirical study of open-source LLMs for LTL to NL translation. In Phase-1, 40 diverse models are screened for basic interpretability, selecting 16 for detailed analysis. In Phase-2, these models are evaluated using 7 prompting strategies (including explanation-augmented variants) on 80 LTL specifications from 4 public datasets. Translation quality is assessed via a multi-metric framework measuring lexical overlap, syntactic structure, and semantic alignment with reference descriptions. Our results highlight the importance of multi-metric evaluation and show that several small and medium sized open-source LLMs can outperform larger models under appropriate prompting, indicating that model scale alone is not sufficient to determine translation quality.
Download

Paper Nr: 75
Title:

Modelling GDPR-Based Privacy Requirements with Software Engineering Diagrams: A Systematic Literature Review

Authors:

Evangelia Vanezi, Georgia M. Kapitsaki and Anna Philippou

Abstract: The application of the General Data Protection Regulation (GDPR) has significantly affected privacy requirements elicitation, modelling, and verification in Software Engineering (SE). One of the affected areas is requirements visualisation through modelling diagrams, which plays a crucial role in ensuring privacy compliance, as functional system requirements should be integrated with GDPR-based privacy requirements. We present a systematic literature review on how SE diagrams have been employed to capture and integrate GDPR-based privacy requirements into software system design. The study aims to identify the existing research landscape, existing gaps, and directions for future work. Following a rigorous search protocol and addressing two research questions, 18 primary studies published between 2017 and 2025 were selected, analysed, and categorised based on (i) the diagram types used, and (ii) the GDPR principles or rights addressed. The findings highlight the need for inter-diagram integration, full lifecycle traceability mechanisms, tool support, and automated compliance checking.
Download

Paper Nr: 82
Title:

Domain-Driven Modeling of Combinatorial Constraint Satisfaction Problems for Quantum Solvers

Authors:

Marc Uphues, Sebastian Thöne and Herbert Kuchen

Abstract: Over the last years, quantum computing has reached a level of technological maturity that enables the exploration of potential applications. One use case is solving combinatorial Constraint Satisfaction Problems. Yet, it seems challenging to utilize quantum computing for such problems in practice, as software developers must translate these problems to specialized models required by quantum solvers. A common candidate is the model of Quadratic Unconstrained Binary Optimization. It serves as an input format for both quantum annealing and gate-based quantum algorithms. With this paper, we present a software-based and domain-driven modeling approach that is compatible with such quantum solvers. It facilitates the inline-declaration of problems within domain model code through an annotation-based syntax. Our novel approach is driven by a framework-like engine that encapsulates and automates the process of problem instantiation, translation, solver invocation, and solution mapping. It enables software developers and domain experts to solely focus on the problem domain, rather than highly specialized models prescribed by quantum solvers. By incorporating principles of Domain-Driven Design, our approach employs a well-known modeling technique.
Download

Paper Nr: 85
Title:

How Can the RLSPL Framework Strengthen Traceability and Reproducibility in Reinforcement Learning Projects?

Authors:

Syrine Wardi, Rania Mzid and Tewfik Ziadi

Abstract: Reinforcement Learning (RL) experiments are highly sensitive to design and configuration choices, including hyperparameters, training budgets, and evaluation protocols. Even when implementations are partially documented or code is made available, reproducing reported results and extending experiments in a controlled and traceable manner remains challenging. This paper addresses these issues using RLSPL, a Software Product Line–based framework for RL engineering. We first show that a reference RL study can be faithfully reproduced as an RLSPL configuration, making all design decisions explicit and traceable. Building on this baseline, we introduce a unified extension that integrates hyperparameter optimization with configurable evaluation within the same product-line variant. This extension enables systematic exploration of alternative training configurations while preserving comparability with the reproduced baseline. Evaluation artifacts, including performance metrics, logs, and diagnostic visualizations, are generated automatically and remain explicitly linked to their configuration choices. An empirical study on a task placement problem illustrates how RLSPL supports reproducible reproduction, controlled extensibility, and transparent analysis of RL experiments.
Download

Paper Nr: 88
Title:

Can ChatGPT Generate Realistic Synthetic System Requirements Specifications? Results of a Case Study

Authors:

Alex R. Mattukat, Florian M. Braun and Horst Lichter

Abstract: System requirements specifications (SyRSs) are natural language (NL) artifacts from requirements engineering. Access to real SyRSs is highly valuable for research purposes but limited due to proprietary restrictions or confidentiality concerns. Generating synthetic SyRSs (SSyRSs) can address this scarcity. Black-box large language models (LLMs) such as ChatGPT offer attractive generation capabilities for NL artifacts, as they are easily accessible and do not require prior model training or access to real-world data. However, LLMs suffer from hallucinations and overconfidence, posing major challenges in their use. We designed an exploratory study to investigate whether, despite these challenges, we can generate realistic SSyRSs with ChatGPT without having access to real SyRSs. Using a systematic approach that leverages prompt patterns, LLM-based quality assessments, and iterative prompt refinements, we generated 300 SSyRSs across 10 industries. Results of the last iteration were evaluated by an expert study (n=87). 62% of experts considered the SSyRSs realistic. However, in-depth examination revealed contradictory statements and deficiencies. Overall, we were able to generate realistic SSyRSs to a great extent with ChatGPT, but LLM-based quality assessments cannot fully replace thorough expert evaluations. This paper presents the methodology and results of our study and discusses the key insights we obtained.
Download

Paper Nr: 91
Title:

EAGLE-BPM: An Ethical Assessment and Governance Framework for Integrating Large Language Models into Process-Aware Environments

Authors:

Omnia Saidani Neffati

Abstract: Large Language Models (LLMs) are increasingly integrated into process-aware systems to support operational decision-making, such as classification, prioritization, and routing. The use of LLM-supported decisions within automated workflows introduces ethical governance considerations that must be addressed at the level of process execution. In this context, there is a need for mechanisms that can systematically assess the ethical sensitivity of LLM-supported decisions and guide appropriate governance actions as these decisions influence process behaviour. In this paper, we introduce EAGLE-BPM, an ethical assessment and governance framework designed to support the integration of LLM-supported decisions within process-aware environments. EAGLE-BPM explicitly identifies LLM-aware decision points within process models, evaluates ethical sensitivity at the criterion level and links assessment outcomes to well-defined governance actions. The applicability of EAGLE-BPM is illustrated through a healthcare triage scenario, demonstrating how ethical sensitivity may vary across decision criteria and how targeted governance actions can be applied. The results highlight the importance of process-aware and context-sensitive ethical governance and show that effective oversight of LLM-supported decisions can be achieved without disabling automation.
Download

Paper Nr: 97
Title:

Market-Based Process Coordination: Trading Routing Efficiency for Schedule Stability in Volatile Field Operations

Authors:

Leo Poss and Stefan Schönig

Abstract: Centralized orchestration creates unstable schedules in volatile field environments. We introduce a Swap-Enabled Auction in which decentralized agents negotiate task allocation via economic bids, with optional risk-aware pricing to model deadline pressure. A tunable Stability Parameter governs the economic cost of de-commitment. We compare this mechanism against a Centralized Offline Optimum, a dynamic Rolling Horizon solver, and standard heuristics. Results reveal a distinct strategic trade-off: the Market acts as an insurance mechanism, incurring a Resilience Premium (Price of Anarchy 1.72 under disruption) to reduce schedule volatility by more than 90%. Rather than reshuffling the entire fleet, the market effectively confines the impact of disruptions to those agents who voluntarily participate in a swap. This removes the nervousness associated with dynamic re-optimization and clears in milliseconds, demonstrating that process resilience can be explicitly managed as an economic parameter, trading off efficiency for schedule stability.
Download

Paper Nr: 115
Title:

PLCEQ: Behavioural Equivalence Checking for Industrial PLC Software Migration

Authors:

Santonu Sarkar, Avijit Mandal and Raoul Jetley

Abstract: Industrial plants rely on Programmable Logic Controllers (PLCs) and Distributed Control Systems (DCSs) for process automation. As technology evolves, it becomes necessary to migrate legacy control software written in the heritage TaylorTM Control Language (TCL) to the modern IEC 61131-3–compliant language Sequential Function Chart (SFC). Such migration processes may introduce unintended changes in the control logic, potentially leading to errors and safety risks. This paper introduces a tool, PLCEQ, a Petri net based framework for modeling the behaviour of control software written in TCL and its migrated SFC counterpart. The approach verifies behavioural equivalence between the source and migrated code using a Petri net–based equivalence checker. An extensive evaluation demonstrates that PLCEQ automatically establishes equivalence for 92% of industrial migrations when we tested it on 300 industrial benchmarks, as well as on 80 benchmarks drawn from the open-source OSCAT library. The approach is sound but not complete, ensuring reliable equivalence detection while flagging potential mismatches conservatively.
Download

Paper Nr: 116
Title:

A Guardrail-Driven Multi-Agent Architecture for AI-Assisted Public Administration Workflows

Authors:

Ciprian Paduraru, Bogdan Dumitru and Alin Stefanescu

Abstract: Public administration processes are multi-step, document-intensive, and subject to strict authorization and traceability requirements. This paper presents a multi-agent architecture for AI-assisted administrative work-flows, instantiated in three Romanian public services: identity card issuance, social benefits application, and local tax payment. The central contribution is a formally defined step-driven interaction model that separates agent reasoning, user interface rendering, and persistent case-state mutation. Agents emit structured step descriptors that specify admissible inputs, validation constraints, and transition targets. Workflow execution is formalized as a labeled transition system (LTS), in which interaction steps correspond to guarded execution transitions between explicit workflow states. Persistent case-state mutation occurs exclusively through authorized tool invocations under declared workflow and authorization preconditions. The architecture integrates workflow orchestration, retrieval-augmented access to procedural and legal knowledge, local document processing (using OCR), and layered authorization enforcement within a unified execution contract. An open-source prototype demonstrates cross-domain reuse of shared orchestration components across heterogeneous administrative workflows. The evaluation empirically assesses workflow conformance, invariant preservation, authorization scenarios, and efficiency characteristics under local deployment conditions.
Download

Paper Nr: 121
Title:

Learning Fairness through Bias Mitigation and Reflection

Authors:

Andrada-Mihaela-Nicoleta Moldovan, Andreea Vescan and Crina Grosan

Abstract: This study investigates how AI fairness can be meaningfully integrated into undergraduate AI curricula through a structured, course-based empirical approach. Conducted within a third-year Artificial Intelligence course, the research engaged 43 students in a dedicated lab assignment requiring them to compute and interpret fairness metrics, identify potential sources of bias, and apply mitigation strategies to a sample dataset and model. Guided by the Goal-Question-Metric framework and informed by reflective practice principles, the study addresses two research questions: how students assess and mitigate bias, and what introspective reflections they report after completing the task. Findings indicate that students were able to define context-sensitive fairness objectives, select and compute relevant metrics, and compare results before and after applying mitigation strategies. Questionnaire responses reflected strong confidence in metric selection and mitigation design. However, students reported greater difficulty in identifying suitable metrics for dynamic systems, interpreting potentially conflicting fairness indicators, and determining acceptable fairness–accuracy trade-offs. Reflective analyses further show that students developed a structured, metrics-driven, and system-level perspective on fairness, recognizing it as an iterative engineering process shaped by data constraints, model behavior, and ethical considerations rather than a one-time technical adjustment.
Download

Paper Nr: 130
Title:

Rethinking Correctness and Efficiency in AI-Assisted Code Generation

Authors:

Haluk Altunel, Tugba Gurgen Erdogan and Ayca Kolukısa

Abstract: Large Language Models (LLMs) have significantly advanced the software engineering life cycle by automating code generation, yet evaluations have traditionally relied heavily on functional correctness metrics like Pass@k. Such correctness-only assessments often mask critical deficiencies in computational efficiency, including computational overhead, memory footprint, and long-term maintainability. To address this gap, this study systematically surveys eight rigorous benchmarking studies to review the current evaluation landscape of AI-assisted code generation. By analyzing benchmarking methodologies, datasets, and multi-dimensional performance metrics, we highlight the severe efficiency gap between LLM-generated code and optimal human expert baselines. Ultimately, this paper outlines the current boundaries of generative AI in software development and identifies critical research gaps, advocating for a paradigm shift toward rigorous, hardware-agnostic compound metrics and stress testing to achieve truly scalable and efficient AI-assisted programming.
Download

Paper Nr: 146
Title:

A Mixed-Method Empirical Study of LLM Assistance in Software Engineering Workflows

Authors:

Pamali D. Weerasinghe, Roshan N. Rajapakse, Isuru Dharmadasa and Chamath Keppitiyagama

Abstract: Large Language Models (LLMs) are increasingly integrated into software development workflows, yet their effects are often discussed without distinguishing between task types, developer seniority, and verification demands. This paper presents a mixed-method empirical study of LLM-assisted software engineering with first-year and fourth-year undergraduates. Phase 1 is a preliminary survey (N = 157) that characterizes LLM exposure, reliance, and trust calibration among the two groups. Phase 2 is a task-based quasi-experiment with a purposive sample from both cohorts (n = 20). Here, we compare AI-assisted and non-AI conditions on a structured set of software engineering tasks spanning implementation, constraint-driven algorithm selection, and architectural reasoning. We then analyze performance outcomes alongside behavioral traces captured via screen recording and a qualitative coding process. Survey results indicate widespread LLM adoption and substantial verification effort, alongside cohort differences in perceived LLM capability for constraint-heavy scenarios. The quasi-experiment further shows that AI assistance changes workflow structure. For example, participants frequently adopt AI-first task entry, copy–transfer integration, and AI-mediated debugging, whereas non-AI workflows rely more on documentation, prior templates, and iterative trial–error refinement. Overall, our findings suggest that the benefits of LLM assistance are task-dependent and mediated by expertise and verification practices, rather than by generation speed alone.
Download

Paper Nr: 156
Title:

A Graph-Based Software Framework with Topological Analysis of Retinal Vessel Networks for Automated Diabetic Retinopathy Grading

Authors:

Nader Belhadj, Mohamed Amine Mezghich, Ridha Ghayoula and Lassaad Latrach

Abstract: Automated medical image analysis pipelines increasingly demand modular, extensible software architectures capable of combining deep learning with structured domain knowledge. This paper presents a novel software framework for diabetic retinopathy (DR) grading that integrates three complementary computational components into a unified, reproducible pipeline: (i) convolutional feature extraction via a pretrained EfficientNet-B3 backbone, (ii) topological data analysis (TDA) based on persistent homology of skeletonized retinal vascular networks, and (iii) graph neural network (GNN)-based relational learning over a population-level similarity graph. The framework addresses a core software engineering challenge: enabling contextual, inter-sample reasoning across clinical image collections without requiring additional manual annotations or patient-level metadata. Each stage of the pipeline is designed as an independently testable and replaceable module, following software quality principles of separation of concerns, interface-driven composition, and reproducibility. A topology-aware dual-similarity graph construction mechanism jointly accounts for visual and structural feature distances to model meaningful inter-patient relationships. A GraphSAGE network then propagates disease-related features across this graph, refining node representations through neighborhood aggregation. The framework is evaluated on three publicly available benchmarks: Kaggle Diabetic Retinopathy, Messidor-2, and APTOS 2019. It achieves classification accuracies of 95.5%, 96.1%, and 94.6% respectively, consistently outperforming a strong CNN-only baseline by 1.5 to 2.3 percentage points. Ordinal-aware metrics (Quadratic Weighted Kappa, Macro-F1) further confirm the improvements, particularly for clinically critical intermediate disease stages. Ablation experiments validate the contribution of each pipeline component. These results demonstrate the practical value of integrating topological and relational reasoning into medical image analysis software, providing a reusable blueprint for next-generation clinical AI systems.
Download

Paper Nr: 166
Title:

A Multi-Agent Software Architecture for Enterprise Time Series Forecasting

Authors:

Lynda Ayachi

Abstract: Many large organizations rely on heterogeneous and largely manual forecasting practices across departments such as HR, Finance, Operations and Marketing, which hampers reproducibility, governance and systematic use of exogenous signals. To address this, we propose an agentified forecasting platform architecture in which specialized software agents collaborate over a shared enterprise data ecosystem to automate data preparation, anomaly detection, model selection, forecast generation and governance. The platform is designed as a set of loosely coupled services orchestrated by workflow engines and event buses, leveraging a feature store and a model registry to ensure traceability and controlled evolution of forecasting pipelines. We position this architecture as a reusable software engineering approach for building scalable, explainable and governable forecasting systems in enterprise contexts, independently of specific machine learning algorithms or business domains. We also outline an evaluation plan combining forecasting accuracy metrics, operational indicators (latency, robustness, adoption) and governance-related criteria (auditability, compliance), to be applied in a real-world industrial setting. This paper contributes an architectural blueprint, engineering principles and an evaluation framework for agent-based forecasting platforms, aimed at practitioners and researchers seeking to industrialize time series forecasting at scale.
Download

Paper Nr: 168
Title:

Towards Interpretable Ensemble Learning for Software Effort Estimation

Authors:

Karishma Doshi, Suyash Shukla and Sandeep Kumar

Abstract: To enable successful software project planning, accurate prediction of development effort is essential. Software Effort Estimation (SEE) aims to predict the effort required in terms of labour, time, and cost to develop software systems. Accurate effort estimation is necessary for effective planning, resource allocation, cost management, and risk management. Despite extensive research, SEE remains a challenging and active research problem. Over the years, various techniques have been proposed for SEE, including traditional estimation techniques and machine learning (ML) approaches. ML-based models have demonstrated superior performance by learning complex patterns from historical project data. Recent ML techniques, such as ensemble learning, have attracted significant attention in SEE for their ability to improve prediction accuracy. In our earlier study, we proposed a self-adaptive ensemble-based approach for SEE, which integrated multiple ML models while jointly optimizing their hyperparameters and model weights. However, despite the improved accuracy, the model lacks interpretability, which remains a challenge in its practical adoption. To address this limitation, this study incorporates Local Interpretable Model-Agnostic Explanations (LIME) and SHapley Additive exPlanations (SHAP) for model interpretation. SHAP provides a unified framework for explaining the contributions of individual features. This provides meaningful insights into the factors that affect the predictions. Experimental results demonstrate that the proposed approach achieves a strong predictive performance while providing meaningful interpretability. This enables practitioners to better understand and trust the model’s predictions.
Download

Paper Nr: 171
Title:

An Encoder-Flexible Semantic Hypergraph Framework for API Recommendation in Mashup Development

Authors:

Abhinav Jamwal, Dipender Singh, Manish Agrawal and Sandeep Kumar

Abstract: API recommendation is an important task in mashup development, where developers must identify suitable Web APIs from a large and evolving ecosystem. Existing methods mainly rely on invocation history, collaborative signals, or structural relations, but often underuse the semantic information available in mashup descriptions, API descriptions, and tag metadata. Moreover, many pipelines are tied to a single semantic encoder, which limits extensibility and obscures the effect of semantic backbone choice on recommendation quality. To address these limitations, this paper proposes an encoder-flexible semantic hypergraph framework for mashup–API recommendation. The framework encodes mashup descriptions, API descriptions, mashup tags, and API tags into dense semantic vectors, uses them to initialize trainable node representations, and refines them through multi-channel hypergraph propagation and adaptive layer aggregation. A joint objective further combines recommendation ranking with tag-level semantic alignment. Experiments with seven pretrained sentence encoders under the same downstream architecture show that the proposed framework is consistently effective and that encoder choice materially influences recommendation behavior. In particular, BGE and GTE yield the strongest ranking performance, whereas MPNet-based variants provide broader recommendation coverage. These findings highlight encoder-flexible semantic initialization as a meaningful design dimension in mashup API recommendation.
Download

Paper Nr: 178
Title:

Ecosystem-Based Personas: Modeling Sociotechnical Roles in Complex Digital Service Systems

Authors:

Marcelo Judice, Andrea Judice, Glauco Pedrosa, John Gardenghi, Rejane Figueiredo, Giovanna Santana and Flavio Costa

Abstract: Complex digital service systems are increasingly characterized by sociotechnical environments in which multiple actors perform interdependent activities and rely on continuous coordination and information exchange. In such contexts, traditional user representations centered on individual profiles may be insufficient to capture the organizational and relational structures that shape how services operate. Although personas are widely used in user-centered design to represent user needs and behaviors, existing approaches often overlook the sociotechnical roles and interdependencies among actors involved in complex service ecosystems. This paper proposes a model of ecosystem-based personas for modeling sociotechnical roles in complex digital service systems. The proposed approach integrates ecosystem mapping, analysis of organizational data and identification of recurring sociotechnical roles to support the systematic construction of ecosystem-based personas aligned with the relational structure of digital service systems. The applicability of the model is illustrated through an exploratory study conducted in a digital public service related to the management of federal real estate assets in Brazil. The results show how modeling users as roles within a service ecosystem can reveal organizational interdependencies, information flows and distributed responsibilities among actors. From the perspective of digital service design and software engineering, the proposed approach contributes to stakeholder analysis and to the understanding of complex organizational contexts that shape the design and evolution of digital service systems.
Download

Short Papers
Paper Nr: 27
Title:

Leveraging Project Metrics with LLMs: Designing a Retrieval-Augmented Generation Chatbot for Software Project Management

Authors:

Esa Karjalainen, Timo Poranen, Zheying Zhang and Pekka Mäkiaho

Abstract: Software project management faces increasing demands as development practices evolve, motivating the development of new support tools. Recent advances in generative AI, particularly large language models (LLMs), offer opportunities to enhance project management by analysing project data and providing guidance using a natural-language interface. This paper presents a Design Science Research study that designs and implements a retrieval-augmented generation (RAG) chatbot integrated with an existing project monitoring tool for student software projects. The prototype validates the system architecture and evaluates prompt engineering using a commercial LLM. The results exhibit that project metrics can be effectively leveraged in a RAG chatbot to provide useful, context-aware assistance to project managers.
Download

Paper Nr: 39
Title:

Consent Requirements with Large Language Models: An Empirical Study on Clarity, Compliance, and Bias

Authors:

Anastasia Terzi, Christina Zoi and Stamatia Bibi

Abstract: Obtaining user consent for personal data management has recently become a significant challenge for software developers, who often struggle to integrate consent requirements into existing systems. This difficulty is compounded by the need to continuously update and maintain consent requirements alongside main software features to accommodate new templates and regulatory specifications. Recent advances in Large Language Models (LLMs) offer promising potential to automate the engineering of these requirements. In this paper, we examine the capabilities of popular LLMs namely Llama, Gemma, DeepSeek, and Phi to generate consent requirements in collaboration with developers providing natural language inputs. The LLMs enrich and transform these inputs into ready-to-use requirement specifications for integration into consent forms. We conducted an empirical, multi-dimensional evaluation of these LLM-generated requirements, encompassing both quality metrics and developer productivity assessments. The findings reveal a trade-off between model complexity, output quality, and developer productivity, which is exacerbated by the requirement context and output format. The paper provides useful implications to help practitioners select the appropriate LLM based on specific consent contexts and concludes with directions for future research.
Download

Paper Nr: 53
Title:

Integrating Machine Learning and a Rule-Based Approach for the Automated Requirements Elicitation from BPMN-Based Process Models

Authors:

José Neves, João Delgado, A. M. Rosado da Cruz and Estrela Ferreira Cruz

Abstract: Organizations are increasingly committed to planning and structuring their production processes with greater precision. Consequently, it has become standard practice for companies to model their business processes prior to initiating operations. Business Process Model and Notation (BPMN) has emerged as a widely adopted standard for business process modeling due to its expressive completeness, formal consistency, and relative ease of understanding. In software engineering, development starts with requirements elicitation, being one of the most time-consuming and error-prone phase. Extracting information contained in business process models as a basis for identifying software requirements is an effective strategy for ensuring that the resulting specifications remain aligned with organizational needs and operational realities. At the same time, artificial intelligence, particularly advances in machine learning (ML) and natural language processing, has increasingly been applied to the domain of requirements engineering, offering new opportunities for automation and improved quality in requirements elicitation. In this paper, we propose a tool that automatically generates functional software requirements from BPMN-based business process models, using a rule-based approach, and subsequently evaluates the quality of the generated requirements using ML techniques. To demonstrate the use of the platform, two case studies are presented: one with a single BPMN model and another with three interrelated BPMN models. The results demonstrate the viability of this approach as an initial mechanism for requirements elicitation.
Download

Paper Nr: 62
Title:

Automatic Validation of Use Case Descriptions in Terms of Quality Criteria and Bad Smells via NLP Models and LLMs

Authors:

Evin Aslan Oğuz, Jochen M. Küster and Felix Lennart Schildmann

Abstract: Ensuring the quality of use case descriptions is a critical task in software engineering, as poor quality artifacts can lead to ambiguity, miscommunication, and design flaws. Current quality assessment methods are largely manual, time consuming, and prone to inconsistency. Existing approaches are limited in scope and rarely address the nuanced quality criteria or recurring issues-commonly known as “bad smells”-in a systematic way. This paper presents an observational study that explores the feasibility of detecting quality criteria violations and bad smells in use case descriptions using a combination of Natural Language Processing (NLP) techniques and Large Language Models (LLMs). The system operates based on a predefined set of quality criteria drawn from established literature and practitioner guidelines. It targets five key fields in use case descriptions: Name, Actors, Postcondition, Standard Procedure, and Extensions. Rather than seeking to generalize, this study serves as a first step toward understanding whether such detection can be performed using language models. Evaluation results show promising levels of accuracy, precision, recall, and F1 scores. A comparative analysis of GPT-4o and o1 highlights trade offs in output quality, runtime, and cost, with GPT-4o emerging as the more practical choice. While the system provides consistent results to the extent permitted by the LLMs used, the findings suggest that criteria based, LLM and NLP models supported quality assessment of use case descriptions is both feasible and worthy of further exploration.
Download

Paper Nr: 70
Title:

Visual Strategies in Mobile Navigation: Preliminary Eye Tracking Findings on Usability Barriers with Older Adults

Authors:

John W. Castro, Gianina A. Madrigal and Agustín I. Astudillo

Abstract: The usability of mobile navigation applications is a critical area of study, particularly as older adults increasingly rely on these technologies for daily activities. Despite their widespread use, the specific interaction challenges faced by this demographic remain underexplored. This study presents preliminary findings on usability barriers in a widely used mobile navigation application, focusing on older adults' visual attention patterns. Using a controlled laboratory setup with a Tobii Pro Spark eye tracker, the study analyzed gaze behavior in older adults (aged 60-79) during a location-retrieval task. Preliminary results reveal significant discrepancies between the application’s intended workflow and older adults' actual cognitive strategies. Quantitative metrics indicate a prevalent "map-first" mental model, in which participants prioritized manual map exploration over using textual search bars to avoid input interactions. Furthermore, contrary to the assumption that decorative elements distract users, fixation data suggest that façade imagery served a functional role as a "safety anchor," allowing users to visually verify destinations, thereby mitigating error aversion. These findings underscore the need for software engineering approaches that move beyond standard efficiency heuristics and instead prioritize compensatory interface designs—such as redundant visual verification and map-centric navigation—to accommodate the unique cognitive processing of older adults.
Download

Paper Nr: 71
Title:

Why Do You Contribute to Stack Overflow? Insights for Sustaining Knowledge Ecosystems in the Age of LLMs

Authors:

Sherlock A. Licorish, Elijah Zolduoarrati, Tony Savarimuthu, Rashina Hoda, Ronnie De Souza Santos and Pankajeshwara Sharma

Abstract: Understanding developers’ motivations for participating in community question-and-answer (CQA) platforms is crucial for sustaining knowledge-sharing ecosystems (e.g., Stack Overflow), which is necessary to advance the discipline while also ensuring its longevity. This is particularly necessary in the age of LLMs, where data from such portals are used to train these models. Limited insights exist regarding how contributors’ motivations vary across national cultures. This research investigates Stack Overflow contributor motivations, analysing regional differences and relations to platform activity. A mixed-methods approach was employed, combining qualitative content analysis of 600 “About Me” profiles with quantitative linguistic analysis of 268,215 contributors’ data from the United States, China, and Russia. We found that contributors are primarily motivated by advertising opportunities and altruistic problem-solving desires, with Americans exhibiting strongest self-promotional behaviours. Also, those with more detailed profiles tend to engage in advertising and social activities, while learning-oriented users maintain minimal self-presentation. Understanding these variations can inform strategies for enhancing cross-cultural participation in software engineering.
Download

Paper Nr: 72
Title:

Blueprint-Based Standardization and Executable Portability Support for Self-Adaptive Serious Games

Authors:

Spyros Loizou and Andreas S. Andreou

Abstract: Serious Games present strong connection between game structure, content and execution logic, making reuse, portability and system modification very difficult, if not impossible. The ability to represent game structure elements in a standardised and reusable way that can be transferred across system or changed without programming effort is limited and the main reason is that existing approaches often use engine-specific implementations. This paper addresses this challenge using a Blueprint-based framework for modelling and executing Serious Games, focusing on hierarchical structure, standardized metadata representation and game engine-independent execution. The proposed approach utilizes the concept of Blueprints to provide structured descriptions of game elements and flow based on a dedicated Scenario-Scene-Task hierarchy and exports these descriptions into executable XML blueprint code. A parser traverses the XML files and dynamically rebuilds the game environment based on BL and generic interaction templates. As a result, a game can be (re-)created, modified, transferred across systems and reused by editing the Blueprints only, without any changes to the underlying game engine. A use-case example is developed to demonstrate and validate the proposed approach through a serious game that aims to support education for children with learning disabilities. The experimental results shows that changes to the game are reflected in real-time offering engine execution independence and system-to-system portability.
Download

Paper Nr: 76
Title:

Unifying Non-Uniform Formats of UML Model Interchange Using JSON

Authors:

Agnieszka Malanowska and Filip Pawłowski

Abstract: Despite the constant popularity of the UML as a modeling language, it is still really difficult to exchange UML models between different modeling tools. Although there is an XMI standard dedicated to UML model interchangeability, in practice each tool implements its own version of this specification. It results in problems during the import of UML project into another software, even though theoretically the standard is supported by both exporting and importing environment. To overcome this issue, we propose a new notation for UML elements representation, based on JSON format. The objective of the notation, named UMJ, is to uniformize various UML serialization formats, either by implementation of translations from existing tool-specific formats or by creation of plugins allowing for direct import and export of an UMJ file to and from modeling environments. Our evaluation on several UML models proves the usefulness of the UMJ.
Download

Paper Nr: 78
Title:

Engineering Control in LLM-Based Recommender Systems

Authors:

Zihao Xiao, Jesús Tapia, Jorge Díaz, Samuel Sepúlveda, Oscar Ancán and Carlos Cares

Abstract: The widespread availability of Large Language Models (LLMs) has led to their rapid adoption as core components of AI-based software solutions, including recommender systems. Despite their expressive power and adaptability, LLMs produce non-deterministic outputs. While fine-tuning and continual retraining can reduce output variability, these techniques do not support improving recommendations or producing trace-based explanations. In this paper, we analyze how LLMs are integrated into multi-turn recommender systems. We conduct a systematic mapping study of LLM-based recommender systems, identifying 15 primary studies. Our analysis shows that LLMs often handle multiple responsibilities, increasing system opacity and hindering decision traceability and maintainability. To address these limitations, we propose a reference architecture grounded in separation of concerns and explainable AI principles, using LLMs only for natural-language interpretations and explanations. The architecture is validated through a proof-of-concept prototype in the domain of software architecture decision-making.
Download

Paper Nr: 84
Title:

BPM4QM: A Meta-Model for Extending the BPMN Formalism with the Quality Management Dimension

Authors:

Zohra Alyani, Mohamed Turki, Karima Dhouib and Faiez Gargouri

Abstract: This paper introduces BPM4QM, Business Process Modeling for Quality Management a meta-model designed to extend Business Process Model and Notation (BPMN) with quality management concepts. In many organizations, quality management systems rely on process-oriented approaches, yet current Business Process Management (BPM), modeling languages provide limited support for explicitly representing quality-related concepts such as risks, performance indicators, quality objectives, and resources within business process models. To address this limitation, we propose BPM4QM, a multidimensional meta-model based on a Core Ontology for Quality Management (COQM). The meta-model integrates several complementary dimensions of business process modeling, including functional, organizational, informational, behavioral, knowledge, and quality dimensions. Building on this foundation, we present BPMN for Quality Management Systems (BPMN4QMS): an extension to BPMN 2.0.2 which facilitates the clear representation and incorporation of quality management principles within BPMN process models. This allows organizations to model quality requirements, risks and performance indicators directly within their business process models. Finally, a modeling scenario is used to illustrate the applicability of BPM4QM and BPMN4QMS in the context of quality-oriented business process management. The results demonstrate how the proposed approach improves the representation and traceability of quality requirements within business process models.
Download

Paper Nr: 90
Title:

A Trace-Based Assurance Framework for Agentic AI Orchestration: Contracts, Testing, and Governance

Authors:

Ciprian Paduraru, Petru-Liviu Bouruc and Alin Stefanescu

Abstract: In Agentic AI, Large Language Models (LLMs) are increasingly used in the orchestration layer to coordinate multiple agents and to interact with external services, retrieval components, and shared memory. In this setting, failures are not limited to incorrect final outputs. They also arise from long-horizon interaction, stochastic decisions, and external side effects (such as API calls, database writes, and message sends). Common failures include non-termination, role drift, propagation of unsupported claims, and attacks via untrusted context or external channels. This paper presents an assurance framework for such Agentic AI systems. Executions are instrumented as Message-Action Traces (MAT) with explicit step and trace contracts. Contracts provide machine-checkable verdicts, localize the first violating step, and support deterministic replay. The framework includes stress testing, formulated as a budgeted counterexample search over bounded perturbations. It also supports structured fault injection at service, retrieval, and memory boundaries to assess containment under realistic operational faults and degraded conditions. Finally, governance is treated as a runtime component, enforcing per-agent capability limits and action mediation (allow, rewrite, block) at the language-to-action boundary. More broadly, the framework is intended as a common abstraction to support testing and evaluation of multi-agent LLM systems.
Download

Paper Nr: 95
Title:

An Evaluation of Apache Jena and Hive for Data Lake Metadata Enrichment Using Semantic Blueprints

Authors:

Panagiotis Papageorgiou, Artemis Photiou, Michalis Pingos and Andreas S. Andreou

Abstract: The rapid growth of heterogeneous data has driven organizations toward Data Lake architectures. However, insufficient metadata management often leads to poor data discoverability and governance, commonly referred to as the “data swamp” problem. Semantic enrichment addresses this issue by associating data sources with meaningful metadata and contextual relationships. Apache Jena is a widely adopted framework for semantic enrichment, offering expressive semantic modelling through RDF and SPARQL, but its scalability in large Data Lake environments remains a concern. This paper investigates whether Apache Hive can serve as a scalable alternative for metadata enrichment using an established semantic blueprint model. The blueprint-based metadata framework is implemented on both Apache Jena and Apache Hive, and the two systems are experimentally compared across multiple Data Lake sizes. The evaluation focuses on metadata insertion performance, query execution time under varying complexity, and metadata storage efficiency. Experimental results show that Hive provides significantly faster and more stable insertion and query performance at scale, as well as substantially reduced storage size due to ORC compression. However, Jena remains advantageous in scenarios requiring flexible schemas and rich semantic expressiveness. The findings highlight clear trade-offs between scalability and semantic flexibility, providing guidance for selecting appropriate metadata enrichment solutions in modern Data Lake architectures.
Download

Paper Nr: 98
Title:

Teacher–Student Subspace Unlearning with Structural Parameter Corruption

Authors:

Lam Thanh Vo and Thai Hoang Le

Abstract: The issue of machine unlearning is an essential challenge that arises with deep learning, especially in medical applications where unlearning is necessary after training on sensitive information and forgetting some learned information. The unlearning technique should be perfect and difficult to reverse while maintaining acceptable performance on other information. However, most unlearning techniques aim at reducing the predictive power instead of eliminating learned information, which may still come back after unlearning. In our contribution, we propose a multi-level teacher-student unlearning technique for panoramic dental X-ray image classification, where ResNet-18 is used as a student model for layer-level unlearning. Our technique affects the output, features, and parameters to achieve perfect unlearning. The experimental results demonstrate that our unlearned ResNet-18 model has Accuracy, Precision, Recall, and F1-Score of approximately zero for the forget set with an average confidence level of 0.33 and an average entropy value of 1.70. At the same time, our unlearned model still has a high accuracy of 0.9605 and a Macro-F1-Score of 0.9610 in the retain set.
Download

Paper Nr: 102
Title:

Savitzky-Golay Filter Optimization for NDVI Time Series: A Comprehensive Analysis of Romania's Vegetation Monitoring

Authors:

Andrei Văran

Abstract: In this research, we have evaluated how Savitzky–Golay (S-G) filtering improves Normalized Difference Vegetation Index time series across Romania’s diverse landscapes. Using Landsat images from 2025, we have analyzed three representative land-cover types: Carpathian forests, agricultural areas, and mixed vegetation zones. Overall, S-G smoothing substantially improves signal quality, with an average Signal-to-Noise Ratio (SNR) gain of 14.04 dB in forest areas, noise reduction exceeding 82%, and roughness reduced by 99.57%. Hypothesis testing confirms that the smoothed and original NDVI values differ significantly (paired t-test, p < 0.01; Cohen’s d = 0.59–0.84). Importantly, the filter reduces noise while largely preserving phenological patterns, with amplitude preservation between 58% and 93% depending on vegetation type. These results support the use of S-G filtering for vegetation monitoring in temperate continental climates.
Download

Paper Nr: 112
Title:

Detection of Dangerous Driving Events from Video Streams with Logical Explanations

Authors:

Kazuko Takahashi, Yurika Yamaguchi, Daiki Suzuki, Duong Dinh Tran, Aran Chindaudom, Takashi Tomita and Toshiaki Aoki

Abstract: We propose a framework that converts video streams into symbolic representations and performs formal reasoning to derive high-level events. We present the event detection of dangerous driving using this framework. We define several types of dangerous driving events using Qualitative Spatial Reasoning, and represent driving scenes captured by dashboard cameras --- specifically focusing on scenarios where the ego-vehicle is subjected to dangerous driving behaviors. These events are then detected through logical reasoning. We conducted experiments using both simulated and real-world data and provided formal explanations for the reasoning process. Furthermore, the event definitions were iteratively revised to encompass a broader range of cases.
Download

Paper Nr: 117
Title:

Trustworthy AI Agent Pipelines via Authenticated Data Structures

Authors:

Nasser Alzahrani, James Harland and Maria Spichkova

Abstract: AI agents often blindly trust tool responses. A compromised database, intercepted API call, or prompt injection via malicious tool can manipulate agent behavior undetected, and existing defenses like TLS, guardrails, and input validation do not provide end-to-end authenticity and integrity guarantees for tool content. We propose a principled solution: Authenticated Data Structures (ADS), the same cryptographic primitive that enables Bitcoin lightweight clients to verify transactions without downloading the blockchain. With ADS, agents store only a 32-byte digest yet can cryptographically verify ADS-backed tool responses. We show that ADS methods can be integrated into LLM agent-tool pipelines, enabling verification with 2.5x overhead relative to non-authenticated operations and sub-millisecond proof verification on authenticated binary search trees. The same mechanism naturally produces tamper-evident audit trails, enabling post-hoc verification of every action an agent performed.
Download

Paper Nr: 122
Title:

A Declarative Query Language for Analysis and Its Translator

Authors:

Deepika Prakash, Naveen Prakash and Abhinav Jajoo

Abstract: The ubiquitous implementation of the Multidimensional Model, MDM is as a relational database and SQL is used for querying/analysis. There is a mismatch between conceptualization as an MDM and using an implementation-level language. This is due to the absence of a query language for the MDM itself. We propose the Analysis Query Language, AQL for the Conceptual Analysis Model, ADAPT. The ADAPT schema is a pure conceptualization of analysis needs and captures analysis semantics in hierarchical structures. It is implemented as a relational schema. We show the features of AQL and its relationship with Object Query Language of ODMG and develop a translator for converting AQL queries into SQL queries. Thus, the implementation layer is hidden from users who only see the conceptual schema and its query language.
Download

Paper Nr: 123
Title:

Evaluating Defect-Prediction Models through Weighted EA-Z and Heterogeneous Feature Representations

Authors:

Camelia-Petrina Nadejde, Camelia Serban and Andreea Vescan

Abstract: Effort-aware software defect prediction (EA-SDP) aims to prioritize software entities for inspection by jointly considering defect likelihood and inspection cost. EA-Z, an effort-aware ranking calibration strategy, was recently proposed to improve ranking stability and defect coverage, and prior studies have confirmed its effectiveness under homogeneous code-metric representations. However, its robustness across heterogeneous feature spaces remains unclear. In this paper, we extend an independent replication of EA-Z by systematically evaluating its performance across datasets characterized by different feature representations, including static code metrics (PROMISE), change metrics (AEEEM), and just-in-time metrics (Kamei––JavaScript). We compare EA-Z with a cost-aware variant, Weighted EA-Z, using multiple machine learning learners and a consistent evaluation protocol. Our results show that EA-Z consistently achieves higher defect coverage, while Weighted EA-Z enables earlier defect detection at the cost of reduced recall. These trends remain stable across feature representations, although their magnitude depends on dataset characteristics. Overall, this study demonstrates the robustness of EA-Z-based strategies under heterogeneous feature spaces and highlights the context-dependent impact of cost-aware weighting in effort-aware defect prediction.
Download

Paper Nr: 124
Title:

Software Defined Networking Based Key Management for Quantum Key Distribution Networks: Architecture and Evaluation

Authors:

Charalampos Chatzinakis, Inès Arif, Jorge López and Johanna Sepulveda

Abstract: Recent advances in quantum computing threaten the long-term security of widely deployed cryptographic algorithms. Quantum Key Distribution~(QKD) uniquely provides information-theoretic security by exploiting fundamental principles of quantum mechanics (not relying on computational complexity). A QKD Network~(QKDN) extends quantum key exchange beyond point-to-point links by interconnecting multiple QKD systems into a networked infrastructure. QKDNs adopt a layered architecture in which the quantum, key management, control, and management planes cooperate to deliver end-to-end key distribution services. In this paper, we propose a QKDN architecture in which the centralized functionality of the key management plane is implemented as a software defined networking service within the controller; we discuss its design and the particularities required for its implementation. Leveraging this architecture, we evaluate and compare sequential and parallel key relay schemes, and analyze the runtime performance of different path computation algorithms and their corresponding implementations. Our experimental results show that both parallel and sequential schemes have acceptable performance; while the parallel scheme shows a clear advantage on the running time, the sequential scheme exhibits exceptional robustness due to its lack of telemetry requirements.
Download

Paper Nr: 127
Title:

A BPM-Driven Taxonomy for Process-Aware Multimodal Medical Image Analysis Workflows

Authors:

Omnia Saidani Neffati

Abstract: Multimodal medical image analysis (MMIA) integrates heterogeneous diagnostic information from imaging modalities and textual reports to support accurate clinical decision-making. While recent advances in deep learning have improved cross-modality feature fusion, existing MMIA systems remain predominantly model-centric and lack explicit process-level coordination. This results in fragmented orchestration, limited traceability, and inconsistent interoperability within clinical imaging workflows. To address these limitations, we propose in this paper a BPM-driven taxonomy that characterizes recent MMIA approaches according to four dimensions: Modelling & Representation, Orchestration & Execution, Monitoring & Adaptation, and Compliance & Auditability. A systematic analysis of 23 studies published between 2020 and 2025 positions each work along a three-level process-awareness maturity scale. Results show a strong emphasis on orchestration and compliance driven by interoperability standards such as DICOM and FHIR, while formal process modelling and adaptive monitoring remain underdeveloped. The proposed taxonomy provides a conceptual basis for designing process-aware, auditable, and interoperable AI-enabled imaging workflows. Future research should focus on executable process models, continuous monitoring, and human-in-the-loop governance to support trustworthy and scalable clinical imaging AI ecosystems.
Download

Paper Nr: 128
Title:

Client-Driven Greedy STOMP Connection Rebalancing Using Server-Reported Subscription Load for Adaptive Load Balancing

Authors:

Manish Agrawal, Sandeep Kumar, Subham Kumar, Ashok Shukla and Nandgopal Srinivasan

Abstract: We present a production-ready, client-driven STOMP connection rebalancing mechanism deployed on Oracle Cloud Infrastructure (OCI) across Dev and Preprod environments using Oracle Kubernetes Engine (OKE). Traditional load balancers rely on transport-level metrics such as connection count, which do not accurately reflect subscription-driven workload in persistent messaging systems. Our approach enables clients to use server-reported subscription load to make decentralized migration decisions. A deterministic 25% migration threshold ensures stability by preventing oscillation while enabling meaningful load redistribution. The algorithm operates entirely at the client layer and requires no modifications to OCI Load Balancer/NLB or server infrastructure. Deployment is configuration-driven and integrates seamlessly with standard OCI operational tooling, including Helm, OKE, and OCI Monitoring. Experimental results from OCI Dev and Preprod environments show consistent subscription-load variance reduction of 35–45% and successful redistribution of connections from overloaded to underutilized subscriber instances, with zero message loss during controlled migration. These results demonstrate that client-driven subscription-aware rebalancing significantly improves load distribution and operational stability while remaining fully compatible with existing OCI production workflows.
Download

Paper Nr: 145
Title:

Trace-to-Logic Assurance for Agentic AI: Mining Probabilistic Rules from Message-Action Traces

Authors:

Ciprian Paduraru, Bogdan Macovei and Alin Stefanescu

Abstract: Agentic AI systems increasingly execute tasks through stochastic, tool-using multi-agent orchestration. In such settings, assurance cannot rely solely on isolated model outputs; it must reason over multi-step executions, tool interactions, and governance decisions. This paper proposes a trace-to-logic assurance pipeline that augments contract-based governance with empirically induced behavioral rules. Given Message-Action Traces instrumented with step- and trace-level contract verdicts, the method lifts events into typed relational facts, encodes them as a temporal knowledge graph, and applies rule mining to induce probabilistic Horn clauses that capture recurrent operational dependencies. The induced rules support offline auditing and diagnosis, and provide a non-blocking runtime deviation signal for ranking contract-admissible actions. The approach preserves contract-first execution semantics while enriching governance decisions with empirically derived context signals. An evaluation plan and illustrative artifacts demonstrate how the extracted rules function as interpretable assurance objects for agentic AI governance.
Download

Paper Nr: 151
Title:

Evaluating ChatGPT-5 for Misuse Case Diagram Generation: An Empirical Evaluation

Authors:

Alia Alzarooni, Yasser Khan, Hassan Alsayegh, Mohamed El-Attar and Rima Grati

Abstract: Misuse case diagrams are a widely adopted technique in security requirements engineering, enabling analysts to model adversarial threats and derive countermeasures early in the software development lifecycle. However, manual construction of these diagrams is prone to incompleteness and subjectivity, requiring significant security expertise. Large language models (LLMs) such as ChatGPT present a promising opportunity to automate this process, yet their effectiveness for generating structured security modeling artifacts remains largely unexplored. This paper presents an exploratory study evaluating ChatGPT-5's ability to generate misuse case diagrams directly from textual security requirements, using 12 case studies of varying complexity spanning small, medium, and large requirement sets. The diagrams produced by ChatGPT-5 were evaluated against manually constructed ground-truth diagrams, and our results indicate that ChatGPT-5 performs well overall, demonstrating a strong capability to identify key actors, threats, and adversarial relationships from natural language input.
Download

Paper Nr: 152
Title:

Unsupervised Commit Message Classification for Software Evolution Using In-Context Learning of Large Language Models

Authors:

Dipshikha Das, Nazmus Sakib, Jitesh Sureka and Md. Nurul Ahad Tawhid

Abstract: Classification of software changes, specifically commits, into maintenance activities is crucial for improving decision-making in software evolution, ultimately reducing maintenance costs. Traditionally, researchers have focused on commit classification through keyword-based analysis of commit messages and contextual semantic analysis of commit messages through pre-trained language models. However, these methods often rely heavily on training data, which raises concerns regarding their ability to generalize effectively. In this study, we investigate the potential of using the in-context learning capabilities of large language models (LLMs) for commit classification. In-context learning does not require training data, making it less susceptible to overfitting and more capable of generalizing across different datasets. We focused on classifying software commits from the JavaVFC dataset without predefined categories. To ensure the robustness of our results, we apply majority voting across the outputs of the models, effectively generating unsupervised clusters. This approach eliminates the need for labeled data, making it less prone to overfitting and enhancing its generalization potential. Our experimental results demonstrate the feasibility of using LLMs for commit classification and unsupervised clustering, offering a promising direction for software maintenance tasks.
Download

Paper Nr: 160
Title:

Process-Aware Generation of Operational Technology (OT) Security Requirements Using BPMN Context and Large Language Models

Authors:

Padma Iyenghar and Elke Pulvermüller

Abstract: Integrating large language models (LLMs) into structured engineering workflows requires understanding how process context affects output quality, a question largely unaddressed in existing LLM-assisted requirements engineering research. This paper investigates whether contextual information derived from Business Process Model and Notation (BPMN) process models improves the IEC 62443 concept coverage of LLM-generated Programmable Logic Controllers (PLC) security requirements. A systematic experimental evaluation employs open weight and proprietary models across several prompt configurations of varying context and complexity and assesses the generated requirements against expert-curated per-task reference concept sets. Results show that IEC 62443 standards grounding yields the largest improvement, increasing concept coverage by 5 to 15 percentage points over the baseline. Combining BPMN context and standards prompting reduces the proprietary–open-weight coverage gap by roughly half without model fine-tuning. Four OT-specific security concepts are identified as systematic generation gaps in open-weight model outputs. These findings provide actionable guidance for deploying LLM-assisted requirements generation in IEC 62443-governed industrial engineering workflows.
Download

Paper Nr: 172
Title:

Evaluating the Practical Applicability of Defect Taxonomies in Industrial Bug Repositories

Authors:

Lianne V. Hufkens, Robin R. Bouwmeester, Fernando Pastor Ricos, Beatriz Marín and Tanja E. J. Vos

Abstract: Bug classification is widely used in Software Process Improvement (SPI) to analyse defect trends, prioritise quality assurance activities, and identify recurring problem areas. Over the years, many defect taxonomies have been proposed with different classification dimensions. However, there is limited evidence on whether these taxonomies can be consistently operationalised in large industrial bug repositories. The goal of this study is to analyse whether established defect taxonomies can be applied to classify industrial bug reports in practice and to identify the operational limitations that arise when applying them. To achieve this, we perform a multivocal review to identify prominent taxonomies and analyse a random sample of bugs from DigiOffice, an industrial productivity application with approximately 26,000 end users and a large bug repository containing more than 12,000 reports. The results reveal recurring operational issues in these taxonomies, including ambiguity, overlapping category boundaries, insufficient granularity for practical use, and classification instability over time. These observations suggest that widely-cited defect taxonomies cannot always be applied to industrial bug repositories without additional operational guidance or adaptation. Finally, we outline the requirements for symptom-focused and operationally stable classification approaches tailored to industrial contexts.
Download

Paper Nr: 174
Title:

Data Privacy Preserving Approach Using Non-Functional Regulations

Authors:

Zakaria Maamar, Amel Benna, Abderrahmane Maaradji and Mohamed Boughouas

Abstract: Nowadays, the European Union’s General Data Protection Regulation, Canada’s Personal Information Protection and Electronic Documents Act, and other regulations have demonstrated data privacy from a functional perspective. This has mainly consisted of setting up a compliance process that requires, for example, stating the purpose of using data and to secure the consent of this data’s owner. However, these regulations overlook data privacy compliance from a non-functional perspective, where other forms of restrictions could be imposed on data like maximum number of data owners’ daily consents and mandatory protocols for transferring data from one territory to another. This paper discusses data privacy from a non-functional perspective shedding light on the benefits of complementing the existing data privacy compliance process. To this end, the paper suggests alleviating data to the level of asset, running assetization techniques, and adopting the Open Digital Rights Language to enforce asset privacy. Experiments demonstrate the technical doability of running a privacy compliance process from functional and non-functional perspectives.
Download

Paper Nr: 175
Title:

X-SHIELD: Explainability-by-Design for Self-Healing Orchestration in Multi-Agent Systems

Authors:

Davis Joseph, Wiam Belouard, Sara El Kardi and Antoun Yaacoub

Abstract: Modern information systems increasingly rely on orchestrated collections of AI components and services, improving capability but also increasing operational complexity and reducing transparency. In such systems, selfhealing mechanisms detect degradation and trigger recovery actions, yet these interventions are often opaque to human stakeholders. Logs and telemetry expose low-level signals but rarely provide concise, decision-relevant explanations. We introduce X-SHIELD, an explainability-by-design approach for self-healing multi-agent orchestration. X-SHIELD models each recovery intervention as a structured explanation artifact linking trigger conditions, curated evidence, diagnosis hypotheses with uncertainty, recovery actions, expected effects, and audit metadata. A deterministic Auto-Explainer maps orchestration events to schema-conformant explanations using rule-based templates, and a viewer renders them as an auditable timeline for oversight. We implemented a lightweight prototype with reproducible scenarios and evaluation artifacts. We report artifact-level validation on a compact scenario suite and a small formative user study (N = 10) comparing raw logs with X-SHIELD explanations. Results provide preliminary evidence that structured explanations can improve perceived clarity, actionability, and comprehension relative to raw telemetry while preserving traceability to system events. Although instantiated in adaptive quizzes, the approach is intended as a reusable contract pattern for orchestrated systems that require auditable recovery explanations.
Download

Paper Nr: 176
Title:

Interpretable Major Depressive Disorder Classification from Resting-State fMRI via Causality-Inspired Graph Mamba

Authors:

Fadwa Messaoudi, Rebh Soltani and Hela Ltifi

Abstract: Major Depressive Disorder (MDD) is among the most prevalent psychiatric conditions globally. To advance MDD classification using resting-state functional connectivity data, we propose Causality-Inspired Graph Mamba (CI-GMamba). This novel framework integrates Graph Neural Networks (GNNs) with selective state-space models, guided by causal discovery principles. Our pipeline initiates with mutual information-based feature selection and Principal Component Analysis (PCA) for dimensionality reduction. We then apply the Peter-Clark (PC) algorithm to extract causally relevant components, which serve to regularize a bidirectional Graph Mamba encoder operating on a patient similarity graph. Evaluated across standard benchmarks and the real-world HiroshimaU dataset, where it achieves a superior accuracy of 76.08%, CI-GMamba surpasses existing state-of-the-art techniques. Additionally, coupling Shapley Additive Explanations (SHAP) with anatomical reprojection ensures the model yields biologically meaningful and interpretable insights. Ultimately, this work demonstrates that embedding causal regularization into state-space architectures significantly elevates both diagnostic performance and model transparency.
Download

Paper Nr: 22
Title:

Model-Based Specification of Robot Kinematics

Authors:

Jeshwitha Jesus Raja and Marian Daun

Abstract: The use of robots across various industries is rising, and the growing complexity of modern robotic systems calls for systematic, well-organized planning methods. Model-Based Systems Engineering addresses this need by supporting the formal specification, analysis, and design of complex systems through the use of models. A crucial starting point in this process is the specification of the robotic system, which determines the feasibility and potential applications of the robot. In this paper, we evaluate the applicability of a systems modeling language for specifying robot-specific properties, such as kinematics, by applying it to two real-world case examples and assessing its syntactic, semantic, and pragmatic quality.
Download

Paper Nr: 44
Title:

BAYTFACTORY: Engineering Smart-Home Variants with Software Product Lines

Authors:

Tewfik Ziadi and Zakaria Maamar

Abstract: Smart-home IoT systems, as a practical class of cyber-physical systems, are inherently heterogeneous, combining diverse devices, communication protocols, and software artifacts. This heterogeneity makes customization and evolution difficult, as deriving a new system variant often requires coordinated changes across multiple artifacts with limited traceability and reuse. Software Product Line Engineering (SPLE) provides principles for explicitly modeling variability and deriving products from valid configurations. In this paper, we present BAYTFACTORY, a framework for managing variability in smart-home IoT systems using SPLE principles. BAYTFACTORY supports the definition of application-specific feature models, the annotation of heterogeneous artifacts with variability markers, configuration validation through constraint solving, and the automatic derivation of consistent system variants. We instantiate and assess the approach on a Home Assistant–based smart-home application whose feature model comprises 21 features with cross-tree constraints. From a codebase of 32 files (843 LOC in the maximal configuration), BAYTFACTORY automatically derives 10 operational variants, all successfully deployed on the Home Assistant platform. These results illustrate the feasibility of the approach for supporting systematic variability management and variant derivation in practice.
Download

Paper Nr: 46
Title:

AI-Driven Analysis of User Feedback to Detect Security Issues in mHealth Applications

Authors:

Maroua Loukil, Mariem Haoues and Nedia Bouacida

Abstract: Mobile healthcare applications (mHealth apps for short) are being increasingly adopted to assist patients in monitoring their health. mHealth apps are crucial for real-time user tracking, predicting complications, and sharing information with healthcare professionals. The integration of artificial intelligence in mHealth apps for routine clinical practice and remote healthcare will not be feasible until we overcome the main challenges regarding data privacy and security. User feedback analysis increasingly contributes to improving the quality of mHealth apps. This study aims to identify security issues in mHealth apps by analyzing user reviews using deep learning models. For this purpose, 609 reviews have been initially categorized into Negative, Positive, and Neutral, and further annotated according to the quality attributes defined by the ISO/IEC 25010 quality model. The evaluation step mounted the overcoming of RoBERTa with 94% in terms of accuracy, recall, and F1-Score and 95% in terms of precision compared to the other models such as RNN and XLNet. Following this, we investigated security issues to assess potential risks and trust concerns in mHealth apps. Users highlighted several challenges that must be addressed to deliver high-quality and effective apps.
Download

Paper Nr: 48
Title:

STALLM: Benchmarking Prompts and LLMs in Software Maintenance

Authors:

Tewfik Ziadi, Seifeddine Bouallegue and Reda Bendraou

Abstract: Large Language Models (LLMs) are increasingly used in software maintenance and code-quality analysis, yet their outputs remain highly sensitive to prompt design, model choice, and evaluation setup, complicating systematic comparison. This is especially challenging for maintenance tasks that require localized, typed findings rather than non-structured responses. We present STALLM, a benchmark for evaluating LLM-based code-quality analysis under a controlled and reproducible protocol. STALLM combines (i) a multi-language benchmark built from real systems and static-analysis outputs, (ii) a modular prompt-execution pipeline, and (iii) an evaluation engine that normalizes LLM outputs and compares them against reference findings using span-level metrics. We illustrate the benchmark through two studies: prompt sensitivity (three prompt styles, fixed model) and model sensitivity (four LLM slots, fixed prompt). We also demonstrate extensibility using vulnerability findings. Results highlight clear differences in precision, recall, and coverage across prompts and models, underscoring the need for structured benchmarking in maintenance-oriented LLM usage.

Paper Nr: 50
Title:

Adaptation of an Agile Method for Product Backlog Construction Using Expected Results Included in the Requirements Engineering Process of a Maturity Model

Authors:

Jamilli Ynglid Carmo da Cunha and Sandro Ronaldo Bezerra Oliveira

Abstract: According to the Brazilian Association of Software Companies, the Brazilian software market has shown significant growth, with a 7.9% increase in 2022. In order to meet the growing demand for high-quality technological solutions, many companies have adopted agile methodologies. However, adopting these methodologies, particularly in requirements engineering, poses significant challenges as this is a critical area for ensuring the final quality of the software. This paper proposes adapting the agile Product Backlog Building (PBB) method based on the expected results of the Requirements Engineering process of the Brazilian Software Process Improvement (MPS.BR) maturity model. The objective is to analyze how PBB practices and ceremonies should be adapted to meet MPS.BR’s rigorous quality criteria for the Requirements Engineering process. This adaptation has generated a PBB usage script incorporating new practices and ceremonies, as well as demonstrating its practical application in software project development. This allows companies and agile method practitioners to achieve process maturity without compromising agility.
Download

Paper Nr: 60
Title:

An AI-Based Adaptive Workflow Model for Early Detection of Alzheimer’s Disease

Authors:

Nadine Mili, Sarra Abidi and Leila Ben Ayed

Abstract: Early detection of Alzheimer’s disease remains a thorny issue owing to heterogeneous clinical data and progressive symptom evolution. Classical diagnostic workflows are basically rigid and unable to cope with patient-specific patterns. The current paper introduces an AI-based adaptive workflow model designed to promote Alzheimer’s disease detection through dynamically adjusting diagnostic tasks according to real-time data. The approach integrates machine learning for feature extraction and classification, combined with a context-aware decision mechanism that reconfigures the workflow based on prediction feedback. Experiments conducted on public Alzheimer datasets reveal an enhanced diagnostic accuracy and an improved responsiveness compared to static workflows. The results highlight the potential of adaptive AI-driven models to reinforce intelligent and efficient healthcare diagnosis.
Download

Paper Nr: 101
Title:

Data Product-Driven Self-Adaptation in Serious Games

Authors:

Spyros Loizou, Michalis Pingos and Andreas S. Andreou

Abstract: Personalisation and self-adaptation are key requirements in Serious Games especially in domains where task difficulty and interaction complexity must be adapted to meet user characteristics. Current adaptation techniques often rely on runtime monitoring and adjustments parameters which can increase game complexity, limit reusability and introduce performance overhead. Furthermore, adaptation knowledge is hardcoded in operational logic making it difficult to reuse and manage across users and scenarios. This paper proposes a self-adaptive process for Serious Games based on data products, which are produced from a dedicated Data Mesh. Game parameters and user experience data are stored in a Data Lake and then transformed into a Data Mesh to be able to derive useful data products. Each data product represents a complete and executable gameplay configuration that can be directly applied to game scenes or scenarios, turning adaptation decisions from runtime parameter adjustment to configuration selection or generation. This approach is demonstrated using a real game application for children with learning disabilities, which includes phonological and cognitive matching tasks, creating multiple adaptation data products to control task complexity and satisfy specific learning goals. A short experimental evaluation is performed to assess data product creation effort, adaptation behaviour and expert acceptance of the selected configurations. The results indicate that utilizing adaptation data products enables structured complexity management, reduces lengthy runtime procedures and supports adaptation across heterogeneous game scenes.
Download

Paper Nr: 110
Title:

Scalable Best Practices of Agile Autonomous Teams in Industrial Large-Scale Software Projects: Results of a Multiple Case Study

Authors:

Martin Schliefellner, Raoul Vallon and Thomas Grechenig

Abstract: Autonomous teams are widely established in software development. While large-scale agile software development has been extensively studied, less is known about how everyday team-level practices of autonomous teams are enacted in large-scale industrial settings. This paper reports a qualitative multiple case study of three industrial large-scale software projects, each involving more than 150 people and at least 11 teams. The empirical basis comprises 15 semi-structured interviews, participant observation in one project, and an analysis of project artifacts. We identify everyday team-level practices across the three cases and compare them with the 13 practices of Hoda’s Balancing Acts model as an established analytical reference. Nine practices show substantial conceptual overlap and are therefore interpreted as scalable across project sizes. Four practices show no conceptual overlap in the large-scale context. To improve interpretability, the scalable practices are grouped into thematic categories. The results provide empirically grounded guidance on team-level practices that appear transferable to complex large-scale software projects.
Download

Paper Nr: 119
Title:

Comparative Analysis of LLM Performance on AWS Bedrock: A Case Study on Receipt-Item Categorisation

Authors:

Gabby Sanchez, Sneha Oommen, Cassandra T. Britto, Di Wang, Jung-De Chiou and Maria Spichkova

Abstract: This paper presents a systematic, cost-aware evaluation of large language models (LLMs) for receipt-item categorisation within a production-oriented classification framework. We compare four instruction-tuned models available through AWS Bedrock: Claude 3.7 Sonnet, Claude 4 Sonnet, Mixtral 8x7B Instruct, and Mistral 7B Instruct. The aim of the study was (1) to assess performance across accuracy, response stability, and token-level cost, and (2) to investigate what prompting methods, zero-shot or few-shot, are especially appropriate both in terms of accuracy and in terms of incurred costs. Results of our experiments demonstrated that Claude 3.7 Sonnet achieves the most favourable balance between classification accuracy and cost efficiency.
Download

Paper Nr: 120
Title:

An Empirical Comparison of Human and LLM-Assisted Bug Priority Assignment

Authors:

Andreea Galbin Nasui, Andreea Vescan and Cristina Marinescu

Abstract: Effective software testing relies on well-documented bug reports and accurate priority assessment, as they ensure critical issues are addressed promptly, and system quality is maintained. This study investigates the bug prioritization process and the role of Large Language Models (LLMs) in supporting decision-making. A mixed-method research design was adopted, involving two groups of student participants: one using LLM assistance and one working independently. Participants completed a structured take-home assignment in which they analyzed and prioritized 15 software bug reports. Although participants in the no-LLM group initially appeared more comfortable with the prioritization concept, and a higher number of correct answers was observed in the LLM-assisted group, the difference between the groups was not statistically significant. Furthermore, participants using LLM support reported increased confidence after completing the task, whereas confidence levels in the no-LLM group decreased. Sentiment analysis showed differences across questions rather than a consistent effect of LLM use. Overall, the findings suggest that LLM assistance may support performance and confidence in bug prioritization tasks, although differences between groups were limited.
Download

Paper Nr: 125
Title:

Self-Adaptive Parameter Control in Differential Evolution Based on Optimizing Undersampling Strategies for Software Defect Prediction

Authors:

Moch Lutfi and Ary Mazharuddin Shiddiqi

Abstract: Data imbalance is a fundamental challenge in software defect prediction (SDP), as the dominance of non-defect classes biases models and reduces the ability to detect defective modules. Although Learning-to-Rank Undersampling (LTRUS) has been shown to be more effective than random undersampling; however, its performance still depends on the determination of static ranking weights (A), which are prone to stagnation during optimization. This study proposes an adaptive framework that integrates LTRUS with four variants of differential evolution, namely JADE, SHADE, L SHADE, and SaWDE, to dynamically optimize the weight (A). Experiments were conducted using a 5-fold cross-validation scheme and three classifiers: K-Nearest Neighbors, Logistic Regression, and Naïve Bayes. Performance was evaluated using AUC, Matthews Correlation Coefficient, Precision, Recall, and F Measure, along with statistical analysis based on the Shapiro–Wilk normality test and paired t-test. The results showed that JADE performed best on KNN, with AUC 0.725, MCC 0.356, and F Measure 0.450, and on Logistic Regression, with AUC 0.750, MCC 0.397, and F Measure 0.474. In Naïve Bayes, differential evolution was most effective with an AUC of 0.784, MCC of 0.441, and F Measure of 0.515. The results indicate that performance improvements are statistically significant in several cases, particularly for KNN and Logistic Regression, although not all comparisons show significant differences.
Download

Paper Nr: 140
Title:

An Extensive Empirical Investigation of Ensemble Learning for Software Defect Prediction

Authors:

Megha Jayakumar, Suyash Shukla and Sandeep Kumar

Abstract: Software Defect Prediction (SDP) is crucial for enhancing the reliability of software by early identification of defect-prone modules. This is significant as software faults can incur substantial expenses and delays in software development. While individual machine learning (ML) models have shown promising outcomes, their effectiveness fluctuates across various datasets due to data imbalance, heterogeneity, and project-specific characteristics. Ensemble learning has emerged as an effective method to address this limitation. This study conducts a comprehensive empirical analysis evaluating individual ML classifiers and ensemble learning techniques for SDP. We first assess the effectiveness of seven common ML classifiers, including Decision Tree, K-Nearest Neighbors, Random Forest, Gradient Boosting, Extra Trees, Bagging, and AdaBoost. To enhance data quality and rectify class imbalance, the datasets are subjected to preprocessing utilizing Ant Colony Optimization (ACO)-based feature selection and the Random Over-Sampling Technique. The hyperparameters for each classifier are optimized by 5-fold Cross-Validation and Grid Search to provide unbiased and robust model tuning. Extensive experiments are conducted on a consolidated dataset of 47,618 samples. The experimental results consistently demonstrate that ensemble models outperform individual ML models. The SLSQP-based ensemble exhibits considerable proficiency in handling varied and imbalanced defect datasets among the evaluated ensemble algorithms. These findings provide significant insights for researchers pursuing reliable, scalable approaches to improve software quality assurance.
Download

Paper Nr: 142
Title:

A Faceted Classification of Authenticator-Centric Authentication Techniques

Authors:

Alex R. Mattukat, Vincent Schmandt, Timo Langstrof, Michael Zerbe and Horst Lichter

Abstract: Authentication is a fundamental security means for protecting system resources from unauthorized access. Authenticator-centric authentication techniques (AUTHN TECHNIQUES) address how mechanisms and credentials are used via AUTHENTICATORS. There are many AUTHN TECHNIQUES that differ in many ways and there exist classification approaches that aim to structure them. However, they are limited in the aspects they classify and are not flexible enough to accommodate the diverse nature of AUTHN TECHNIQUES. This paper presents two contributions. First, novel, faceted classification schemes for AUTHN TECHNIQUES and AUTHENTICATORS. The schemes were developed based on 345 papers identified through a targeted LLM-assisted literature review and semantic clustering. Second, a catalog of AUTHENTICATORS and AUTHN TECHNIQUES that was built by applying the classification schemes. In this paper, we present our methodology, the classification schemes we developed, including an example application, an overview of the catalog, and discussions on future work.
Download

Paper Nr: 150
Title:

Mapping LLM Misuse in Computing Education: A Survey-Based Risk Analysis of Faculty and Student Contexts

Authors:

Noura Alzaabi, Mohamed El-Attar, Sarah Kohail and Mahmood Niazi

Abstract: Large Language Models (LLMs) have become deeply embedded in computing higher education, yet the misuse risks they introduce for faculty and students remain insufficiently understood from a cybersecurity and data privacy perspective. This paper presents an empirical study in which a structured survey of 105 participants at a computing college was used to identify and systematically risk-score thirteen LLM misuse cases across faculty and student contexts. Using a Likelihood × Impact scoring model, the resulting taxonomy classifies misuse cases as Critical, High, or Medium severity, with over-reliance and skill atrophy, academic integrity violations, and research integrity risks emerging as the highest-priority concerns. Targeted mitigation strategies addressing AI literacy development, institutional policy reform, and scaffolded LLM engagement are proposed in response. The findings contribute an empirically grounded, role-differentiated risk framework applicable to computing education institutions navigating responsible LLM integration.
Download

Paper Nr: 163
Title:

From Image to Insight: Evaluating LLM Accuracy in Understanding UML Use Case Diagrams with Claude

Authors:

Mohamed El-Attar, Yasser Khan, Mahmood Niazi, Sajjad Mahmood and Mohammad Alshayeb

Abstract: UML use case diagrams are a prominent artefact of requirements engineering, capturing the functional scope of a software system in terms of actors, use cases, and their stereotyped relationships. The emergence of multimodal large language models with image understanding capabilities raises the question of whether such models can reliably extract structured construct-level information from use case diagram images. This paper reports an empirical evaluation of Claude on the task of counting 14 notational construct types from a corpus of 78 computer-generated UML use case diagrams, assessed against manually verified ground truth annotations. Results reveal a strongly differentiated accuracy profile: Claude achieved near-perfect exact match rates (>=95%) for visually unambiguous constructs such as Actors, System Boundary, Extend, and Extension Points, but performed poorly on association directionality. Diagram structural complexity was the strongest predictor of overall error. The findings indicate that LLM-based use case diagram analysis is construct-selective and complexity-sensitive and provide empirically grounded guidance for practitioners considering LLMs as tools for automated diagram annotation in software maintenance workflows.
Download

Paper Nr: 183
Title:

Agentic RAG for Cyber Threat Intelligence

Authors:

Emna Fakhfakh, Maha Charfeddine, Bechir Hamdaoui and Habib M. Kammoun

Abstract: As cyberattacks become increasingly sophisticated, Security Operations Centers (SOCs) face constant pressure to analyze massive volumes of threat data. Telegram, while widely used for secure communication and information exchange, can also be exploited to coordinate illegal activities. This paper proposes a lightweight Agentic Retrieval-Augmented Generation (RAG) framework that uses orchestrated Small Language Models (SLMs) to mitigate emerging cyber threats. The framework assigns tasks to specialized tools for ML-based classification and Indicator of Compromise (IoC) extraction using DeepSeek-R1-1.5B as a reasoning supervisor. The framework correlates adversaries, capabilities, and infrastructure into structured JSON reports guided by MITRE ATT&CK tactics, techniques, and procedures. According to experimental findings, this modular architecture offers a scalable solution for contemporary SOC environments by delivering high analytical fidelity and actionable intelligence with little computational overhead.
Download

Paper Nr: 185
Title:

Systematization of the Pressure for Change on Software Methods

Authors:

David Kuhlen and Andreas Speck

Abstract: A strong societal interest in software and the use of software in various areas of life increase the likelihood that software products will need to be changed. Useful software that is also easily modifiable tends to have a high economic value. However, the effects of changes on software units can be complex, which is why techniques have been developed to address this complexity. In the present work, the change pressure to which a software unit is exposed is analyzed and systematized in the form of a reference model. The reference model distinguishes four scenarios, each of which is assigned influences that affect the probability of change. In addition to the significance of a method’s input/output for the user, coupling also has a substantial impact on the probability of change. The assessment and analysis of the future requirements is supported by AI. In an end-to-end pipelines provide support for systematizing future changes in requirements and the pressure for changes.
Download

Paper Nr: 186
Title:

YOLO11-Based Drone Swarm Detection: An Advanced Deep Learning Based Approach for Anti-UAV Systems

Authors:

Hafedh Jouini, Hamza Gharsellaoui and Mohamed Khalgui

Abstract: Anti-drone detection is becoming a top security problem due to the growing use of drones, especially in swarm settings with small targets, complicated backdrops, and time restrictions. Despite the widespread use of YOLO variants, a thorough assessment of the most recent YOLO11 models for swarm detection is still lacking. In order to close this gap, this work conducts a thorough benchmarking analysis of YOLO11n/s/m on the NETBIT dataset, assessing swarm-specific performance, accuracy, and efficiency using a suggested Swarm Detection Rate (SDR). According to experimental data, YOLO11s outperforms YOLOv8s by 2.7% while using 15.5% fewer parameters, achieving the optimum balance with 76.5% mAP50 at 322 FPS and only 9.41M parameters. However, SDR decreases from 94.3% (low-density) to 61.5% (high-density) according to density analysis, indicating occlusion as a major problem. In addition to establishing reference baselines for the advancement of swarm detection research, this work offers practical insights for practitioners choosing YOLO variants for real-time deployment.
Download