Banner
Home      Log In      Contacts      FAQs      INSTICC Portal
 

Tutorials

The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.



Tutorial on
Integration Design: Challenges, Principles, Methods


Instructor

Avi Harel
Ergolight
Israel
 
Brief Bio
Avi Harel received his B.Sc. (1970) and M.Sc. (1972) degrees in mathematics from the Technion, the Israeli Institute of Technology, in Haifa, Israel. Between the years 1985 and 1989 Avi studied Behavioral and Management Sciences at the faculty of Industrial Engineering of the Technion. Between 1975-1992 Avi worked for Rafael, the Armament Development Authority of Israel, during which he gained experience in working with a wide range of applications, platforms, operating systems, programming languages and development environments. His work experience includes software engineering, system engineering and ergonomics in Rafael, Nortel, IBM, Attunity and Ergolight. Ergolight Since 1983, Avi developed a methodology for developing user interfaces, based on human factors. In 1997 Avi Harel founded ErgoLight Usability Software and initiated the design of the ErgoLight tools for testing the user activity when using Windows applications.
Abstract

Over 70% of accidents in industry, transportation and security systems stem from problems in the system's behavior, related to the operators' difficulties in dealing with unusual situations. Traditionally, these accidents are attributed to a higher power (black swan) or the human factors, instead of errors in the integration design, thereby torpedoing the learning. Proactive design of the system integration enables effective dealing with unusual situations. A design based on an interaction model makes it possible to improve the reliability of the integration. The integration design also allows for the prevention of usage errors that hamper productivity, and the operation of consumer products. In this presentation I will discuss common problems in the integration design that allow operator errors, and I will propose a generic model of the system's behavior based on an interaction model of a controller-server dyad, derived from the HSI paradigm. The interaction model applies principles of engineering system control (the server) based on the STAMP paradigm, where the control is based on generic interaction laws, which express the principles of cybernetics. The operating rules can be implemented by digitally matching the server's behavior, based on standard profiles of the interaction modes and transitions between modes.


Part I
  • Motivation: from path from human factors to integration engineering
  • The semantics of integration: from procedures to quality goals
  • Review of approaches to system failure
  • Fundamental concepts, and taxonomical challenges
  • Integration challenges: enforcing inter-unit collaboration by design
  • Routine, daily failure modes
  • Human errors
  • Barriers to learning from incidences and near misses: the accountability bias

Part II
  • A model of system integration, in terms of inter-process interaction dyads
  • A model of robust interactions
  • Scenario-oriented coordination design
  • A model of coordination control
  • Designing the situation control, and the derived scenario and event control design
  • Designing the procedure control
  • Control rules
  • The operator as a system component
  • Alarm design
  • Summary and Lessons Learned


Keywords

MBSE, Integration design, Digital twins, HSI.

Aims and Learning Objectives

The participants will learn how to prevent surprise, such as errors, by design

Target Audience

Experts in designing utility-critical systems

Prerequisite Knowledge of Audience

Awareness of the risks of operating in exceptional situations.

Detailed Outline

Socio-technical systems (STS):
An STS may consist of human elements (operators, users, and stakeholders), subsystems that provide services to the human elements, and processes used to provide the services. The subsystems may include automated devices, engineered systems, and sub socio-technical systems. Part of the system units, notably the human elements, may be regarded as OEM black boxes.

The performance envelope:
Typically, the value of an STS is the expected operational utility, defined as the optimal performance, constrained by the performance boundaries. In normal operation, the performance values should be in the performance envelope. Operation beyond the envelope is often costly, resulting in degradation of productivity and usability, and sometimes, in accidents (cf AF 296, 1988). System situations corresponding to
operating beyond the boundaries are called exceptions.

Social factors:
In traditional engineering, we assume that all the people involved in the system development and operation share the same goals in terms of functions, performance, and responsibility. Such assumptions are valid regarding a normal operation, but they break when facing incidences, due to organizational biases. It turns out that system developers and operators refrain from employing tools for reporting on exceptional situations, which might suggest that they do not do their work properly.

Operational complexity:
Exception handling is by orders of magnitude more complex than the design for normal operation. Common practices optimized for normal operation are not adequate for designing for exceptions. They are extremely costly, and they do not provide the desired level of protection from hazards.

Utility factors:
The Socio-Technical Systems (STS) approach extends the scope of interaction design beyond normal operation, by referring also to exceptional situations. Rather than defining the goals in terms of performance, we define them in terms of utility, considering also the effect of incidences. To promote learning from incidences, we need to define frameworks for enforcing tracking, recording, and reporting on incidences.

Behavioral integration:
The term integration refers to the way the sub-systems interact with each other. Traditionally, the focus is on the functional requirement. Studies of system failure indicate that the likelihood of incidences depends on the likelihood of exceptional situations. The integration task is more than the traditional assembly, verification, and validation: it is about designing the coordination between the system elements and validating the system behavior proactively, at design time.
In behavioral integration, we are concerned about the interaction preceding the incidences. Behavioral integration is key to usability, which is key to safety, productivity, and operational independence.

From black swans to engineering:
The famous Murphy’s Law is a conclusion from observations of system failure: if the operation might fail, eventually, it will. A conclusion based on the Black Swan Theory is that operators can prevent some of the failures, but not all of them.
Failure is due to operating in exceptional situations. A proactive version of Murph’s Law is about responsibility: failure should be prevented by design. The design should disable potential ways of approaching the boundaries and should rebound from unexpected reaching the boundaries.

Proactive integration:
Traditional integration is reactive, namely, it involves verification and validation of the implementation compliance with functional requirements and expectations. The result of this approach is lengthy, redundant operation time typical of exploratory, iterative design. A key design challenge is to specify the behavioral integration proactively. This goal may be achieved when employing models of behavioral integration.

Methodology:
According to the first principle of cybernetics, realized in the System Theoretic Analysis Method and Process (STAMP) paradigm, the system should control its own behavior, according to rules that define the performance envelope. Rule-based models facilitate the definition and validation of the interactions.

Modeling:
A model of proactive behavior integration consists of two layers: tasks, and interactions. The task layer consists of rules defining normal activity. The interaction layer comprises means to avoid and handle diversions from the rules. The interaction layer is hierarchical, breaking down into mini-models of elementary controller-service interaction (CSI), which are abstractions of models of human-machine interaction (HMI).
A generic model of the controller focuses on decision-making. The challenges are to facilitate the controller's awareness of the service situation, prediction of future situations, exploration of the expected effect of optional control, and troubleshooting.
A generic model of the service focuses on function provision, but also on providing the controller with the information required for decision making. The service models are structural, by a hierarchy of system entities, enabling systematic customizing for specific projects, as proposed in the INCOSE HSI2021 IS. An affordable implementation may be based on the customization of generic behavioral twins.

Learning from rare events:
Proactive validation must be based on knowledge about the risks of exceptions. Unfortunately, exceptions are rare events, and the risks are unknown at design time. A way to cope with the barrier of rare events is by cross-industry sharing of protection methods. The challenge is to develop a general, universal model of exception handling, which may be used to customize the behavior of the system units in exceptions. For example, we can learn from problems in operating home devices, such as mode errors due to unintentional device settings. We can apply this knowledge to protect safety-critical devices from mode errors due to unintentional changes in the device settings.

The human element:
A commonly accepted model of the human element, proposed by Card, Moran & Newell (1983) is in terms of Goals, Objects, Methods, and Selection rules (GOMS). Kahneman proposed that the mental processing involved in decision-making consists of parallel processing of two mental systems, which he called System-1 and System-2. System-1 is in charge of instant, reflexive reacting, and System-2 is in charge of thoughtful, rational thinking. The theory of System-2 applies to designing normal processes, and the theory of System-1 applies to testing the behavior in exceptional situations.

The HF version of Murphy’s Law:
The model describing System-1 assumes that humans are not perfect in doing what they intend to do. They often slip (cf Torrey Canyon, 1967). Sometimes they act unintentionally, or inadvertently (cf Zeelim A, 1990). The System-2 model assumes that the human element is rational, and entirely dedicated to operating by the book. This model is suited to describe the normal operation of low-risk systems. These assumptions do not apply to operating under stress, such as in multi-tasking (cf AF 447, 2009; WWII B-17). The human-factors version of Murphy’s Law is that the human element is error-prone: if the operators might fail, eventually they will.

Responsibility biasing:
Traditionally, people regard the operators as responsible for preventing failure, and when they fail, the failure was attributed to the operator’s errors. Donald Normal protested against this approach:
Over 90% of industrial accidents are blamed on human error. You know, if it was 5%, we might believe it. But when it is virtually always, shouldn't we realize that it is something else?
https://jnd.org/stop_blaming_people_blame_inept_design/

The paradox of human errors:
A primary reason for the popularity of the errors concept is vendor biasing. According to Erik Hollnagel, errors are instances of normal operation with costly results. This definition implies that by definition, the operators cannot prevent the errors, because if they prevented an error, then the error did not exist. Yet, the operators are typically regarded as accountable. The conclusion is that the term error is a bias, intended to divert the focus from the developers’ mistakes to the operators.

The HSI challenge:
According to many case studies, errors involve failure to notice exceptional situations, such as those due to a change in the operational mode (cf several TO/GA accidents).
A proactive version of Murphy’s Law is about responsibility: instances of errors should be attributed to design mistakes: failure of situation awareness should be attributed to mistakes in the design of the human-machine coordination, not to the human operators. It is the developer’s responsibility to prevent operator errors.

Usability vs. Safety:
Human Factors Engineering is about considering human factors in the system development (design, verification, validation). The focus is on usability. The goal is a seamless operation. Sometimes, however, the seamless operation involves safety issues. Examples are usability problems due to unintentional activation of computer shortcut keys, or unnoticed assignment of default values on reset.

From human factors to HSI factors:
HSI factors are complementary to human factors. HFE sets the usability goals, and HSI engineering is about ways to achieve these goals. Because the barriers to usability involve problems in the coordination between the human and the technical elements, the ways to implement usability is by HSI engineering.

The risks of implicit rules:
Many accidents are attributed to unexpected operator behavior. This is often the case when the design relies on implicit rules, which the developers believed that the operators should follow.

Rule-based design:
According to Leveson’s STAMP, the system control should be constrained by rules defining proper operation, namely, enforcing operation in the performance envelope. This is possible only if the rules are defined explicitly, and the system can verify at run-time compliance with the rules. Rule-based design implies that the rules are defined explicitly, and implemented as safeguards in the system design.

Scenario-based design:
The coordination between the controller and the service may be expressed in terms of rules about the normal operation, specifying the matching between the service modes and the controller scenarios. Scenario-based design enables direct mapping from the controller tasks to the service processing.

Scenario-based modeling:
Analysis of many accidents indicates a problem of inconsistent assumptions about the scenario, which was not defined explicitly, and therefore was not implemented in the interaction protocols. In many case studies, the operator assumed a scenario that did not match the operational mode. For example, many annoying problems in using home devices are due to enabling changing the device settings while in normal operation. Also, many accidents, such as Aero Peru 603, are due to applying maintenance-only procedures while in normal operation.

Specifications:
The first principle of cybernetics is difficult to follow when the rules are implicit or vague, as is typical of under-disciplined STS development. Therefore, a basic requirement of interaction definition is that the scenarios are defined formally, and implemented in the controller sub-system. In a model-based implementation, the rules defining normal operation enable automated detection of exceptional situations and activity.

Scenario-mode pairing:
The records of many accidents do not include data about the ways the operators or the system may trace the scenario, or how the system matched the scenarios with the operational modes. A common practice to work around this problem is to imply the scenario from the mode. It turns out that this practice is the root cause of many accidents, in which the implied scenario did not match the real scenario, which was projected from the contextual tasks.

From HSI to system integration:
In an article submitter to the INCOSE HSI 2021 conference, I proposed a universal model of a digital twin, which may be used to control the system behavior. The proposed digital twins may operate according to the cybernetics principle of self-control, and Leveson’s STAMP principle of rule-based design. In the tutorial proposal submitter to the INCOSE IS2022, I proposed to extend the model, to apply it to the integration of any STS.

The Controller-Service Integration (CSI) Model:
The complexity of system integration may be resolved by examination of the interactions between system units. It seems that any integration may be described as a collection of interactions between two units, in which one of them is functional, providing services, and the second is a user of these functions, and also controls the use of these functions. This model complies with the STAMP paradigm proposed by Leveson. When employing this model, the controller is an abstraction of the human operators, and the service is an abstraction of a system. In other words, HSI is a special case of CSI.

Essentials of controller-service coordination:
To cope with the complexity of exception handling, the services should cooperate with the controller:
• The services should provide the controllers with data essential for proper decision-making: a preview of the upcoming situation, and the potential effects of applying various options.
• The services should warn the controllers about critical activity, such as changing from normal to an exceptional situation, and about exception escalation
• The services should provide the controller with information that may facilitate the troubleshooting
• The services should rebound from erroneous control selection, and inform the controller about such instances.

Behavioral twins:
A behavioral twin is a digital twin of the service behavior, used to detect exceptional situations, and to provide the data required for the service control.
A model of a behavioral twin may include six layers of design entities:
1. The basic layer is that of the system units as above
2. Each unit may have performance variables, used also as risk indicators
3. The situation of each unit, represented by state machines, describes attributes of functionality, availability, operation, hazards, etc. Situations are classified as normal or exceptional.
4. The controller-service activity is defined in terms of situation changes, and the risks associated with these changes. A change from a normal to an exceptional situation is classified as a hazard. Other risks indicators are about process variables, such as the time of processing or inter-machine state transition
5. The controller-service behavior is defined in terms of the service response to exceptional activity, such as automated shutdown, transition to safe-mode operation, or alerting.
6. Secondary risks, due to failure to detect or recover from a hazard

Implementation:
Utility-critical systems should incorporate means, including sensors and data analytics, which may be used to guide the developers about the design of the behavioral integration and to facilitate cross-domain learning from incidences.

Cost-effectiveness:
Twin development may be affordable if it is based on pre-defined profiles of operational rules, such as setting, maintenance, and safety backup.

Secretariat Contacts
e-mail: enase.secretariat@insticc.org

footer