5 Fault Management over Itf-N

32.111-13GPPFault ManagementPart 1: 3G fault management requirementsRelease 17Telecommunication managementTS

5.1 Fault Management concept

An operations system on the network management layer (i.e. the NM) provides fault management services and functions required by the operator on top of the element management layer.

The Itf-N may connect the Network Management (NM) system either to Element Mangers (EMs) or directly to the Network Elements (NEs). This is done by means of Integration Reference Points (IRPs). In the following, the term "subordinate entities" defines either EMs or NEs, which are in charge of supporting the Itf-N.

This clause describes the properties of an interface enabling a NM to supervise a 3GPP system including – if necessary – the managing EMs. To provide to the NM the Fault Management capability for the network implies that the subordinate entities have to provide information about:

– events and failures occurring in the subordinate entities;

– events and failures of the connections towards the subordinate entities and also of the connections within the 3GPP system ;

– the network configuration (due to the fact that alarms and related state change information are always originated by network resources, see [19]). This is, however, not part of the FM functionality.

Therefore, for the purpose of FM the subordinate entities send notifications to a NM indicating:

– alarm reports (indicating the occurrence or the clearing of failures within the subordinate entities), so that the related alarm information can be updated;

– state change event reports, so that the related (operational) state information can be updated. This is, however, not part of the FM functionality.

The forwarding of these notifications is controlled by the NM operator using adequate filtering mechanisms within the subordinate entities.

The Itf-N provides also means to allow the NM operator the storage ("logging") and the later evaluation of desired information within the subordinate entities.

The retrieval capability of alarm-related information concerns two aspects:

– retrieval of "dynamic" information (e.g. alarms, states), which describes the momentary alarm condition in the subordinate entities and allows the NM operator a synchronization of its alarm overview data;

– retrieval of "history" information from the logs (e.g. active/clear alarms and state changes occurred in the past), which allows the evaluation of events that may have been lost, e.g. after an Itf-N interface failure or a system recovery.

As a consequence of the requirements described above, both the NM and the subordinate entity shall be able to initiate the communication.

5.2 Management of alarm event reports

5.2.1 Mapping of alarm and related state change event reports

The alarm and state change reports received by the NM relate to functional objects in accordance with the information model of Itf-N. This information model tailored for a multi-vendor capability is different from the information model of the EM-NE interface (if an EM is available) or from the internal resource modelling within the NE (in case of direct NM-NE interface). Thus a mapping of alarm and related state change event reports is performed by a mediation function within the subordinate entity.

The mediation function translates the original alarm/state change event reports (which may contain proprietary parameters or parameter values) taking into account the information model of the Itf-N.

The following examples describe potential mediation function behaviour:

– Alarm notifications generated by a functional object in a subordinate entity can be mapped to alarm reports of the corresponding ("equivalent") functional object at the Itf-N. If the functional object generating the original alarm notification has not a direct corresponding object at the Itf-N, the mediation functions maps the alarm to the next superior functional object in accordance with the containment tree of the Itf-N.

– State change notifications generated by a functional object in a subordinate entity can be mapped to state change reports of the corresponding ("equivalent") functional object at the Itf-N. If the functional object generating the original state change notification has not a direct corresponding object at the Itf-N, the mediation functions maps the alarm to the next superior functional object in accordance with the containment tree of the Itf‑N.

Every alarm notification generated by a manufacturer-specific, equipment-related object in the subordinate entity is mapped to an alarm report of a generic logical object, which models the corresponding equipment-related resource.

5.2.2 Real-time forwarding of event reports

If the Itf-N is in normal operation (the NM connection to the subordinate entities is up), alarm reports are forwarded in real-time to the NM via appropriate filtering located in the subordinate entity. These filters may be controlled either locally or remotely by the managing NM (via Itf-N) and ensure that only the event reports which fulfil pre-defined criteria can reach the superior NM. In a multi-NM environment each NM shall have an own filter within every subordinate entity which may generate notifications.

5.2.3 Alarm clearing

On the Itf-N, alarm reports containing the value "cleared" of the parameter perceivedSeverity are used to clear the alarms. The correlation between the clear alarm and the related active alarms is performed by means of unambiguous identifiers.

This clearing mechanism ensures the correct clearing of alarms, independently of the (manufacturer-specific) implementation of the mapping of alarms/state change events in accordance with the information model of the Itf-N.

The IRP manager may also clear alarms manually.

5.3 Retrieval of alarm information

5.3.0 Introduction

The retrieval of alarm information comprises two aspects:

– Retrieval of current information:

This mechanism shall ensure data consistency about the current alarm information between the NM and its subordinate entities and is achieved by means of a so-called synchronization ("alignment") procedure, triggered by the NM. The synchronization is required after every start-up of the Itf-N, nevertheless the NM may trigger it at any time.

– Logging and retrieval of history information:

This mechanism offers to the NM the capability to get the alarm information stored within the subordinate entities for later evaluation.

5.3.1 Retrieval of current alarm information on NM request

The present document defines a flexible, generic synchronization procedure, which fulfils the following requirements:

– The alarm information provided by means of the synchronization procedure shall be the same (at least for the mandatory parameters) as the information already available in the alarm list. The procedure shall be able to assign the received synchronization-alarm information to the correspondent requests, if several synchronization procedures triggered by one NM run at the same time.

– The procedure shall allow the NM to trigger the start at any time and to recognize unambiguously the end and the successful completion of the synchronization.

– The procedure shall allow the NM to discern easily between an "on-line" (spontaneous) alarm report and an alarm report received as consequence of a previously triggered synchronization procedure.

– The procedure shall allow the NM to specify filter criteria in the alignment request (e.g. for a full network or only a part of it.

– The procedure shall support connections to several NM and route the alignment-related information only to the requesting NM.

– During the synchronization procedure new ("real-time") alarms may be sent at any time to the managing NM.

– If the EM loses confidence to its alarm list and rebuilds it, then the EM shall indicate to the NM that the alarm list have been rebuilt. If the rebuild of the alarm list only concerns alarms for e.g. one NE then the EM may indicate that it is only that part of the alarm list that has been rebuilt. In the latter case the NM may use the knowledge that only a specific subset of the alarm list has been rebuilt to perform a partial resynchronization using filters.

If applicable, an alarm synchronization procedure may be aborted by the requesting NM.

5.3.2 Logging and retrieval of alarm history information on NM request

The alarm history information may be stored in the subordinate entities. The NM is able to create logs for alarm reports and to define the criteria for storage of alarm information according to ITU‑T Recommendation X.735 [11].

Nevertheless these particular requirements are not specific for alarm or state change information.

The alarm history information should be returned by files when IRPAgent finished collecting all the alarm history information that NM requested.

5.4 Co-operative alarm acknowledgement on the Itf-N

The acknowledgement of an alarm is a maintenance function that aids the operators in his day-to-day management activity of his network. An alarm is acknowledged by the operator to indicate he has started the activity to resolve this specific problem. In general a human operator performs the acknowledgement, however a management system (NM or EM) may automatically acknowledge an alarm as well.

The alarm acknowledgement function requires that:

a) All involved OSs have the same information about the alarms to be managed (including the current responsibility for alarm handling).

b) All involved OSs have the capability to send and to receive acknowledgement messages associated to previous alarm reports.

A co-operative alarm acknowledgement means that the acknowledgement performed at EM layer is notified at NM layer and vice versa, thus the acknowledgement-related status of this alarm is the same across the whole management hierarchy. The OSs often gives the operator(s) a possibility to add a comment to an alarm. An OS can have the capability to record more than one comment for each alarm. To make the same alarm look the same in all OSs subscribing to the alarm, it should be possible to distribute the recorded comments in the same way as for the acknowledgement information.

The co-operative alarm acknowledgement on Itf-N shall fulfil the following requirements:

– Acknowledgement messages may be sent in both directions between EMs and NM, containing the following information:

– Correlation information to the alarm just acknowledged.

– Acknowledgement history data, including the current alarm state (active | cleared), the time of alarm acknowledgement and, as configurable information, the management system (EM | NM) and the operator in charge of acknowledgement (the parameter operator name or, in case of auto-acknowledgement, a generic system name).

– Acknowledgement notifications sent to NM shall be filtered with the same criteria applied to the alarms.

– Taking into account the acknowledgement functionality, the above described synchronization procedure for retrieval of current alarm information on NM request may be extended. Additionally to the requirements defined in clause 5.3.1, this extended synchronization procedure relates not only to the active, but also to the "cleared and not acknowledged" alarms, which have still to be managed by the EM.

5.5 Overview of IRPs related to Fault Management (FM)

The Itf-N is built up by a number of IRPs. The basic structure of the IRPs is defined in 3GPP TS 32.101 [2] and 3GPP TS 32.102 [3].

For the purpose of FM, the following IRPs are needed:

– Alarm IRP, see 3GPP TS 32.111-2 [13];

– Notification IRP, see [21]; and

– Notification Log (NL) IRP, see [22].

NOTE: The Notification Log (NL) IRP is not part of Release 1999, therefore the requirements related to the log functionality are not valid for Release 1999).

Annex A (informative):
General principles of alarm generation

This annex, as additional guidelines to subclause 4.1.2, lists and explains some general principles of alarm generation.

The definition of ‘alarm’ can be found in subclause 3.1.

– Alarm should convey the identified management entity information to operator

For the faults of cell, carrier, channel, port, etc., if these faults need operator action, alarms need to be generated. Alarm location information should be accurate enough to identify the units which can be repaired or replaced by the maintenance staff.

– No alarms for those faults that occurred once and then disappeared

Faults that have occurred only once and then disappeared should not be reported as alarms, because these faults need no operator action. For example, for single call establishment failure, single handover failure or single call drop alarms are not needed since these faults usually only occur once and will not last permanently. Instead these events should be captured by performance measuement counters.

– No alarms for those faults that were self-healed

For the faults that cannot be perceived by operator, for example some internal software faults like stack overflow, loss of messages, insufficient memory, etc., no alarms are needed if these faults were fixed by network entity’s self-healing actions such as software restart, since these faults need no operator action. However, as the service usually is negatively impacted by these faults before they are self-healed, these faults should be recorded into the related logs.

Annex B (informative):
Change history

Change history

Date

TSG #

TSG Doc.

CR

Rev

Subject/Comment

Old

New

Mar 2000

SA_07

SP-000013

32.111 Approved at TSG SA#7 and placed under Change Control

2.0.0

3.0.0

Mar 2000

cosmetic

3.0.0

3.0.1

Jun 2000

SA_08

SP-000247

001

Split of TS 32.111 – Part 1: Main part of spec – Requirements

3.0.1

3.1.0

Jun 2000

SA_08

SP-000248

002

Split of TS 32.111- Part 1: Merged Clause X into Clause 4

3.0.1

3.1.0

Jun 2000

SA_08

SP-000249

003

Split of TS 32.111 – Part 1: Alignment of FM requirements with IRP, etc

3.0.1

3.1.0

Sep 2000

SA_09

SP-000437

001

Clarification On Mediation Function Algorithms

3.1.0

3.2.0

Sep 2000

SA_09

SP-000437

002

Clarification On Clear Alarm Suppression

3.1.0

3.2.0

Jun 2001

SA_12

SP-010282

003

Added two new features ‘partial resynchronization’ or ‘Itf-N distribution of comments associated to faults’.

3.2.0

4.0.0

Mar 2002

SA_15

Automatic upgrade to Rel-5 (no Rel-5 CR)

4.0.0

5.0.0

Sep 2002

SA_17

SP-020477

004

Add requirements for new clearAlarms() operation in Alarm IRP

5.0.0

5.1.0

Dec 2002

Updated references & cosmetics

5.1.0

5.1.1

Dec 2003

SA_22

SP-030631

005

Add retrieval of alarm history information requirement

5.1.1

6.0.0

Jun 2005

Foreword, Introduction update : added 32.111-5 new TS-family member

6.0.0

6.0.1

Jun 2007

SA_36

Automatic upgrade to Rel-7 (no CR) at freeze of Rel-7. Deleted reference to CMIP SS, discontinued from R7 onwards. Cleaned-up references.

6.0.1

7.0.0

Mar 2009

SA_43

SP-090207

006

Include reference to SOAP Solution Set specification

7.0.0

8.0.0

Dec 2009

SA_46

Upgrade to Rel-9

8.0.0

9.0.0

Mar 2011

Update to Rel-10 version (MCC)

9.0.0

10.0.0

Sep 2011

SA_53

SP-110534

007

Add concepts for Alarm Correlation and Root Cause Analysis

10.0.0

10.1.0

2012-09

Update to Rel-11 version (MCC)

10.1.0

11.0.0

2013-06

SA_60

SP-130272

008

1

Addition of criteria for critical and major alarms (compliance Top OPE)

11.0.0

12.0.0

010

1

Addition of requirements on repair actions (compliance Top OPE)

2014-12

SA_66

SP-140801

011

1

Alarm quality improvements, new definitions and concepts for alarm handling

12.0.0

12.1.0

2015-03

SA_67

SP-150060

016

1

Replacement of obsolete term “N interface”

12.1.0

12.2.0

2016-01

Update to Rel-13(MCC)

12.2.0

13.0.0

Change history

Date

Meeting

TDoc

CR

Rev

Cat

Subject/Comment

New version

2017-03

SA#75

Promotion to Release 14 without technical change

14.0.0

2018-06

Update to Rel-15 version (MCC)

15.0.0

2020-07

Update to Rel-16 version (MCC)

16.0.0

2022-03

Update to Rel-17 version (MCC)

17.0.0