Taskforce 2000: Embedded Systems - "The solution is about identifying and managing your risks. This is why the aim of any programme is not to achieve Year 2000 compliance but for your business to be Year 2000 ready."

greenspun.com : LUSENET : TimeBomb 2000 (Y2000) : One Thread

http://www.taskforce2000.co.uk/resource/embeddedsystems.htm

initially received far less attention yet it could cause the loss of power or cooling systems to computer rooms or dealing areas, lead to the loss of access control to the building or controlled areas. It can also lead to faults in fire alarm and detection systems, building management systems, lift control systems, leak detection systems, etc.

Due to the relative low cost of electronic control boards embedded systems are found within almost all items of equipment including building services, fire and other life safety systems, power generation, security and access control systems and other specialised systems.

It is true that only a small percentage of embedded systems will cause significant problems. However, the difficulty is that we do not know where they are therefore the compliance programme must be sufficiently robust to identify all affected equipment and systems supporting all critical business activities. The compliance programme also needs to have a clear document trail to satisfy any auditor or external regulator.

The majority of embedded code compliance programmes in large organisations form part of the main IT programme. As such, they are often expected to follow the same broad approach and meet the same project milestones. However, the embedded code problem is fundamentally different to the general IT problem and therefore needs its own individual approach and agreed milestones.

It is important to recognise that this is not just a date-related problem when the year changes from 1999 to 2000. The problem extends to cover all past computer and control code writing problem. Problems may therefore be experienced before and after the 1 January 2000. Affected dates have been identified from August 1999 to well into the next century although it is expected that almost all problems will occur within the first two years of the new century.

A difficulty in the compliance process is that there are no accepted International standards or agreements on what constitutes compliance although in the UK there is the British Standard DISC PD2000-1 Four-Rule definition of compliance and this definition has received wide acceptance among the G11 countries.

The Loss Prevention Council has published a standard for fire alarm systems, LPS2000 XXXX.

The main message, which must be understood, is that there is no solution in the normally accepted sense; compliance is about managing risk. However, the risk assessment must be made without any previous experience because this is a project that none of us have ever undertaken in the past.

Within the equipment under consideration there is the same two digit date and counter problem found in IT systems. The date problem is well known and is about how a control system will react to seeing the year change from 99 to 00 on the 1st January 2000. Century windowing, used in IT systems, is also used in embedded systems and an investigation is therefore required to establish the referencing date to ensure it is within acceptable limits. The counter problem is less well known and is to do with how programmers use a counter to represent the date or log control system operations. If, for example, a programmer uses the control board real time clock to record, say 10,000 events when the counter reaches 10,000 the control system either may stop or resets to 0. Either event may cause a control logic problem that may, in turn, cause the equipment to malfunction in some way. One counter problem found in many systems is the end of calendar figure. This is found by stepping through the days, months and years until the date resets back to its base figure. In many cases this date is many years in the future but end of calendar figures have also been found well within the normal life of the equipment.

A common approach to addressing the embedded chip problem is to contact the equipment manufacturers to obtain confirmation that their equipment is compliant. This information, if it can be obtained, is used to obtain insurance, defend against legal actions and take legal action if the confirmation proves incorrect.

This approach does not address any technical problems and therefore does not reduce risk it simple places you in a better position to take legal action against your equipment manufacturers assuming that they were the suppliers of the equipment. In many cases they will not be and therefore you do not have a direct contract with them. Moreover, the responses from suppliers are often standard wording from their legal departments, which does not answer the specific questions you may have asked.

You must remember that it is your business, which will suffer if the manufacturer is wrong, therefore, your business and your shareholders may want to see a more robust system in place than simply relaying on manufacturers bland assurances.

You may ask how the equipment supplier can get it wrong when he should know his equipment very well. Unfortunately many manufacturers do not fully understand the problem and basis his compliance statement on the fact that his control system does not use the date. Moreover, many equipment suppliers do not manufacture their own control boards and some of these third party boards are multi-functional. That is, they are designed to meet the requirements of a number of equipment models or even manufacturers and therefore each manufacturer may not necessarily know what other functions are on the control boards he uses. Because the control boards are manufactured by third parties the manufacturer may not have access to the control code and the problem may be compounded if for reasons of reducing cost the control boards used in the same item of equipment are purchased from a number of suppliers.

If that were not enough, the board in the item of equipment under investigation may not be the same as was originally supplied because it was replaced at some time due perhaps to component failure.

With so many problems in obtaining accurate and reliable compliance information one may perhaps consider testing. However, it should be remembered that many control boards cannot be tested or require highly specialised equipment and knowledge to do so. Moreover, in order to produce a comprehensive testing specification one needs to know how the control board is supposed to operate which means one needs the control code.

Testing specifications must also take into account control the code used in systems for event or fault logging. This type of code may pass a simple date test because the affected code is only used when an event or fault occurs which may be later that week, month or year. Another testing problem encountered is where there may be two methods of testing, one through a dataport and the other using a control panel keypad. Testing has shown that the dataport test may pass whereas the keypad test may fail.

Finally when considering testing systems by advancing the date one must also consider whether you will be able to recover in the event of a failure since some systems cannot be set back to the present date once they have been advanced passed 1st January, 2000.

You may ask with all of these problems whether there is a solution. The answer is that there is no solution in the normally accepted sense. The solution is about identifying and managing your risks. This is why the aim of any programme is not to achieve Year 2000 compliance but for your business to be Year 2000 ready.

The risk management approach must allow for:

Auditing - it is important that the whole compliance process is capable of being audited since the omission of only one non-compliant item of equipment may mean that a critical system will fail causing loss of business.

Prejudices - overcoming any preconceived ideas or prejudices by members of the compliance team. Many people feel that this is an overstated problem or have preconceived ideas about whether an particular item of equipment may be affected. It is vital therefore, that the compliance process is consistent and not dependent upon the person who carries out the initial survey or produces the compliance assessment.

Future monitoring - many asset suppliers have not fully assessed or tested their product and therefore may not be in a position to give all the information required about the compliance status of their product. The system must therefore allow for the monitoring of the compliance status of these assets on a regular basis.

Change management system - once a survey has been carried out in each building it is important that a change management system is set up to ensure that no potentially non-compliant items of equipment or components are installed.

The major activities of a typical approach are shown on the following Year2000 Compliance Logic Diagram.

Initially all assets must be identified. This may be a simple operation using the asset lists or it may require a survey of the installation if a current asset list is not available. This may appear to be a daunting prospect but the survey only needs to identify assets in critical systems which are usually more readily accessible.

In the first instance it is worthwhile listing all assets that have an electrical connection to them. This is to avoid any preconceived ideas form the survey team leading to an affected asset being missed. From the asset list those assets containing electronic components must be identified, as it is these assets that have potential problems.

How these assets are connected together to form critical systems are then identified and these links are shown using a dependency type diagram. The systems are identified in the visual way to better enable the identification of missing assets and to show the inter-relationship between systems.

The logic diagram could be later used to show compliance status or risk of being non-compliant. The diagram should be produced in draft form during the survey so that any obviously missing system items can be identified and the asset information gathered. The typical dependency logic diagram shown indicates how these assets and systems may be represented.

The equipment manufacturers must then be contacted and asked a series of questions to establish whether:

Their equipment is compliant to the British Standard four-rule definition. Their control code is available Their equipment is testable Their equipment has been tested Their source of control boards

The questioning of the equipment manufacturers is not an easy task. You must first ensure that you are speaking to someone who understands your questions and is able to give authoritative answers.

Some manufacturers are not very helpful and are tired of being asked these questions. You need to be both polite and firm with this type of manufacturer and try to make them understand that their piece of equipment is in one of your critical systems therefore you must know the answers to your questions.

You will frequently be told that they are compliant because they do not use the date. You must still find out whether they have the control code and whether their equipment has been tested. Some manufacturers will say they are compliant but do not have access to the control code, have many sources of control boards and are unable to test their equipment. Will this be acceptable to your business if this item of equipment is part of a critical system?

From the compliance information the level of comparative risk may be assessed using a fully compliant product as the basis of this assessment. The typical dependency logic diagram shows the results of such an assessment and the output from using these assessment figures in a risk model.

Once this work has been carried out you are in a position of knowing your probable risk and how it may impact on your critical systems and business activities.

To lower this risk to an acceptable level you must devise a robust implementation plan covering fixing, testing, workarounds, change management, and monitoring.

Where a fix has been identified by the equipment manufacturer you must make arrangements for this fix to be put in place bearing in mind the short time remaining.

Testing of fixes and critical equipment must also be arranged. In order to retain manufacturers guarantees the manufacturers should be asked to carry out these tests. The testing should be to a standard specification but allow for any other test the manufacturers feel necessary to properly check compliance of their equipment.

Despite all your efforts you or your manufacturers may make mistakes. Remember no of us have been through this before therefore we cannot draw on any previous experience. To avoid problems to your business you must therefore devise workarounds either in the form of equipment modifications or in the form of staff procedures.

To prevent the whole process from being compromised by the inclusion of a noncompliant item of equipment or spares an auditable change management system must be put in place. This must cover new buildings, refurbishment projects, new equipment and maintenance. It is sensible to prevent any new work to the critical systems after say November of this year, as there will be no time to ensure compliance before the end of the year.

Equipment manufacturers and your change management system must be monitored on a regular basis to ensure there are no compliance changes and to check the effectiveness of your change management process.

-- Old Git (anon@spamproblems.com), October 25, 1999

Answers

With just 37 Federqal days left 'till the End, it is getting down to "How can I shift legal responsibility for that which I never intended to spend the dough to fix?"

Ya gotta love these "World Class" "business leaders." Talk about multiple nested Oxymorons.



-- K. Stevens (kstevens@ It's ALL going away in January.com), October 25, 1999.


Very informative, thought-provoking, sobering read. Thank you, Old Git. Impossible to happy face after this read.

-- Ashton & Leska in Cascadia (allaha@earthlink.net), October 26, 1999.

Moderation questions? read the FAQ