Welcome to CISA recommendations for Timing and Sync (part 2)
For part one go here.
We all understand in the modern world network security is important as well as the tools we have to monitor it. It can be easily understood that your network security can only be as good as the insight into it you have.
This is why the first step is to Know your system.
If you can understand the nature of the devices on your network, you can start to secure and manage your environment. For this you need to identify, verify, and (we all hate it but….) document the timing dependencies for your organisation which will result in a timing topology. A simple topology could look like this:
With this starting point, you should be able to at a minimum understand what devices or systems have timing requirements as well as the precision needs for the different types of devices or systems.
Below we investigate another real-world example as described within the CISA document on timing guidance:
A recent Apple software update notice advised users to update specified devices before November 3, 2019, to maintain accurate GPS location and correct date and time functionality. These devices were not impacted by April 6, 2019, GPS Week Rollover event as Apple programmed the update to occur on a date after the rollover event. This exemplifies why knowing your system is critical to your and your users’ operations; it also directly correlates to item 1.a.3 below.
So what does it take to know your system?
You need to be able to manage the devices in your network that use sync and timing.
Starting at the easy end you are going to need to understand how the time of day is distributed around your network (sometimes referred to as major time).
You will need to understand what devices distribute time across the network
Most important about this is knowing that they have a traceable source of UTC, whether that be from GNSS or stratum 1 references or even globally available public NTP if precision isn’t important.
If GNSS is part of your timing distribution you should also include this in your standard firmware update processes as from time to time the manufacturer will issue updates such as when the GPS rollover event happened.
Finally, depending on how you operate the timing and sync topology, you should be scanning for time servers on your network regularly.
NEXT….
Identify applications and systems that use or require time and sync for their operation.
It may obvious but the first step is to make sure these applications or systems need the time.
Second is the level of precision that they need, it may not always be sensible to run everything with high precision if it doesn’t need it.
Directly relating to the above – is your method of distribution suitable for the required precision (typically this would involve looking into NTP and PTP)
Critically once you have got this far in identifying everything you need to make a note of it all.
When making a note, make sure you have all the time dependant systems and the precision they require
Ensure all distribution devices are noted and their potential performance (i.e. holdover or unicast seats available etc.)
Ensure the inventory above is reviewed or updated from time to time.
Does the system architecture documentation detail the time reliance and possible effects?
To date, are you able to identify timing anomalies? - this is usually the key question many individuals ignore. It’s one thing to have sync systems it’s another thing entirely to know that they are correct!
Do you publish the level of service anywhere for your timing (some industries and MSP’s will need to do this for SLA’s)
To do the above you will need a method to identify and monitor the level of service which will require a clear definition
When you have monitoring in place you need to also implement alerts and notifications to deviations in performance (if no one hears about it was it worth monitoring?)
To go one step further you should put in place a system to detect anomalies in the timing chain – for example detection of jumps forward and backward
How long time can your systems and devices maintain normal operation in the absence of timing or sync?
The first step is to document how long-time applications, devices or systems can holdover without source input (this is easier at the source than the application, but a good sync app can identify for you)
Once you know how long time you have until the operation becomes abnormal or critical, you need to work out if it is long enough. Can the users accept the length of time?
Conduct regular redundancy checks, a lot of timing systems are designed with a level of resiliency, but you need to test to ensure it is optimal
The last item on the list to understand about your systems could be the most important and that is to understand what happens and how do your systems react when time and sync are degraded.
You need to look back at alerts and your systems should inform or have a system in place to inform when time dependant applications have a degradation or may not be reliable.
Do the applications have error handling routines to address the potentially unreliable time.
Are your staff and engineers trained to understand and respond to faulty time or sync, do you have the necessary support packages in place with your vendors?
What is your Back up Source? Sounds like it’s a basic statement but when everything goes wrong there should always be a backup plan!
If you can't wait for the next installment you can always get in touch with us at Timebeat as we would be more than happy to create a bespoke solution for your needs.
Comments