How to ace your system design for top notch security

Achieving absolute security for an IT system is not an easy or straight forward matter. More than that; it needs to be thought of as a journey, and not a system state. You are constantly travelling to remain on a path considered secure.

The most important lessons I’ve learned over the years are:

  1. To constantly maintain the system by applying security patches in a timely manner
  2. Continuous monitoring for system degradation, such as full disk partitions, tell-tale signs of a breach or denial of service attack
  3. To continually evolve the system towards being more resilient against attacks or removing shortcomings in the protection of access management.

Best design practices

At the core, the goal is to build a secure service implementation through its components, as well as a tight and secure configuration. This can be achieved by first working out the different security concerns and thinking about how the system can be designed to address those concerns.

Compliance and policy

To suit the purpose of your system, you will likely need to be compliant with regulatory constraints. Therefore, certain components need to be in place, such as centralised log aggregation and analysis, and the configuration adapted to meet them, such as access levels and credential policies.

Identity and access management

Access management usually needs to be set up for each application or service itself, just as for the infrastructure. For example, you’ll need to think about software developers vs. system administrators with different access levels. This includes authentication (verification of the identity) as well as authorisation (verification of access privileges for desired activities or levels).

Additionally, symmetric and private keys need to be protected from common access. This can be achieved using certain management system components, such as PKI or Kerberos, or via key stores for API keys or credentials for external components.

Lastly the administration of access itself requires some attention, preferably in a unified way. This way activities of adding/removing users or granting/removing access levels will be effective across the entire system, avoiding glitches and unnoticed access through forgotten path ways. A unified approach also provides the means for logging and acts as a common hub for privilege management vs. a disparate agglomerate across inhomogeneous sub systems.


Confidentiality is applicable across actual (data) content as well as meta-data, such as times of access. The data needs to be safe from unauthorised access. This can be achieved by restricting access to the data, or by protecting the data in transit and at rest (e.g. via encryption).


Just building a suitable system is often not enough. Peer reviews can take an important role in making sure no omissions were encountered in the design and implementation, or a more formal process of an external system review could be considered. After the initial build and "go live" of your system, ongoing re-evaluation of changing conditions, as well as actively managing change (making and documenting the change, reviewing change by others, possibly a re-audit) are still important.


Checks of system integrity and active monitoring (ideally with pre-configured alerts on commonly expected potential conditions) is absolutely required. This increases system resilience and avoids cascading effects of fault conditions causing others. However, most of all, it prevents or reveals tampering through unauthorised access through early detection of breaches or even access attempts.


Sensitive or classified information held within the system needs to be identified as such, which can be accomplished by some mechanism containing meta-data to this extent. With this in place, the system can be enabled to prevent unauthorised disclosure of this information.


System logging takes an important role within a system to ensure ongoing secure operation, and to take the guess work out of security maintenance. Logging ranges from the initial and easy form of debugging, towards increasingly elaborate use cases for detecting operational anomalies, compliance and for forensic purposes.

The latter cases are of increasing difficulty as several issues with normal logging need to be addressed. Log entries must be "machine understandable", as they are not solely used for human consumption, and will require to be parsed and reasoned upon by algorithms.

Typically, non-trivial systems are comprised of several host systems, in which each of them may produce one or more log files that need to be aggregated and "meshed" in a temporal and functionally consistent way.

Log entries must also be immutable to make it infeasible for an attacker to subvert the log integrity and cover up their tracks of unauthenticated use. For such tasks, sophisticated log analysis systems may be used. Their capabilities are varied and a choice needs to be made carefully upon functional and non-functional requirements of the system, along with the degree of protection required.


In most cases a system needs to be robust against outages. Such outages may be caused by failures, such as software, infrastructure, or connectivity failure, or by attacks, such as denial of service attacks. Depending on the required availability requirements of the system, different means can be employed to achieve these, such as high availability through redundancy or using commercially available content delivery networks for protection on a geographical scale.

Don’t forget about the human factor

Beyond the above best practices for secure system design and the pure architecture and configuration, the human factor is often overlooked. Education of staff and users towards security awareness is a huge factor in keeping a system safe and should not be underestimated.

Security breaches through staff can occur knowingly or unknowingly. The obvious cases for knowingly compromising systems could simply be conscious carelessness, or in more malicious cases, reasons can range from revenge to espionage. The situation is very different if loss or theft of a device (with security credentials in place) is involved, or when carelessness is the cause for insecurity, such as a computer that has not been "locked", passwords that are insecurely stored or written on a sticky note, or weak credentials have been used.

As counter measures, policies and procedures can be established to lessen the likelihood and the impact of such breaches. However, education and the creation of awareness for all staff is the prime factor.

An ongoing process

System security is a complex, multi-faceted goal to achieve. Many different things must be considered and come together. There is no magical solution or secret sauce to make a system secure, however, the single biggest advantage to gain is by educating the staff and end users on the ins and outs of system security. Make them aware of the possible problems and give them the means to avoid them.

The bottom line is that security is an ongoing process. You must continuously monitor and maintain the system beyond the initial build and deployment, and rigorously scrutinise security weaknesses and advancements in all components.

Subscribe to the blog