Wednesday, January 8, 2025

Quality is Priority Zero, Especially for Security

Software defects are not uncommon, but what sets great companies apart is how quickly they respond and put the customer first. While that is a challenge in any domain, security software defects are a unique beast. Our products can be the first line of defense or the last, depending on the solution. The cost of failure in security is catastrophic — and I say that without exaggeration.

“For a security product, quality is more than a technical measure; it’s a form of security in itself.”

This is an important lesson I’ve learned over the past 25 years in the cybersecurity industry. For example, when McAfee.com was acquired by Network Associates, I received a welcome gift from Chris Bolin — Test-Driven Development by Kent Beck. Chris, then head of engineering, emphasized a core principle that has stayed with me: Developers must own the quality of the products they build. From him, I learned that quality isn’t just a responsibility; it’s an integral part of the development process.

Another key leader I worked with, Bryan Barney, was instrumental in establishing PSIRT and CSIRT processes. He often said, “No product defect will ever have the same large-scale impact as a bad update from a security vendor.” He was, of course, referring to us. Yet, despite that warning, we did cause a large-scale disruption to critical infrastructure around the world when a flawed content update was released. At the time, we were deploying content updates every single day — fully automated across all versions, operating systems, and products.

Recently, when a similar incident occurred with a major vendor’s security product, several colleagues — current and former security leaders — shared their own war stories of having to report major incidents to their C-suite and board. These weren’t CISOs, but leaders in engineering, responsible for the very products designed to protect businesses from security threats. One thing we all could agree on — any issue a vendor encounters is a reminder to not get complacent, another wake-up call to step up our processes, methodologies of designing, building, testing and releasing code/software. This is why quality is priority zero, and we know the stakes are high if we don’t get it right.

Priority zero for our customers

This hyper-focus on quality is partly due to the fact that security products operate with elevated privileges, granting them significant access to systems and environments. A failure in quality can introduce vulnerabilities, turning the product from a defense mechanism into an attack surface. Poorly executed security updates can cause the very breaches they are designed to prevent.

Quality impacts the customer experience.  What we ought to strive for is the quality of customer experience. Usability issues, stemming from poor quality, can lead to misconfigurations or overlooked critical alerts, reducing the overall effectiveness of a security solution. (Around 80% to 85% of quality issues are due to misconfigurations, policy inconsistencies and poor implementation of software rather than flaws in the security products themselves.) This is especially true for products designed to detect and respond to incidents. If compromised by poor quality, their ability to protect customers is weakened, with potentially disastrous consequences.

Quality also goes hand-in-hand with operational resilience, which is a primary goal for many customers investing in security solutions. But when a security product fails, it does the opposite — disrupting the very operations it was meant to safeguard. In this way, a widespread failure in a security product can, in some cases, cause even more damage than a targeted ransomware attack, which usually affects specific targets.

The cost of not getting it right

The consequences of a security failure are not only about service disruptions but also about real-world harm, particularly in industries where downtime can put lives at risk. Think about hospitals, immigration services, utilities like electricity and water — any failure in these sectors can have immediate and severe repercussions. For example, a hospital unable to access patient records due to a security product malfunction could delay critical treatments.

Similarly, in banking, government services and large corporations, the financial and reputational damage of a security product failure can be profound. These sectors rely on security solutions to maintain operational integrity, and a single defect can lead to financial losses, reputational damage and long-term erosion of customer trust. In many industries, compliance with strict regulatory standards is also at stake. A failure in quality can result in non-compliance, leading to penalties, scrutiny, or even exclusion from certain markets.

“The potential for service disruptions, economic impacts, real-world harm and regulatory risks makes it essential to prioritize quality. Quality, then, is not just a technical attribute; it is a fundamental component of security.”

The consequences of a quality failure can be more far-reaching than even a malicious attack, highlighting the need for stringent standards and security practices when developing critical security products.

This became even more clear to me during the COVID-19 pandemic when I received a panicked call from an account executive. A drug manufacturing facility’s production line had come to a halt due to a defect in my product. The stakes couldn’t have been higher. A swift response, however, gets your customers to become loyal customers.

Lessons learned and what the future holds (hint: AI is a key player)

When we can build solutions to defend critical infrastructure against nation-state attacks, we must be equally committed to ensuring the quality and security of our own products and processes. I believe AI will be a key player in helping us meet this challenge.

  • The next time your software is flagged for a vulnerability, don’t seek an exception approval to ship the product — fix the issue first. Zero tolerance. Code reviews aren’t tedious formalities; they are valuable learning opportunities where teams can sharpen their skills and catch critical errors.
  • Routine daily updates should be evaluated based on the impact of each change, no matter how small.  The potential impact of code changes must be considered at the design stage itself, ensuring that any issues are contained early on.
  • Failure Mode Effects Analysis (FEMA) might sound like a heavy practice but when internalized by the org., it can deliver great dividends. FMEA forces one to think about failure modes within the system, evaluating potential effects of those failures and prioritizes actions to mitigate risks.
  • Focus on continuous integration testing, automated regression checks, and having robust monitoring tools in place to catch problems before they reach production.
  • Clear communication across teams is essential to ensure everyone understands the risks involved with even minor changes. Anything less compromises not just quality, but trust.

With the advancements in AI, the five key steps outlined above can now be implemented more efficiently and effectively than ever before.

AI can help automate and streamline these processes, allowing teams to quickly identify and address issues, improve product quality and maintain customer trust. Our teams have rolled up their sleeves and jumped in, leveraging AI to accelerate unit testing, automate compliance steps, review logs to check for anomalies proactively, improve the risk assessment framework to consistently assess risk of builds and automate detection of vulnerabilities.

“By leveraging AI, we can enhance the overall security and resilience of our products, making these critical steps easier to achieve in today’s fast-moving cybersecurity environment.”

Looking ahead, I cannot envision any scenario where the pursuit of unwavering quality is detached from building great security products. Effective and reliable security solutions are the foundation of digital trust, especially in a world where threats evolve and morph at the speed of AI. This means every security vendor and the industry as a whole must commit to rigorous testing, CI/CD principles and transparent communications with our customers, even — or maybe especially — when the news is really dire. For Cisco Security Engineering, these commitments are not aspirational; they are priority zero.


We’d love to hear what you think. Ask a Question, Comment Below, and Stay Connected with Cisco Secure on social!

Cisco Security Social Channels

Instagram
Facebook
Twitter
LinkedIn

Share:


Related Articles

Latest Articles