It’s been almost two years since OpenAI launched ChatGPT, driving increased mainstream awareness of and access to Generative AI tools. In that time, new tools and solutions seem to be launching daily. There is also a growing trend of building bigger models that consume larger quantities of training data, often with mixed results ranging from hallucinations or categorically incorrect facts to the regurgitation of opinions as universal truth, proving the old adage that sometimes “less is more”.
Quality over Quantity
So, if using more data doesn’t translate into better results… what does? It comes down to another tried and true saying – “quality over quantity.”
At McAfee, we maniacally focus on data quality. A well-developed Generative AI model is nothing without high-quality, curated datasets to fuel them. When the quantity of data is prioritized over quality, the results are often disappointing.
How do we produce quality data? Using millions of worldwide sensors, our AI engineers and AI data specialists focus on clues that point to threats. But that’s just the first step. Our teams then curate the data to improve the quality and maximize data diversity, reducing sources of bias, cross-pollinating data sources, and enriching and standardizing samples, just to name a few of the dozens of operations conducted to ensure we’re building datasets of the highest and purest quality.
All of this translates into the most comprehensive and robust AI-based protection for our customers: more than 1.5M threat detections per week across malware, scams, phishing, smishing, and more than half a billion web categorizations to help ensure a safe digital journey while browsing the Internet.
Human/AI Partnership
As the capabilities of AI tools increase, so does the conversation around how technology removes humans from the equation. The reality is that humans are still an integral part of the process and key to any successful Generative AI strategy. AI is only as good as the data it’s trained on, and in McAfee’s case, the guidance provided by cybersecurity experts. Thus, Cybersecurity AI specialists curating data is crucial to the development of all of our AI systems as it mitigates potential sources of error, resulting in accurate and trusted AI solutions, and allowing us to scale and share human expertise to better protect millions of customers worldwide.
Tackling cyber threats is a tall order that comes with intrinsic challenges. For example, modern scams are more subtle and less obvious even to experts, and quite often it is just the implicit intent that sets it apart from genuine (non-scam) content. Being context-aware can help navigate this landscape to more effectively detect and stop threats before they reach customers. What is more, we believe transparency and education are paramount for building a safer digital world. This is why we also invest in building explainable AI that helps users understand why a threat has been flagged and provides clues they can use to identify future threats.
Only the Beginning
The GenAI journey has only just begun. There is still a lot of work to do and a lot to look forward to as this technology continues to evolve. While it’s easy, as developers, to get caught up in the excitement, it’s also important to identify and focus on an ultimate goal and the responsible and safe steps to get there. At McAfee, we pledge to protect our customers, and we believe in the synergistic interaction between AI and Human Threat Intelligence. Together, we can deliver a trusted, world-class AI protection experience.