
April 15, 2026
Your cyber systems can’t keep up with your data.
Nir Zuk, Founder and CEO
How long can your organization operate without full control of its data?
That question is no longer theoretical. Anthropic recently withheld a model capable of autonomously identifying and exploiting software vulnerabilities at scale because it worked too well. At the same time, its systems have drawn scrutiny at the government level as a potential supply chain risk.
This is the environment in which cybersecurity systems now operate. It is already affecting a growing share of the market.
In my first article, I mentioned the need for “sovereign data.” The initial responses I received to that piece prompted me to clarify what I meant by “sovereignty.” Now, I want to explain what I meant by “data.”
For a long time, “data” in the context of security operations meant something very specific: logs and events. SIEMs, and before them, SIMs and SEMs, have ingested event and log data for well over 25 years. In 2005, Gartner analysts Mark Nicolett and Amrit Williams defined the term SIEM to describe a unified system that combined the log management capabilities of SIM with the real-time reporting of SEM. But while log data provided some insights into cyber risks, it became obvious it wasn’t nearly enough.

Unfortunately, for organizations that, for a variety of reasons, couldn’t use the cloud, it meant they’ve been stuck for almost 2 decades with products that operate on that very narrow definition of data. It was also very common for those organizations to need to throttle their ingestion and storage of such data, as the dominant vendors monetized their product in a very punishing way. Not only did – and do – those organizations operate on very few data sources, they also operate on insufficient quantities to make informed decisions.
What came next was a logical expansion: more data from more individual security products. With a few exceptions, every security company born after 2005 managed its own data “puddle” of telemetry it gathered from its coverage surface. With no consensus on how to operationalize that data across systems, several attempts were made to expand SIEM systems to capture and normalize the data and incorporate it into workflows. However, these were the same systems that had already severely throttled ingestion and access to data when it consisted solely of logs and events, so adding more data posed even greater challenges.
This, in turn, created an opportunity for cloud-based approaches that addressed some of the storage and monetization issues. However, the trade-off was the introduction of significant new concerns regarding privacy, compliance (including regulatory compliance), and supply chain risk.
So while this second definition of “data” was more expansive, it led to an architecture that effectively prevents an increasing number of organizations from adopting what I believe is the correct approach to “data” in cybersecurity.
What I mean by “data” is all data.
In other words:
- Data from logs and events;
- Data from any system in an organization’s infrastructure;
- Data from external environments such as SaaS applications;
- And, most importantly, end-user data, meaning all of an organization’s files from users, servers, and applications, whether structured or unstructured.
This shift has been building for some time. When ChatGPT was released, it exposed the scale at which systems would need to operate on data. That confirmed what I had already suspected: the existing cloud-based architecture would not scale for organizations operating under strict security and regulatory constraints. Especially when most systems were never designed to operate on all of this data in the first place.
And, as it turns out, incorporating end-user data into cybersecurity operations provides extremely valuable information, through its lineage and associated timelines, that can be used by machine learning and other techniques to analyze attacks. Without it, you are operating without critical context. As we evolve our platform at Cylake, I will discuss this topic more deeply, but suffice to say that we will include end-user data in our approach.

Of course, using end-user data introduces complex issues of scale. Ingesting, analyzing, retaining, and operating on these vast amounts of data requires a strong endpoint technology to collect the data, broad integration with existing telemetry, and then an equally strong, vertically integrated system to work on it. To serve the most regulated institutions, neither system can be cloud-based or cloud-dependent.
This is why I created Cylake. We are the only system that can operate on all the data that we collect in a single “lake” in a fully sovereign way. Without dependencies on cloud-based services, non-sovereign AI, or throttling of data.
My belief that cybersecurity has become a data challenge led me to build a complete system that can access data from anywhere, use all that data at arbitrary volumes in an AI-native way, and make the best operational decisions to protect our customers. All in a complete sovereign way.
The question is no longer whether this shift will happen, but how organizations will respond to it.
Stay tuned for more updates along the way.
— Nir