Home » 100% Packet Capture » Inconvenient Truth

Network packet capture - the Inconvenient Truth

There's a problem with high-speed packet capture that
no-one's talking about ...

Packet capture’s 'Inconvenient Truth'

A range of different Enterprise systems rely on captured packet data to fuel them. Most network monitoring, Intrusion Detection (IDS), Intrusion Prevention (IPS) and Service Quality Measurement tools rely in some way on captured network traffic data. To be of value, it's critical that these systems have visibility of every single packet on the wire; a partial view of traffic, in many circumstances, is worse than no visibility at all.

And that's where the problem lies ... current packet capture engines are just not capturing everything.

Traditional hardware can't keep pace

Most of the security and monitoring systems that rely on captured packet data use software-centric techniques that rely on standard network interface cards (NICs) to capture network packets off the wire. This technique is fine at relatively low speeds (sub 250Kbps), however as line speeds start to increase, things can start to go wrong. Packets are dropped, and the systems relying on the captured packets cannot rely on what they're seeing - or rather not seeing.

So what's causing this problem, and why is it such a big issue?

Interrupt Storms - the I/O problem

Software based systems rely on NICs to capture traffic off the wire. But standard NICs are not purpose-built for high-speed packet capture - they're designed to send and receive packets at moderate rates sufficient for the services or clients running on a single computer.

Every packet that is captured by a NIC requires a series of CPU interrupts to inform the CPU of the arrival and availability of the packet. The maximum speed at which traffic can be captured reliably is therefore directly related to the processor's clock speed. The faster the processor, the greater the number of interrupts it can handle in a given time period and the more packets can be captured reliably.

Put simply, software based packet capture systems that rely on NIC hardware are fine so long as the processor's clock speed is faster than the interrupt rate that the NIC requires the CPU to service. If the processor isn't fast enough however - which happens quickly as network loads and rates increase - packets are dropped as the CPU becomes inundated by an interrupt storm.

The rise of multi-core

Everything was fine while processor clock speeds kept getting faster. But in the early 2000's microprocessor manufacturers (like Intel and AMD) stopped increasing the clock speed of processors and moved to multiple core CPUs. This approach gave processors much more processing power, but left absolute clock speeds stalled at a maximum rate below 4.0GHz. This put a cap on the maximum capture rate these CPUs could support with traditional interrupt driven packet capture.

Our research suggests software based system, even with NAPI or PR-Ring technology and the fastest multi-core processors, can only be relied on to capture 100% of packets up to around 500Mb/s before they start to drop packets.

Environmental factors

At the same time a number of other environmental factors - for example regulatory and compliance requirements such as PCIDSS and Sarbanes-Oxley - have required organisations to rethink what they capture, how they capture it and what they do with it once they have captured it.

Accurately capturing all network data is increasingly becoming a mandatory requirement as well as a pre-requisite for effectively monitoring, analysing and securing the network

The changing nature of traffic

Over the last ten years the profile of the packet traffic being carried has changed considerably, which has had a significant knock-on effect on the interrupt issue

  • Networks started carrying a greater load as organisations made increasing use of the Internet to empower their customers and their staff. Established networking wisdom dictates that IP networks become more inefficient and more unpredictable as they become more heavily loaded - which in turn makes them more difficult to manage, monitor and secure.
  • Networks started getting faster. The first Gigabit Ethernet network started to appear in the early 2000’s and have increasingly become ubiquitous, rapidly followed by 10Gb and even 40Gb networks. Even under the lightest of loads, these networks easily run in excess of 500Mb/s.
  • Network traffic has changed in size, composition and time sensitivity. IM and VoIP and video traffic, for example, have introduced new latency-sensitive protocols into the network mix. And as packet sizes have been reduced in an attempt to minimise latency, packet numbers have soared. This in turn has aggravated the I/O problem, resulting in NIC-based capture engines experiencing packet loss at even lower network speeds and loads.

So how serious is the issue really?

Our research (using our Traffic Replay capability) suggests that many of the packet-based security, monitoring and measurement systems deployed across the globe today could be ‘missing’ anywhere between 25% and 40% of the traffic on their networks (depending on load and conditions). And often neither organisations nor their vendors even know it is happening.

So, what's the effect of this? Let's look at an example.

The Conficker worm is 57KB in size. Based on standard Ethernet frame size (1,518 bytes) Conficker is carried in approximate 37 packets, and any one of those could be the signature packet your IDS needs to see to trigger an alert.

On a link operating at 0.5Gb/s with even just a 0.00002% packet loss you could be dropping the very packet that contains the vital signature. So forget "5 nines" guarantees about the reliability of your traffic capture engine - nothing less than 100% capture is really safe.

Where does this leave us?

The upshot is that a new 'next generation' approach to packet capture and analysis is required to support organisations that run fast, complex networks and have an absolute need to capture 100% of network traffic. At Endace, we've spent the last 10 years perfecting just such an architecture.