How to Evaluate Full Packet Capture Solutions

Selecting the right network packet capture solution is essential for any team tasked with ensuring that IT systems or networks are secure, operational, and performing at their best. 

Poor packet capture performance or reliability may jeopardize your ability to quickly recover from outages, validate policy settings (e.g. Zero Trust), resolve service degradations, and neutralize critical cyber security threats.

What's Covered in this Guide?

Selecting the right packet capture solution should include evaluating and testing the system’s suitability in your environment - with your real-world network traffic - and that it meets your team’s needs. This article summarizes the important aspects to consider, evaluate and validate when choosing a packet capture solution.

Performance

Your packet capture evidence is the source of truth for investigating any network incident. This evidence can’t be in question. It must completely reliable, with no missed traffic or dropped packets and it needs to be quickly searchable. A packet capture solution that can’t keep up with real-world traffic will leave you with more questions and doubts than answers:

  • Did my network drop the packet or was it my packet capture solution being overwhelmed?
  • When my network is under attack am I capturing all the threat traffic?
  • In times of crisis can I search and analyze packets rapidly to find what I need?

Accuracy and reliability are essential. If you can't trust your system to record every packet faithfully, it's hard to be confident that threats can be investigated fully. Lost packets result in missing important evidence when investigating a cyber security incident. Inaccurate recording can also cast doubt when troubleshooting network or application performance issues.

Datasheet Myths and Legends

Selecting a vendor based on datasheet specs alone may leave you high and dry. Not all vendors’ products live up to expectations when recording real-world traffic under real usage conditions. Some datasheets specify their capture rates with large packets, easily compressible traffic, simple L3 payloads and just a few flows or connections. When attempting to record high rates of real-world traffic these systems might experience lost packets, gaps in recording, system overload, or make it difficult or slow for users searching recorded traffic.

The performance of a full packet capture solution is more than just its sheer throughput in terms of recording packets to disk. Assessing performance should also include the following measures:

  • Maximum long term continuous capture rate, under real world traffic conditions for days or weeks with a rich mix of applications, millions of connections, different packet sizes, assessed over the full extent of the storage media, empty through to full.
  • The speed, and ease, of finding and retrieving specific packets from within petabytes of recorded packet data, while the system is continuously recording.
  • The solution’s accuracy with respect to timestamps, packet drops, and the effects on performance resulting from different traffic characteristics or usage conditions.
  • Does the solution tell me when it’s overloaded and dropping packets?

Datasheets should go into depth and cover more than just Gb/s write to disk speed, they should also detail:

  • Maximum sustained and maximum burst write-to-disk speeds including burst length.
  • Maximum concurrent flows and maximum flows/second.
  • Maximum concurrent users hitting the system with searches.
  • Maximum scale of deployment for distributed deployments.

Test Traffic Profiles

It’s very important to ensure the system is tested using traffic that represents the worst-case conditions observed on your network. This includes test traffic with short and varied length packets, many concurrent flows or connections, many flows starting and ending every second, and a mix of different protocols and applications.

All these traffic aspects will add load to the capture system and potentially impact the accuracy of packet capture and system reliability.

Did you know that recording rates drop when the HDD records on sectors near the center of the disk platter? This means it’s important to test over as long period of time as possible to ensure the disk system is fully utilized, and that performance is not impacted by the variable performance of drives once they become full.

Reliability

Reliability is key for any capture system. This is a difficult aspect to measure when you are conducting a proof-of-concept trial, but it is a very important consideration. It’s critical that systems can run for many years without fault, reliably capturing packets and providing key evidence quickly when it’s needed.

Some questions to ask packet capture vendors:

  • Do you have large estates of deployed capture systems in mission critical networks?
  • Are you using the most reliable parts on the market — especially HDDs, which are the key component in system reliability?
  • Can the RAID system survive disk failures? Can the HDD replacement be done without stopping capture.
  • Does the system have redundant hot swappable power supplies?
  • Can the solution be deployed in a way that allows for continuous capture with single chassis failure?
  • Do you support in service software upgrade, redundancy, or high availability?
  • Do you publish MTBF and analyze the failure rates of parts?
  • When something does fail, what is the RMA process and dispatch timeliness?

Depth of Storage

Increasing the depth of available packet storage allows recorded packet data to be kept for longer. This provides increased “lookback” time for analysts, enabling them to go back further in time to analyze historical activity and accurately reconstruct network events before historical data is overwritten by new, incoming data.

This is particularly important for use cases such as investigating advanced persistent threats that involve multiple attack vectors that may take place over many days or weeks. Or examining historical traffic to scan for evidence that a Zero Day vulnerability may have been exploited before patches or detection rules were in place.

Some questions to ask packet capture vendors: 

  • How many days of lookback will the system provide with your average traffic rate?
  • How does storage scale with your solution?
  • Does adding storage degrade search response?
  • What is the cost of adding storage?
  • Is there built-in redundancy as the fabric scales?

Search and Data Retrieval Speeds

If finding packets of interest takes hours (or longer) for any given incident response, the value of having recorded the packet data is greatly diminished. Fast and easy search increases analyst efficiency, minimizes delays and distractions, and allows for streamlined investigation workflows.

Some questions to ask packet capture vendors:  

  • How quickly can I search across weeks or months of recorded packet data? How quickly are results returned, and does the data returned require further processing to be useful?
  • Does the solution support multiple concurrent users running search and data retrieval tasks simultaneously without impacting on performance?

Fast search should also be well integrated with other security and performance monitoring tools that you own, from vendors such as Cisco, Palo Alto Networks, Splunk, Fortinet, IBM and many others and open-source tools such as Zeek, Suricata, and SNORT. Solutions such as SIEM, SOAR, NGFW, and other tools should integrate seamlessly with the packet recording fabric.

Be sure to ask:

  • Can I integrate search and data-mining into third-party tools? What plugins are available and published today?

Centralized, Federated Search

Having the ability to continuously record packet data across the enterprise is one thing. But for that data to be useful, teams must be able to quickly and easily search across it to find what they need when they need it.

The solution needs to support efficient searching datamining of packet data across the entire environment, preferably from a single point. Otherwise, the workflow for finding packets of interest pertaining to any specific incident will be long and tedious and increase the time-to-resolution.

For some deployments, it is important to be able to restrict access to specific data or to specific appliances. For example, organizations may need to restrict some data to certain teams (for example geographical regions, different security levels, different organizational divisions etc.)

Some questions to ask packet capture vendors:  

  • Does your packet capture solution support “network-wide” search from a single application? Or do they need to log into each individual packet capture appliance to search?
  • Can users perform a federated search and retrieve packets in a single merged trace file? Via GUI and API?
  • Do search response times increase with the number of packet capture devices in my network?
  • Can you control access to specific appliances with specific data – e.g. restrict access to only team members with the appropriate level of security clearance?
  • Does centralized search and retrieval capability integration with other tools such as SIEM or SOAR?

Scaling to Support Increased Active Users

In our experience, as the value of having access to recorded packet data is increasingly recognized across the organization, demand for access to that data increases – whether the need for access comes from more members of SecOps and NetOps teams, or the desire to share infrastructure across multiple teams. As a result, invariably, solutions need to support access from an increasing number of users over time.

The ability to deploy additional instances of InvestigationManager (at zero cost) enable customers to scale-on-demand to support additional users without impacting performance and without adding cost.

Some questions to ask packet capture vendors:  

  • Are upgrades required to support additional users of the solution?
  • What is the cost of adding additional users?
  • Does adding additional users affect performance of the solution?

Centralized Management

As appliance estates increase in size, it becomes increasingly important to have centralized management to monitor the health and performance of packet capture appliances in the fabric, and to apply patches, updates and configuration changes easily. This reduces the administration overhead of managing estates — reducing cost and increasing efficiency.

Separating the management plane (administration) and user plane (search and data-mining) makes it very easy to perform these functions across different teams and ensure administration access to the relevant functions is restricted just to the teams or individuals that need it.

Packet capture data is sensitive information that should be restricted to only those cleared to access and analyze packet captures. Centralized management integrated with AAA systems ensures that access is controlled and restricted appropriately.

Some questions to ask packet capture solution providers:  

  • What tools do you provide to enable administrators to centrally manage large estates with multiple appliances?
  • Is there a licensing cost to the management component if there is one?
  • What are the deployment requirements for deploying central management?
  • Are there different RBAC roles that I can use to separate admins from various levels of users?
  • Can I integrate with my AAA systems, logging systems, and other IT systems?

Flexibility of Deployment

Organizations that have migrated all or part of their infrastructure to cloud providers have often sacrificed visibility for expected cost savings and flexibility.

To provide the same benefits as packet capture in physical environments, cloud-based packet capture solutions need to support similar features and benefits to their corresponding physical deployments

Some questions to ask packet capture solution providers:

  • Do you support cloud-native deployments?
  • Can you search across, and retrieve data from, hybrid cloud environments seamlessly?
  • Can you scale in all environments simply, without “forklift” upgrades or complex reconfiguration?

Integrating with Third-Party Tools

When organizations purchase packet capture solutions, they are typically adding packet capture to enhance troubleshooting or investigation workflows. Most large organizations have multiple monitoring and detection tools and SIEM or SOAR solutions already deployed – often from multiple vendors, such as Splunk Enterprise Security or Splunk SOAR, Cisco Firepower or Stealthwatch, IBM Q-Radar, Palo Alto Networks Panorama, NG Firewalls, or XSOAR, Microfocus Arcsight, Fortinet FortiSIEM, DarkTrace Enterprise Immune System and many others. Open-source tools such as SNORT, Zeek, Suricata, Wireshark etc. may also be deployed.

Packet capture, when incorporated into an infrastructure, is typically not used for proactive monitoring where an analyst has “eyes-on-glass.” The proactive monitoring space is well served by solutions such as those mentioned above. Packet capture should complement these monitoring tools by seamlessly integrating into existing workflows to provide accurate network history at the packet level that enables analysts to solve issues/incidents faster and more confidently.

Some questions to ask packet capture solution providers:

  • What integration capability do you provide to enable packet data to be accessed from external applications?
  • How easy is it to enable that integration?
  • Does the integration support automated access to recorded packet data?
  • What plugins are available and published today? Are they officially recognized and published by the third party tool vendors e.g. is there a Splunk app officially available on SplunkBase?

Total Cost of Ownership (TCO)

Like all technology purchases, packet capture solutions are not a one-off capital expense, and ongoing costs must be considered in any buying decision. In addition to ensuring that internal administration costs are minimized (as discussed above under “Centralized Management”) there can be a range of ongoing vendor costs that need to be included in any calculation of TCO

Many vendors add annual licensing costs for various components of their solutions that can substantially increase the TCO when the total lifespan of the solution is considered.

Some questions to ask packet capture solution providers:

  • What ongoing support, maintenance and licensing costs are there beyond the initial purchase cost?
  • What is the expected (and supported) lifetime of your product? Do you have an official End of Life policy?
  • Is your solution upgradeable? What are the upgrade costs? What is the support cost associated with those upgrades?

Download PDF Guide

Evaluating Full Packet Capture Solutions

We hope this article provided a useful guide to the important things to consider when selecting and deploying full packet capture solutions.

This article is also available as a PDF guide with checklist. Download your free copy below

Download PDF Guide

Who is Endace?

Endace specializes in scalable, high-speed, high-performance packet capture. Our solutions are used by some of world’s biggest organizations on some of the fastest networks on the planet.

If you are looking for a packet capture solution, we’d love to show you why Endace is the best choice. Contact us to book a demo or ask a question.

Contact us