Collector Requirements | InsightIDR Documentation

Collector Requirements

The Machine with Collector Software installed acts as a server.

Before you install a Collector please consider that the machine with Collector Software is a server. It’s intended use is collecting data for the Insight Platform and it should not be used for any other purpose.

In order to set up a collector the following requirements should be met. If you do not meet these requirements before attempting to set up a collector it may not operate properly. Read the following sections and understand their importance to determine if deploying a collector is right for your organization. If you already have Nexpose or InsightVM installed on your system do not install the Insight Collector Software on an existing Nexpose Console or Nexpose Scan Engine as this will cause issues with your Nexpose systems.

System and Host Requirements

You can install a Collector on a network server or virtual machine that meets the following requirements.

Minimum Hardware

  • 4 CPU cores with 2GHz+ on each core
  • 8 GB RAM recommended
  • 60 GB+ available disk space
  • Configured with a Fully Qualified Domain Name (FQDN) such as idrcollector23.myorg.com

Deploying the collector on ARM architecture, such as AWS Graviton, is not currently supported.

Read more about Collector Placement and Sizing.

Disk Space

In some situations, a collector cannot establish a connection with the cloud and becomes unable to send data to the Insight platform. Collector Disk space allows it to “hold on” to the data by writing logs to the disk until a connection is reestablished. If more disk space is available, your collector can hold data longer without a connection.

Because the Platform compresses data it receives, Rapid7 recommends 1GB of disk space for each 10GB of data in the collector. Additionally, plan for at least 24 hours of “spillover” disk space for each Collector when data cannot reach the cloud.

Supported Operating Systems

64-bit versions of the following platforms are supported:

  • Ubuntu 10.04 – 20.04
  • Debian 7.0 – 8.2
  • CentOS 5.2 – 7.3
  • Oracle Enterprise Linux (OEL) 5.2 – 7.3
  • Fedora 17 – 25
  • SUSE Linux Enterprise Server (SLES) 11 -12
  • SUSE Linux Enterprise Desktop (SLED) 11 -12
  • openSUSE LEAP (42.1 – 42.2)
  • Amazon Linux
  • Red Hat Enterprise Linux (RHEL) 5.2 – 7.3
  • Microsoft Windows Server 2019
  • Microsoft Windows Server 2016
  • Microsoft Windows Server 2012 R2
  • Microsoft Windows Server 2008 R2
  • Windows 7 and newer

On Windows systems, the Collector must be capable of launching the PowerShell process locally in order to auto-configure event sources.

Supported Browsers

  • Mozilla Firefox (latest stable release)
  • Google Chrome (latest stable release)

Minimum Network Bandwidth

  • 100Mbps network (required)
  • 1000Mbps (Preferred)

Other Recommendations

  • Only one Collector can be installed for each machine on your network. Rapid7 strongly recommends that the machine (physical or virtual) is dedicated to running the Collector.

Warning!

If you already have Nexpose installed in your organization, do not install the Insight Collector software on an existing Nexpose Console or Nexpose Scan Engine, as this will cause issues with your Nexpose systems.

Networking Requirements

As you prepare your network for the Collector, consider the following areas:

SSL Decryption Exclusion

The Collector, as well as agents that use Collectors as a proxy to the Insight Platform, will not work if your organization decrypts SSL traffic via Deep Packet Inspection technologies like transparent proxies.

Internal Routing Rules

The Collector polls and receives data from event sources. Therefore, you should provide the directory or file location where the Collector can access the server logs for collecting log data. You can specify a local folder path or a Windows Universal Naming Convention (UNC) path to a hosted network drive.

Ports

All Collectors must be able to reach out to port 443 and communicate back to the Collector via TCP ports:

Port Number

Data Gathered

TCP 5508

Communication back to the Collector from the Insight Agent or Endpoint Monitor.

TCP 6608

Upgrade agent data path for the Insight Agent.

TCP 8037

File upload for Insight Agent.

TCP 20,000 – 30,000

Communication back to the Collector from the Endpoint Monitor.

For Linux collectors, you must use ports higher than 1024.

See Ports Used by InsightIDR for more information.

IP Ranges

Overlapping endpoint monitoring ranges are allowed. IP addresses or IP ranges defined on Collector A should not be duplicated on Collector B. If this exists, it should be updated before the migration. Otherwise, those ranges will have to be manually updated after the migration.

See IP Addresses for more information.

Credentials

Each Collector can only support one set of endpoint monitoring credentials. Make sure you configure credentials for each Collector instance on your network.

Firewall Rules

Disable the local firewall on the collector host, if possible. See Firewall Rules for specific instructions.

If you cannot disable the local firewall, follow the configurations below.

All Collectors must be able to establish outbound connectivity on port 443 to *.endpoint.ingress.rapid7.com and communicate with the domains shown in the Data and Storage (S3) columns of the following table according to your geographic region. For example, for InsightIDR subscribers that elect to store their data in Australia, Collectors must be able to communicate with the following endpoints using port 443:

  • *.endpoint.ingress.rapid7.com

  • au.data.insight.rapid7.com

  • s3-ap-southeast-2.amazonaws.com

RegionData endpointStorage (S3 endpoint)United States – 1

data.insight.rapid7.com

s3.amazonaws.com

United States – 2

us2.data.insight.rapid7.com

s3.us-east-2.amazonaws.com

United States – 3

us3.data.insight.rapid7.com

s3.us-west-2.amazonaws.com

Canada

ca.data.insight.rapid7.com

s3.ca-central-1.amazonaws.com

Europe

eu.data.insight.rapid7.com

s3.eu-central-1.amazonaws.com

Japan

ap.data.insight.rapid7.com

s3-ap-northeast-1.amazonaws.com

Australia

au.data.insight.rapid7.com

s3-ap-southeast-2.amazonaws.com

If you intend to deploy token-based Insight Agents through your Collectors, you also need to allow outbound connectivity from each Collector on port 443 to the endpoint that provides the agent’s configuration files. Just like the Data and Storage endpoints in the previous table, you can configure your firewall rules to allow your Collectors to connect to a region-specific version of the Deployment endpoint to meet this requirement:

RegionDeployment endpointUnited States – 1

us.deployment.endpoint.ingress.rapid7.com/api/v1/get_agent_files

United States – 2

us2.deployment.endpoint.ingress.rapid7.com/api/v1/get_agent_files

United States – 3

us3.deployment.endpoint.ingress.rapid7.com/api/v1/get_agent_files

Canada

ca.deployment.endpoint.ingress.rapid7.com/api/v1/get_agent_files

Europe

eu.deployment.endpoint.ingress.rapid7.com/api/v1/get_agent_files

Japan

ap.deployment.endpoint.ingress.rapid7.com/api/v1/get_agent_files

Australia

au.deployment.endpoint.ingress.rapid7.com/api/v1/get_agent_files

Data Collection Requirements

To plan your Collector deployment, have the following information available for each server or virtual machine where you will install the Collector:

  • Display name
  • Network location
  • Server host name and IP address
  • Administrator rights to install a service on the server

Endpoint Data Requirements

The collection of endpoint data also uses resources on the Collector. Endpoint data can be collected either by using the Collector to scan a range of endpoints or by installing a Rapid7 Insight Agent on the endpoints. Both methods will use resources on the Collector.

The greater the number of endpoints that the Collector needs to collect data from, the more resources it will need. If the CPU utilization is already consistently hovering at 40% or higher on the Collector, you should consider standing up another Collector at that location or adding more CPUs before adding additional endpoint ranges to scan or agents.

The Rapid7 Collector cannot have more endpoints or agents than 600 per CPU. Therefore, if your Collector has 4 CPU cores, it can handle up to 2,400 endpoints or agents if the CPU utilization is not already heavily utilized by event sources that have been added.

The number of event sources and the number of endpoints from which you are collecting data determine how much RAM and the number of CPUs that the Collector needs. The more event sources and the more endpoints, the more RAM and CPU the Collector will need to operate. The free disk space that the Collector has is used for spillover of data collection only. Under normal circumstances, the Collector sends all data collected immediately to the cloud for processing.

However, if the Collector loses connectivity to the cloud or it is under other subnormal operating scenarios, it will store collected data into a spillover folder on its hard drive. The more free disk space you give the Collector, the more spillover space it will have available to it. Note that it is often more efficient to deploy multiple Collectors throughout the environment rather than break firewall rules or overload a single Collector.

Also, when scanning endpoints with a Collector, each Collector can be configured with only one set of credentials for the endpoint scanning. If different credentials are required for scanning endpoints, then you will need to use a separate Collector for each credential that will be used.

Collector Placement and Sizing

When considering where to place your Collectors, keep in mind that your bandwidth and network architecture will influence the number of Collectors that you need in your organization and where you should place them. Generally, you should deploy the Collectors close to the logs that will be pulled or sent and close to the endpoints that they will be scanning.

Rapid7 recommends a maximum of 80 event sources for each Collector, depending on the following:

  • The size of the event sources being added
  • The amount of CPU memory available to the Collector
  • The amount of VM resources available to the Collector
  • The amount of disk space available to the Collector

Tip: Keep up to 50-60 event sources per Collector and distribute event sources over multiple Collectors

The capacity of a collector depends on multiple factors. While the maximum recommended is 80 event sources for each Collector, it can be more convienent to keep up to 50-60 event sources per collector to prevent data collection issues.

Distributing event sources over multiple collectors is always a good practice.

Collector Location Size

Number of Endpoints/Agents

Number of Event Sources on the Collector

Recommended Minimum CPU

Recommended Minimum RAM

Recommended Minimum Disk Space

Small

Up to 500

1 – 10

4

8 GB

60 GB

Medium

Up to 2,400

10 – 50

4

8 GB

80 GB

Large

Up to 600 per CPU core

50 – 80*

4+

16 GB

100 GB

*If you have more than 80 event sources, you should split your event sources across multiple collectors.

High-volume event sources place a higher RAM and CPU load on the collector and will result in the collector handling a lower number of event sources overall. Before adding a chatty event source like a firewall to the collector, check its current resource utilization (under Data Collection > Collectors).

  • If the CPU utilization is consistently more than 40%, consider adding another collector to the location to handle some of the event sources.
  • If the CPU utilization is consistently above 90%, then you need an additional collector to handle the load.

Important Collector Limitations

All Collectors must be configured with a fully qualified domain name, for example:

  • idrcollector1.myorg.com

For endpoint scanning, a Collector can be configured with only one endpoint scanning credential. Therefore, if you have multiple domains or other requirements for separate credentials that need to be used for scanning different endpoint ranges, you should plan on a separate Collector for each domain or set of credentials.

If you wish to collect logs from a Checkpoint firewall, you must use a collector running on Windows. That is, you cannot use a Linux Collector to collect Checkpoint firewall logs.

A Collector installed on Linux has a limitation to the number of agents that it can support due to default file descriptor settings. For most Linux systems, the default agent limit is 2000 agents. To increase the number of agents that can connect to a Linux Collector, change the number of file descriptors to be twice the number of agents that you want the Collector to handle. More information on the file descriptor settings can be found here: https://www.tecmint.com/increase-set-open-file-limits-in-linux

If you already have Nexpose or InsightVM installed in your organization, do not install the Insight Collector Software on an existing Nexpose Console or Nexpose Scan Engine as this will cause issues with your Nexpose systems.