Metrics and Metadata

Most metrics are collected by the agent without requiring the user to perform any additional setup. For troubleshooting, see What to Check if the Agent Isn't Reporting Metrics.
Some additional metrics for NGINX monitoring will only be reported if the NGINX configuration file is modified accordingly. See Additional NGINX Metrics below, and pay attention to the Source and Variable fields in the metric descriptions that follow.

OS Metrics

  • amplify.agent.status
Type: internal, integer Description: 1 - agent is up, 0 - agent is down.
  • amplify.agent.cpu.system
  • amplify.agent.cpu.user
Type: gauge, percent Description: CPU utilization percentage observed from the agent process.
  • amplify.agent.mem.rss
  • amplify.agent.mem.vms
Type: gauge, bytes Description: Memory utilized by the agent process.
  • system.cpu.idle
  • system.cpu.iowait
  • system.cpu.system
  • system.cpu.user
Type: gauge, percent Description: System CPU utilization.
  • system.cpu.stolen
Type: gauge, percent Description: System CPU stolen. Represents time when the real CPU was not available to the current VM.
  • system.disk.free
  • system.disk.total
  • system.disk.used
Type: gauge, bytes Description: System disk usage statistics.
  • system.disk.in_use
Type: gauge, percent Description: System disk usage statistics, percentage.
  • system.io.iops_r
  • system.io.iops_w
Type: counter, integer Description: Number of reads or writes per sampling window.
  • system.io.kbs_r
  • system.io.kbs_w
Type: counter, kilobytes Description: Number of kilobytes read or written.
  • system.io.wait_r
  • system.io.wait_w
Type: gauge, milliseconds Description: Time spent reading from or writing to disk.
  • system.load.1
  • system.load.5
  • system.load.15
Type: gauge, float Description: Number of processes in the system run queue, averaged over the last 1, 5, and 15 min.
  • system.mem.available
  • system.mem.buffered
  • system.mem.cached
  • system.mem.free
  • system.mem.shared
  • system.mem.total
  • system.mem.used
Type: gauge, bytes Description: Statistics about system memory usage.
  • system.mem.pct_used
Type: gauge, percent Description: Statistics about system memory usage, percentage.
  • system.net.bytes_rcvd
  • system.net.bytes_sent
Type: counter, bytes Description: Network I/O statistics. Number of bytes received or sent, per network interface.
  • system.net.drops_in.count
  • system.net.drops_out.count
Type: counter, integer Description: Network I/O statistics. Total number of inbound or outbound packets dropped, per network interface.
  • system.net.packets_in.count
  • system.net.packets_out.count
Type: counter, integer Description: Network I/O statistics. Number of packets received or sent, per network interface.
  • system.net.packets_in.error
  • system.net.packets_out.error
Type: counter, integer Description: Network I/O statistics. Total number of errors while receiving or sending, per network interface.
  • system.net.listen_overflows
Type: counter, integer Description: Number of times the listen queue of a socket overflowed.
  • system.swap.free
  • system.swap.total
  • system.swap.used
Type: gauge, bytes Description: System swap memory statistics.
  • system.swap.pct_free
Type: gauge, percent Description: System swap memory statistics, percentage.

NGINX Metrics

HTTP Connections and Requests

  • nginx.http.conn.accepted
  • nginx.http.conn.dropped
Type: counter, integer Description: NGINX-wide statistics describing HTTP connections. Source: stub_status (or N+ extended status)
  • nginx.http.conn.active
  • nginx.http.conn.current
  • nginx.http.conn.idle
Type: gauge, integer Description: NGINX-wide statistics describing HTTP connections. Source: stub_status (or N+ extended status)
  • nginx.http.request.count
Type: counter, integer Description: Total number of client requests. Source: stub_status (or N+ extended status)
  • nginx.http.request.current
  • nginx.http.request.reading
  • nginx.http.request.writing
Type: gauge, integer Description: Number of currently active requests (reading and writing). Number of requests reading headers or writing responses to clients. Source: stub_status (or N+ extended status)
  • nginx.http.request.malformed
Type: counter, integer Description: Number of malformed requests. Source: access.log
  • nginx.http.request.body_bytes_sent
Type: counter, integer Description: Number of bytes sent to clients, not counting response headers. Source: access.log

HTTP Methods

  • nginx.http.method.get
  • nginx.http.method.head
  • nginx.http.method.post
  • nginx.http.method.put
  • nginx.http.method.delete
  • nginx.http.method.options
Type: counter, integer Description: Statistics about observed request methods. Source: access.log

HTTP Status Codes

  • nginx.http.status.1xx
  • nginx.http.status.2xx
  • nginx.http.status.3xx
  • nginx.http.status.4xx
  • nginx.http.status.5xx
Type: counter, integer Description: Number of requests with specific HTTP status codes. Source: access.log
  • nginx.http.status.discarded
Type: counter, integer Description: Number of requests finalized with status code 499 which is logged when the client closes the connection. Source: access.log

HTTP Protocol Versions

  • nginx.http.v0_9
  • nginx.http.v1_0
  • nginx.http.v1_1
  • nginx.http.v2
Type: counter, integer Description: Number of requests using a specific version of the HTTP protocol. Source: access.log

NGINX Process Metrics

  • nginx.workers.count
Type: gauge, integer Description: Number of NGINX worker processes observed.
  • nginx.workers.cpu.system
  • nginx.workers.cpu.total
  • nginx.workers.cpu.user
Type: gauge, percent Description: CPU utilization percentage observed for NGINX worker processes.
  • nginx.workers.fds_count
Type: gauge, integer Description: Number of file descriptors utilized by NGINX worker processes.
  • nginx.workers.io.kbs_r
  • nginx.workers.io.kbs_w
Type: counter, integer Description: Number of kilobytes read from or written to disk by NGINX worker processes.
  • nginx.workers.mem.rss
  • nginx.workers.mem.vms
Type: gauge, bytes Description: Memory utilized by NGINX worker processes.
  • nginx.workers.mem.rss_pct
Type: gauge, percent Description: Memory utilization percentage for NGINX worker processes.
  • nginx.workers.rlimit_nofile
Type: gauge, integer Description: Hard limit on the number of file descriptors as seen by NGINX worker processes.

Additional NGINX Metrics

NGINX Amplify Agent can collect a number of additional useful metrics described below. To enable these metrics, please make the following configuration changes. More predefined graphs will be added to the Graphs page if the agent finds additional metrics. With the required log format configuration, you'll be able to build more specific custom graphs.
  • The access.log log format should include an extended set of NGINX variables. Please add a new log format or modify the existing one — and use it with the access_log directives in your NGINX configuration.
    log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for" ' '"$host" sn="$server_name" ' 'rt=$request_time ' 'ua="$upstream_addr" us="$upstream_status" ' 'ut="$upstream_response_time" ul="$upstream_response_length" ' 'cs=$upstream_cache_status' ;
  • Here's how you may use the extended log format with your access log configuration:
    access_log /var/log/nginx/access.log main_ext;
    Note. Please bear in mind that by default the agent will process all access logs that are found in your log directory. If you define a new log file with the extended log format that will contain the entries being already logged to another access log, your metrics might be counted twice. Please refer to the agent configuration section above to learn how to exclude specific log files from processing.
  • The error.log log level should be set to warn.
    error_log /var/log/nginx/error.log warn;
    Note. Don't forget to reload your NGINX configuration with either kill -HUP or service nginx reload.
Here is the list of additional metrics that can be collected from the NGINX log files:
  • nginx.http.request.bytes_sent
Type: counter, integer Description: Number of bytes sent to clients. Source: access.log (requires custom log format) Variable: $bytes_sent
  • nginx.http.request.length
Type: gauge, integer Description: Request length, including request line, header, and body. Source: access.log (requires custom log format) Variable: $request_length
  • nginx.http.request.time
  • nginx.http.request.time.count
  • nginx.http.request.time.max
  • nginx.http.request.time.median
  • nginx.http.request.time.pctl95
Type: gauge, seconds.milliseconds Description: Request processing time — time elapsed between reading the first bytes from the client and writing a log entry after the last bytes were sent. Source: access.log (requires custom log format) Variable: $request_time
  • nginx.http.request.buffered
Type: counter, integer Description: Number of requests that were buffered to disk. Source: error.log (requires 'warn' log level)
  • nginx.http.gzip.ratio
Type: gauge, float Description: Achieved compression ratio, calculated as the ratio between the original and compressed response sizes. Source: access.log (requires custom log format) Variable: $gzip_ratio
Upstream Metrics
  • nginx.upstream.connect.time
  • nginx.upstream.connect.time.count
  • nginx.upstream.connect.time.max
  • nginx.upstream.connect.time.median
  • nginx.upstream.connect.time.pctl95
Type: gauge, seconds.milliseconds Description: Time spent on establishing connections with upstream servers. With SSL, it also includes time spent on the handshake. Source: access.log (requires custom log format) Variable: $upstream_connect_time
  • nginx.upstream.header.time
  • nginx.upstream.header.time.count
  • nginx.upstream.header.time.max
  • nginx.upstream.header.time.median
  • nginx.upstream.header.time.pctl95
Type: gauge, seconds.milliseconds Description: Time spent on receiving response headers from upstream servers. Source: access.log (requires custom log format) Variable: $upstream_header_time
  • nginx.upstream.response.buffered
Type: counter, integer Description: Number of upstream responses buffered to disk. Source: error.log (requires 'warn' log level)
  • nginx.upstream.request.count
  • nginx.upstream.next.count
Type: counter, integer Description: Number of requests that were sent to upstream servers. Source: access.log (requires custom log format) Variable: $upstream_*
  • nginx.upstream.request.failed
  • nginx.upstream.response.failed
Type: counter, integer Description: Number of requests and responses that failed while proxying. Source: error.log (requires 'error' log level)
  • nginx.upstream.response.length
Type: gauge, bytes Description: Average length of the responses obtained from the upstream servers. Source: access.log (requires custom log format) Variable: $upstream_response_length
  • nginx.upstream.response.time
  • nginx.upstream.response.time.count
  • nginx.upstream.response.time.max
  • nginx.upstream.response.time.median
  • nginx.upstream.response.time.pctl95
Type: gauge, seconds.milliseconds Description: Time spent on receiving responses from upstream servers. Source: access.log (requires custom log format) Variable: $upstream_response_time
  • nginx.upstream.status.1xx
  • nginx.upstream.status.2xx
  • nginx.upstream.status.3xx
  • nginx.upstream.status.4xx
  • nginx.upstream.status.5xx
Type: counter, integer Description: Number of responses from upstream servers with specific HTTP status codes. Source: access.log (requires custom log format) Variable: $upstream_status
Cache Metrics
  • nginx.cache.bypass
  • nginx.cache.expired
  • nginx.cache.hit
  • nginx.cache.miss
  • nginx.cache.revalidated
  • nginx.cache.stale
  • nginx.cache.updating
Type: counter, integer Description: Various statistics about NGINX cache usage. Source: access.log (requires custom log format) Variable: $upstream_cache_status

NGINX Plus Metrics

In NGINX Plus a number of additional metrics describing various aspects of NGINX performance are available. The extended status module in NGINX Plus is responsible for collecting and exposing all of the additional counters and gauges.
The NGINX Plus metrics currently supported by the agent are described below. The NGINX Plus extended status metrics have the "plus" prefix in their names.
Some of the NGINX Plus extended metrics extracted from the connections and the requests datasets are used to generate the following server-wide metrics (instead of using the stub_status metrics):
nginx.http.conn.accepted = connections.accepted nginx.http.conn.active = connections.active nginx.http.conn.current = connections.active + connections.idle nginx.http.conn.dropped = connections.dropped nginx.http.conn.idle = connections.idle nginx.http.request.count = requests.total nginx.http.request.current = requests.current
Please see the following reference documentation and a solution brief for more information about the NGINX Plus extended status.
The NGINX Plus metrics below are collected per zone. When configuring a graph using these metrics, please make sure to pick the correct server, upstream or cache zone. A more granular peer-specific breakdown of the metrics below is currently not supported in NGINX Amplify.
Server Zone Metrics
  • plus.http.request.count
  • plus.http.response.count
Type: counter, integer Description: Number of client requests received, and responses sent to clients. Source: NGINX Plus extended status
  • plus.http.request.bytes_rcvd
  • plus.http.request.bytes_sent
Type: counter, bytes Description: Number of bytes received from clients, and bytes sent to clients. Source: NGINX Plus extended status
  • plus.http.status.1xx
  • plus.http.status.2xx
  • plus.http.status.3xx
  • plus.http.status.4xx
  • plus.http.status.5xx
Type: counter, integer Description: Number of responses with status codes 1xx, 2xx, 3xx, 4xx, and 5xx. Source: NGINX Plus extended status
  • plus.http.status.discarded
Type: counter, integer Description: Number of requests completed without sending a response. Source: NGINX Plus extended status
Upstream Zone Metrics
  • plus.upstream.peer.count
Type: gauge, integer Description: Current number of live upstream servers in an upstream group. If graphed/monitored without specifying an upstream, it's the current number of all live upstream servers in all upstream groups. Source: NGINX Plus extended status
  • plus.upstream.request.count
  • plus.upstream.response.count
Type: counter, integer Description: Number of client requests forwarded to the upstream servers, and responses obtained. Source: NGINX Plus extended status
  • plus.upstream.conn.active
Type: gauge, integer Description: Current number of active connections to the upstream servers. Source: NGINX Plus extended status
  • plus.upstream.bytes_rcvd
  • plus.upstream.bytes_sent
Type: counter, integer Description: Number of bytes received from the upstream servers, and bytes sent. Source: NGINX Plus extended status
  • plus.upstream.status.1xx
  • plus.upstream.status.2xx
  • plus.upstream.status.3xx
  • plus.upstream.status.4xx
  • plus.upstream.status.5xx
Type: counter, integer Description: Number of responses from the upstream servers with status codes 1xx, 2xx, 3xx, 4xx, and 5xx. Source: NGINX Plus extended status
  • plus.upstream.header.time
  • plus.upstream.header.time.count
  • plus.upstream.header.time.max
  • plus.upstream.header.time.median
  • plus.upstream.header.time.pctl95
Type: gauge, seconds.milliseconds Description: Average time to get the response header from the upstream servers. Source: NGINX Plus extended status
  • plus.upstream.response.time
  • plus.upstream.response.time.count
  • plus.upstream.response.time.max
  • plus.upstream.response.time.median
  • plus.upstream.response.time.pctl95
Type: gauge, seconds.milliseconds Description: Average time to get the full response from the upstream servers. Source: NGINX Plus extended status
  • plus.upstream.fails.count
  • plus.upstream.unavail.count
Type: counter, integer Description: Number of unsuccessful attempts to communicate with upstream servers, and how many times upstream servers became unavailable for client requests. Source: NGINX Plus extended status
  • plus.upstream.health.checks
  • plus.upstream.health.fails
  • plus.upstream.health.unhealthy
Type: counter, integer Description: Number of performed health check requests, failed health checks, and how many times the upstream servers became unhealthy. Source: NGINX Plus extended status
  • plus.upstream.queue.size
Type: gauge, integer Description: Current number of queued requests. Source: NGINX Plus extended status
  • plus.upstream.queue.overflows
Type: counter, integer Description: Number of requests rejected due to queue overflows. Source: NGINX Plus extended status
Cache Zone Metrics
  • plus.cache.bypass
  • plus.cache.bypass.bytes
  • plus.cache.expired
  • plus.cache.expired.bytes
  • plus.cache.hit
  • plus.cache.hit.bytes
  • plus.cache.miss
  • plus.cache.miss.bytes
  • plus.cache.revalidated
  • plus.cache.revalidated.bytes
  • plus.cache.size
  • plus.cache.stale
  • plus.cache.stale.bytes
  • plus.cache.updating
  • plus.cache.updating.bytes
Type: counter, integer; counter, bytes Description: Various statistics about NGINX Plus cache usage. Source: NGINX Plus extended status

Other metrics

PHP-FPM metrics

You can also monitor your PHP-FPM applications with NGINX Amplify.
When the agent finds a PHP-FPM master process, it then tries to auto-detect the path to the PHP-FPM configuration. When the PHP-FPM configuration is found, the agent will look up the pool definitions, and the corresponding pm.status_path directives.
The agent will try to find all pools and status URIs currently configured. The agent will then try to query the PHP-FPM pool status(es) via FastCGI. There's no need to define HTTP proxy in your NGINX configuration that will point to the PHP-FPM status URIs.
To start monitoring PHP-FPM, follow the steps below:
  1. Make sure that your PHP-FPM status is enabled for at least one pool (if not, uncomment the pm.status_path directive for the pool, and restart PHP-FPM).
  2. Check that NGINX, the Amplify Agent, and the PHP-FPM workers are all run under the same user ID (e.g. www-data).
  3. Check that the listen socket for the PHP-FPM pool you want to monitor (and for which you enabled pm.status_path) is properly configured with listen.owner and listen.group (should be the user ID from step #2 above, e.g. www-data).
  4. Check that the PHP-FPM listen socket for the pool is properly created and has the right permissions.
  5. Check that you can query the PHP-FPM status for the pool from the command line, e.g.
    # SCRIPT_NAME=/status SCRIPT_FILENAME=/status QUERY_STRING= REQUEST_METHOD=GET cgi-fcgi -bind -connect /var/run/php5-fpm.sock
    and that the above command (or alike) returns the proper set of PHP-FPM metrics.
    Note. the cgi-fcgi tool has to be installed separately (e.g. from the fcgi package). This tool is not required for the agent to collect and report PHP-FPM metrics. It can be used to diagnose possible issues though.
  6. Update the agent to the most recent version.
  7. Check that the following options are set in /etc/amplify-agent/agent.conf
    [extensions] phpfpm = True
  8. Restart the agent.
The agent should be able to detect the PHP-FPM master and workers, obtain the access to status, and collect the necessary metrics.
Here is the list of caveats to look for if the PHP-FPM metrics are not being collected:
  • No status enabled for any of the pools.
  • Wrong permissions for the PHP-FPM listen sockets.
  • Using variables like $pool in the socket configuration.
With all of the above successfully configured, the end result should be an additional tab displayed on the Graphs page, with the pre-defined visualization of the PHP-FPM metrics.
The PHP-FPM metrics on the Graphs page are cumulative, across all automatically detected pools. If you need per-pool graphs, go to "Dashboards" and create custom graphs per pool.
Below is the list of the currently supported PHP-FPM metrics.
  • php.fpm.conn.accepted
Type: counter, integer Description: The number of requests accepted by the pool. Source: PHP-FPM status (accepted conn)
  • php.fpm.queue.current
Type: gauge, integer Description: The number of requests in the queue of pending connections. Source: PHP-FPM status (listen queue)
  • php.fpm.queue.max
Type: gauge, integer Description: The maximum number of requests in the queue of pending connections since FPM has started. Source: PHP-FPM status (max listen queue)
  • php.fpm.queue.len
Type: gauge, integer Description: The size of the socket queue of pending connections. Source: PHP-FPM status (listen queue len)
  • php.fpm.proc.idle
Type: gauge, integer Description: The number of idle processes. Source: PHP-FPM status (idle processes)
  • php.fpm.proc.active
Type: gauge, integer Description: The number of active processes. Source: PHP-FPM status (active processes)
  • php.fpm.proc.total
Type: gauge, integer Description: The number of idle + active processes. Source: PHP-FPM status (total processes)
  • php.fpm.proc.max_active
Type: gauge, integer Description: The maximum number of active processes since FPM has started. Source: PHP-FPM status (max active processes)
  • php.fpm.proc.max_child
Type: gauge, integer Description: The number of times, the process limit has been reached. Source: PHP-FPM status (max children reached)
  • php.fpm.slow_req
Type: counter, integer Description: The number of requests that exceeded request_slowlog_timeout value. Source: PHP-FPM status (slow requests)