The Overview page is designed to provide a quick summary about the state of your NGINX infrastructure. Here you can quickly check what is the total sum of HTTP 5xx errors over the past 24 hours, and compare it to the previous 24 hours.
Five key overlay graphs are displayed for the selected time period. By switching over various time periods you can compare trends and see if anything abnormal shows up.
The cumulative metrics displayed on the Overview page are:
- Total requests — sum of nginx.http.request.count
- HTTP 5xx errors — sum of nginx.http.status.5xx
- Request time (P95) — average of nginx.http.request.time.pctl95
- Traffic — sum of system.net.bytes_sent rate
- CPU Usage — average of system.cpu.user
Note. By default the metrics above are calculated for all monitored hosts. You can configure specific tags in the Overview settings popup to display the metrics for a set of hosts (e.g. only the "production environment").
You may see zero numbers if some metrics are not being gathered. E.g. if the request time (P95) is 0.000s, please check that you have properly configured NGINX log for additional metric collection.
Application Health Score
The upper left block displays a total score that reflects your web app performance. It's called Application Health Score (AHS).
The Application Health Score (AHS) is an Apdex-like numerical measure that can be used to estimate the quality of experience for your web application.
AHS is a product of 3 derivative service level indicators (SLI) — percentage of successful requests, percentage of "timely" requests, and agent availability. The "timely" requests are those with the total observed average request time P95 either below the low threshold (100% satisfying) or between the low and high threshold (partially satisfying).
A simplified formula for AHS is the following:
AHS = (Successful Requests %) (Timely Requests %) (Agent Availability %)
Each individual SLI in this formula can be turned on or off. By default only the percentage of successful requests is on.
There are T1 and T2 thresholds for the total observed average request time P95, that you can configure for AHS:
- T1 is the low threshold for satisfying requests
- T2 is the high threshold for partially satisfying requests
If the average request time (P95) for the selected time period is below T1, this is considered 100% satisfying state of requests. If the request time is above T1 and below T2, a "satisfaction ratio" is calculated accordingly. Requests above T2 are considered totally unsatisfying. E.g. with T1=0.2s and T2=1s, a request time greater than 1s would be considered unsatisfying, and the resulting score would be 0%.
The algorithm for calculating the AHS is:
successful_req_pct = (nginx.http.request.count - nginx.http.status.5xx) / nginx.http.request.count
if (nginx.http.request.time.pctl95 < T1)
timely_req_pct = 1
if (nginx.http.request.time.pctl95 < T2)
timely_req_pct = 1 - (nginx.http.request.time.pctl95 - T1) / (T2 - T1)
timely_req_pct = 0
m1 = successful_req_pct
m2 = timely_req_pct
m3 = agent_up_pct
app_health_score = m1 * m2 * m3
When you log in to Amplify, you’re presented with a collection of predefined graphs on the Graphs page. Here you can see an overview of the key metric stats, such as CPU, memory, and disk usage for all of your systems.
If you click on a system on the left, the graphs will change to reflect the metrics for the selected system. The graphs are further split into tabs such as "System", "NGINX" and so on.
Some graphs have an additional selector. E.g., with "Disk Latency" or "Network Traffic" you can select what device or interface you're analyzing.
Above the graphs, you will find the following:
- Hostname or alias for the selected system
- System properties editor where you can set up an alias for the host, and/or assign host tags
- List of tags assigned to the system
- Time range selector, which helps to display different time periods for the graphs
- Time zone selector
You can also copy a predefined graph to a custom dashboard by focusing on the graph and clicking on the arrow in the top right corner.
From the top menu bar, you can always open the inventory of the systems that are being monitored. When the agent is properly installed on a new system and reporting, it's automatically visible in the system index on the left and in the Inventory.
The Inventory allows you to check the status of all systems at a glance. It also provides a quick overview of the key metrics.
In the rightmost column of the Inventory you will also find the settings and the metadata viewer icons. Click on the "info" icon and a popup will appear with various useful information about the OS and the monitored NGINX instances. If you need to remove an object from the monitoring, use the "trash" icon.
You can apply sorting, search, and filters to the Inventory to quickly find the system in question. You can search and filter by hostname, IP address, architecture etc. You can use regular expressions with the search function.
Note. Bear in mind, that you'd also need to stop or uninstall the agent on the systems being removed from the monitoring, otherwise the objects will reappear in the UI. Be sure to delete any system specific alert rules too.
You can create your own dashboards populated with highly customizable graphs of NGINX and system-level metrics.
Some of the use cases for a custom set of graphs are the following:
- Checking NGINX performance for a particular application or microservice, e.g. based on the URI path
- Displaying metrics per virtual server
- Visualizing the performance of a group of NGINX servers — for example, front-end load balancers, or an NGINX edge caching layer
- Analyzing a detailed breakdown of HTTP status codes per application
When building a custom graph, metrics can be summed or averaged across several NGINX servers. By using metric filters it is also possible to create additional “metric dimensions” — for example, reporting the number of POST requests for a specific URI.
To create a custom dashboard, click CREATE DASHBOARD on the Dashboards drop-down menu. You can choose to quickly build several graphs from a preset to populate your custom dashboard with useful visualizations, or you can create your own graphs from scratch.
To start with a graph set wizard, click New Set.
If you'd like to add individual graphs, click New Graph in the upper right corner to start adding graphs to the dashboard.
When adding or editing a graph, the following dialog appears:
To define a graph, perform these steps:
- Enter the graph title.
- Pick one or more metrics. You can combine multiple metrics on the same graph using the "Add another metric" button.
- After the metric is selected, you are able to see the systems for which the metric has been observed. Select one or multiple systems here. You can also use tags to specify the systems.
- When aggregating across multiple systems, select either "Sum" or "Avg" as the aggregation function.
- Last but not least, the “filter” functionality is also available for NGINX metrics collected from the log files. If you click on "Add metric filter", you can then add multiple criteria in order to define specific "metric dimensions". In the example above, we are matching the NGINX upstream response time against the /api/feed/reports URI. You can also build other filters, e.g. displaying metric nginx.http.status.2xx for the responses with the status code 201.
- Click "Save" when you're done, and the graph is added to the dashboard. You can also edit the graph later on if needed, move it around, resize, stack the graphs on top of each other, etc.
Note. When using filters, all the "metric dimensions" aren't stored in the NGINX Amplify backend by default. A particular filter starts to slice the metric according to the specification only after the graph is created. Hence, it can be a while before the "filtered" metric is displayed on the graph — the end result depends on how quickly the log files are being populated with the new entries, but typically you should see the first data points in under 5 minutes.
Because NGINX Amplify is not a SaaS log analyzer, the additional slicing for "metric dimensions" is implemented inside the agent. The agent can parse the NGINX access logs on-the-fly and extract all the necessary metrics without sending the raw log entries elsewhere. Moreover, the agent understands custom log formats automatically, and will start looking for various newly defined "metric dimensions" following a particular log_format specification.
Essentially, the agent performs a combination of real-time log analytics and standard metrics collection (e.g. metrics from the stub_status module). The agent does only the real-time log processing, and always on the same host where it is running.
Metric filters can be really powerful. By using the filters and creating additional "metric dimensions", it is possible to build highly granular and very informative graphs. To enable the agent to slice the metrics you must add the corresponding log variables to the active NGINX log format. Please see the Additional NGINX metrics section below.
Metric filters are available only for the metrics generated from the log files. For other metrics some additional modifiers can be set when editing a graph. E.g. for NGINX Plus it is possible to specify the status API zones to build more detailed visualizations.
While editing the dashboard, you can also use additional features like "Clone" to streamline the worklow.
NGINX Amplify Agent parses NGINX configuration files and transmits them to the backend component for further analysis. This is where Amplify offers configuration recommendations to help improve the performance, reliability, and security of your applications. With well-thought-out and detailed recommendations you’ll know exactly where the problem is, why it is a problem, and how to fix it.
When you switch to the Analyzer page, click on a particular system on the left in order to see the associated report. Unless an NGINX instance is found on a system, there will be no report for it.
The following information is provided when a report is run against an NGINX config structure:
- Branch, release date, and the latest version in the branch
- Path to NGINX config files(s)
- Whether the parser failed or not, and the results of nginx -t
- Last-modified info
- 3rd party modules found
- Breakdown of the key configuration elements (servers, locations, upstreams)
- Breakdown of IPv4/IPv6 usage
- Any security advisories that apply to this version of NGINX
- Breakdown of the virtual host configuration (think "apachectl -S")
- OpenSSL version information
- Breakdown of the number of SSL or HTTP/2 servers configured
- Information about the configured SSL certificates
- Warnings about common SSL configuration errors
- Various suggestions about configuration structure
- Typical configuration gotchas highlighted
- Common advice about proxy configurations
- Suggestions about simplifying rewrites for certain use cases
- Key security measures (e.g. stub_status is unprotected)
- Typical errors in configuring locations, especially with regex
To parse SSL certificate metadata the Amplify Agent uses standard openssl(1) functions. SSL certificates are parsed and analyzed only when the corresponding settings are turned on. SSL certificate analysis is off by default.
Static analysis will only include information about specific issues with the NGINX configuration if those are found in your NGINX setup.
In the future, the Analyzer page will also include dynamic analysis, effectively linking the observed NGINX behavior to its configuration — e.g. when it makes sense to increase or decrease certain parameters like proxy_buffers etc. Stay tuned! Note. Config analysis is on by default. If you don't want your NGINX configuration to be checked, unset the corresponding setting in either Global, or Local (per-system) settings. See Settings below.
The Alerts page describes the configuration of the alert rules used to notify you of any anomalies in the behavior of your systems.
Alerts are based on setting a rule to monitor a particular metric. Alert rules allow the user to specify the metric, the trigger condition, the threshold, and the email for notifications.
The way alert rules work is the following:
- Incoming metric updates are being continuously monitored against the set of rules.
- If there's a rule for a metric, the new metric update is checked against the threshold.
- If the threshold is met, an alert notification is generated, and the rule will continue to be monitored.
- If subsequent metric updates show that the metric no longer violates the threshold for the configured period, the alert is cleared.
By default there's no filtering by host. If a specific alert should only be raised for a particular system, you should specify the hostname(s) or tags when configuring the alert. Currently metrics can't be aggregated across all systems; instead any system will match a particular rule unless a host is specified.
There's one special rule which is the about amplify.agent.status metric. This metric reflects the state of the agent (and hence, the state of the system as seen by Amplify). You can only configure a 2 minute interval and only 0 (zero) as the threshold for amplify.agent.status.
You shouldn't see consecutive notifications about the same alert over and over again. Instead there will be digest information sent out every 60 minutes, describing which alerts were generated and which ones were cleared.
Note. Gauges are averaged over the interval configured in the rule. Counters are summed up. Currently that's not user configurable and these are the only reduce functions available for configuring metric thresholds.
Note. Emails are sent using AWS SES. Make sure your mail relay accepts their traffic. Also make sure to verify the specified email and check the verification status in the Account menu.
The Account option in the "hamburger" menu at the top right corner of the web interface contains various important settings.
First of all, you can always check the information you provided upon signing up, and edit specific fields.
You can also see the current limits such as "maximum number of agents", "maximum number of custom dashboards", etc.
Inside the Users section you will see the list of the user logins that are associated with this particular account. If you are the admin user, you can also invite your team members to the account.
Users can be assigned one of the three roles — Admin, User, or Read-Only. Admin users are allowed to use all the functions in the Amplify UI, can add/remove users, and modify everything. The User role is almost unrestricted with the exception of managing other users. Read-only users can't modify graphs, or manage users — this role can be useful for your support team members.
In the Notifications section you will find the information about the emails currently registered with your account, and whether they are verified or not. The alert notifications are only sent to verified emails.
In addition to the email alert notifications, you can optionally configure the integration with your Slack team and workspace. Under the registered emails section, click the "Add to Slack" button to allow Amplify to send you certain notifications on Slack. You will have to login and provide the necessary details about your team, and what channels you'd like to add to Amplify notifications. Both direct messages and channels can be used for notifications. If configured successfully, Amplify is able to send alert information to Slack. A few more additional notifications are available — e.g. the agent not finding a running nginx instance, but also proactive messages about the issues with the SSL certs.
The "Agent settings section is where you enable or disable account-wide behavior for:
- NGINX configuration files analysis
- Periodic NGINX configuration syntax checking with "nginx -t"
- Analyzing SSL certs
Per-system settings are accessible via the "Settings" icon that can be found for a particular NGINX on the Analyzer page. Per-system settings override the global settings. If you generally prefer to monitor your NGINX configurations on all but some specific systems, you can uncheck the corresponding settings.