Amazon CloudWatch

A monitoring and observability service called Amazon CloudWatch was created for DevOps engineers, programmers, site reliability engineers (SREs), IT managers, and product owners. You can monitor your apps, react to changes in system performance, and optimize resource usage with the help of CloudWatch, which gives you statistics and useful insights.

Logs, metrics, and events are the main types of operational and monitoring data that CloudWatch gathers. You obtain total visibility of your AWS resources, apps, and services that are running both on AWS and on-premises, as well as a single picture of operational health.

In order to keep your applications functioning properly, you can use CloudWatch to spot suspicious behavior in your environments, set alarms, visualize logs and metrics side by side, take automated actions, troubleshoot problems, and find insights.

Amazon CloudWatch has available Amazon EC2 Metrics for you to use for:

CPU utilization
Network utilization
Disk performance
Disk Reads/Writes

*In case you need to monitor the below items, you need to prepare a custom metric using a Perl or other shell script, as there are no ready to use metrics for: *

Memory utilization
Disk swap utilization
Disk space utilization
Page file utilization
Log collection

Be aware that *there is a multi-platform CloudWatch agent* that may be set up on instances that are running Windows or Linux. From Amazon EC2 instances and on-premises servers, you may gather system metrics and log files with a single agent. You can choose the metrics to be collected with this agent, which supports both Windows Server and Linux and includes sub-resource measurements like per-CPU core. It is advised that you gather metrics and logs using the new agent rather than the older monitoring scripts.

How it works?

You can receive a unified view of your AWS resources, applications, and services that are running on AWS and on-premises using CloudWatch, which gathers monitoring and operational data in the form of logs, metrics, and events and visualizes it using automated dashboards. Through experimentation, you may see how end users interact with your application and confirm design decisions. To better understand the state and performance of your resources, correlate your metrics and logs. Create alerts that can watch for anomalous metric behavior using ML methods or alarms based on the metric value thresholds you specify.