Picking Monitoring System

The importance of monitoring cannot be understated in a distributed system. When you cannot login to a single machine and tail logs, being able to aggregate metrics from various systems becomes very important. Your mileage may vary here but here are few factors I consider. There are many monitoring SaaS services that attempt to offload the work of maintaining a monitoring system for you, if you have the budget and (sometimes) patience for the services, go ahead but I'm going to focus on the open source tools you can get started with. The goal here is to understand the fundamental of a metrics/monitoring system. Understanding this may help you figure out what you need from a SaaS monitoring service too.

There are generally three parts of a monitoring service

  • Collecting
  • Aggregating
  • Exposing

A metric service exposes a tcp or udp endpoint that accepts metrics in a certain protocol. The simplest one is perhaps graphite that exposes a tcp endpoint that takes messages in this format metric.name <int> Believe it or not, what you pick for your monitoring system depends heavily on what you are planning to use it for. If you will want to slice and dice the data, you may want something with a query interface like TSDB or InfluxDB.

results matching ""

    No results matching ""