Monitoring Tools Changes

Placeholder Image

Over last 15 years I have been tinkering with some excellent monitoring software tools, these alongside other types of software are today known as monolithic software. I recently heard this term when I was doing docker training, how times have changed, it made me feel so somewhat outdated, which I am ….not,  and thus I started to research the next generation of tools, and figure out why are they so hip and trendy, here’s few that I have recently read about, and tinkered with a few of them.

  • AWS – Cloudwatch (cloud monitor)
  • Google StackDriver (cloud monitor)
  • Azure Monitor (cloud monitor)
  • Riverbed (apm)
  • App Dynamics (apm)
  • Splunk (log/siem)
  • Elastic (log/siem)
  • LogRythm (log/siem)
  • New Relic (apm)
  • Solarwinds (network)

These tools are a mix of classic and open source which can be deployed in the cloud and on premise. Some of the key findings for me were, alongside the classic way of deployment, they offer cloud based monitoring, API services, deployment of the tool in the cloud, which makes them more agile than the monolithic tools, you don’t need a big army of people to deploy, configure and maintain them, that said it does depend on the size of the organizations requirements,  so you still have to perform an environment analyses before you choose your tool, this is sometimes a painful process, but if you do all the upfront analysis work and define a monitoring strategy, you will have a good outcome.

Each tool has its pros and cons, the main difference today is that for the devops world which is what IT departments use as a practise, need agile ways of monitoring due to the speed of everything, and continuous integration, so time to deploy, configure and use is a key requirement. What used to take many months to provision servers, software and then deliver, can now be done within hours for some use cases in the cloud.

The next generation of monitoring tools which are in the limelight offer dashboards, reporting, time series charts, anomaly detection (machine learning), alerts, application diagnostics, network sniffing, metric correlation, logging of raw logs (unstructured and structure data) and do them faster and make use of API’s. Look out for these features and map them to your use cases to ensure they meet the requirements.

The one new standout feature that is part of some of the tools, is the machine learning aspect, its a new way of monitoring, it’s trying to find the needle in the haystack due to the large volumes of data which is impossible for humans to trawl through, so for logs analysis, anomaly detection is used and it’s a great feature. Another way of getting to the data you want, is to use the search queries against logs to provide useful insights, some of these tools provide a great way to get the data ingested.

Monitoring start-ups are using cloud (AWS), containers, and open source components to build some of these new monitoring tools and they provide good and fast monitoring services, it’s also worth seeing if these meet your requirements, but keep security in mind as sometimes the on premise tools might be better suited due to the risk of your infrastructure being compromised.

Monitoring has and always will be a big challenge, trying to monitor the full IT stack is very difficult task to do, in the past some software projects would fail to even put monitoring in to the project, today it should be a standard process. I go by the principle that it’s never going be 100% perfect as I have learned over the years, if you get 80% of it right, (80/20 rule) then you are winning.

A Typical IT Monitoring Stack

  • End User (synthetic monitoring, web page & protocol use and performance, reports)
  • Applications (Metrics Performance / Custom), SQL Queries)
  • On-Premise – Infrastructure (performance and availability, metrics, logging, logs, security, hardware)
  • Cloud – (service, compute, database, capacity, scaling, API, reports)

With this stack in mind, consider do you really need every metric and web application monitored? do you really need an agent on every server? do you really need every event or log? (Unless required for compliance), these are a few things to think about, control the monitoring environment is key, or it will get out of hand very quickly.

Here‘s a few tips on getting the right monitoring outcome:

  • Work on your monitoring technical business goals
  • Identify the functional and non-functional requirements (ensure its future proof API oriented, can perform, ease of use, and scalable for the enterprise)
  • Architect and design the solution (HLD/LLD)
  • Plan the build activities
  • Test the solution
  • Develop tools skills
  • Create standards
  • Deploy the solution
  • Grow the solution, don’t use all the features, but exploit them time
  • Configure as much automation as possible
  • Maintain the solution

No one tool or tools are going to give you the magic solution you want, having a mix of tools (best of breed) can become a nightmare to manage and integrate, that said with Rest API services it’s becoming easier, having a one stop shop platform can lead to vendor lock in, and sometimes be cumbersome to use, but on the plus side they give you a complete framework that all the teams (network/apps/db/os/dev) can eventually learn and exploit to their advantages. Sometimes it’s better to use complete framework and sometimes it’s better to use best of breed.

The one thing I would suggest is look at how easy it is to use and maintain the tool, if you’re fighting with the tool to get it to do what you want in a timely manner, it’s not going to be a happy relationship….use the proof of concept stages to validate the solution, this may cost a little more upfront but in the longer can save you a lot in terms of costs and headaches.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s