Advice on syslog setup

I am looking for advice on setting up a remote syslog facility to recieve syslog from our MAAS + SLURM system of a few thousand servers running SLURM-nodes.
The syslog clients needs to be lightweight (no Java) since we dont want to lose performance.

Functionality for notification/alarm is desirable as is visualization.

Primary use-case is to be able to correlate errors in syslogs with time alongside SLURM-jobs running on the servers.

What is your take on a good stack/bundle here?

From a sending standpoint, cs:rsyslog-forwarder-ha is useful to be able to land a configuration on each juju machine as a subordinate that can forward to one or more rsyslog servers. There is also a cs:rsyslog server charm that can be used. My experience has only been with using the forwarder charm to send logs to non-juju managed, upstream systems such as loglogic, splunk, etc.

For more advanced log management there are a couple of stacks that you could choose from that are based around filebeat charm on each unit, and elasticsearch charm managing the log warehouse, and either graylog managing the communication between filebeat and elasticsearch, or filebeat sending directly to elasticsearch and the filebeat indicies can be read by Kibana. It’s best to evaluate each of these stacks for your own use case.

I believe the big-data teams may have some good documentation available for gathering and processing information at this scale using these charms.

I’m thinking of setting up Elasticsearch + Graylog for MAAS initially since MAAS has support for sending its syslog to a remote server. Apparently, Elasticsearch has native support for syslog-ng (which I hope MAAS somehow can manage. This is what docs say: https://maas.io/docs/syslog)

After this works, I will try hook up SLURM with elasticsearch along with the rsyslog-forwarder subordinate you suggested.

This will then likely let me correlate events on bort MAAS, server and SLURM layers in the same elasticsearch instance.

What are your thoughts around this setup ?

I’m not sure which charms has the best status though for these systems since there are a few competing charms already in the charmstore…

I would love to get some advice on which charmstore accounts to deploy, especially for the Graylog charm: https://jujucharms.com/graylog/40 (is this the best version?)

I’m having problems getting graylog to run aswell when I’m testing this on AWS…

I’ve filed some bug reports: https://bugs.launchpad.net/graylog-charm/+bug/1850211
and
https://bugs.launchpad.net/graylog-charm/+bug/1850206

Not sure how to put heat on this…

I’ve updated with comments on your bugs. I believe this comes down to needing to properly size your aws instances in the “machines” section of your bundle definitions using appropriate flavors and then working out architecture for access to systems deployed on the FAN network.

Accessing the graylog API is tricky given that the graylog web UI forces your desktop browser to call-back to the API directly rather than the graylog webserver proxying your API requests. This means that the X-Graylog-API URL header needs to be updated. See the graylog charm documentation regarding Reverse Proxy relation and how you can utilize an apache2 instance to provide a more seamless graylog interface.

The cs:graylog promulgated charm is the correct and supported charm to use, and in general, your bundle and relations look correct, but you’ll need to provide more information related to your VPC setup, networking, routing, etc before we can provide further feedback on how to resolve your issues related to the API errors in lp#1850211.

1 Like