Remstats Home


  About remstats
  Release Notes
  FAQ
  Conventions
  Prerequisites
  Installation
  Configuration
  Configuration Tools
  Servers
  Collectors
  Updater
  Monitors
  pagemakers
  run-remstats2
  CGIs
  troubleshooting
      check-rrdlast
  do-traceroutes
  Miscellany
  Thank-you
  Index


Live Data

Remstats was written by Thomas Erskine at the CRC in Canada and now looking for work.

[Top] [Prev] [Next]

Troubleshooting

[Almost all of the various programs accept a -h or --help flag to ask for help, if you've forgotten or don't know it. Most also accept a -d or --debug flag to set the level of debugging output.]

Tools

There are two tools which are specifically intended for finding problems:

  • check-config - You should always run this after making any configuration change. It won't find all possible problems with the configuration files, but if it doesn't like your configuration, you can be sure that nothing else will, If it prints nothing, all is well. It will also note when it is creating new RRDs and directories, but this is not an error.
  • check-rrdlast - checks the last-update time on all RRD files. Make sure that you have no TIMEWARP marked RRDs. These are RRDs with a last-update time in the future. They won't be updated until after what they think is the last-update time. Anything that's STALE marked is also a concern if it's long overdue. These would be RRDs which aren't getting updated, for some reason.

Are things running?

The first thing to check is whether run-remstats2 is being started by cron. Look in tmp for a file called LOG-run-remstats2. If it's there and doesn't have anything marked ERROR or ABORT, this is a good thing. Check that the modified time for the file is within the last 5 minutes, to make sure that it's being run at appropriate times.

The next thing to check is whether run-remstats2 or run-remstats has been told to run the collector at all.

  • For run-remstats, look at the collectors line in the general config-fileebet
  • For run-remstats2, look at the run-stages/collectors file and make sure that the appropriate line is uncommented.

Once it's running, you can check what data it found in the last run. Look in data/LAST/<collectorname>. The lines are all the data which that collector found on the last run.

Remstats Services

First of all, make sure that the perl specified in the shebang line exists and is the right perl to use. If you're getting a message like "No such file or directory" and you can see that the script exists, check the first line of the script, which will look something like:

	#!/usr/bin/perl -w

And make sure that the specified perl (in the example "/usr/bin/perl" exists, is executable by the user specified in the inetd configuration, and is the correct perl to use.

If you're having difficulty getting data from remstats servers, you should check them out without the collectors. For the unix-status-server, you can make sure that you have access, by running the following on the server host:

	% telnet localhost 1957
	UNAME
	GO

If you don't get connected, you'll have to look at your inetd configuration, and the output it logs. You might have to run inetd with debugging flags to get it to log enough to be usefull.

If you get connected and then shortly afterwards disconnected, this is the signature of the tcp_wrappers refusing a connection: check out /etc/hosts.allow. You ought to have lines for the remstats servers which you are running on that host:

	unix-status-server : collector-host localhost
	log-server : collector-host localhost

Note that inetd will refer to unix-status because that's the name of the service from /etc/services (I had to shorten the name to unix-status from unix-status-server, as I ran into a length limit on service names somewhere.) However, hosts.allow refers to program names.

Debugging Output

Almost all of the programs will take a -d ddd flag to set the debugging output level. Unless you're willing to look at the code, you probably won't want more than level 1, but it can be helpful.

If you're trying to figure out what's happening, it's helpful to run run-remstats2 interactively with -d 1, after disabling it from crontab. This will tell you which processes are being run and how long they're taking.

Follow that by running a collector interactively with the -d 1 flag, and possibly a -H xxx flag to restrict it to host xxx.

It's also good to know what the output of a collector looks like. You can get a general description under collectors, or you can just run a collector interactively and see what it comes up with. Remember that collectors won't even try to get information that they don't need yet, so if you want to see everything that the collector would get in a normal run, you might want to use the -F flag to force collection and/or -u to attempt even hosts which are down.

Usefull Files

There are some useful files to look at:

  • data/ALERTS contains all the alert statuses for all variables, whether they have an alert defined or not. It is maintained by the alert-monitor.
  • data/IP_CACHE contains IP numbers which were looked up by the ping-collector during the current run of run-remstats2. This is to save extra DNS lookups for large sites. If you've re-numbered any hosts, you might want to look here to see what remstats was using for an IP number for a host.
  • data/LOGS is a directory containing log-files, one per day. Currently, they are kept for a bit less than a year. You can grep them to do searching in them that showlog.cgi won't let you do. These show all events which remstats thought might be interesting.
  • data/NT is a directory with information about Windows NT hosts/domains collected by nt-discover.
  • data/TRACEROUTES contains all the aggregate traceroute information collected by do-traceroutes.
  • data/LAST has the data the last run of the various collectors collected, in files named after the collector. Check here to make sure that your data is being collected, if the graphs are getting no new data. Note that if you have RRDs with different step times, it is normal that the longer step times won't always show up.
  • under tmp are several interesting things. You'll find a file called LOG-run-remstats2 (or LOG-run-remstats if you're still running it instead). This file tells which processes were run during the last run of run-remstats2 (or run-remstats). There is also a file called LOG-run-remstats2.old which has the same stuff for the run before.

    The file STATUS-run-remstats2 contains info on the progress of a running instance of run-remstats2.

    Under tmp/run-stages are the files containing the stderr output from the processes started by run-remstats2. In an ideal world, you'll only see files in here while run-remstats2 is running, but if it dies or you kill it, you may have files left in here which may give you an idea of what went wrong before it died.

  • the pseudo-host _remstats_ maintains information about the functioning of remstats itself, currently, it's mostly about the performance of the various collectors.

Trouble reporting

Send your report to the mailing-list <remstats-list@lists.sourceforge.net>.

Always include the remstats version. You can find it in the VERSION file (just the version number) or the release file (status, number and date). Unless you've got a pre-release version, the VERSION file has what you need.

Run check-config interactively and include the output.

Describe what you ran to produce the error, including command-line args.

Include any error messages you received from cron, or found in any of the above files, even if you don't understand them.

Complain, in detail, about documentation or error messages which you don't understand. They're intended to be comprehensible.

All remstats error messages should be prefixed by ERROR: or ABORT:. (The only difference is that an abort is an error the remstats can't or shouldn't continue from.) Nothing should trigger any perl errors or warnings; complain about them.


[Top] [Remstats] [SourceWorks] [RRDtool] [SourceForge]
Last updated Fri May 30 13:51:13 PDT 2003 by <terskine@users.sourceforge.net>.