- have to tell unreachable from a configuration error with
an oid that machine doesn't have.
- only way with rrdcgi pages would be to store alerts/status in individual
files, so that they could be sucked in with <RRDCGI::INCLUDE...>
- [DONE 19991203] 19991202 add has-an-alert status file (containing red star
if there is an alert or empty if not, and RRD::INCLUDE it on the index pages.
How about another which has a longer message, to RRD::INCLUDE on the main
host page. Need to modify alert-monitor and graph-writer.
- [DONE 19991203] 19991026 need reporting program. Point it at an rrd
and give it a timespan and it will report on min/avg/max for each
(specified) variable with sub (min/avg/max) per day, week, month and year.
- [DONE 19991208] 19991203 topology-change alert, from do-traceroute's data to tell when the
route to somewhere has changed (log it too)
- [DONE 19991208] 19991203 need some flag on rrd invocation (within [host]) to tell what the
bandwidth is, to set max limits to filter garbage. Should get updated when
config is changed. Where?
- can probably do it with a different script with substitutions in the script.
- working timeout in unix-status-server in do_df and in port-collector
which are the only ones I've seen problems with.
- program prints "host varname value", possibly multiple times,
(i.e. standard collector format)
- don't even need this if you're willing to insert said script into
do-remstats with the standard wrapping, but it makes new collectors
possible without editing remstats code.
- need to find a suitable documentation format. Need to be able to
generate plain-text and html output from a single source. (POD)
B<74) 20000410 [HIGH] implement host "via" with multiple network devices
- [DONE 20000420] now test that it does something usefull.
- cisco-access-server-collector, which currently only supports
the ciscolinespeed rrd, should do this.
89) 20000425 a new CGI, say remstatsgraph.cgi which can be invoked as:
<IMG SRC="remstatsgraph.cgi?host=xxx&rrd=yyy&graph=zzz&time=aaa">
or
<IMG SRC="remstatsgraph.cgi?customgraph=xxx&time=yyy">
This isn't very efficient as the graph has to be re-generated each time the
page is fetched, but it allows remstats graphs on otherwise static pages.
80) [DONE 20000502] 20000413 [MEDIUM] make a new ping-* rrd
and modify ping-collector to deal with it for pinging multiple
interfaces on a host.
66) [DONE 20000201] 20000331 [MEDIUM] views. Make new html
trees populated by only what you specify. Sample config (old-style):
[view customer1]
tools ping
text varname some text here
oid varname someoid
and either
template customer
or lines specifying the desired graphs, like:
graph router1.my.domain snmpif-se0 graphname
customgraph graphname
33) [DONE 20000502] 19991206 [LOW] need to store graph config differently so that
graphs of the same name in different rrd's don't conflict. This will also
require renaming the graph image files to avoid over-writing.
90) [DONE 20000502] 20000428 [LOW] stale lockfiles collector.
Similar to the log-collector. You provide a list of lock-files and
how old they're allowed to be, and the lockfile-collector notes
what happened. Generalize to have it simply return file ages for
listed files, and use the usual alert mechanism to decide what's
a problem. Add it as a piece of the unix-status-collector, but
it doesn't need to run an external program.
79) [DONE ????????] 20000413 [MEDIUM] consider alerts possible when a condition
returns to normal from an elevated level.
61 [DONE 20000516] 20000328) [LOW] new portinfo-collector. Sort of like the
port-collector, possibly replacing it. Sends a parameter-substituted
script (like port-collector), and picks out information via regex's
(like log-collector).
- done using the infopattern and valuepattern in a script associated
with a port-collector RRd.
B<91) [DONE 20000518] 20000518 [HIGH] form-based input for ping.cgi and traceroute.cgi
and whois.cgi when invoked without a query-string.
71) [DONE 20000524] 20000407 [MEDIUM] alert-email needs host and
rrd descriptions to add to output.
B<72) [DONE 20000524] 20000407 [MEDIUM] add host and rrd descriptions
to the list of things that do_subs does
93) [DONE 20000529] 20000525 [MEDIUM] template facility for alert-email. Two kinds.
One for differing templates depending on adressee, so that pager-like
things can be given a shorter message. The other is for rrd-specific
templates, so that the message can actually mean something to people
who don't know anything about remstats.
88) 20000420 [MEDIUM] indextype (flow|rrdrow) to allow all the
graphs from an rrd to show in one row of a table. This way all the
graphs from an interface line up with the same graph on the next interface on
this host. As a crude half-measure on the way to views.
- views weren't that difficult. I don't think that this is necessary?
95) [DONE 20000613] 20000613 [HIGH] showlog.cgi doesn't do
time-spans anymore
96) [DONE 20000613] 20000613 [HIGH] add previous/next
day/week/month buttons to log selection.
105) [DONE 20001018] 20000921 [HIGH] need ping-collector to add hosts with no
ping rrd to the list of uphosts, when running in pre-collector
mode, so they don't get skipped.
104) [DONE 20001018] 20000921 [HIGH] need to be able to note host status by
other than ping status. Use a line like:
status STATUS-port-ssh
to pull the status from the SSH reachability, instead of ping.
67) 20000405 [MEDIUM] host link, for host-specific info
- cancelled in favour of 102
84) 20000417 [LOW] No way to order the hosts on the index pages,
except re-ordering the actual directory entries.
- Now at least the entries are sorted (by char-set) [DONE 20000418]
- I think this is the right thing to do, and I'm going to leave it sorted
by host-name, within group.
101) [DONE 20001206] 20000707 [HIGH] re-write alert-email as a perl script to
compare time and make it more flexible.
- added new script alerter which determines the real destination
address and calls appropriate tiny scripts to deliver individual alerts.
100) [DONE] 20000627 [MEDIUM] traceroute will core if you use the -A flag
for routes which aren't in the routing arbiter database. (Thanks Steve.)
Need to fix this. Isn't this fixed?
35) [DONE 20001206] 19991209 [MEDIUM] need availability report for interfaces, systems,
ports at least, showing % availibility over a user-specified time-span,
with sub-intervals within that. Sounds like a lot of code from
rrd-report could be used; maybe make it an option for rrd-report.
- or could dump to a form usable as import to an SQL database and
use some-one else's report generator.
??)) [DONE] need a configure which at least looks like an autoconf configure
A simple one would just create fixup.config, using fixup.config.in
as a default.
- done quite a while ago.
117) [DONE 20010102] 20001229 [HIGH] add "I changed the graph description in the rrd but
nothing happened" to the FAQ.
113) [DONE 20010102] 20001212 [HIGH] host templates. So you can
configure similar hosts with
desc xxx
template yyy
Implementation is simple. Reserve UPPERCASE hostnames for templates,
and read them in with the read_config_hosts routine. It needs to be modified
to store the templates in $main::config{HOSTTEMPLATE}...
instead of
$main::config{HOST}...
so that this won't break everything that relies on
getting a list of hosts by keys %{$main::config{HOSTS}}
. Then, when
a host refers to a template, copy the keys from the template hash.
119) [DONE 20010131] 20010130 [HIGH] - host status pages showing current values of everything:
rrd/variables and status files. Implement by datapage.cgi pages
automatically generated. Include indices, tools and status_header in
the page.
118) [DONE 20010227] 20010124 [LOW] - cleanup.sh to run out of cron infrequently to
prune old log-files and junk in the tmp directory
68) [DONE 20010227] 20000406 [MEDIUM] write new-unix-status-host. It requires that the
host already be running the unix-status-server. It will parse the output
of the unix-status-collector after requesting everything.
123) [DONE] 20010409 [HIGH] - make snmp-collector give priority to unix-status-collector's
hardware and software collection.
124 DONE 20010528 20010525 [HIGH] - make new-config create the _remstats_ pseudo-host, including
making a group for it in groups.
125 20010525 [HIGH] - add to unix-status-server netstat functionality
the ability to show counts of processes in various TCP states.
120 [DONE 20010608] 20010202 [MED] - collect inventory from NT
machines at least, consider unix boxen too. Start by looking at
the raw output of the collectors.
114) [DONE 20010608] 20001214 [HIGH] - make snmp-collector and
whatever else needs it produce/accept oidnames in mixed case, like
they must be specified in the "oid" lines.
108) [DONE before 20010608] 20001211 [MED] need "noavailability"
spec for host definition to drop availability calculation for hosts
which lack something which should usually have availability calculated.
126 [DONE 20010628] 200106022 [HIGH] - different coloured alert
links on quick index
116 [DONE 20010628] 20001229 [HIGH] - describe documentation
conventions. Some people don't read manualese. E.G. for [bracketed]
stuff being optional.
129 [DONE 20010823] 20010823 [HIGH] - add command to datastuff.pl
which pulls the alert status of a specified (host,rrd,variable). This
will permit making different formats of display of statuses, e.g.
by host, by rrd, for a single host, for a group, for all of them...
The only thing it doesn't get us is the addition of statuses to the
host indices. This would require 130.
136 [DONE 20011015] 20010927 [MED] - <VIEW::GRAPH...> of rrds with
characters which are invalid in a filename are inaccessible. For
example, df-/home.
140 [DONE pre-20011122] 20011012 [MED] - allow specification of alert
levels on the host so different hosts can have different levels.
148 [DONE pre-20010107] 20011220 [HIGH] - ping.cgi "Do that one again"
link is broken.
145 [DONE 20020107] 20011220 [HIGH] - remove un-configurable links
(i.e. home, top, remstats and rrdtool) and make all links configurable.
Of course, the default configuration will include them.
151 [DONE 20020109] 20020109 [HIGH] - make run-remstats use make_lockfile
149 [DONE 20020110] 20011221 [HIGH] - update unix-status-server.pod to
include docs on new linux /proc sections.
153 [DONE 20020117] 20020111 [HIGH] - ntop-collector to collect protocol
distributions via ntop.
152 [DONE 20020121] 20020110 [HIGH] - extension to graph.cgi to allow
specification of non-standard graph times and sizes. So that with the
correct wrapper, one could, e.g., make a graph that could be clicked on
either end to scroll in either direction, or the top and bottom to zoom
in and out, or somewhere else to increase or decrease the size.
155 [DONE 20020125] 20020122 [MEDIUM] - make recently-booted image have an
ALT attribute which contains the last boot time, via <RRD::INCLUDE>
154 [DONE 20020131] 20020118 [HIGH] - make CGIDIR and CGIURL fixup
variables and implement it so that the static CGI scripts can live
somewhere else.
144 [DONE 20020201] 20011220 [HIGH] - make page-writer, a more
flexible replacement for graph-writer based on and replacing
view-writer. Ideally, all pages will be created from
configuration-supplied templates.
156 20020123 [MEDIUM] - add to rrdcgi a
<RRD::FETCHDS rrdfile cf var [time [defaultvalue]]>
tag. Should be easy and would be very usefull, to avoid having to
either code some pages as datapages or invoke a CGI for each value.
98) [DONE 20020201] 20000619 [MEDIUM] add group index files and
store hosts under group directories. For easier application of
access-controls. (for Florian)
- See 144 - page-writer for a more flexible solution
161 [DONE 20020208] 20020208 [HIGH,BUG] - fix get_rrd extra
158 [DONE 20020213] 20020201 [HIGH] - keys for rrds. Page-writer
can use them to do things for set of RRDs at the least.
167 [DONE 20020430] 20020422 [HIGH,BUG] - new-unix-hosts doesn't deal
with wild rrds
169 [DONE 20020510] 20020508 [HIGH] - add dbi-collector note to
configfile-rrds
14 [DONE before 20020510] ???????? [LOW,NEEDS:2,HOLD] make rrd structural
changes in config file get applied to the rrds.
- some taken care of with snmpif-setspeed, but need a more general solution
- look at new XML output of rrddump
- [DONE before 20020510] implemented in remstats-rrdtune.
163 [DONE before 20020510] 20020408 [HIGH] - update prerequisites.
Include DBI.
166 [DONE 20020510] 20020409 [HIGH] - remove alerter's dependancy
on Sys::Hostname
132 [DONE before 20020510] 20010824 [HIGH,BUG] - get rid of the spikes
in uptime from the unix-status-server
- [DONE] fixed in unix-status-server. I think I've finally got
uptime parsed correctly.
160 [DONE 20020517] 20020208 [MEDIUM] - Add to rrd definitions,
information to be written to status files, using similar methodology
to that used for specifying the DSs themselves (steal from updater.pl).
And modify the collectors and rrd definitions to use this instead of
the ad-hoc writing of status files by collectors. The directive would
look something like:
currentvalue statusfilename collectorvarname
or maybe:
currentvalue statusfilename &function(collectorvarname)
- 20020515 - implemented config parseing part in remstats.pl, and re-named
the directive to avoid confusion with the hosts statusfile
directive.
139 [DONE before 20020517] 20011010 [MEDIUM] - cleanup needs to do
log-files too. It isn't if it's intended to.
107) [DONE 20020517] 20000922 [MEDIUM] extra status header lines for
hosts, from specified STATUS files creaded by the various collectors. Add
lines to host definition like:
extrastatus "STATUS DESCRIPTION" STATUS-FILE-NAME
- DONE 20020517 - implemented as headerline
directive in
host config-files.
60) 20000328 [MEDIUM,HOLD] replace route-collector with something which
scales. SNMPwalking bgp4PathAttrBest doesn't scale to large Internet
routers with 400 peers, taking over an hour to complete. (see also 61)
- look at a script to follow the output of zebra. That's a lot of
overhead though. Easy if zebra is solid.
- How difficult can it be to make a native BGP listener? I'm not clear on
the protocol, but it doesn't look too bad.
- [HOLD] As I don't need it, and have no access to anything which does.
42) 20000114 [MEDIUM,HOLD] snmp-collector mod to
allow summary data collected from a walk and then filtered as a single
data-point. E.G. specify a rrd "oid" like:
walk count ifOperStatus = 1
would produce a count of the number of interfaces on that device that
were active (i.e. had a live device plugged into them). Or a similar one
would let you count BGP routes, or arp addresses, ...
- Unfortunately, from experience with the snmp-route-collector, this is
going to be slow for anything with a large number of items.
- [HOLD] Until I think of something to use it for.
43) 20000114 [CLOSED] parallelizing the collectors, at least on a
group basis, preferably host or group.
- collectors must accept -G
and -H
flags to request processing of
the specified group or host, respectively. Run-remstats needs to fork
extra processes according to a config-file line, "parallel group" or
"parallel host".
- 20010831 TEE - implemented -H flags for all collectors except for the
remoteping-collector, which I'm not using anyway right now.
- See 159. Implemented as run-remstats2.
142 20011109 [CLOSED] - sar loader. Lightly munges the output of
"sar -h" (on linux) which looks very similar to remstats collector format.
- [CLOSED] replaced by 172 XXX
170 [DONE 20020517] 20020510 [HIGH] - fix dataimage.cgi to write temp
files under ~/tmp/dataimage
, to make it more sucure. Remember to
make the new directory and make it group webgroup, setgid webgroup,
probably in check-config.
172 [DONE 20020521] 20020510 [HIGH] - add sar
section to
unix-status-collector. It will rely on sa1
being set up to collect
info every 5 minutes, like remstats, and will permit access to any of the
info collected by sadc
. This isn't as simple as it ought to be, as
different sars do not implement the same options.
169 [DONE 20020528] 20020422 [MEDIUM] - remove ntop-collector's
dependance on LWP which is more stuff than I want to pull in.
177 [DONE 20020527] 20020521 [HIGH,BUG] - alert-flags are set incorrectly.
E.G. if a host has two alerts, one ERROR and one CRITICAL, the flag
may be set in yellow, indicating an ERROR level.
- [DONE 20020527] actually a bug in alertstuff.pl, now corrected.
176 [DONE 20020527] 20020524 [HIGH,BUG] - alerts don't go away. They
need to be expired properly. Probably I'll end up re-writing the
wretched alert-monitor.
- [DONE 20020527] Fixed, without re-write :-)
175 [DONE 20020528] 20020524 [HIGH,BUG] - The QMAIL-QSTAT
section of unix-status-server can have problems with a very, large
queue. This isn't actually its fault; it's a bug in qmail-qstat
.
With a very large queue, you will get the error:
/var/qmail/bin/qmail-qstat: /usr/bin/find: Argument list too long
To avoid this, unix-status-server
needs to simulate qmail-qstat
, or
replace it.
- implemented as QMAILQSTAT2 section
174 [DONE 20020528] 20020524 [HIGH,BUG] - macinfo.cgi URLs for hosts are
broken. Need to use absolute URLs.
178 [DONE 20020606] 20020606 [HIGH] - add magic cookie
for host's data directory. Say HOSTDATADIR
. This will make it easier
to add extra files, which get <RRD::INCLUDE>ed into pages, without having
to hard-code the complete path yourself.
122 [DONE 20020610] 20010330 [HIGH] - rrd prog-* which tells
if a particular named process is running, using the ps section of
the unix-status-collector.
- see the rrd procname-*, and make it work
173 [DONE 20020611] 20020521 [HIGH] - improve graph.cgi
to allow zooming in/out (time scale), panning (time scale) and re-sizing.
Then add links to this everywhere to allow data browsing.
62) [DONE before 20020611] 20000329 [LOW] make different markers
for different levels of alert on quick-index.
97) [DONE 20020611] 20000616 [LOW] make port-collector or
check-config complain about having a script with ok/warn/error/critical
patterns but no send string. The port-collector will ignore patterns
unless there is a send string.
179 [DONE 20020612] 20020611 [HIGH] - rt-update
updates a temporary RRD file "frequently", say every 5 seconds.
Rt-update will be given (host,rrd), an update interval, and how long to
store the data for (for an RRA). It will be started in the background
and run until the RRA is full and then exit, or possibly until stopped.
It will use the RRD definition to determine the correct collector to
run to collect the data; the step will be overridden by the command-line
specified update interval. It will write an rrdcgi page which will
show the info, optionally with form variables supplying the various parts
which might be overridden (start, end, height, width). It should run the
collector a (configurable) number of times first, to confirm that it can
get updates at the frequency requested.
159 [DONE 20020613] 20020208 [HIGH] - Re-write run-remstats. Add a new
config sub-directory run-stages
, containing files named for the stage
of run-remstats. Run-remstats will be controlled by
a new config-file called run
. The run
file lists, in order
the run-stages that run-remstats
will progress through. At each stage,
it will read the file from run-stages
named for that stage.
There might be multiple ping collectors, say one for production hosts,
with async=no and another for less important hosts which are only pinged,
with async=yes. Or, there might be separate snmp collectors for large
routers or switches so that they can be done in parallel with all the
other snmp queries. Or multiple collector instances could use the
-G, -H and -T flags to make the collection more parallel by hosts,
groups or tags.
Each line will have several parts, at least:
-
name: The name is used to pass to updater
for collectors, and for file-names. The instancename is used for file-names
This is to deal with the need to run multiple instances of the same collector.
-
asynch flag: whether or not to wait for this one to finish before
going on the the next stage. Any anync processes will be waited for at the
end of
run-remstats
.
-
frequency: how often to run this command, so that some things can be
done more frequently.
-
command-line: what to run for this process. Doesn't include I/O
redirection as
run-remstats2
will manage that.
Need to keep track of last-run time for each instance, to make frequency
work. Probably, make a subdir ~/tmp/run-stages to store them.
[This is a major re-working of run-remstats
to make the collection
process very configurable, similar to what page-writer gained for
page generation, but more work.]
- 20020517 - mostly done, needs testing
182) [DONE 20020618] 20020617 [HIGH] - VAR::WILD and VAR::FIXED::WILD
tag in page-writer
- Needed to fix 181.
181) [DONE 20020618] 20020617 [HIGH,BUG] - "browse this graph" link broken
for wildcard RRDs. The graph-name needs to be the wild version, e.g. df-*
- DONE 20020618 - via 182
184) [DONE 20020621] 20020620 [HIGH] - access-control and config-file
for CGIs. There are getting to be too many, with more access to things
that outsiders shouldn't have access to.
183) [DONE 20020621] 20020617 [HIGH] - cleanup needs to
do $HTMLDIR/GRAPHS/TMP as well. Or they'll just increase without limit.
186) [DONE 20020627] 20020626 [LOW] - make an "again" button for ping.cgi
189) [DONE 20020711] 20020710 [HIGH,BUG] - run-remstats2 is not
removing temp files. Maybe not reporting non-empty ones.
192) [DONE 20020722] 20020721 [HIGH] - note that cgi access is controlled
by the access config-file
198 [DONE 20020814] 20020813 [HIGH,BUG] - remstats server timestamps
are broken. They all depend on the time on servers and collectors being in
sync. When they're not, the timestamps provided by the servers cause stale
times and timewarps. The best fix may be to have the collector apply its own
timestamps when it receives the data.
196 [DONE 20020814] 20020812 [HIGH] - add troubleshooting section to docs
Tell them to run check-config. Also note using run-remstats2
interactively with the -d 1
flag. And collectors with -d 1
. And
looking at collector output.
195 [DONE <20020814] 20020726 [HIGH,BUG] - program-collector can't find
remstats.pl
197 [DONE 20020814] 20020812 [HIGH,BUG] - cleanup stops in glob
[Reported by Ecaroh]
- don't use glob; that's why I wrote list_files
194 [DONE 20020814] 20020726 [HIGH,BUG] - podhtml gets error at 140 for
snmp-showif.pod
- it was an error in a pod file's invocation of =exec
190 [DONE 20020814] 20020712 [HIGH] - make configure look for required
perl modules and optional ones too, and modify the makefile to not stop
for errors on optional stuff.
- Now it will note which programs use missing modules and not try to
check them, since they're bound to fail.
115) [DONE <20020816] 20001229 [HIGH] - need docs on errors.
Specifically, when run-remstats kills a collector for taking too long.
And where to find the output of the killed collector.
- DONE <20020816 - see troubleshooting
199 [DONE 20020819] 20020819 [HIGH] - add :port to snmp rrds for Ecaroh.
205 [DONE 20030313] 200302?? [HIGH] - uptimeflag has HTML in ALT tag.
201 20020820 [HIGH,BUG] - check_collect_time in remstats.pl is broken.
It's causing all collectors to skip about half their updates. This is
double-plus-ungood, and MUST be fixed.
- [DONE 200302??] - server time sync problem. Now collector does crude
time-sync
209 [DONE 20030508] 20030508 [HIGH] - make start for rrds be modulo
step to attempt to increase precision
[Top]
[Remstats]
[SourceWorks]
[RRDtool]
[SourceForge]
Last updated Fri May 30 13:50:52 PDT 2003 by <terskine@users.sourceforge.net>.