Discussion:
Large scale ICMP & SNMP monitoring
(too old to reply)
a***@gmail.com
2008-06-05 11:20:06 UTC
Permalink
I am looking for a tool that will be able to monitor network devices
using ICMP and SNMP but on large scale. The scale needed is over 100K
nodes.

Using SNMP I want to poll 3-5 parameters in a 2 minutes cycle. In ICMP
this will be only one parameter selected in this time frame.

As explained the cycle should terminate in this time frame of 1-2
minutes and should not be affected by scale.

I know I can solve the problem with multiple collectors however this
is not my first choice and if that will be the case than I want as
little collectors as possible and they should be integrated and
controlled from a single point and should have one management
platform.

Does any one know of such a solution?
d***@gmail.com
2008-06-07 08:29:36 UTC
Permalink
Post by a***@gmail.com
I am looking for a tool that will be able to monitor network devices
using ICMP and SNMP but on large scale. The scale needed is over 100K
nodes.
Using SNMP I want to poll 3-5 parameters in a 2 minutes cycle. In ICMP
this will be only one parameter selected in this time frame.
As explained the cycle should terminate in this time frame of 1-2
minutes and should not be affected by scale.
I know I can solve the problem with multiple collectors however this
is not my first choice and if that will be the case than I want as
little collectors as possible and they should be integrated and
controlled from a single point and should have one management
platform.
Does any one know of such a solution?
I also needed a solution like this. I checked all the common platforms
like openview, tivoli (micromuse), & ca Unicenter and they failed with
their scaling capability. Some suggested installing multiple instances
of their product. This made their solution very costly and also
unmanageable as they required multiple computers and a lot of
management.

I also looked at some open sources (whatsup, nagios and others) but
they did not scale to the number of objects I wanted (something close
to the numbers you talked).

I found while searching the web a company named "jilroy software"
www.jilroy.com. Looking at its site, it looks like scale is their
focus. I have not yet tried them but they look very interesting.

try looking there.

d.b
Jerry Wilborn
2008-06-30 19:28:12 UTC
Permalink
I've just recently written something like this that will do basic ICMP
ping and SNMP v1/2c GET/GETNEXT requests. It's written in PHP5 and
I've done dry runs with 2000 devices (10 ICMP packets each, 100 bytes
in the data portion). Time to send out all packets was .4 seconds and
packets are parsed in near-real time. I'd like to work with folks
that are running other large networks (legitimately) and improve the
utility.

What kind of networks are you guys running?
Post by d***@gmail.com
Post by a***@gmail.com
I am looking for a tool that will be able to monitor network devices
using ICMP and SNMP but on large scale. The scale needed is over 100K
nodes.
Using SNMP I want to poll 3-5 parameters in a 2 minutes cycle. In ICMP
this will be only one parameter selected in this time frame.
As explained the cycle should terminate in this time frame of 1-2
minutes and should not be affected by scale.
I know I can solve the problem with multiple collectors however this
is not my first choice and if that will be the case than I want as
little collectors as possible and they should be integrated and
controlled from a single point and should have one management
platform.
Does any one know of such a solution?
I also needed a solution like this. I checked all the common platforms
like openview, tivoli (micromuse), & ca Unicenter and they failed with
their scaling capability. Some suggested installing multiple instances
of their product. This made their solution very costly and also
unmanageable as they required multiple computers and a lot of
management.
I also looked at some open sources (whatsup, nagios and others) but
they did not scale to the number of objects I wanted (something close
to the numbers you talked).
I found while searching the web a company named "jilroy software"www.jilroy.com. Looking at its site, it looks like scale is their
focus. I have not yet tried them but they look very interesting.
try looking there.
d.b
abe.sterling
2008-07-16 14:00:09 UTC
Permalink
Post by Jerry Wilborn
I've just recently written something like this that will do basic ICMP
ping and SNMP v1/2c GET/GETNEXT requests. It's written in PHP5 and
I've done dry runs with 2000 devices (10 ICMP packets each, 100 bytes
in the data portion). Time to send out all packets was .4 seconds and
packets are parsed in near-real time. I'd like to work with folks
that are running other large networks (legitimately) and improve the
utility.
What kind of networks are you guys running?
Post by d***@gmail.com
Post by a***@gmail.com
I am looking for a tool that will be able to monitor network devices
using ICMP and SNMP but on large scale. The scale needed is over 100K
nodes.
Using SNMP I want to poll 3-5 parameters in a 2 minutes cycle. In ICMP
this will be only one parameter selected in this time frame.
As explained the cycle should terminate in this time frame of 1-2
minutes and should not be affected by scale.
I know I can solve the problem with multiple collectors however this
is not my first choice and if that will be the case than I want as
little collectors as possible and they should be integrated and
controlled from a single point and should have one management
platform.
Does any one know of such a solution?
I also needed a solution like this. I checked all the common platforms
like openview, tivoli (micromuse), & ca Unicenter and they failed with
their scaling capability. Some suggested installing multiple instances
of their product. This made their solution very costly and also
unmanageable as they required multiple computers and a lot of
management.
I also looked at some open sources (whatsup, nagios and others) but
they did not scale to the number of objects I wanted (something close
to the numbers you talked).
I found while searching the web a company named "jilroy software"www.jilroy.com. Looking at its site, it looks like scale is their
focus. I have not yet tried them but they look very interesting.
try looking there.
d.b
Hi Jerry

2000 devices are a short number we look for 100Ks of devices. I wonder
however if you wait for a reply (work synchronously) and what happens
if a device does not answer. does it take 4 sec to get all replies?
are they all local?
Jerry Wilborn
2008-07-29 14:41:32 UTC
Permalink
The devices I've been polling have been a mixture of devices in the
datacenter(s) and equipment in far flung places (within the United
States). The program splits off all the various jobs. A 'polltime'
is established (eg 30s).

1. Call a function called main().
2. The main function sets up a msgqueue and splits off two children
(Child A & Child B).
3. Child A listens to the msgqueue. If a 'response' msg from Child B-2
comes in, it looks inside the packet to make sure the packed 'id'
matches our PID, if so, it stores the response in an array. If a
'exit' msg comes in, it stores all the responses in a database and
exits.
4. Child B creates a blocking shared ICMP socket and forks Child B-2,
calls a function to send all the packets, sleeps for $polltime -
$length_of_time_to_send_all_packets and sends the 'exit' msg to Child
A.
5. Child B-2 listens to this socket, any time it sees a reply come in,
it marks the time of the response and puts that into the msgqueue (to
be handled by Child A).

This isn't *exact*, but it's a very close approximation.

So... All the echo requests go out, and the last one has slightly less
time to respond than the first one, but everyone gets a "fair" shot (I
can't imagine 30s RTT being acceptable latency). Also, to clarify it
takes four tenths of a second to send out 20,000 packets, so the last
packet to go out gets 29.6 seconds to respond before the listener
gives up. Context switching on the machine averages out to about 600/
sec. The cx's obviously spike when the program is polling and then
drops to almost nothing when it's not.

I believe the program can realistically handle 15,000+ hosts on one
decently equipped machine with networking sufficient to handle the
packet load.

Can you tell me more about your network?
Post by abe.sterling
Post by Jerry Wilborn
I've just recently written something like this that will do basic ICMP
ping and SNMP v1/2c GET/GETNEXT requests.  It's written in PHP5 and
I've done dry runs with 2000 devices (10 ICMP packets each, 100 bytes
in the data portion).  Time to send out all packets was .4 seconds and
packets are parsed in near-real time.  I'd like to work with folks
that are running other large networks (legitimately) and improve the
utility.
What kind of networks are you guys running?
Post by d***@gmail.com
Post by a***@gmail.com
I am looking for a tool that will be able to monitor network devices
using ICMP and SNMP but on large scale. The scale needed is over 100K
nodes.
Using SNMP I want to poll 3-5 parameters in a 2 minutes cycle. In ICMP
this will be only one parameter selected in this time frame.
As explained the cycle should terminate in this time frame of 1-2
minutes and should not be affected by scale.
I know I can solve the problem with multiple collectors however this
is not my first choice and if that will be the case than I want as
little collectors as possible and they should be integrated and
controlled from a single point and should have one management
platform.
Does any one know of such a solution?
I also needed a solution like this. I checked all the common platforms
like openview, tivoli (micromuse), & ca Unicenter and they failed with
their scaling capability. Some suggested installing multiple instances
of their product. This made their solution very costly and also
unmanageable as they required multiple computers and a lot of
management.
I also looked at some open sources (whatsup, nagios and others) but
they did not scale to the number of objects I wanted (something close
to the numbers you talked).
I found while searching the web a company named "jilroy software"www.jilroy.com. Looking at its site, it looks like scale is their
focus. I have not yet tried them but they look very interesting.
try looking there.
d.b
Hi Jerry
2000 devices are a short number we look for 100Ks of devices. I wonder
however if you wait for a reply (work synchronously) and what happens
if a device does not answer. does it take 4 sec to get all replies?
are they all local?
Loading...