<<<
Chronological Index
>>> <<<
Thread Index
>>>
Re: [ripe-ttraffic #15093] TTM to do list
Rene,
> Related to the "install more boxes" and "N^2" items,
> I think we need to revisit the design/configuration of
> the software running on the boxes (see appended list).
Yes, these are all valid/relevant questions. Initial comments, more in
September:
> - ControlDaemon:
>
> currently we construct a filter string for BPF packet capturing
> that consists all hostnames of all boxes in the configfile.
>
> Question: does that scale? can BPF keep up with the influx
> of packets when we have a large (~100 boxes) filter string?
From looking at the code, it appears that pcap_compile() can handle
arbitrarily large filter strings, it simply malloc's more space. (OK,
that's not arbitrarily large, but it is close enough :-). This is
something we'd have to try though.
> - Send_data:
>
> currently the scheduling sends at most one packet per second
> to any box. In 1999 we had to reduce the number of packets send
> per host per day to avoid running in 'pulsar' mode.
>
> Question: when more boxes get deployed, do we need sub-second
> scheduling, or should we again reduce the number of packets
> send per 'connection'?
There are 2 reasons why we have the current scheduling:
1. There should be a time-gap between 2 subsequent packets, to avoid
queues elays on the ethernet card etc.
2. The O/S has a routine "usleep()", which sleeps for N microseconds
instead of N seconds. This one is (was?) not accessible from Perl,
perl only has sleep (N), which pauses for N seconds.
Combining 1 and 2 led to the current send_data program. By using sleep()
from perl, one gets a 1 second interval and this seemed like a nice gap.
You can argue that it can be less. It should not be 0 though.
Note that the current send_data creates a subprocess. This isn't a
problem with 1 packet/second. It might become a problem with a higher
rate.
To go to sub-ms scheduling, one will have to re-write the send_data
process into a language that can access the usleep() function. At that
point, I can imagine that one would also like to get rid of creating the
subprocesses. In short, this means a new main() for SendPacket.
> - Router:
>
> currently each testbox schedules 10 traceroute measurements
> to each configured destination in one hour.
>
> Question: when we exand the number of destinations, can we
> continue to do traceroutes at that rate? will we need
> sub-second scheduling?
In first order, we can go to 360 boxes here, though most of the comments
above apply.
> - Resource problems
>
> On some occasions (e.g. connectivity problems) machine resource
> problems are unavoidable; no more processes / out of swap space.
>
> Question: do we want to leave it to the OS (when possible with a
> little help from CFE) to recover from that, or do we want to
> (re)design datataking software such that it will never exceed
> preconfigured limits of #processes (and thus amount of swap).
Yes :-)
* Whatever we do, machines can run out of resources and the O/S should
be set up such that it recovers from that.
* We should put limits on the number of unfinished subprocesses. My
feeling is that new traceroute flag helped a lot here, but I cannot
quantify this. I'm definitely planning to look at the other suggestions
in this thread (#21155) as well.
(This is only part of the answer, in fact, it isn't an answer. I've added
an item to the to-do-list...)
Henk
------------------------------------------------------------------------------
Henk Uijterwaal Email: henk.uijterwaal@ripe.net
RIPE Network Coordination Centre WWW: http://www.ripe.net/home/henk
Singel 258 Phone: +31.20.535-4414, Fax -4445
1016 AB Amsterdam Home: +31.20.4195305
The Netherlands Mobile: +31.6.55861746
------------------------------------------------------------------------------
A man can take a train and never reach his destination.
(Kerouac, well before RFC2780).
<<<
Chronological Index
>>> <<<
Thread Index
>>>