<<< Chronological Index >>>    <<< Thread Index >>>

Re: [ripe-ttraffic #15093] TTM to do list


Rene,

> Related to the "install more boxes" and "N^2" items,
> I think we need to revisit the design/configuration of
> the software running on the boxes (see appended list). 

Yes, these are all valid/relevant questions.  Initial comments, more in
September:

> - ControlDaemon:
> 
>     currently we construct a filter string for BPF packet capturing 
>     that consists all hostnames of all boxes in the configfile.
> 
>     Question: does that scale? can BPF keep up with the influx
>     of packets when we have a large (~100 boxes) filter string?

From looking at the code, it appears that pcap_compile() can handle
arbitrarily large filter strings, it simply malloc's more space.  (OK,
that's not arbitrarily large, but it is close enough :-).  This is
something we'd have to try though.


> - Send_data:
> 
>     currently the scheduling sends at most one packet per second
>     to any box. In 1999 we had to reduce the number of packets send
>     per host per day to avoid running in 'pulsar' mode.
> 
>     Question: when more boxes get deployed, do we need sub-second
>     scheduling, or should we again reduce the number of packets
>     send per 'connection'?

There are 2 reasons why we have the current scheduling:

1. There should be a time-gap between 2 subsequent packets, to avoid
   queues elays on the ethernet card etc. 

2. The O/S has a routine "usleep()", which sleeps for N microseconds
   instead of N seconds.  This one is (was?) not accessible from Perl,
   perl only has sleep (N), which pauses for N seconds.

Combining 1 and 2 led to the current send_data program.  By using sleep()
from perl, one gets a 1 second interval and this seemed like a nice gap.
You can argue that it can be less.  It should not be 0 though.

Note that the current send_data creates a subprocess.  This isn't a
problem with 1 packet/second.  It might become a problem with a higher
rate.

To go to sub-ms scheduling, one will have to re-write the send_data
process into a language that can access the usleep() function.  At that
point, I can imagine that one would also like to get rid of creating the
subprocesses.  In short, this means a new main() for SendPacket.



> - Router:
> 
>     currently each testbox schedules 10 traceroute measurements
>     to each configured destination in one hour.
> 
>     Question: when we exand the number of destinations, can we
>     continue to do traceroutes at that rate? will we need
>     sub-second scheduling?

In first order, we can go to 360 boxes here, though most of the comments
above apply.


> - Resource problems
> 
>     On some occasions (e.g. connectivity problems) machine resource
>     problems are unavoidable; no more processes / out of swap space.
> 
>     Question: do we want to leave it to the OS (when possible with a
>     little help from CFE) to recover from that, or do we want to
>     (re)design datataking software such that it will never exceed
>     preconfigured limits of #processes (and thus amount of swap).

Yes :-)

* Whatever we do, machines can run out of resources and the O/S should
  be set up such that it recovers from that.

* We should put limits on the number of unfinished subprocesses.  My
  feeling is that new traceroute flag helped a lot here, but I cannot
  quantify this.  I'm definitely planning to look at the other suggestions
  in this thread (#21155) as well.


(This is only part of the answer, in fact, it isn't an answer.  I've added
an item to the to-do-list...)

Henk

------------------------------------------------------------------------------
Henk Uijterwaal                    Email: henk.uijterwaal@ripe.net
RIPE Network Coordination Centre     WWW: http://www.ripe.net/home/henk
Singel 258                         Phone: +31.20.535-4414,  Fax -4445
1016 AB Amsterdam                   Home: +31.20.4195305
The Netherlands                   Mobile: +31.6.55861746  
------------------------------------------------------------------------------

A man can take a train and never reach his destination.
                                               (Kerouac, well before RFC2780).








<<< Chronological Index >>>    <<< Thread Index >>>