gpsd-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [gpsd-dev] Clarifications needed for the time-service HOWTO


From: Eric S. Raymond
Subject: Re: [gpsd-dev] Clarifications needed for the time-service HOWTO
Date: Mon, 21 Oct 2013 13:34:07 -0400
User-agent: Mutt/1.5.21 (2010-09-15)

Other experts, feel free to chime in.

Gary E. Miller <address@hidden>:
> > Is it really true that most public NTP servers are Stratum 2, or
> > are there more layers in normal use?
> 
> Maybe most, but you'll see a lof of 1 and 3's.

I've been doing research. Revised text near the end of first section:

     You will hear time service people speak of "Stratum 0" (the reference
     clocks) "Stratum 1" (NTP servers directly connected to reference
     clocks over a path with known and compensated-for latency)
     "Stratum 2" (publicly accessible servers that get time from Stratum 1
     over a network link.) Stratum 3 chimers redistribute time from Stratum
     2, and so forth. There are defined higher strata up to 15, but
     you will probably never see a public chimer higher than Stratum 3.

     Ordinary client computers are normally configured to get time from one
     or more Stratum 2 (or less commonly Stratum 3) servers. With GPSD and
     a suitable GPS, you can easily condition your clock to higher
     accuracy than typical Stratum 2; with a little effort you can do
     better than public Stratum 1 servers.

If this is misstating the facts in any way - for example, if Stratum 3
and up servers are more common than we are implying here - someone
please speak up.

> > More generally: what can I discover about the quality of the chimers
> > I listen to?
> 
> Just compare several. 

"Just compare several".  How delightfully vague!  What I need to
document for the HOWTO is *how to do this*.  Concrete procedure.

(1) What reporting tool do I run?  

(2) Where among the numbers it will display for each chimer is
the figure of merit I should be paying attention to? 

(3) What do reasonable values of that figure look like?  What
do weird outliers look like?

It would be illuminating if you replied with a transcript of how
the report looks on your system and pointed out which numbers are
the significant ones.  If you can include a contrasting report 
from a system with bad chimers, please do.

> You should have at least 2, more likely 5 in
> at least one of your ntp.conf.  

Yup, I got that.  It's at the beginning of the new section on NTP
performance tuning.  Which we are now writing...

>                       Then the bad (to you) ones will just
> stand out.  Some are just bad, some will not have a good network
> connection to you and will appear bad.

That second sentence is *useful*.  New text:

    A chimer can be a poor performer (what the inventor of NTP whimsically
    calls a "falseticker") for either of two reasons. It may be shipping
    bad time, or the best routes between you and it have large latency
    variations.  (Large but fixed latencies can be compensated out using a
    fudge.)

> > How specific can we be about time jitter?  Is this a topic for the
> > HOWTO at all?
> 
> We can describe it, but since it is the error part, it will be 
> specific to chimers, time sources, networks and clients.

What sorts of jitter are produced by different parts of the 
delivery chain?  What do typical magnitudes look like?

On to a different topic...

> >    Those hotplug devices will, however, may be able to use plain,
> >    non-kernel PPS. gpsd tries to automatically fall back to this when
> >    absence of root permissions makes KPPS unavailable. This fallback
> > is complicated by the fact that gpsd needs to communicate to ntpd in
> >    a different way in root and non-root mode.  This complicates the
> >    configuration in ways beyond the scope of this document and is
> > strongly discouraged in practice.
> > 
> > This paragraph troubles me. I'm not sure, but I think it may be
> > conflating two different issues and two sets of constraints. 
> 
> Yes, two related issues.  KPPS to PPS fallback, and the problems of
> fallback to non-root.  In general we should just discourage non-root 
> and say it is bad, do not do that.

I understand that you want to discourage non-root operation, and I'm
not arguing that we shouldn't.  But...

We are writing a ground-truth document here. In these it's bad
practice to mix policy and mechanism.  We should be clear about "what
happens if you do X" even if (perhaps especially if) we think X is
a bad idea.

There are several reasons for this, but at least one sufficient one
is that it helps the reader build an adaptable mental model rather 
than merely following instructions semi-blindly.

Here's how you do this sort of thing right.  First, supply 
motivation - why privilege-dropping happens:

    In order to present the smallest possible attack surface to
    privilege-escalation attempts, gpsd run as root drops its root
    privileges very soon after startup - just after it has opened any
    serial device paths passed on the command line.

    Thus, KPPS can only be used with devices passed that way, not with
    GPSes that are later presented to gpsd by the hotplug system.  Those
    hotplug devices will, however, may be able to use plain, non-kernel
    PPS. gpsd tries to automatically fall back to this when absence of
    root permissions makes KPPS unavailable.

(Here comes the don't-do-that.)

    In general, if you start gpsd as other than root, the following
    things will happen that slightly degrade the accuracy of reported
    time:

    1. Devices passed on the command line will be unable to use KPPS and
    will fall back to the same plain PPS that all hotplug devices must
    use, increasing the associated error from ~1 uSec to about ~5 uSec.

    2. gpsd will be unable to renice itself to a higher priority.  This
    action helps protect it against jitter induced by variable system
    load. It's particularly important if your NTP server is a general-use 
    computer that's also handling mail or web service or development.

    3. The way you have to configure ntpd and chrony will change away
    from what we show you here; ntpd will need to be told different
    shared-memory segment numbers, and chrony will need a different
    socket location.

    You may also find gpsd can't open serial devices at all if your
    OS distribution has done "secure" things with the permissions.

(Notice that the don't-do-that is presented in a way that increases 
the reader's options rather than decreasing them.  Now we transition
to "here is best practice".) 

    When in doubt, the preferred method to start your timekeeping is:

    $ su -
    # killall -9 gpsd ntpd
    # ntpd -gN
    # sleep 2
    # gpsd -n /dev/ttyXX
    # sleep 2
    # cgps

    where /dev/ttyXX is whatever 1PPS-capable device you have.  In the
    rest of these setup instructions will assume that you are starting
    gpsd as root, with occasional glances at the non-root case.

> > Which set of ntpd segments GPSD can use is constrained by whether
> > it started up as root or not.
> 
> Worse, by whether it is root or not when initialized, which may be at
> hot plug time.

I believe this is incorrect. All shared-memory segments are opened in
ntpshm_init(), which is called before privilege-dropping and well
before gpsd begins accepting hotplug notifications.  Please review the
code to either verify this or point out where and why I'm full of crap.

> > 2) GPSD started as root; device is hotplugged. GPSD
> > will use privileged ntpd segments 0 and 1,
> 
> No.  It will use units 2 and 3.  Which is likely not what is in ntp.conf
> and in practive is not a fail.

Again, I believe this is incorrect.  
 
> > 3) GPSD started as non-root; device path either passed on command line
> > *or* hotplugged.  GPSD will use privileged ntpd segments 2 and 3; KPPS
> > will not work but plain PPS will.
> 
> Sort of, the ntp.conf mmust be changed to use units 2 and 3.

Understood, and covered in the revised language.

> The problem with just keeping the first sentence is the user is not
> left with an idea of the severity of the problems he will encounter.

Which is why the right thing to do is *document those problems
explicitly *. As I have done.

> We have seen that in the past where users try to run as non-root and
> have not  understood the instructions to run as non-root are incomplete
> and problematic.  So if you keep the first sentence, then say if you are
> not root (hot plug or initialization) that is bad, unsupported and out
> side the scope, that could work.

I've refuted this in a couple of subtle ways above, here's where I hit
you over the head with a 2x4 to get your attention, ya ornery mule. :-)

What you have just enunciated is a recipe for documentation that
*sucks*.  I won't do it, and I *will* teach you how and why not to
fuck up like this if you're not utterly impervious.

When your content is "Do A and B and C, and if you wander off the narrow
path *dragons will eat you*", you are stiffing your users.  You are,
among other things, not supporting their ability to cope if reality
wanders outside of the scenarios you imagined when you were documenting.

*Good* documentation doesn't merely teach facts and procedures, it
nurses the ability to adapt and improvise intelligently.  It does this
by presenting a causal model that can be applied not merely when
things go right but when they go wrong - and not merely in the
exact circumstances the author had in mind but in conditions the
author didn't anticipate.  It conveys not just operation but
understanding.

Saying that a mode of operation is "unsupported" is justified when
that mode yields results that are random or dangerous.  It is *not*
justified when you are trying to avoid the discomfort of describing
options that you think are bad policy.  The reader's priorities may
be different than yours!  

Now reread my new text and notice how at every step it *creates
options*.  It doesn't say "don't do that!", it says "here are the
consequences if you do". Instead of walling the user in, each warning
gives him additional context with which to understand normal operation -
and with which to troubleshoot if things don't go as expected.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>

Attachment: signature.asc
Description: Digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]