Document: draft-ietf-6man-rfc4941bis-12

I have reviewed this document as part of the security directorate's 
ongoing effort to review all IETF documents being processed by the 
IESG.  These comments were written primarily for the benefit of the 
security area directors.  Document editors and WG chairs should treat 
these comments just like any other last call comments.

The summary of the review is: Ready with issues

High-level comments:

This is a great document! I apologize for taking so long to review it. My main comments are on the details of the address generation logic and heuristics, and how it ties into the larger threat model. In general, a clear description of the threat model these temporary addresses aim to address might be worth including, perhaps by expanding the Security Considerations.

Comments:

Section 1.

   The default address selection for IPv6 has been specified in
   [RFC6724].  The determination as to whether to use stable versus
   temporary addresses can in some cases only be made by an application.
   For example, some applications may always want to use temporary
   addresses, while others may want to use them only in some
   circumstances or not at all.  An Application Programming Interface
   (API) such as that specified in [RFC5014] can enable individual
   applications to indicate a preference for the use of temporary
   addresses.

I wonder if this should mention TAPS, which has discussed APIs for this sort of selection in the past. See https://tools.ietf.org/html/draft-ietf-taps-interface-10#section-5.2.13.

Section 1.2.

   The correlation can be performed by:

     <snip list>

This probably isn't exhaustive, so perhaps: "Correlation can be performed by a variety of attackers, including, though not limited to:" (or something to that effect).

Section 2.1.

   One of the requirements for correlating seemingly unrelated
   activities is the use (and reuse) of an identifier that is
   recognizable over time within different contexts.  IP addresses
   provide one obvious example, but there are more.  For example,

What about MAC addresses? As I understand it, most systems are moving towards MAC address randomization, though it's still probably worth mentioning. Likewise, similar to cookies, one could also mention TLS (or transport) layer identifiers, such as TLS session tickets. This is touched on somewhat in the Security Considerations.

Section 2.2.

   To make it difficult to make educated guesses as to whether two
   different interface identifiers belong to the same host, the
   algorithm for generating alternate identifiers must include input
   that has an unpredictable component from the perspective of the
   outside entities that are collecting information.

It seems like this "must" be normative, and should probably reference the RFC4086 [https://tools.ietf.org/html/rfc4086].

Section 3.1.

   3.  New temporary addresses are generated over time to replace
      temporary addresses that expire.

I assume expiration here means that the address is deprecated, right? If so, that might be worth clarifying.

       4. <snip>
       
       The lifetime of temporary addresses must be statistically different
       for different addresses, such that it is hard to predict or infer
       when a new temporary address is generated, or correlate a newly-
       generated address with an existing one.

This "must" is not normative, right? I assume not, since the previous guideline in this item ("the lifetime of an address should be further reduced when privacy-meaningful events ... takes place") does not require all temporary addresses to cease working.
It might be better to drop the "or correlate a newly-generated address with an existing one" bit.

Moreover, what does "statistically different" mean, precisely? It might be more accurate to talk about this property from the perspective of the adversary. For example, I think this is trying to say that given two different temporary addresses, an adversary must have negligible probability in determining whether or not they correspond to the same or different sources. (That would match better with the Randomized Interface Identifier algorithms given in Section 3.3.)

Section 3.2.

   This document also
   assumes that an API will exist that allows individual applications to
   indicate whether they prefer to use temporary or stable addresses and
   override the system defaults (see e.g.  [RFC5014]).

If a reference to TAPS is made for these APIs, it is probably also worth including here.

Section 3.3.

   The algorithm specified in
   Section 3.3.1 benefits from a Pseudo-Random Number Generator (PRNG)
   available on the system.

What does "benefits" mean here? If we're specifying an algorithm to generate random values, shouldn't a PRNG be *required*?

Section 3.3.2.

This section assumes a "hash-based" algorithm, but is specified using a PRF. Later, in the text, it reads:

   F() could be the
   result of applying a cryptographic hash over an encoded
   version of the function parameters.

But a cryptographic hash is not a PRF. If the hash function is meant to be keyed, even that probably isn't sufficient. (Some constructions, like H(k || m) for secret k and input m, are vulnerable to length extensions.) 

I think it's probably safest to recommend a particular construction, such as HKDF with secret_key and output length equal to the number bytes needed for the interface identifier.

Moreover, requirements for secret_key are not really strict enough. There's text about F(), e.g.,: 

   F() MUST
   also be difficult to reverse, such that it resists attempts to
   obtain the secret_key

And it is said that secret_key "SHOULD be of at least 128 bits," but what if it's less? What if it only has a single byte of entropy?

Section 3.4.

Constants here are used before defined. Moving Section 3.8 to somewhere before Section 3.4 might help.

What happens if the constants are chosen such that the rule (5) is not possible to achieve?

Section 3.6.

   The frequency at which temporary addresses change depends on how a
   device is being used (e.g., how frequently it initiates new
   communication) and the concerns of the end user.  The most egregious
   privacy concerns appear to involve addresses used for long periods of
   time (weeks to months to years).  The more frequently an address
   changes, the less feasible collecting or coordinating information
   keyed on interface identifiers becomes.  Moreover, the cost of
   collecting information and attempting to correlate it based on
   interface identifiers will only be justified if enough addresses
   contain non-changing identifiers to make it worthwhile.  Thus, having
   large numbers of clients change their address on a daily or weekly
   basis is likely to be sufficient to alleviate most privacy concerns.

I don't disagree with the text, but is there anything we can cite here? Why do we think it's "sufficient," for example?

   Finally, when an interface connects to a new (different) link,
   existing temporary addresses for the corresponding interface MUST be
   eliminated, and new temporary addresses MUST be generated immediately
   for use on the new link.

If the addresses are eliminated, how does one run DAD and ensure that the same (or similar) addresses are not used on the new link?

Section 3.7.

   Devices implementing this specification MUST provide a way for the
   end user to explicitly enable or disable the use of temporary
   addresses.

Why is this a MUST, rather than a SHOULD? Since this is effectively describing an API, I think this ought to be relaxed.

Section 6.

   An implementation might want to keep track of which addresses are
   being used by upper layers so as to be able to remove a deprecated
   temporary address from internal data structures once no upper layer
   protocols are using it (but not before).

It seems an application might also want to consider other information linkable to select addresses in the future. For example, TLS resumption may link clients across two different temporary addresses. (This goes back to my comment on Section 2.1 above.)