Document: draft-ietf-mboned-driad-amt-discovery Reviewer: Bernard Aboba Review result: Ready with Nits This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. This draft is ready for publication from a transport point of view, with the exception of a few (relatively minor) issues: Section 2.5.4.1 " The RECOMMENDED timeout is a random value in the range [initial_timeout, MIN(initial_timeout * 2^retry_count, maximum_timeout)], with a RECOMMENDED initial_timeout of 4 seconds and a RECOMMENDED maximum_timeout of 120 seconds. " [BA] The draft provides a justification for the initial_timeout value of 4 seconds, but not for the maximum_timeout value of 120 seconds, which seems somewhat high. It is my suspicion that the value is set this high to allow for robustness in dealing with potential routing transients. It would be helpful to state the reasoning. Section 2.5.4.2 " In some gateway deployments, it is also feasible to monitor the health of traffic flows through the gateway, for example by detecting the rate of packet loss by communicating out of band with receivers, or monitoring the packets of known protocols with sequence numbers. Where feasible, it's encouraged for gateways to use such traffic health information to trigger a restart of the discovery process during event #3 (before sending a new Request message). However, to avoid synchronized rediscovery by many gateways simultaneously after a transient network event upstream of a relay results in many receivers detecting poor flow health at the same time, it's recommended to add a random delay before restarting the discovery process in this case. The span of the random portion of the delay should be no less than 10 seconds by default, but may be administratively configured to support different performance requirements." [BA] There is good reason to be concerned about causing synchronized rediscovery as a result of a transient network event, if "poor flow health" is diagnosed too readily. As a result it would be useful to have more specific advice on the definition of "poor flow health" as well as how to calculate the "random delay". My assumption is that we are talking about *major* and *sustained* loss here (e.g. a period larger than most routing transients), as well as a *substantial* delay (to avoid instability). Concerns unrelated to Transport Security Section 6.2 "There must be a trust relationship between the end consumer of this resource record and the DNS server. This relationship may be end-to- end DNSSEC validation, a TSIG [RFC2845] or SIG(0) [RFC2931] channel to another secure source, a secure local channel on the host, DNS over TLS [RFC7858] or HTTPS [RFC8484], or some other secure mechanism." [BA] This paragraph is mixing e2e security mechanisms (DNSSEC) with mechanisms such as DoT and DoH. The threats addressed by each mechanism are different (e.g. RR modification versus snooping) so it would be helpful to be clear about what the threat model is. Is there a privacy concern relating to unauthorized snooping of AMTRELAY RRs? Or is the issue more modification of the RRs? Overall utility [BA] It is not clear to me why the AMTRELAY RR is needed, given that Section 2.3.1 makes it clear that querying this record is a last resort: " It's only appropriate for an AMT gateway to discover an AMT relay by querying an AMTRELAY RR owned by a sender when all of these conditions are met: 1. The gateway needs to propagate a join of an (S,G) over AMT, because in the gateway's network, no RPF next hop toward the source can propagate a native multicast join of the (S,G); and 2. The gateway is not already connected to a relay that forwards multicast traffic from the source of the (S,G); and 3. The gateway is not configured to use a particular IP address for AMT discovery, or a relay discovered with that IP is not able to forward traffic from the source of the (S,G); and 4. The gateway is not able to find an upstream AMT relay with DNS-SD [RFC6763], using "_amt._udp" as the Service section of the queries, or a relay discovered this way is not able to forward traffic from the source of the (S,G) (as described in Section 2.5.4.1 or Section 2.5.5); and 5. The gateway is not able to find an upstream AMT relay with the well-known anycast addresses from Section 7 of [RFC7450]." In particular, DNS-SD RRs can easily be added with DNS service providers, while this is not necessarily the case for a new AMTRELAY RR. So are there really situations in which it was not feasible to add DNS-SD RRs, but using the AMTRELAY RR is more convenient/easier to deploy?