The terminology "RTO estimate" used throughout the document is confusing to me.  The RTO is a solid value, not estimated, and is computed from estimates of the RTT and RTT variation.  You could talk about estimating the "optimal" RTO value (for some definition of optimal), but I don't think that's the case here.  Similarly section 4.2 is titled "Measured RTO Estimate", but RTO is not a measured quantity (it is always computed).  I think this terminology needs to be corrected throughout the document.

Section 3 seems important to me, but doesn't say very clearly what it means by "generally applicable".  Does that mean that it could run across the Internet?  Does it work if there are very short or very long delays, or only ones around the values mentioned in Apppendix C?  Does it work if the links are very thin bandwidth?  Is it efficient when there is very high bandwidth (e.g. Gbps range)?  Since there are many classes of IoT device and many possible use cases, it seems important to me to be a little bit more clear about the envisioned use cases, or at least the specific ones that have been explored to-date, versus what hasn't been explicitly considered but might (or might not) also work.  The appendix just sort of uses the word "diverse" and mentions a couple link technologies, but otherwise doesn't provide any enlightenment.

The first sentence in section 4 doesn't make much sense to me, since the default timeout doesn't imply any knowledge of the RTT.  Do you mean to say that a more appropriate RTO can be computed once some RTT samples are available?  The wording could be clarified here.

The description in the beginning of section 4.2 says that ambiguous samples resulting from retransmissions are used in the "weak" estimator, and seems to be saying that Karn's algorithm is not used for filtering samples?  The rationale seems to be in 4.2.2, but the text there is vague.  In general, it would seem to result only in a potentially slower than necessary timeout, but still faster than the default.  That seems inherently safe, and I'd think there could be a stronger argument made than the current text.

That said, the statement in this section that the rate of retries is reduced does not make sense, since any time the RTO decreases, the rate of retries should be increasing, with all other things considered equal?

Is there sensitivity to the weights for the EWMA?  This has been studied a bit for TCP, but I guess may be different for CoAP scenarios since there are less samples typically, or something?

Why is this being targeted for just Informational rather than Experimental or better?  It's mentioned as being informational in both the header and Section 1.1, but I didn't notice an explanation of why the WG thinks it wouldn't be a candidate for widespread use, etc.  Is there a concern that needs to be described?