[afnog] A heads up on a nasty IPv6 bug

Andrew Alston Andrew.Alston at liquidtelecom.com
Mon Aug 15 08:44:00 UTC 2016


Just as a note:

An hour ago we converted all dynamic customers to static customers and modified our systems to handle static provisioning when customers are provisioned (and reclaim of space for customers that churn)

So, yeah, we dumped the dynamic approach pretty quickly because it was causing issues!

Andrew


On 15/08/2016, 11:35 AM, "Jan Zorz" <zorz at isoc.org> wrote:

    Hey,
    
    Yes, this dynamic way of assigning IPv6 PDs is causing much trouble to
    operators around the world until they decide to swallow the initial pain
    and change to static PD assignments... ;)
    
    Please see my comments inline...
    
    On 14/08/16 16:40, Andrew Alston wrote:
    > If you are automatically allocating usernames for PPPoE authentication
    > its relatively simple to tie that username provisioning to static
    > assignments.
    > 
    > As an example, if your username ends in a numeric on your auto
    > provisioning system its relatively simple to use some basic maths and
    > hex conversion to produce a static subnet that’s tied.
    
    Yes, usually some math formula is created to tie the PD and username and
    then the script populates additional field in your radius database and
    after that - the user always gets the same IPv6 PD "for life".
    
    If you have multiple aggregation or termination points then some
    observation is needed prior to this so you can group users on same
    termination points to have PD from same aggregated prefix, but this is
    trivial.
    
    > With regards to the subnet issue on the RA, I’ll respond to that later,
    > though perhaps Jan would also like to make some comments on this, since
    > his understanding of it is admittedly better than mine until I do more
    > testing and labbing
    
    Ok, let's un-dust the old saying:
    
    "In theory there is no difference between theory and practice. In
    practice there is."
    
    For sake of simplicity, let's say we have only 3 components:
    
    +--------+       wan +---------+ lan        +-----------+
    |  ISP   |-----------|   CPE   |------------|   host    |
    +--------+           +---------+            +-----------+
    
    ISP can be any access equipment you are using, for example BRAS or
    anything else.
    
    CPE has for simplicity just WAN access and LAN segment behind the CPE
    for home network.
    
    host is any device that we use on our home network - it can be computer,
    laptop, tablet, mobile phone, printer - anything that connects to our
    network and can autoconfigure IPv6.
    
    Theory:
    - CPE connects to ISP and gets the Prefix Delegation (PD) from ISP.
    - ISP installs a route for that PD segment towards the CPE wan interface
    - CPE provisions /64 out of PD to LAN interface and starts sending out
    RA (Router Advertisements) packets with prefix information to LAN
    - host connects to LAN network and sends out na RS (Router Solicitation)
    packet that is responded by RA packet containing prefix information.
    - host accepts the packet, generates IPv6 address(es), does the DAD
    process and if all good - sets up the IPv6 addresses and sets the
    default route to source IPv6 address of RA message - that is a
    link-local address of a CPE LAN interface.
    - now IPv6 traffic can start flowing.
    - ISP decides that PD must change, or something is wrong with wan link
    and the PD assignment process restarts (for example pppoe client restarts)
    - in this event CPE gets a different PD from ISP and need to delete the
    old IPv6 address from LAN port that is no longer in assigned PD.
    - ISP installs a route for that PD segment towards the CPE wan interface
    and removes the route for old PD towards that CPE
    - CPE adds a new IPv6 address from new /64 from new PD to LAN, deletes
    the old IPv6 address and sends to LAN link the RA packet with old prefix
    information with lifetime 0
    - ISP removes the route
    - all hosts that receives RA packet with lifetime 0 must remove the old
    IPv6 address and stop using it.
    - now we have CPE with new IPv6 PD and all hosts on LAN link with just
    IPv6 addresses from new PD and world is beautiful and nice and a safe place.
    
    This was the theory, now let's see some practice and real world - what
    can go (and will) go wrong?
    
    We have 2 failure modes here:
    - host never receives RA packet with lifetime 0 and ends up with IPv6
    addresses from old and new IPv6 PD
    - host receives RA packet with lifetime 0, but doesn't care much because
    it's implemented wrong (and this is happening, given the wide variety of
    end user devices that are in use on this earth)
    
    In both cases we end up with a device that has the option of using the
    wrong IPv6 address as a source address for sent packet.
    
    Source address selection mechanism is broken in this case and there is
    an ongoing discussion at IETF how to fix that, but there will be some
    time before any fix becomes standard and even more time before it's
    implemented.
    
    Currently hosts selects the source IPv6 address for packets in quite
    variety of ways - some randomly, some "address that was last allocated",
    some of them "first one until it's valid", and all other possible ways,
    depending on OS and vendor.
    
    Problem is, that after changing the PD - ISP removed the route back to
    CPE for that old PD and some of those ISPs that went the "less optimal"
    path down the IPv6 road and started with dynamic PD assignments did a
    quick fix to put the old PD in "quarantene" and keep the old route back
    to CPE for additional 24 hours so all old IPv6 addresses in LAN segment
    behind the CPE expires and vanishes. This is a quick hack so you get a
    quazi functional access network and can dedicate your time to start
    planing a process to make PD assignments properly static (also over BRAS
    reboots).
    
    So, this is a pain that many of operators went through and the solution
    is quite obvious: go with static IPv6 PD assignments. Your help desk
    will appreciate you.
    
    Again: "In theory there is no difference between theory and practice. In
    practice there is."
    
    I hope I shed some light on the issue with the above explanation.
    
    See you all in Mauritius for Afrinic-25 where we can discuss IPv6
    deployment challenges at length ;)
    
    Cheers and thnx, Jan Zorz
    
    -- 
    Jan Zorz
    Internet Society
    mailto:<zorz at isoc.org>
    http://www.internetsociety.org/deploy360/
    --
    "Engineering is always positive in results..." N. Tesla
    
    



More information about the afnog mailing list