Method to mitigate DDoS attacks in ISP networks

© Rajaram Pejaver

Background

This description addresses Internet access provided to customers of an ISP (Internet Service Provider.) The subscribers may be Residential or Commercial.  (Some ISPs call this service is HSD (High Speed Data.))

 

Denial of Service (DoS) attacks occur when miscreants flood a legitimate subscriber’s servers by sending bogus packets from the Internet to their IP address.  The server’s capacity and the downstream bandwidth get saturated.  If the subscriber is a merchant that depends on incoming orders from the Internet, then their business will be hurt.  Miscreants often do this to force a merchant to pay a ransom in return for stopping their attacks.  Besides hurting the merchant, DoS attacks also disrupt the ISP’s network infrastructure by overloading it.

 

Typical victims of such attacks are Internet based gambling operations that accept bets on upcoming sporting events.  They stand to lose a lot of revenue if their Internet access is interrupted just before, say, a major boxing match.  Potential victims also include political parties, religious sites, doctors, and various other large businesses that depend on the Internet for sales.  These customers typically run high value businesses out of small offices (and are the kind of commercial customers targeted by ISPs.)

 

DoS attacks can usually be mitigated when the ISP simply blocks the IP address of the attacking host.  However, in the last few years, Distributed Denial of Service (DDoS) attacks have become common. DDoS attacks involve large numbers (10s of thousands) of coordinated attacking hosts.  These hosts are known as “Bots” or “Zombies”.  They are usually home PCs belonging to unsuspecting people around the world. These PCs are infected with viruses that cause them to respond to commands from the “Bot Master”.  The Bot Master can remotely instruct the Bots to initiate simultaneous attacks on the intended victim.

 

It is not feasible for an ISP to block the large number of IP addresses of all these hosts.  More importantly, it is hard to distinguish between the traffic originated by the legitimate users of the merchant’s service and the traffic created by the Bots.

 

Subscribers cannot effectively block DDoS attack at their sites; only the ISP can implement effective solutions in its network. Nor can subscribers be expected to host huge servers that can withstand DDoS attacks.

 

Note that this method is aimed at an ISP’s paying subscribers, and not at protecting the hosts that comprise the ISP’s infrastructure network.  There may be simpler ways of protecting the latter.

 

The following description covers attacks on web sites.  However, attacks on other services (like FTP, VPNs, remote logins, etc.) on the subscriber’s servers can be mitigated just as well.

 


Previous solutions

1.       The typical reaction to a DDoS attack by an ISP is to discard all traffic addressed to the intended victim.  This is known as “Black Holing.”  It serves to protect the ISP’s network from the attack traffic that is flooding in from multiple sources.  Attacks are automatically detected when traffic patterns exceed predefined thresholds.

 

Unfortunately, Black Holing has the effect of completing the Denial of Service attack by blocking all traffic to the victim.  The victim will be at the mercy of the attacker.

 

2.       Rate Limiting is another solution, where the ISP can slow down all traffic or limit the number of concurrent connections to the intended victim.  Again, this strategy serves mainly to reduce the impact of the traffic on the ISP’s network.

 

Rate Limiting is also undesirable because it slows down service to legitimate users of the victim’s site.  The ISP’s goal should be to provide normal service to the subscriber that is under attack.

 

3.       There are some techniques that use timing analysis to distinguish between legitimate and attack traffic.  Messages from Bots may follow some detectable patterns.

 

Timing analysis may be able to identify the Bots involved in a particular attack when all Bots display a similar signature.  Unfortunately, the software and the behavior of Bots are changed frequently by the Bot Master.  If a large ISP deployed a method to detect a signature, the Bot Master would respond within days to change not only the signature but to thwart the method used.  It is difficult to keep abreast with the behavior of the Bots.

 

4.       Lastly, there are ideas that use statistical methods to distinguish between legitimate and attack traffic. 

 

Statistical methods tend to be complicated and unreliable.  They are prone to false positives and false negatives, where some attack traffic is allowed to trickle through while some legitimate traffic is blocked.  Most of these ideas appear in research papers presented at obscure conferences.  Also, they usually lack the scale to mitigate a large DDoS attack.

 

Idea Summary

The method involves a network based Mitigation Service and a tunnel.  The Mitigation Service comprises several powerful servers and has the capacity to handle large amounts of traffic and connections.  All packets addressed to the intended victim are redirected to the Mitigation Service.  Various techniques are used there to distinguish between legitimate and attack traffic.  One key technique assumes that legitimate traffic is originated by sentient human users that can interactively respond to simple questions, while attack traffic comes from automated Bots that cannot respond correctly.  This helps filter legitimate traffic from attack traffic. Legitimate traffic is forwarded through a tunnel to the intended victim, while attack traffic is discarded.  Legitimate IP addresses are white-listed thereby improving performance.

 


Idea Description

The implementation of the idea centers around the Mitigation Service, which comprises a number (possibly one) of cooperating hosts that accept traffic on behalf of the attack victim and filter out the bad traffic.  The remaining good traffic is forwarded to its destination.  The following description refers to the Mitigation Service as a single entity. 

 

Rather than describe the parts, the following describes the idea in the three phases of attack mitigation:

1.       Configuration and provisioning

2.       Mitigation Operation

3.       Tear down

 

Configuration and provisioning

 

The flowchart in Fig 1 summarizes the steps involved in Configuration and Provisioning.  A detailed explanation follows.

 

When a DDoS attack is observed, the ISP and the intended victim agree to engage the Mitigation Service.  The Mitigation Service is rapidly configured to mimic the services provided by the subscriber.  For example, if there was a web server on port 80 on the subscriber’s host, then the Mitigation Service will listen on port 80.  Note that the web server itself is not duplicated; just that IP traffic on port 80 will be accepted and processed.  All other traffic will be discarded.  The Mitigation Service impersonates the IP address of the subscriber so that all packets addressed to the subscriber are sent to it.  A welcome page may be configured so that repeat users connecting to this service are reassured that they are at the correct site.  The Mitigation Service maintains a White List containing the source IP addresses of all legitimate senders.  This list is initially empty, but may optionally be initialized with the IP addresses of the known legitimate senders as provided by the subscriber.

 

In an alternate embodiment, the information gathering and configuration steps described above can be done before an attack is observed.  This will speed up response times to restore subscriber’s service when an attack is actually underway.

 

Next, the ISP redirects all traffic addressed to the intended victim’s IP address to the Mitigation Service by sending a routing update to all routers in the ISP infrastructure network.  The Mitigation Service should ideally be located close to the ISP’s peering points so that the huge volumes of attack traffic will avoid affecting intermediate ISP routers.  Note that attack traffic may come not only from the peering points but from other ISP subscribers as well.

 

GRE (Generic Routing Encapsulation) is a standard way to encapsulate an IP packet inside another packet.  It allows a sender to direct a packet to a specific receiver at the other end of a tunnel.  The receiver can de-encapsulate the original packet and continue to route it to its destination.  A tunnel is set up between the Mitigation Service and a suitable router connecting to the subscriber’s server.  This tunnel allows good packets to be forwarded by the Mitigation Service to the subscriber. 

 

For ISPs using HFC (Hybrid Fiber Cable) distribution, fig 2 describes three alternative implementations.  Ideally, the GRE tunnel is terminated at the CMTS (Cable Modem Termination System,) as indicated by the line marked ‘1’.  Many models of CMTSs support GRE tunnels.  CMTSs will not receive the routing updates described above and will forward the packet to the subscriber.  Alternatively, a small router dedicated to DDoS mitigation may be installed near the CMTS, as indicated by the line marked ‘2’.  It can be configured to terminate the GRE tunnel and forward all de-encapsulated traffic to the interface leading to the CMTS.  These Tunnel routers will not be included in routing updates for DDoS mitigation.  In alternative ‘3’, the edge (last hop) router adjacent to the CMTS is used to terminate the GRE tunnel.  This router should forward de-encapsulated packets to the CMTS.  Since this router is part of the ISP infrastructure, it would receive routing updates instructing it to forward packets addressed to the DDoS victim to the Mitigation Service.  The routing updates should be overridden for de-encapsulated packets, otherwise these packets will be sent back towards the Mitigation Service.  Clearly, this would be the least desirable alternative.

 

 

Occasionally, the subscriber may have a secure web site.  SSL certificates & private credentials belonging to the subscriber’s host may be needed at the Mitigation Service to correctly emulate the subscriber’s service.  This can be done by copying these items to the Mitigation Service.  Some subscribers may find this step objectionable since sharing these credentials is considered to be “bad practice” from a security point of view.  Fortunately, this situation will be rare.  Most web sites start out with a simple HTTP URL and then redirect to a SSL (Secure Socket Layer) based HTTPS URL.  Attack mitigation will be done before the SSL connection is required.

 

This completes the provisioning required for the service

 

Mitigation Operation

 


Fig 3 shows the attack in progress, before and while the Mitigation Service is enabled.

 

The thin lines indicate good traffic that must be preserved, while the thick lines indicate large volumes of attack traffic that must be filtered out and discarded by the Mitigation Service.  The basic idea is that the Mitigation Service acts as an application firewall, or as a reverse proxy.  It stops the attack traffic from reaching the subscriber, while allowing the legitimate traffic.

 

The flowchart in Fig 4 summarizes the steps involved when an attack on HTTP based service is being mitigated.  HTTP is expected to be the most common service mitigated.  A detailed explanation follows.

 

 

As traffic starts arriving at the Mitigation Service, the White List is consulted to identify packets from the known legitimate senders.  If the source IP address in a packet matches an entry in the While List, it probably is a good packet. These packets are encapsulated within GRE and forwarded to the subscriber inside the GRE tunnel.

 

The Mitigation Service maintains a large connection table that tracks the state of all the connections that are coming in. A new entry is added when a TCP connection request (SYN) is received.  Packets that are not part of a current connection are discarded.

 

Fig 4 attempts to depict a state machine in a flow chart.  Each incoming message in a connection advances the state as it proceeds through the following steps.  Each step may involve multiple sub-steps and states.

1.       TCP connection is accepted and the 3 way handshake is completed (ACK received)

2.       An SSL connection is set up if necessary.

3.       An HTTP GET is received from the sender.

4.       A page containing a question requiring an interactive response is sent back.

5.       A POST response is received in response to the question posed above.

 

The question requiring an interactive response in state 4 above can be any challenge-response question. Various methods are:

1.       CAPTCHAs, (Completely Automated Public Turing Test To Tell Computers and Humans Apart,) which require the human to type in the word shown in an image:

Note that the technology for computer recognition of CAPTCHAs keeps advancing, and the images generated must be continually improved.

 

2.       A simple question, like

a.        What is 2 + 3?

b.       What is a alphabet following C?

c.        What sex is the opposite of female?

 

The above questions must be embedded in an image (e.g. JPEG, as in CAPTCHA) to further complicate automated answering by a Bot.  Simple Questions can be thought of as the next generation CAPTCHAs.  A human can read and answer the questions quite easily, but a computer program must not only recognize the question, but must also be able to parse it and use common sense to answer it.

 

3.       A simple question based on a video being displayed (an animated GIF or JPEG may be used as well):

a.        Where did Mary throw the ball?

b.       What is the color of the car?

c.        What is 2 + 3?  (where the numbers 2 and 3 appear at different times in the video)

 

Video significantly raises the bar for computer program based responders because parts of the question may appear in different frames of the movie.  Again, a human would be able to easily adapt and respond.  Note that there is a company called nuCAPTCHA that uses video.

 

4.       A simple Java or JavaScript based game, like tic-tac-toe.

 

If the question is answered incorrectly, or if the connection times out, then the connection entry is purged.  Legitimate users that inadvertently answered incorrectly can always try again because IP addresses are not Black Listed.

 

Entries are purged from the connection table if their state does not advance within a configurable time period (say 60 seconds.)  Entries are also purged if they are determined to be for attack traffic, as shown in the flow chart in Fig 4.  Care is taken to withstand SYN flood attacks.  Other well known techniques to withstand DoS attacks are also incorporated (like blocking dark addresses (bogons), discarding excessive connections from the same source, recognizing protocol anomalies, etc.)

 

If the question is answered correctly, then it is deemed that a human is at the other end and the connection is legitimate.  The sender’s IP address is added to the White List.  The sender is sent an HTTP redirect to the same URL and the connection is closed.  The redirect will cause the sender’s browser to automatically reattempt the connection.  This time, the White List will cause the sender’s packets to be forwarded to the subscriber’s server.

 

The subscriber’s server will handle the connection as usual.  It does not need to be aware of the Mitigation Service.  Reverse traffic from the subscriber’s server will flow normally through the ISP’s infrastructure to the Internet.  Reverse traffic does not need to flow through the Mitigation Service to implement the method described here.  However, for a commercial implementation with additional desirable features, new destination addresses in the reverse traffic should somehow be added to the White List.  The subscriber should also have a way to add entries to the White List while an attack is underway and Mitigation Service is enabled.

 

IP addresses that have been added to the White List typically stay on the list until the mitigation service is terminated.  However, as shown in Fig 4, even traffic from these hosts is checked for abusive characteristics.  If there are excessive connection attempts or unusually high traffic per unit of time, then the IP address will be removed from the White List.  This protects against situations where a Bot may have somehow responded correctly to the interactive question.  It also protects against situations where a Bot and a legitimate host are both behind a NAT (Network Address Translation) gateway.  This can happen if one of the computers in a large company or a residential subscriber’s home gets infected and is acting as a Bot.  In this case, the Bot and the legitimate host will both have the same source IP address at the Mitigation Service.  If the legitimate host gets White Listed, then the Bot’s attacks will get through.  Various other well known techniques (like signature detection) can also be used to detect Bot attacks in White Listed traffic.  Unfortunately, the legitimate host will be denied service due to the existence of the Bot in the originating NAT zone.

 

The subscriber may also provide web services that are SOAP based or are RESTful.  These services are generally machine-machine communications and do not involve human operators.  They cannot be addressed by the method described here.  However, these services are usually between specific business partners, and the IP addresses or ranges of the legitimate partners can be White Listed.

 

Besides HTTP, protocols like FTP and SSH can be handled in a similar way.  A solution for SMTP (email) is provided by several commercial vendors as SPAM protection.

 

Tear down

 

DDoS attacks usually do not last long.  The attackers quickly get bored and want to move on to new targets.  If an attack is successfully mitigated, the attack traffic will stop within hours.  Restoring normal service simply involves sending out a routing update to the ISP’s infrastructure routers to stop diverting the subscriber’s traffic to the Mitigation Service.  The GRE tunnel can be taken down at some point later.

 

The final billing and usage statistics can include:

·         Mitigation duration, start & end times.

·         Connections permitted through & blocked.

·         Total packets (& bytes) permitted through & blocked.

·         Traffic rates to subscriber: before and during mitigation.

·         Final White List (useful if attack resumes.)

·         Geographic distribution of blocked IP addresses.

 

Conclusion

 

Some of the unique innovations claimed herein are:

1.       Using human interaction to distinguish between legitimate and attack traffic to mitigate a DDoS attack.

2.       Though CAPTCHA is well known, the methods of asking “simple questions” and using video to determine whether a human is present at the other end of the connection are new and unique.

3.       Using a centralized network based Mitigation Service that can be shared on demand by multiple ISP subscribers.

 

Commercial Value

The mere availability of such a service would make an ISPs HSD (High Speed Data) service appealing to high value commercial customers, and would attract and retain their business.  Conversely, lack of such guarantees of protection against DDoS may drive potential commercial customers away to other ISPs.

 

An ISP can offer such a DDoS mitigation service to its HSD customers as a value added feature.  The feature could be offered on a monthly subscription basis (like an insurance) or on a per event basis (like an on demand service.)  From the subscriber’s perspective, a subscription would be attractive because it would offer faster service recovery times, since much of the configuration information would be gathered ahead of the possible attack.

 

The DDoS mitigation methods described are broad enough to be useful to other ISPs.  The method of using “simple questions” is a new alternative to CAPTCHA and may be useful to many other web site developers who wish to filter out automated requestors.  Licensing revenue may be gathered from these sources.

 

=========