Backends are not removed from VIP table when killed

Description

Description

Environment: DCOS 1.8.7 (Azure, installed using ACS Engine modified ARM templates - Ubuntu 16.04 with 4.4.0-28-generic).

After adding a docker contaneriser based app set to use the default `dcos` overlay network and assigned a VIP via label in `portMappings` (see attachment with app config in my-group.json, it's the /my-app/backend app), VIP assignment works and traffic gets routed correctly to the overlay network IPs of its tasks (using both the `SERVICE_LABEL.l4lb.thisdcos.directory:$VIP_PORT` as well as the `IP:$VIP_PORT`). The issue appears when removing (properly killing) tasks, either by scaling up and down, scaling down the initial 3 instances, or completely removing the app: backends are never removed from the VIP and so requests directed to the no-longer-existing backends fail, though `total_failures` continued to be tracked for each backend.

Steps to reproduce

1. Before launching the app: check vips in any cluster node (results are the same across, I've tested in different nodes and types of nodes):

2. Launching (`dcos marathon group add my-group.json` ) the app will create the VIP and digging the DNS name will give (only relevant info here)

Vips will show backends. After a few requests:

The status of the group (all apps have their healthcheck in place) will be:

3 Scaling up, and all continues well...

4 Scale down and the issue appears.

Though all tasks appear healthy, on querying our backend endpoint we get a few of

(see failed_requests.txt for more)

Expected

VIPs table. Just what we had prior to scaling up above in 2.

Current result

VIPs tables like this:

^^ Notice the `total_failures` on the first one. What's more, failures are more common the more backends that have been ever registered and are now removed (for instance in a zdd), or when IP allocation is done so another app gets an overlay IP registerd in the back end, in which case requests are directed to a different task.

Unfortunately removing a group/app does not reset the backends (of the VIP) either. The above output for the vips endpoint is the same after complete group removal.

Status

Assignee

Deepak Goel

Labels

Components

Configure