In our configuration (3 master nodes, 4 private agent) when the master leader is lost, cluster become unstable.
I know that it is not optimal, but our upstream dns server already has a mesos zone.
The problem may be related to leader.mesos record in mesos dns. It seems that if leader.mesos is not defined in dcos-mesos-dns, it forwards the request to upstream server and if the record is present it receives a wrong ip.
Is it possible to disable query forward from mesos-dns only for his domain ".mesos" ?
reported by me
The problem is linked to all lines in systemd script like
ExecStartPre=/bin/ping -c1 ready.spartan
ExecStartPre=/bin/ping -c1 leader.mesos
The resolv.conf library try to complete domains with: 1) search from resolv.conf, domain from resolv.conf, domain from local hostname. The last is our problem: leader.mesos is transformed in leader.mesos.mycompany.com and this zone exists in upstream dns and points to another mesos cluster.
Solution: Replace leader.mesos with leader.mesos. Note the point at end, this instruments resolv.conf to treat it as a full name and to do not complete it.
> Solution: Replace leader.mesos with leader.mesos. Note the point at end, this instruments resolv.conf to treat it as a full name and to do not complete it.
I think that this will work. Do you want to submit a PR with it to see how it works out?
Also, what are you doing where you have other Mesos clusters at mesos.FQDN? Are these also DC/OS clusters?