For over three years nowLinux, not Windows Server, was the most popular virtual machine (VM) operating system on Microsoft Azure. And, of all the Linux distributions used on the cloud, Canonical Ubuntu has long been the most popular. Alas, this is not a “Yes, for Linux!” story. It’s the contrary. Even Linux has its fair share of problems, and in the latest, a recent DNS update in Ubuntu 18.04 led to Azure Virtual Machines fail. Many Azure VMs fail.
The problem started at 06:00 UTC on August 30, 2022, the problem lasted until August 31. Now that’s ancient history, we can only hope we don’t see a repeat, bearing in mind that Clouds, like any other technology, will fail from time to time.
The crux of the matter is when a security patch, systemd 237-3ubuntu10.54, was made for Ubuntu 18.04 instances, this made them unable to resolve DNS queries. This, of course, broke the networking, and that’s it. Repeat after me. It’s always DNS.
The fixed CVE-2022-2526a systemd use-after-free memory vulnerability in how systemd handles DNS packets. If left unfixed, this is a high-level security issue that could bring systems down and gain root-level privileges. Besides Ubuntu 18.04, the security flaw and its fix are also present in Red Hat Enterprise Linux (RHEL) 7 and 8.x and Debian-Linux.
So, ask yourself, why is this problem not appearing everywhere and in all sorts of clouds? This is because Microsoft Azure has a netplan setup, an Ubuntu-specific way of setting up cloud networking, which uses the “driver” match to set up networking. If a udevadm trigger is executed, the pair that contains this information is lost. Then the next time netplan is run, the server loses its DNS information. In short, the blame does not lie entirely with Ubuntu. Azure should also get its fair share.
How to fix it
That said, the 64-bit question is, “How can I fix this problem?” There are several answers:
You can, of course, simply restart your instances. This will give your reactivated VM a new DHCP lease and new DNS resolvers.
Microsoft has also rolled out automatic remediation for Azure Kubernetes Service (AKS) clusters. But, and this is a big problem, some AKS nodes are not covered by auto-remediation detection, and therefore they are not remediated.
The moral of the story, as always, is to always check the DNS for any issues. Yes, I’m pretty serious. And even simple fixes to complex cloud-based systems can lead to very complicated issues. So always pay attention to production systems when fixing them, even for the most minor issues.