LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   DNS issues - unable to ping domain names (https://www.linuxquestions.org/questions/linux-server-73/dns-issues-unable-to-ping-domain-names-4175732346/)

jackwayneright 01-01-2024 12:42 PM

DNS issues - unable to ping domain names
 
Hello! I'm attempting to debug what I think is a DNS issue with a server, and I'm unsure how to proceed.

Main information
The server can `ping 8.8.8.8`, but cannot `ping google.com` (`Name or service not known`). My `/etc/resolv.conf` contains:
Code:

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#    DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 8.8.8.8
nameserver 127.0.0.53
options edns0

This is the contents when it is regenerated using `resolvconf -u`.

Previously when I had manually added `nameserver 8.8.8.8` to the top of `/etc/resolv.conf`, I was able to `ping google.com`, but other services (see long version) still seemed to be failing in some way. However, since I've attempted some other fixes, such as `sudo apt install --reinstall resolvconf network-manager libnss-resolve` and others, even the presence of `nameserver 8.8.8.8` in `/etc/resolv.conf` does not seem to allow `ping google.com` to work. I'm also now unsure of where `nameserver 8.8.8.8` is being added from during a `resolvconf -u`, as none of `/etc/systemd/resolved.conf`, `/etc/resolvconf/resolv.conf.d/head`, or `/etc/resolvconf/resolv.conf.d/base` seem to contain this entry, and `/etc/network/interfaces.d/` is empty.

My `systemd-resolve --status` appears as:
Code:

Global
        DNS Servers: 8.8.8.8
                      8.8.4.4
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 11 (veth399f989)
      Current Scopes: none
      LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 9 (vethc76fcf1)
      Current Scopes: none
      LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 7 (vethbb5aff2)
      Current Scopes: none
      LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 5 (br-ad7981a8fd08)
      Current Scopes: none
      LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 4 (docker0)
      Current Scopes: none
      LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 3 (eno2)
      Current Scopes: DNS
      LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
        DNS Servers: 134.74.128.7
                      134.74.192.2
          DNS Domain: ~.

Link 2 (eno1)
      Current Scopes: none
      LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Additional information
I originally set up this server for my PhD advisor when I was student 5 or so years ago. Primarily, they use this server to host a WordPress site and a MediaWiki site that I set up at the time. This has continued to work fine for the last 5 years.

The original sign of some issue was that, recently, for both the WordPress site and the MediaWiki site, any page updates began to fail. For example, the MediaWiki pages can still be viewed, but upon attempting to submit an edit to a page, the user receives a timeout. On the server, Nginx receives the POST, PHP seems to execute the appropriate script for the post, but then the page is left un-updated. I'm not finding any errors in any of the Nginx, PHP, or database logs. Given that there are the other DNS issues made obvious from the above pinging, I suspect that the server is sending requests to itself, but due to these DNS issues, these requests are never really sent or received.

The server is behind a university controlled entry point, then a lab router. I no longer have physical access to the machine, but I can periodically have someone go in to physically access the machine when needed. As part of my attempts to fix it, at one point I had run a package update followed by a reboot. For one reason or another, the reboot did not complete, and the machine only shutdown. Someone had to be sent in physically to turn the machine back on for me. So with my other fixes, I would hope to avoid trying solutions that require reboots, though, I understand this often may be required.

I am far from an expert in either unix related topics or networking topics, so I apologize in advance for any obvious mistakes or troubleshooting that I haven't checked.

Any suggestions would be greatly appreciated. Thank you for your time.

mrmazda 01-01-2024 10:11 PM

I'm no networking expert either. /etc/resolv.conf is always my first thought when ping fails. :p Typically it's just fine, and the problem is there is no default route set up. Check yours with: ip route. If that's not it, then check your firewall for blockage. Any more than that and I would have to Google it. :)

wpeckham 01-01-2024 10:26 PM

#1 Ping is not a dependable test of network continuity if your are routing through devices that may block ECHO packets.

#2 a. Test your default nameserver using lookup utilities such as nslook, host, or dig.
Code:

nslookup www.google.com
__ b. Then test lookup using the external nameserver you WANT to have work using the IP address
Code:

nslookup www.google.com 8.8.8.8
Note the responding server names/addresses as the utility reports them in addition to the target information. It may be important.

#3 Whoever runs the lab network needs to be consulted. IF that fails you need to consult with whoever manages the U network. See if there is a required nameserver and if they are blocking lookups on port 53 to external nameservers.

Some requirements have changed, and some facilities now require nameserver encrupted lookups or restrict lookup traffic. If that is one of them, they may have hosed your operation without even being aware it existed.
IF they have a sysadmin or network admin that is familiar with the OS on that machine perhaps they will be willing to troubleshoot the network settings and document the fix.

Now we can talk about recovery. Without direct access to the machine, physical or network based, this will be tricky and I am not in a position to help. I suspect the updates failed in part or full because the update/repo servers got "lost" due to the name resolution issue. If you cannot check the logs to determine what failed and what worked, I am not sure how you will remote troubleshoot that part of the problem.

IS it an option to pull a backup of the data and rebuild the machine and reload the data? In a worst-case scenario you can still recover if that is an option and you have decent backups.

jackwayneright 01-02-2024 12:10 AM

@mrmazda
There appears to be a default route:
Code:

$ ip route
default via 134.74.113.100 dev eno2 proto static metric 100
134.74.113.0/24 dev eno2 proto kernel scope link src 134.74.113.131 metric 100
169.254.0.0/16 dev eno2 scope link metric 1000
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.113.0/20 dev br-ad7981a8fd08 proto kernel scope link src 192.168.113.1

As far as the firewall, I'm sure exactly which commands would tell me what I need to know, or what exactly I should be looking for, but both `iptables -L` and `ufw status verbose` suggest things are more or less allowed in all cases.

Thank you for those initial checks though!

==========================================

@wpeckham
`nslookup www.google.com` and `nslookup www.google.com 8.8.8.8` both give the same result of
Code:

Server:                8.8.8.8
Address:        8.8.8.8#53

Non-authoritative answer:
Name:        www.google.com
Address: 142.250.64.100
Name:        www.google.com
Address: 2607:f8b0:4006:807::2004

I'm not exactly sure how to interpret these results. Clearly it's pointing to some address with the nameserver used being 8.8.8.8. And a different site has a different address (still using the nameserver 8.8.8.8). But I don't know if this means that a route to the external address is being made, or if this is just in a name table somewhere more local.

Unfortunately, I'm still probably the most knowledgeable person about the lab network (though, obviously, I'm not particularly knowledgable). There are now other students who kind of fill the role of administrator temporarily. It's a small computer science research lab, so the students are computer science students, but none are network or sysadmin specific students. They can be there physically while I work with them remotely, though. And if I can't find the solution working with them, I will certainly contact the university department network administrator to see what I can find out.

As far as recovery, I think I said things in a way that might have been confusing. As far as I know, nothing on the server is broken (except the network setup). By page updates, I meant if a user of the MediaWiki attempted to update a page on the wiki, the sent update would seem to hang. But as far as I know, all the software on the machine is still functioning properly. When I ran the package updates, I had manually updated `/etc/resolv.conf` to include `nameserver 8.8.8.8`, which previously made connecting to the package repositories work. And the package updates seemed to run fine. But those updates, or one of my other attempted fixes now makes that no longer access the external network. I think when I ran `apt install --reinstall resolvconf network-manager libnss-resolve`, `/etc/resolv.conf` both started being generated with `nameserver 8.8.8.8` when the file is regenerated, but also made that no longer connect to the external sites. That said, I'm not at all certain. I tried to revert any changes I made that didn't improve the situation, but the package updates are the one case where I didn't revert changes.

Thank you much for your time!

wpeckham 01-02-2024 02:34 PM

That looks like name resolution IS working.

What happens when you try
Code:

ping -c 2 www.google.com
???

jackwayneright 01-02-2024 03:44 PM

Earlier today, I had one of the students work with me from in the lab. From the Ubuntu desktop, using the GUI interface to the network, we found that the DNS for the connection was set to manual with some university internal addresses, though external to the lab network (e.g. `134.74.128.7, 134.74.192.2`). After changing this to `8.8.8.8 8.8.4.4` and rebooting, the server was able to access external domain names fine. I'm a bit confused as to where the GUI network settings live compared to the settings from `/etc/resolv.conf` and what not, and why the GUI one was taking precedence. I'd be interested to know if someone has the explanation for that part.

However, submitting updates to our sites still seem to be failing. But now, useful messages were showing up in the Nginx logs. Notably, during HTTP POSTs to the sites, we were now receiving `499` or `408` codes. Except when the POST was sent from the server itself or from a machine on the lab network. We also found that the server itself was not on the lab network, but goes to what is presumably the department network. The lab router also next goes to the department network from there. Additionally, machines on the university wifi *do* encounter the issue with the POSTs. So the fact that machines on the lab network can successfully send POSTs, but machines on the university wifi cannot suggests to me that there is a conflict between what the department router expects and what the server expects. So I'm now talking with the professor that owns the lab and the department to get things sorted out.

Thank you again for all your time!


All times are GMT -5. The time now is 09:55 AM.