[SOLVED] ssh can't connect, then I ping, then ssh can, how can it be ? weird situation...

vincentvije · 08-29-2022, 05:01 PM

Hello,

After loosing some hours with no results, I hope to find some ideas from the community.

I use A sftp server inside a school. It worked in the past till last holydays.
Clients always connected to it without problem with sshfs... but since some days, no.

Well here is the description : if I try to connect with sshfs or ssh to the server with its ip 172.16.0.57, the linux client (exemple 172.16.0.101) sshfs or ssh says that the user can't be found...

Then I open a root shell in the client and ping 172.16.0.47...
And then try again, and it works. sshfs or ssh connect...
I need to do this in all clients..., there is 200. And if I reinstall, again.... Impossible.

I spent some hours trying and trying to figure, moving the network, checking the server and the client with no success.
It's like if tcp does not work without first a icmp. The network is just switched.

Maybe the solution is easy, but I can't figure it...

Thanks a lot for all your help.
Regards,
Vincent

Turbocapitalist · 08-30-2022, 01:06 AM

Root should not be necessary for ping. This sounds like some kind of router problem not related to either the SSH client or the SSH servers. What, specifically, changed with the networking and do all the servers have the right addresses and what kind of timeouts are set in the router?

In the mean time, if it's not you who has to fix the router, a work-around might be to use the Match directive in the client's configuration file while you negotiate the router configuration repairs.

Code:

Match host 172.16.*.* exec "ping -c 2 -w 2 -q %h"
        LocalCommand date +"Today is %%F"

Host 172.16.*.*
        AddKeysToAgent yes
        IdentitiesOnly yes
        UpdateHostKeys yes

Host *
        ServerAliveCountMax 4
        ServerAliveInterval 30
        TCPKeepAlive yes

Or whatever.

vincentvije · 08-30-2022, 02:37 AM

Hello Turbocapitalist,

Thanks a lot !
Yes, of course, it's the same with root of user account.

I'm managing the network, then I thought it could be the source of the problem but found nothing.
The serveur and the client are in the same switched network, and there is no router to go through.
I thought it could be a switch problem, or eth cable problem, but no because making the ping solve the problem.
Nevertheless, after making the ping on a client, sometime the sshfs connection is slow.

It's like if the switches doesn't find the path with sshfs, but icmp that doesn't use transport layer seems to allow to find the path. But this should not be because of only switched network.

In the mean time, as you said, I will try your solution and report.
I'll be able to validate you proposal tommorrow Wednesday, or Thursday.

This is weird, I lost a lot of time, but if your workaround works, it's already some solution. Thanks !
I'd like to find the cause of the problem...

I will report here the results then.
If needed, I will post some further question.

Thanks a lot again !
Regards,
Vincent

elgrandeperro · 08-30-2022, 09:20 AM

Is the client and host on the same network? If not, then the packet is layer 2, it should not get to the router.

The first thought (and I have seen this) is that something is proxy arping things. When the ping is not issued, do a "arp -a" to see if the arp
table has other entries for that ip. Then ping, and see if it changes.

Some range extenders do some proxy arping to reduce traffic across them.

I've seen strange things when netmasks were not consistent, but it usually is a box on the network cannot talk to a box on the same network because instead of arping it default routes it (layer 3 instead of layer 2).

Lets start with if they are on the same network or not.

vincentvije · 08-30-2022, 04:18 PM

Muchas gracias elgrandeperro,

They are on the same switched network, inside the school, server with 172.16.0.47 and client with 172.16.100.x.

I think too that there is a problem with Mac addresses with maybe some switch.
I checked the client, the arp table is empty.

but maybe the switch are doing something strange.
Thursday I will be able to check all of this.

In the mean time, please what do you mean by "instead of arping it default routes it (layer 3 instead of layer 2)." ?
You mean it doesn't try to reach it with layer 2 ?
Podemos hablar in español (or french or italian).

Thanks again,
Kind regards,
Vincent

elgrandeperro · 08-30-2022, 08:37 PM

Yes. The netmask tells the interface when to proxy arp and when to just go default route. So if the mask is wrong, then it doesn't think the ldestination is local, it sends it to the default route even though it is connected to the same wire. And then some routers consider the packet to be spoofed.

So is it 172.16.0.0/16 or 172.16.0.0/24 for the server and 172.16.100.0/24 for the client? Is it routed to the server, layer 3?

And is it a managed switch like a Cisco etc.?

I've seen a setup where people put servers on a network segment. Then they enable proxy-arp so that each smaller segment can receive a proxy arp from the server segment. Kind of like a layer 3 access without a router.

It has to be something very strange like that.

You can compare mac id's from your server to the one in the arp table. A proxy arp would have a different mac on the client side, like a router or a switch that does limited layer 3,

vincentvije · 08-31-2022, 03:41 PM

Hello, the two have /16. It's fine. All in layer 2.
The switches are managed but I didn't change anything since years.
But you're right, something wrong with Mac table in switches could be the problem... I check it tomorrow then report.
But why would I have this problem, I don't know.
Regards,
Vincent

vincentvije · 09-01-2022, 10:03 AM

Thanks elgrandepero, it workarounded the problem so that we could work today !

And thanks Turbocapitalist, I checked the big network of the school, this summer there was a new switch layer 2 / 3 that were installed in place of a standard one without noticing me... It was configured in layer 2/3 with proxy because destined to another part of the network. I changed it with a standard switch, and all turned back to normality..., and furthermore the network found back its speed !
The problem is solved, this afternoon I checked the network and found it... The technician made it in a rush in summer following a power crash...

I'm very sorry for the disturbance...
Weel, I learned to configure exec in ssh_config I didn't was aware...
And I didn't though about layer 2/3 switch problem, then without your advise, I would have checked it.

Thanks again, regards !