securing backups via rsync and SSL

slimm609 · 05-03-2011, 08:20 PM

Quote:

Originally Posted by Skaperen

This is not relevant to security. To begin with, Linux will respond on both interfaces to both IP addresses. So the distinction between them is moot. Any listening service that listens on the unspecified address (0.0.0.0) can be connected to from either interface via either IP address. To limit the service to be connected to from only the local machines, that service needs to bind its listen socket to only the private IP address. Even then, it can be reached from either interface. There are ARP settings to sort-of turn that off.

The best way to prevent outside access to the server is to block it at the firewall, whether or not NAT is used for other things. Just don't provide a path to that service and/or that machine from outside.

It's also moot to my issue because the security concern is addressing situations where access to the machine and ports are there internally, but the backups need to be compartmentalized. It's an entirely different security aspect than the one for preventing outsiders from getting in.

Since when? Almost every service in linux can bind to a single interface. With SSHD in the sshd_config the option is "ListenAddress 10.10.10.2" and that would ONLY listen on that interface. This option has been around for years and years. Apache, mysqld, openldap and many other services also have the same options.

its really not any different. Here is some security "guidance" from multiple government agencies and also DoD. I assure you all of your concerns are covered in these documents and I can also assure you that everything i have said about the way to do it meets or exceeds ANY of the controls that would address it in all the documents.

Department of Defense INSTRUCTION 8500.2
http://www.dtic.mil/whs/directives/c...df/850002p.pdf
DIRECTOR OF CENTRAL INTELLIGENCE DIRECTIVE 6/3
http://www.fas.org/irp/offdocs/DCID_6-3_20Manual.htm
NIST 800.53
http://csrc.nist.gov/publications/ni...05-01-2010.pdf

Quote:

Originally Posted by Skaperen

This is still letting the user (or other server) access root via ssh. And that's what I want to avoid. That's a 3 minute window when anything can be done, not just a backup. Logging in as root is what I want to totally avoid.

See, what I need to do is place restrictions on what a user (server to be backed up) can access within the backup machine, while still providing for a means to store metadata. It's also necessary to be sure the user can't store SUID executables and gain full root access that way. They still need to be able to store SUID executables in terms of storing that metadata (e.g. to back up an SUID executable from that server, and be able to restore it exactly that way so it can run as intended on that server).

But I still need the level of security which allows the backup client to be authenticated by the backup server (so someone cannot spoof the client and overwrite its backups), and also allow the backup client to authenticate that it is connected to the real backup server and not a spoof.

One possible solution is to compartmentalize the backup servers ... a separate backup server for each group (for which trust is administratively disallowed between groups).

Security Engineering is not about making the most secure server in the world. Its about balancing the risk vs security vs cost. If the server is allowing root ONLY with ssh pre-shared keys for 3 minutes a day (which with the newest version of openSSH and openSSL you can do ECC keys for ssh) along with the the ListenAddress as well as all the other things that have been stated throughout the posts the risk vs cost is very minimal. If you have 2000 servers to back up sure having one in each group makes a little more sense but if you have only a handful of servers to backup then the cost of managing/maintaining extra servers does not outweigh the gained security. Which is virtually zero with a proper configuration. It all comes down to either fixing the risk or mitigating the risk. You have servers that you run services on and you use a firewall to help mitigate the risk of the server being attacked then you have a outer router with ACLs configured to even further mitigate the risk of the server and also help mitigate the risk of the firewall. You will NEVER have a server that is 100% secure.

jadrevenge · 05-04-2011, 04:58 AM

Quote:

Originally Posted by Skaperen

I don't follow.

...

3 systems: 1 origin, 2 replicas

every day a script runs on the origin that makes a snapshot on the remote system (using the current date as the snapshot name), takes about 5 seconds.

on that remote server there is now a quickly accessible backup (as a mounted filesystem) of the entire filesystem at the point in time of the snapshot.

that is not enough for us ... even on Raid'ed disks ... we want a remote backup in case of fire/hardware failure.

we run a "zfs send -i <filesystem>@<yesterdays-snapshot> <filesystem>@<snapshot>", to get the incremental snapshot data, and pipe it to a file.

this file is then rsync'd to the remote servers (as a normal user) where a separate script runs looking at the folders (running as root) where the "zfs receive" command runs to restore the snapshot.

the data file is stored (temporarily only) in case we have had trouble with file transfer, or missed a day in file receipt (we cannot guarantee any of our connections to not go down when there is no one on site, or only clumsy people on site, or engineering contractors turning off power to plug in an "oven"

)

to get the data back:

if the origin server is still alive we can get the file directly from the snapshot on that server, instantly because it is mounted.

if the origin server is dead, or we want access to the files at the site where the replica are stored we can get it directly off the mounted snapshots at those sites.

the recovery is instant ... but that is because the snapshots and backups are restored constantly, not at the time that we need to recover a file.

there are a large number of checking scripts and system admin in place to watch for partial rsync'd files, or untransferred data. Scripts and output fortunately not watched by me any more.

when we set up a new site to be replicated a copy of the initial snapshot is taken physically onto removable media (USB Hard Drive) and transferred to our local site, if that physical transfer fails we can always repeat the process with a different device, or if the initial snapshot is small enough rely on transferring it over the VPN, but it hasn't failed yet ...

I hope this goes some way to helping with your understanding of ZFS, or at least with our ZFS backup routine.

Jon

Skaperen · 05-04-2011, 08:27 AM

Quote:

Originally Posted by slimm609

Since when? Almost every service in linux can bind to a single interface. With SSHD in the sshd_config the option is "ListenAddress 10.10.10.2" and that would ONLY listen on that interface. This option has been around for years and years. Apache, mysqld, openldap and many other services also have the same options.

Binding to the interface is different than binding to an IP address. I've seen no syntax in Apache to specify what interface (though there might be one). Are you saying Apache does an interface binding on its own based on the IP addresses bound to the interface? What if an IP address is on two of four interfaces? Then what would Apache do? Create 2 sockets (since a socket can only bind to one interface)?

Quote:

Originally Posted by slimm609

its really not any different. Here is some security "guidance" from multiple government agencies and also DoD. I assure you all of your concerns are covered in these documents and I can also assure you that everything i have said about the way to do it meets or exceeds ANY of the controls that would address it in all the documents.

Department of Defense INSTRUCTION 8500.2
http://www.dtic.mil/whs/directives/c...df/850002p.pdf
DIRECTOR OF CENTRAL INTELLIGENCE DIRECTIVE 6/3
http://www.fas.org/irp/offdocs/DCID_6-3_20Manual.htm
NIST 800.53
http://csrc.nist.gov/publications/ni...05-01-2010.pdf

I did not see my concerns covered in these at all. The first one just didn't even come close. The second one does cover some server related issues, but makes assumptions about delineation that do not apply here (it assumes 2 levels and that isn't the case). It also suggests the very things I am already trying to accomplish (but have yet to see technical solutions). The third seems to be more about the decision making structure than anything else. None of these address my original concern. They don't even address the (IMHO incorrect) direction you are suggesting regarding interface or address level security.

Quote:

Originally Posted by slimm609

Security Engineering is not about making the most secure server in the world. Its about balancing the risk vs security vs cost. If the server is allowing root ONLY with ssh pre-shared keys for 3 minutes a day (which with the newest version of openSSH and openSSL you can do ECC keys for ssh) along with the the ListenAddress as well as all the other things that have been stated throughout the posts the risk vs cost is very minimal. If you have 2000 servers to back up sure having one in each group makes a little more sense but if you have only a handful of servers to backup then the cost of managing/maintaining extra servers does not outweigh the gained security. Which is virtually zero with a proper configuration. It all comes down to either fixing the risk or mitigating the risk. You have servers that you run services on and you use a firewall to help mitigate the risk of the server being attacked then you have a outer router with ACLs configured to even further mitigate the risk of the server and also help mitigate the risk of the firewall. You will NEVER have a server that is 100% secure.

I am not trying to make the most secure server in the world. If I were, it would be disconnected. Security is about a balance between the needs for function and the needs to control access to that function.

As for the 3 minute thing, that makes an assumption that the attacks are just random in nature. If there was an attack on this backup system, it would not be just some random attack. Random attacks won't be anywhere close to it. Those are already blocked. The attack on this system would come from someone on the inside, who should not have access to certain parts of the backup data. It would be from an administrator of a server in one group trying to access data from another group.

While I admit such an attack is very unlikely, the path for such an attack would exist in the "backup push model" (because root on the server being backed up has the private key that is being allowed to have root access on the backup server which is holding data from all groups). The "backup pull model" by comparison is considered the unsafe approach because it has "all eggs in one basket" (the backup server has a private key that every other server allows root access). We are doing it this way now, and that needs to be changed.

I'm not looking for bureaucratic requirements and models. I already know the kind of compartmentalization I need to have. What I am looking for is a means to deploy it that way. And it will require a "two way trust" structure, discrete control of the trust relationships, as well as no access in either direction to root processes. I suspect some commercial backup products could do this. However, an open source solution is much preferred. That, and continued support for the reverse incremental strategy we now use. If rsync had an integrated SSL support with two way trust (e.g. client and server each authenticating the other against their own specifications of who is to be trusted), that might well do it. Rigging it up with ssh might work, but it might be harder to show that it is correct.

Skaperen · 05-04-2011, 08:45 AM

Quote:

Originally Posted by jadrevenge

3 systems: 1 origin, 2 replicas

every day a script runs on the origin that makes a snapshot on the remote system (using the current date as the snapshot name), takes about 5 seconds.

on that remote server there is now a quickly accessible backup (as a mounted filesystem) of the entire filesystem at the point in time of the snapshot.

that is not enough for us ... even on Raid'ed disks ... we want a remote backup in case of fire/hardware failure.

we run a "zfs send -i <filesystem>@<yesterdays-snapshot> <filesystem>@<snapshot>", to get the incremental snapshot data, and pipe it to a file.

I assume the amount of data would be just the difference between two snapshots. But is it forward (with this difference and the previous full system, you can derive the current full system), or reverse (with this different and the current full system, you can derive the previous full system)?

Can you be selective about what portions of the filesystem are piped to this file? Say you have a big filesystem on a big RAID device, but only part of the data needs to be backed up (because the other data is relatively constant and is more easily restored from where it came)?

Quote:

Originally Posted by jadrevenge

this file is then rsync'd to the remote servers (as a normal user) where a separate script runs looking at the folders (running as root) where the "zfs receive" command runs to restore the snapshot.

the data file is stored (temporarily only) in case we have had trouble with file transfer, or missed a day in file receipt (we cannot guarantee any of our connections to not go down when there is no one on site, or only clumsy people on site, or engineering contractors turning off power to plug in an "oven"

)

to get the data back:

if the origin server is still alive we can get the file directly from the snapshot on that server, instantly because it is mounted.

This appears to be assuming the origin server is large enough to hold the differentials. If not, can you just drop/discard older snapshots and keep, say, the last 3 or 4 of them?

Quote:

Originally Posted by jadrevenge

if the origin server is dead, or we want access to the files at the site where the replica are stored we can get it directly off the mounted snapshots at those sites.

the recovery is instant ... but that is because the snapshots and backups are restored constantly, not at the time that we need to recover a file.

there are a large number of checking scripts and system admin in place to watch for partial rsync'd files, or untransferred data. Scripts and output fortunately not watched by me any more.

when we set up a new site to be replicated a copy of the initial snapshot is taken physically onto removable media (USB Hard Drive) and transferred to our local site, if that physical transfer fails we can always repeat the process with a different device, or if the initial snapshot is small enough rely on transferring it over the VPN, but it hasn't failed yet ...

I hope this goes some way to helping with your understanding of ZFS, or at least with our ZFS backup routine.

Jon

I think I understand most of that, now. But doing this on each of the servers is really not practical here. Using ZFS on the backup servers might be. But I still want a "reverse increment" so I can use a sliding window of discarded differentials (as opposed to the classic backup schemes that required full backups ... done in those days on tape ... periodically).

And I still need to figure out the security issue between the individual servers and the backup servers.

Skaperen · 05-04-2011, 09:46 AM

Basically, my goal is to establish application level secure and two-way trusted communication that is not a login session, and use that for rsync. This is the kind of thing SSL/TLS is for. I think a solution would be a wedge library that intercepts socket communication at the library level to force-implement SSL support in an application, and run rsync under that. I don't know if such a library exists. An alternative is to add SSL support into the rsync source code, along with options or config files to specify the authorizations. Possibly stunnel could accomplish this.

slimm609 · 05-04-2011, 01:20 PM

Quote:

Originally Posted by Skaperen

Binding to the interface is different than binding to an IP address. I've seen no syntax in Apache to specify what interface (though there might be one). Are you saying Apache does an interface binding on its own based on the IP addresses bound to the interface? What if an IP address is on two of four interfaces? Then what would Apache do? Create 2 sockets (since a socket can only bind to one interface)?

why would you run the same IP on 2 different interfaces??

if i have a server that eth0 and eth1 both have 10.10.10.2 would both interfaces even come up? (haven't ever tried because there is no reason to)

if my server (server1) is 10.10.10.2 on eth0 and eth1 and i have another (server2) as 10.10.10.3 on network 1 (eth0 on server1) and another (server3) as 10.10.10.3 on network 2 (eth1 on server1) how do you ssh to server2 vs server3? from a web browser on server1 how do you go to a website on server2 vs server3?

also how do you bind to an interface that is layer 2 with a port that is layer 4? do you just skip layer 3??

jadrevenge · 05-05-2011, 03:57 AM

Quote:

Originally Posted by Skaperen

I assume the amount of data would be just the difference between two snapshots. But is it forward (with this difference and the previous full system, you can derive the current full system), or reverse (with this different and the current full system, you can derive the previous full system)?

when you restore a snapshot you have an image of what the entire file system was like at the time of the snapshot.

if you restore a snapshot you will have all the files at the time the snapshot was taken ... however you cannot just restore an incremental snapshot without the previous snapshot, and you cannot restore the previous one unless somewhere down the line there is an initial snapshot.

The zfs send command ("zfs send -i <filesystem>@<yesterdays-snapshot> <filesystem>@<snapshot>") specifies the previous snapshot you want to increment from.

Quote:

Originally Posted by Skaperen

Can you be selective about what portions of the filesystem are piped to this file? Say you have a big filesystem on a big RAID device, but only part of the data needs to be backed up (because the other data is relatively constant and is more easily restored from where it came)?

yes ... zfs has "pools" which can be entire disks, partitions or in Solaris "slices" of a partition.

within these pools you can create zfs "filesystems" each filesystem can have it's own snapshots, and they can be hierarchical e.g.:

the pool is called "rpool"
the base file system is "rpool" which is mounted on /rpool
there are subdirectories:
rpool/ROOT which is mounted on /
rpool/export which is mounted on /export
rpool/export/home which is mounted on /export/home
rpool/export/home/jadrevenge which is mounted on /export/home/jadrevenge
rpool/export/home/otheruser which is mounted on /export/home/otheruser
rpool/local which is mounted on /rpool/local
rpool/local/user which is mounted on /usr/local
rpool/local/opt which is mounted on /opt

etc...

is i ask it to recursively snapshot rpool/export it will snapshot all the file systems under rpool/export no matter where those file systems are mounted.

i can however choose to only send the snapshot stream on a child system if i wanted to.

Quote:

Originally Posted by Skaperen

This appears to be assuming the origin server is large enough to hold the differentials. If not, can you just drop/discard older snapshots and keep, say, the last 3 or 4 of them?

The snapshots take up no extra space if you only have one snapshot ... and if all you are doing is adding data to the filesystem they will take up almost no space at all.

if you are in the habit of deleting data from the drive constantly, or editing the same large file every day and replacing the entire contents ... (e.g. /tmp of /swap, or creating large numbers of core files etc.) then the snapshots will take up more space.

yes, you can discard snapshots although if you're duplicating to another server make sure you keep the one that matches the last one on the remote server so you can keep restoring.

Quote:

Originally Posted by Skaperen

I think I understand most of that, now. But doing this on each of the servers is really not practical here. Using ZFS on the backup servers might be. But I still want a "reverse increment" so I can use a sliding window of discarded differentials (as opposed to the classic backup schemes that required full backups ... done in those days on tape ... periodically).

And I still need to figure out the security issue between the individual servers and the backup servers.

Our system is not perfect, and definitely won't suite everyone. we use Solaris for our servers, so ZFS came naturally.

We had to stop using tape when we couldn't fit 1 day on a DDS-5 tape. that was a very sad day, years of experience and cost and scripting was suddenly thrown away.

We had to bite the bullet with ZFS; we have/had a Solaris support contract during the early days of ZFS there were issues with USB hard drives, and one almost fatal "zfs restore" when the ZFS system seized taking out / ... but the issues and fixes are long behind us.

When we started doing this it was all done by hand ... with a couple of very basic wrapper scripts made by my non-scripting colleague to get the system working. I tried very hard to get his shell scripts working and comment them and ... In the end the restore script had to be written in perl, string handling is so much easier than sh scripts (don't get me started on csh <shudder>)

we still use his minorly incomprehensible "zfs snapshot" and "zfs send" scripts, but since they only run once a day and can't cause any major damage I just left them alone, but his restore scripts caused problems because he would start off 4 "zfs restore" commands at the same time with cron, my script only allows 1 at a time.

I know this is off topic, I hope some of my information has been correct, and helpful

Skaperen · 05-05-2011, 08:13 AM

Quote:

Originally Posted by slimm609

why would you run the same IP on 2 different interfaces??

Two answers. Redundancy. It happens, anyway.

Quote:

Originally Posted by slimm609

if i have a server that eth0 and eth1 both have 10.10.10.2 would both interfaces even come up? (haven't ever tried because there is no reason to)

Sure. Note that you do not get load balancing. But if every server does this, and every eth0 is on one switch, and every eth1 is on another switch, you now have redundancy.

Even if you don't bind the interfaces, ARP can cross over. If eth0 has 10.10.10.2 and eth1 has 10.10.11.2, and an ARP request arrives on eth0 asking for 10.10.11.2, then the kernel's ARP code will answer for 10.10.11.2 with eth0's MAC address. You can also bind 10.10.12.2 on interface lo and it will still do that for 10.10.12.2 on either eth0 or eth1. But, this has a limit capability. One example is that outgoing connections with an unspecified source address have to get one from the interface they are best routed out via, and if that IP address doesn't work for the destination, you don't gain from it (hopefully the routing table will have things going to the right destinations that properly match the bound addresses ... but I have seen that fail, too).

from linux-2.6.35.9/Documentation/networking/ip-sysctl.txt line 835 ...

Quote:

arp_filter - BOOLEAN
1 - Allows you to have multiple network interfaces on the same
subnet, and have the ARPs for each interface be answered
based on whether or not the kernel would route a packet from
the ARP'd IP out that interface (therefore you must use source
based routing for this to work). In other words it allows control
of which cards (usually 1) will respond to an arp request.

0 - (default) The kernel can respond to arp requests with addresses
from other interfaces. This may seem wrong but it usually makes
sense, because it increases the chance of successful communication.
IP addresses are owned by the complete host on Linux, not by
particular interfaces. Only for more complex setups like load-
balancing, does this behaviour cause problems.

arp_filter for the interface will be enabled if at least one of
conf/{all,interface}/arp_filter is set to TRUE,
it will be disabled otherwise

Quote:

Originally Posted by slimm609

if my server (server1) is 10.10.10.2 on eth0 and eth1 and i have another (server2) as 10.10.10.3 on network 1 (eth0 on server1) and another (server3) as 10.10.10.3 on network 2 (eth1 on server1) how do you ssh to server2 vs server3? from a web browser on server1 how do you go to a website on server2 vs server3?

You have an ambiguous situation. ARP will resolve it by random (whichever gets answered first). IPs should not span servers unless you want the redundancy to span that way. But it can confuse things if ARP flips during a connection since mid-connection packets can go to a machine that doesn't have that connection state. So this is something generally best avoided.

Quote:

Originally Posted by slimm609

also how do you bind to an interface that is layer 2 with a port that is layer 4? do you just skip layer 3??

At the socket programming layer, you can bind the socket to a device, address, and/or port (where the protocol supports ports).

Skaperen · 05-05-2011, 03:44 PM

Quote:

Originally Posted by jadrevenge

when you restore a snapshot you have an image of what the entire file system was like at the time of the snapshot.

if you restore a snapshot you will have all the files at the time the snapshot was taken ... however you cannot just restore an incremental snapshot without the previous snapshot, and you cannot restore the previous one unless somewhere down the line there is an initial snapshot.

So I have to keep all the previous snapshots back to the "initial" (full?) one that was done prior to the snapshot of interest? I can't have a sliding window of keeping the last 60 days worth of snapshots?

Quote:

Originally Posted by jadrevenge

The snapshots take up no extra space if you only have one snapshot ... and if all you are doing is adding data to the filesystem they will take up almost no space at all.

I can see that would be minimal space, since all the data "available now" represents data somewhere along the snapshot history. But if data is deleted, then it has to be saved at least until the last history reference to it (snapshots) is deleted/dropped.

Quote:

Originally Posted by jadrevenge

if you are in the habit of deleting data from the drive constantly, or editing the same large file every day and replacing the entire contents ... (e.g. /tmp of /swap, or creating large numbers of core files etc.) then the snapshots will take up more space.

Yes, there is a lot of that (deleting data, or at least modifying data) happening. It's not done by humans. So of course there is more space taken up. I'd want to ONLY take that space up on the backup server(s), and not on the main server itself.

Quote:

Originally Posted by jadrevenge

yes, you can discard snapshots although if you're duplicating to another server make sure you keep the one that matches the last one on the remote server so you can keep restoring.

How does this work for keeping the origin server lean (no extra copies of deleted data) and making just the backups have incrementals (so I can get to an old copy of a file)?

Quote:

Originally Posted by jadrevenge

We had to stop using tape when we couldn't fit 1 day on a DDS-5 tape. that was a very sad day, years of experience and cost and scripting was suddenly thrown away.

You couldn't use 2 tapes?

FYI, back in an earlier century, I was considering writing a software "RAIT" driver for Linux. I'll see if you can guess what that would do.

I'm sort-of still thinking of writing a driver to emulate tapes on disk. Tape "files" (the ones between tape marks) would be partitions on disk. If the disk were accessed directly without the tape emulation, reading a partition sequentially would be like fast forwarding to that tape file and reading just that one file sequentially (e.g. if in tar format, you could untar straight from the partition). But with tape emulation, it would emulate the semantics of a fixed block tape drive (or possibly a variable block tape drive with block size prefixes stored in the disk drive somewhere ... w/o the clear ability to read the data in non-emulating mode). It would have included a user space program to perform certain tape-like operations w/o the need for the tape emulation driver.

But that idea is way on the far back burner, heat off, and more of an academic exercise idea anymore.

Quote:

Originally Posted by jadrevenge

When we started doing this it was all done by hand ... with a couple of very basic wrapper scripts made by my non-scripting colleague to get the system working. I tried very hard to get his shell scripts working and comment them and ... In the end the restore script had to be written in perl, string handling is so much easier than sh scripts (don't get me started on csh <shudder>)

I made the transition from csh/tcsh over to bash years ago. I first started writing scripts in bash (not ksh), then a couple years after moved my interactive shell to bash. I still find some old leftover scripts still coded in csh, and can hardly remember what to do to fix them if they are broken. I just recode them in bash if there's an issue.

Quote:

Originally Posted by jadrevenge

we still use his minorly incomprehensible "zfs snapshot" and "zfs send" scripts, but since they only run once a day and can't cause any major damage I just left them alone, but his restore scripts caused problems because he would start off 4 "zfs restore" commands at the same time with cron, my script only allows 1 at a time.

His script ran in cron and started 4 commands? Did it wait for them to be done or just background them and exit?

Quote:

Originally Posted by jadrevenge

I know this is off topic, I hope some of my information has been correct, and helpful

It has put ZFS on the radar for me. I think the use of forward increment in the snapshots would make it a difficult transition at this time. Maybe in the future they will add reverse increments to it.

slimm609 · 05-05-2011, 07:49 PM

Quote:

Originally Posted by Skaperen

Two answers. Redundancy. It happens, anyway.

Sure. Note that you do not get load balancing. But if every server does this, and every eth0 is on one switch, and every eth1 is on another switch, you now have redundancy.

Even if you don't bind the interfaces, ARP can cross over. If eth0 has 10.10.10.2 and eth1 has 10.10.11.2, and an ARP request arrives on eth0 asking for 10.10.11.2, then the kernel's ARP code will answer for 10.10.11.2 with eth0's MAC address. You can also bind 10.10.12.2 on interface lo and it will still do that for 10.10.12.2 on either eth0 or eth1. But, this has a limit capability. One example is that outgoing connections with an unspecified source address have to get one from the interface they are best routed out via, and if that IP address doesn't work for the destination, you don't gain from it (hopefully the routing table will have things going to the right destinations that properly match the bound addresses ... but I have seen that fail, too).

from linux-2.6.35.9/Documentation/networking/ip-sysctl.txt line 835 ...

You have an ambiguous situation. ARP will resolve it by random (whichever gets answered first). IPs should not span servers unless you want the redundancy to span that way. But it can confuse things if ARP flips during a connection since mid-connection packets can go to a machine that doesn't have that connection state. So this is something generally best avoided.

thats different addresses in your first example. For redundancy you use ether channels or bonding. The address gets assigned to the bond not the interfaces. The arp_filter in the kernel is for multiple interfaces on the same network with different addresses not for multiple interfaces with the same addresses.

so if you bind SSHD to an interface (layer 2 (mac-address)) and not an IP (layer 3) then do you access the server by mac-address on port 22? No. The OSI model still has to be followed. You bind the Port to an IP which binds to an interface.

jadrevenge · 05-06-2011, 04:30 AM

Quote:

Originally Posted by Skaperen

So I have to keep all the previous snapshots back to the "initial" (full?) one that was done prior to the snapshot of interest? I can't have a sliding window of keeping the last 60 days worth of snapshots?

to restore an incremental file system snapshot file you have to have the snapshot you are incrementing from ... you may only have one snapshot, you may have every snapshot since the dinosaurs.

sliding window is fine, you can keep as many or as few as you want ... and you don't need to keep the initial snapshot (that can be destroyed as well).

Quote:

Originally Posted by Skaperen

I can see that would be minimal space, since all the data "available now" represents data somewhere along the snapshot history. But if data is deleted, then it has to be saved at least until the last history reference to it (snapshots) is deleted/dropped.

Yes, there is a lot of that (deleting data, or at least modifying data) happening. It's not done by humans. So of course there is more space taken up. I'd want to ONLY take that space up on the backup server(s), and not on the main server itself.

you can set up pools and filesystems that mount in different ways to make sure all your log files are stored on a different heirarchy ...

rpool/export/home -> /export/home
rpool/export/home/myuser -> /export/home/myuser
rpool/logs/myuser -> /export/home/myuser/logs

Quote:

Originally Posted by Skaperen

How does this work for keeping the origin server lean (no extra copies of deleted data) and making just the backups have incrementals (so I can get to an old copy of a file)?

We have 1TB on the origin, 2TB on the replicas ... I'd say that our users produce data at an average rate of 500Mb a day (sometimes peaking at 4GB, some days they're not in 12Kb) ...

the snapshots are stored on raided USB at our site, so can be removed easily from the server when they get full and stored in a fireproof safe ... at which point a new set of disks will be created with the most recent snapshot from those USB disks and we have kept a full history of snapshots.

we do pare down the origin server, but that is done manually (Kevin doesn't trust scripts with that operation)

Quote:

Originally Posted by Skaperen

You couldn't use 2 tapes?

We liked the fact that tapes were reliable more than 10 years after the data was produced (yes we did have to prove this) ... however the backup happened via cron, was an incremental ufsdump and failed if there was no operator there to change the tape.

the incremental in question was the week after a level 0 (full dump) was done ... usually it was beginning to fill up at the end of the 10 week cycle ...

when I started we had a DDS-1 tape drive, a server with 2Gb storage and about 12 computers in total ... (1998)

now we have 12 sites, average of 4 servers at each site (2 for storage) ... (this site has, I don't know, about 15) each server has a capacity of 1-2TB max (using probably 500Gb ...

DDS-5 can hold a maximum of 72Gb, we got around 40Gb (our data is pretty much already compressed)

we had a 10 week cycle to full dumps, 4 servers dumping to tape at this site, average 7 "slices" backing up from our Solaris boxes ... assuming one tape per day, 7 * 4 * 10 + space for the full backups (6 tapes per server full backup, keep for 7 years min ...

talking about 500 odd tapes needed to be stored in a fireproof safe(s)/warehouse ...

the current system does have the possibility of hardware failure (which we have hopefully mitigated with UPS'd RAID devices), does have the issue over the devices being out constantly so that they could be subject to fire damage losing all backups (hopefully mitigated by having 2 replicas and one origin physically dislocated) ... but it does have live backups (we can get files back in seconds, proved only a couple of days ago) ... does prove that all our backups are working (snapshots are always restored, so are validated immediately, and not just at a time of crisis) and years of data takes up the same space as about 12 tapes (costing significantly less per Gb as well)

It's not the solution for everyone, but I wouldn't want to go back to tape.

Quote:

Originally Posted by Skaperen

FYI, back in an earlier century, I was considering writing a software "RAIT" driver for Linux. I'll see if you can guess what that would do.

I'm sort-of still thinking of writing a driver to emulate tapes on disk. Tape "files" (the ones between tape marks) would be partitions on disk. If the disk were accessed directly without the tape emulation, reading a partition sequentially would be like fast forwarding to that tape file and reading just that one file sequentially (e.g. if in tar format, you could untar straight from the partition). But with tape emulation, it would emulate the semantics of a fixed block tape drive (or possibly a variable block tape drive with block size prefixes stored in the disk drive somewhere ... w/o the clear ability to read the data in non-emulating mode). It would have included a user space program to perform certain tape-like operations w/o the need for the tape emulation driver.

But that idea is way on the far back burner, heat off, and more of an academic exercise idea anymore.

It'd definitely be a possibility cost per Gb of tape just isn't worth it, you'd be much better off storing on flash if we could prove it's reliability, but hard disk is the cheapest ... you would need to look into either hardware or software raid for it though, which would mean that if you were using 2 disks that were the slightly different in size you'd have trouble.

Hopefully ZFS will continue to develop in the open, either with the code-drops from Oracle, or via the Illumos project ... I know that there is a work-in-progress fuse driver for Linux, and they have a slightly older version in BSD ... it would pretty much solve this issue for you, doing all the checksumming and snapshots and de-duplication/compression etc.

Quote:

Originally Posted by Skaperen

I made the transition from csh/tcsh over to bash years ago. I first started writing scripts in bash (not ksh), then a couple years after moved my interactive shell to bash. I still find some old leftover scripts still coded in csh, and can hardly remember what to do to fix them if they are broken. I just recode them in bash if there's an issue.

His script ran in cron and started 4 commands? Did it wait for them to be done or just background them and exit?

basically the cron called each command, he assumed that it would take 5 mins to restore a snapshot, ran all the crons 5 mins apart, we hadn't "scrub"d the disk recently the restore took 30 mins ... he also hadn't taken into account that the big rsync'd files might take 6 hours to transfer if the Internet was on a "go slow"

4 ZFS restore commands happening over the USB bus simultaneously ...

some days it worked ... when it didn't we had to manually fix the computer because it did an equivalent of the bsod.

He went on holiday leaving me to deal with it ... the first day it died ... the second day it died ... at the end of the second day we had a new perl script with separate configuration file (for portability to our other servers, containing file locations and the names of the servers the file comes from), the cron commands were removed and the file runs 'nohup'd in a constant loop, looking every 30 mins to see if there are new files and then restoring them 1 at a time, logging each file individually (easy to grep nohup.out for errors) ...

he was mad for about 5 minutes complaining that we had messed up his system.

after that he copied the scripts and configuration to every server we doing restores.

Quote:

Originally Posted by Skaperen

It has put ZFS on the radar for me. I think the use of forward increment in the snapshots would make it a difficult transition at this time. Maybe in the future they will add reverse increments to it.

because the increment files don't store files, only bits of files, you cannot use an incremental backup on it's own, like guessing what a person is like from a bit of their hair and a few loose skin fragments ... the only safe way to use it is to have it restored.

if you have it restored, then in the future you will have the historical increments ...

I've probably bored you silly, I should go away

Skaperen · 05-06-2011, 08:12 AM

Quote:

Originally Posted by slimm609

thats different addresses in your first example. For redundancy you use ether channels or bonding. The address gets assigned to the bond not the interfaces. The arp_filter in the kernel is for multiple interfaces on the same network with different addresses not for multiple interfaces with the same addresses.

I don't know what you are referring to by "channels". Bonding isn't load balancing in the case of ethernet because each TCP connection is effectively using only one of the interfaces. It's fine for high bandwidth due to a lot of connections. It doesn't give any one connection any more bandwidth. Bonding also requires the switch be aware of it, and all interfaces be connected to the same switch, which doesn't get you any switching redundancy.

Quote:

Originally Posted by slimm609

so if you bind SSHD to an interface (layer 2 (mac-address)) and not an IP (layer 3) then do you access the server by mac-address on port 22? No. The OSI model still has to be followed. You bind the Port to an IP which binds to an interface.

If you are talking about binding SSHD's socket to an interface, it's done via the device index. At that point, sent packets go out that interface only. And only packets arriving on that interface go to that socket (as long as all else matches, too). If you do that with another interface present, and if a given machine is trying to connect to SSHD here, and is coming in over the other interface, the packet will be discarded. So if you are using redundancy of interfaces, you want to avoid binding sockets to specific interfaces (or else have enough sockets to bind to all the interfaces). That or set arp_filter to 1 so the kernel won't answer ARP on the other interface.

Skaperen · 05-06-2011, 08:55 AM

Quote:

Originally Posted by jadrevenge

to restore an incremental file system snapshot file you have to have the snapshot you are incrementing from ... you may only have one snapshot, you may have every snapshot since the dinosaurs.

sliding window is fine, you can keep as many or as few as you want ... and you don't need to keep the initial snapshot (that can be destroyed as well).

Today is May 6. Suppose you do initial snapshots the first day of each month. Someone comes to you and says they need file "foobar" as it was on March 14. Do you need to have the March 1 initial snapshot? Or would the April 1 be sufficient?

Quote:

Originally Posted by jadrevenge

DDS-5 can hold a maximum of 72Gb, we got around 40Gb (our data is pretty much already compressed)

That's a lot of tapes to back up a 2TB drive. Even the carousel or stacker drives might be exceeded these days. But 2TB drives are under $100, so backing one up to a couple others would be the way to go, whether juggling drives like they were tapes, or having a big backup server where all the drives are spinning or spinnable.

Quote:

Originally Posted by jadrevenge

we had a 10 week cycle to full dumps, 4 servers dumping to tape at this site, average 7 "slices" backing up from our Solaris boxes ... assuming one tape per day, 7 * 4 * 10 + space for the full backups (6 tapes per server full backup, keep for 7 years min ...

And that's all forward increment.

Quote:

Originally Posted by jadrevenge

It'd definitely be a possibility cost per Gb of tape just isn't worth it, you'd be much better off storing on flash if we could prove it's reliability, but hard disk is the cheapest ... you would need to look into either hardware or software raid for it though, which would mean that if you were using 2 disks that were the slightly different in size you'd have trouble.

For bulk backup, I think hard drives are still better than flash drives. But for bootable rescue systems and installation media, flash drives are now the way to go (more reliable than CDs and DVDs).

Quote:

Originally Posted by jadrevenge

Hopefully ZFS will continue to develop in the open, either with the code-drops from Oracle, or via the Illumos project ... I know that there is a work-in-progress fuse driver for Linux, and they have a slightly older version in BSD ... it would pretty much solve this issue for you, doing all the checksumming and snapshots and de-duplication/compression etc.

If it were to get the capability to do reverse increments, it could be the win for me.

Quote:

Originally Posted by jadrevenge

basically the cron called each command, he assumed that it would take 5 mins to restore a snapshot, ran all the crons 5 mins apart, we hadn't "scrub"d the disk recently the restore took 30 mins ... he also hadn't taken into account that the big rsync'd files might take 6 hours to transfer if the Internet was on a "go slow"

4 ZFS restore commands happening over the USB bus simultaneously ...

USB??? If you mean external USB drives are in use ... I've had a very high failure rate for those, running at about 66% in a year or two. And that's multiple brands. OTOH, my two year failure rate for internal SATA drives, even the cheap consumer models, is under 2%.

Quote:

Originally Posted by jadrevenge

because the increment files don't store files, only bits of files, you cannot use an incremental backup on it's own, like guessing what a person is like from a bit of their hair and a few loose skin fragments ... the only safe way to use it is to have it restored.

But, still, a reverse increment is better because you don't need any previous/older increments to restore back to that point. So you can discard every increment earlier than date X, but still restore date X.

Note that rsync's option to backup deleted files is partially a reverse increment. It has file granularity rather than block granularity. It doesn't record addition of new files in a way that lets you know the file didn't exist in the past. So doing a "restore back to a date" with only what rsync gives you, with the intent to restore a whole tree as of that date, can leave you with files that should not be there because they were really added at some latter date.

Instead of using rsync's backup of deleted files, I've implemented my own reverse increment archiving. This is done by using rsync to replicate the origin data to the first backup machine. My program depends on rsync NOT writing into files, but rather, writing new files and doing the rename() operation when it's done. What my program does is keep a hard link replica tree. Every non-directory object exists as hard linked in both trees. The added space is the duplicate directories. An increment run recurses both of these trees in parallel looking for what is different. If a file is still hard linked in both trees, it's unchanged and we move along. If it exists only in the original tree (where rsync puts things), then it's a new file (it is now hard linked to the replica tree, and the fact it exists is recorded in a file that lists new files under this date). If it exists only in the replica tree, it has been deleted, and the replica copy is moved (not copied) to the archive tree for this date (parent directories created only as needed). If both files exist, but are not hard linked, then it is a modified file. The old one is saved and the new one is hard linked.

Restoring involves scanning in reverse from the original tree (on this backup server) and back through all the dated archive increments. The last indicator, whether it is a saved file, or the notation the file was added meaning to not have the file for a restore, prevails. When you reach the desired restore date, stop before scanning that date's increment, and you have the file (or lack thereof) as of that date.

The only "full" copy is the current copy. All the rest are increments. You can delete older increments (that you permanently no longer care for), even the day before, without impact.

This works on file granularity because rsync does. But it may be possible to do this with block level granularity by comparing files in the case where rsync replaced one of the hard linked pair with a new one, and seeing which blocks are different (or appended). That would involve more work when the archive increment is run.

slimm609 · 05-06-2011, 04:53 PM

ether channel is bonding on Cisco switches which also support LACP.

I am done arguing as numerous people have already offered advise but you only seem to disagree with ANYTHING that almost anyone says. If you know how to do what you are trying to do then do it. Don't sit here and argue with people that are only trying to help. Everything I and others have offered are valid solutions for what you are trying to do but for reason X or reason Y you say its no good every single time.

Everything I have suggested has been through a full accreditation numerous times without a single problem. As I said I am done arguing with you.

Skaperen · 05-09-2011, 09:20 AM

Quote:

Originally Posted by slimm609

I am done arguing as numerous people have already offered advise but you only seem to disagree with ANYTHING that almost anyone says. If you know how to do what you are trying to do then do it. Don't sit here and argue with people that are only trying to help. Everything I and others have offered are valid solutions for what you are trying to do but for reason X or reason Y you say its no good every single time.

Everything I have suggested has been through a full accreditation numerous times without a single problem. As I said I am done arguing with you.

I basically posted a question about how to securely get two hosts, one a server to be backed up, the other a backup server, to establish communication with rsync on each end, without either end granting full root rights to the other to accomplish things that need root access. I included a background of what I was doing. But it is not my intention to change those backup mechanisms, or the underlying network infrastructure. I'm only seeking ways to make the communication work securely and reliably between two instances of rsync. The only suggestion I saw that I believe addressed my question was one with a 3 minute window of access. The security background I have tells me that isn't security, and is barely obscurity. That's why I won't be doing that one.

I didn't see any of this as argument.

I guess I will be moving on to do new development.