LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-02-2021, 03:11 PM   #1
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,358
Blog Entries: 3

Rep: Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767
Post Brainstorming distributed publishing


I am looking explore options for distributed publishing. What software is there?

The material to be published is a mixture of text (lots of it), supplemented with images, audio, and video. In other words, it is basic web page material, but heavily text-oriented and numbering many tens of thousands of documents. I have no qualms about leaving HTTP/HTTPS behind if necessary or would Rsync web site mirrors be the best bet? In that case would it help if the documents were static, generated by a static site generator, or dynamic and stored in WordPress? NNTP and IPFS are out for different reasons, IPFS in particular because of its heavy CPU and bandwidth requirements. Gemini looks promising but is not finalized and currently has issues with large files, such as video. How is the redundancy in Ceph and can it tolerate nodes appearing and disappearing?

What else is there? Or, what else could be stitched together?
 
Old 11-03-2021, 08:55 AM   #2
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,627

Rep: Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556

I can't tell if you're asking for a variation on a CMS, a Wiki, Wave, or something else.

Without more clarity, the only thing I can say is Git may be a better choice than Rsync, but may not be as good as something else.

 
Old 11-03-2021, 09:25 AM   #3
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,358

Original Poster
Blog Entries: 3

Rep: Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767
CMS and Wikis are centralized, as far as I know. I'm looking for something where several machines can serve the same documents even when one of the machines is unavailable. The overall availability should continue while one node is down..
 
Old 11-03-2021, 09:56 AM   #4
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,627

Rep: Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556

CMSs and Wikis are not required to be centralized - they gain fault tolerance and availability through redundancy and replication.

https://en.wikipedia.org/wiki/Replication_(computing)

 
Old 11-03-2021, 10:13 AM   #5
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,358

Original Poster
Blog Entries: 3

Rep: Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767
The documents already exist so a Wiki would be out. From what I've seen, a CMS requires a substantial investment in learning and maintaining the software. Since the documents already exist as individual files, I am wondering what file oriented approaches are out there.
 
Old 11-03-2021, 10:40 AM   #6
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS, Manjaro
Posts: 5,763

Rep: Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764
Quote:
Originally Posted by Turbocapitalist View Post
The documents already exist so a Wiki would be out. From what I've seen, a CMS requires a substantial investment in learning and maintaining the software. Since the documents already exist as individual files, I am wondering what file oriented approaches are out there.
Have you considered a torrent node?
The problem would be advertising the files and convincing other nodes to replicate them. Once you accomplish that the files are available with a simple torrent link to anyone with a torrent client, and can come form any or several of the torrent nodes hosting the files. (Including client machines that have downloaded them!).
 
Old 11-03-2021, 11:29 AM   #7
boughtonp
Senior Member
 
Registered: Feb 2007
Location: UK
Distribution: Debian
Posts: 3,627

Rep: Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556Reputation: 2556

Ok, so this is essentially about reliable file download?

Are all the documents already created or will new ones arrive or existing ones be updated or replaced?
And will end-users always want the latest versions, or just the ones at the moment they download?

Torrents might work if they don't change and just want the current file(s), but if you add/remove/change the contents you get a new hash/torrent, and thus fork/divide the seeders each time there's a change.
On the other side of things, a Git-based solution would be better when users want to receive changes without re-downloading everything, however availability through multiple remotes would probably be unwieldy - better to run it atop a distributed file system, which would be what you wanted for non-Git downloads too.

https://en.wikipedia.org/wiki/List_of_file_systems#Distributed_file_systems
and
https://en.wikipedia.org/wiki/Comparison_of_distributed_file_systems#FOSS

I now see the "Ceph" that you mentioned before, but there's no "Gemini" (and a search seems to only bring up a single on-topic result, which is a PDF).

There's at least seven other non-proprietary options with "high availability" marked as yes - I'd start by checking which of those are either already in kernel (e.g. Coda) or supported/available in your distro's repos.

 
Old 11-03-2021, 11:58 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,358

Original Poster
Blog Entries: 3

Rep: Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767
Yes, it's about reliable download and browsing (in the generic sense) of various files. The caches or repositories would have to be updated with new documents frequently, perhaps several times per day, but once in the system the documents do not change.

Torrent would be great, if it were feasible to keep adding documents to the seed.

At this point I am wondering about disseminating the files within a pool of distributed nodes, that is to say the back end for any access system. Above all, I would like to keep it about two or three orders of magnitude simpler than WordPress. What about the Coda File System? Or can HAMMER2 nodes be far apart?

Gemini might be one possible way to access the documents from the outside: https://gemini.circumlunar.space/ but only once the files are already on the nodes.
 
Old 11-03-2021, 04:05 PM   #9
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
Quote:
Originally Posted by Turbocapitalist View Post
CMS and Wikis are centralized, as far as I know. I'm looking for something where several machines can serve the same documents even when one of the machines is unavailable. The overall availability should continue while one node is down..
So you are thinking of something such as a High Availability (HA) server. The tools are there, but I do not know if they are scaled for smaller usage. Many commercial organizations have their main systems in HA config so that if one fails it automatically fails over to the other and the user likely never even is aware. The data is mirrored between systems and the server is fail safe so when one fails the other automatically picks up.

Searching for high availability servers should give you several possibilities. It is not even really hard to set up for those who know how. It does require a minimum of 2 network paths between the servers and data stores.

The simplest would be 2 machines, each fully configured to be identical, where the active machine has constant monitoring by the standby machine and if connection is lost the standby picks up the network address and data services of the (previously) active one. There has to be continuous communication between the two where any data change on the active machine is mirrored on the standby machine immediately.
 
Old 11-03-2021, 07:56 PM   #10
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,369

Rep: Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753Reputation: 2753
By the sounds of it, you could look at LVS with Direct Routing + round robin weighting http://www.linuxvirtualserver.org/how.html at the front end.

Then at least 2 webservers behind that and a NAS/SAN for actual docs behind them.
Obviously use RAID for max uptimes.

Last edited by chrism01; 11-03-2021 at 07:59 PM.
 
Old 11-04-2021, 03:05 AM   #11
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 7,358

Original Poster
Blog Entries: 3

Rep: Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767Reputation: 3767
High Availability but at a relatively small scale sounds about right, but without the heavy overhead of the larger approaches. The burden, especially that of specialized knowledge, required of the system administrator(s) must be as low as possible. It needs to avoid Parkinson's Law and pursue KISS.

The Linux Virtual Sever with Direct Routing looks about right except the method ought to work for nodes on different ISP's networks, sometimes with a noticeable latency between some of them.

I think I might be able to test something with Ceph soon.
 
Old 11-04-2021, 09:13 AM   #12
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS, Manjaro
Posts: 5,763

Rep: Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764Reputation: 2764
So something like a gemini, httpd, or Archie server as a distributed cluster so that if one site/server died the internet presence would still exist?
Hmmm. That should be doable, but the best way would be a team creating cone servers at different sites with a mutual update mechanism, meaning you would all have to agree on formats, software, and both presentation and communication standards.

Torrents (one per file) and a torrent directory might be the easier option, but if you decide to go with some distributed cluster solution I want in on this! Sound like FUN!
 
Old 11-04-2021, 02:34 PM   #13
computersavvy
Senior Member
 
Registered: Aug 2016
Posts: 3,345

Rep: Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484Reputation: 1484
When I worked for IBM the HA was accomplished with HA servers and a SAN data store that both servers accessed.

Each server monitored the other and if the 'master' failed to repond properly in a specified time period the secondary took over everything, including telling the 'master' to go offline totally if needed. They shared a network so there was a 'management' IP and a 'service' IP. The running server used 'service' IPs that were shared with the backup but those IPs were down on the backup server and active on the 'master'. The fail-over took those IPs down on the 'master' and brought them up on the backup so the user never knew the difference.

I can easily envision an NAS file server (or 2, running mirrored) with 2 servers using that data store. The 2 servers could easily be configured with identical services and as has been mentioned could use a round-robin style dns service to share the load or if one failed the remaining could do everything.
 
1 members found this post helpful.
  


Reply

Tags
mirroring, samizdat



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Adventures in Self-Publishing, Chap. 13: The Future of Writing and Publishing LXer Syndicated Linux News 0 03-09-2012 01:40 PM
LXer: Brainstorming ways to push open source LXer Syndicated Linux News 0 06-13-2006 07:54 AM
[brainstorming] Solving the erroneous refresh rates calande Linux - Hardware 2 05-30-2006 07:28 AM
[Perl] brainstorming with pipes & commands patator Programming 3 06-25-2004 04:17 AM
some brainstorming and personal projects poeta_boy General 6 04-27-2004 10:57 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 08:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration