LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 12-03-2023, 10:01 AM   #1
reb0rn
LQ Newbie
 
Registered: Sep 2023
Posts: 12

Rep: Reputation: 0
File system optimized for very large file parallel HDD read


I have 7 folders each with 4GB files in total 3.5TB they need to be read as fast as possible with 7 nodes at same time, it`s a spacemesh project (the HDD is only used for this and nothing else also files are copied to it so there is no defregmentation and they are at start of disk where its fastest)
My idea is to setup file system or mount option to force kernel to read as big chunks as possible before seeking HDD to next position
So far I tested:
ext4 with cluster size 64M and 128MB and I got similar results so I am not sure if it helped
the total read time was about 9h 30min

That mean HDD still seek a lot, as data are on start on disk, if doing this one folder by one it would finish in ~3h
I am not sure how to tweak xfs and try, any help advised, anything a bit faster would be ok as I need to be read under 12h in total
 
Old 12-03-2023, 10:33 AM   #2
lvm_
Member
 
Registered: Jul 2020
Posts: 984

Rep: Reputation: 348Reputation: 348Reputation: 348Reputation: 348
blockdev --setra <a lot>, maybe? Or you may try different queue schedulers https://access.redhat.com/documentat...disk-scheduler But the best option is to use multiple disks.

Last edited by lvm_; 12-03-2023 at 10:39 AM.
 
1 members found this post helpful.
Old 12-03-2023, 11:39 AM   #3
reb0rn
LQ Newbie
 
Registered: Sep 2023
Posts: 12

Original Poster
Rep: Reputation: 0
sudo blockdev --setra 65536000 /dev/sdc
Looks like its working I set to read all 6 folders and iostat report ~200MB/s speed
 
Old 12-03-2023, 11:59 AM   #4
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,041

Rep: Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348
I don't think it can be. If you allow concurrent access to these files, the head starts to dance. You can't really allocate contiguous space for each file on ext4 (and I don't think there's any file system that guarantees that). By the way, do these files change?
In theory, a whole track should be read at once, but since there is no physical track/head/sector information (only logical), you can't really do it. (you ought to implement a low level disk driver for this).
If you really want to speed it up just use multiple disks or ssd.
It depends on your files, but probably you can compress them, and in that case you need to read less and you can uncompress them on the fly. Or you can use a compressed filesystem like btrfs.
Using xfs you might want to use direct i/o (do not use any cache), but that won't help a lot on a slow disk.
 
Old 12-03-2023, 12:29 PM   #5
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Virginia, USA
Distribution: Debian 12
Posts: 8,346

Rep: Reputation: 552Reputation: 552Reputation: 552Reputation: 552Reputation: 552Reputation: 552
Quote:
Originally Posted by reb0rn View Post
they are at start of disk where its fastest
The disk arm moves from where it currently sitting to the next location accessed. It does not go back to the beginning of the disk between movements. Therefore the fastest place on the disk is the middle of the data. You optimize the disk arm movement by placing the busiest file in the middle of the disk and the least accessed files are at the beginning and the end of the disk.

If you are using a SSD then there is no disk arm and file placement is irrelevant. You should use SSDs. Then the speed
will be limited by the transfer speed between the SSD and memory. If you use multiple SSDs with multiple transfer channels then you will get a faster speed than if you only have one transfer channel.

ext4 doesn't stack files one after the other from the beginning of a disk. So your data is not contiguous and not necessarily with the busiest file in the center. You can come somewhat closer to an optimal file placement by creating a series of small partitions and sort the files out into the partitions depending on how busy the files are.

Last edited by jailbait; 12-03-2023 at 12:56 PM.
 
Old 12-03-2023, 04:17 PM   #6
reb0rn
LQ Newbie
 
Registered: Sep 2023
Posts: 12

Original Poster
Rep: Reputation: 0
For spinning HDD fastest read is at start of HDD the inner circle, I do not need seek speed as I want to prevent it as all data is read in secunatal way, issue is only 6 node read heir group as same time

Yeah I failed to make ext4 read huge part in one go, cluster size had no effect

But blockdev --setra 65536000 /dev/sdc
helped a lot as it forced kernel to read ahead at least 64MB so my total read time from 9h dropped to ~4h which is great as even reading one by one I can not get it under 3.5h
 
Old 12-03-2023, 05:22 PM   #7
wpeckham
LQ Guru
 
Registered: Apr 2010
Location: Continental USA
Distribution: Debian, Ubuntu, RedHat, DSL, Puppy, CentOS, Knoppix, Mint-DE, Sparky, VSIDO, tinycore, Q4OS, Manjaro
Posts: 5,767

Rep: Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765Reputation: 2765
Multiple disks in a raid-6 array would be the fastest option I am aware of. SSD might reach an order of magnitude faster than spinning rust. Without changing the storage system, add memory: you want the greatest possible amount of I/O buffer you can get. If you have the option, tune the buffer scheduler for the highest read-ahead volume you can. You want to load all the tracks into memory spaces at the earliest opportunity so that after those first reads you get ram speed for everything following.
 
Old 12-03-2023, 10:59 PM   #8
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
You can't do parallel I/O on a disk. They get queued and re-ordered by the scheduler as per post #2.

You don't mention how you are testing, and how relevant it might be to the task at hand. I've not looked at spacemesh, but if you're allowing multiple network clients to access (particularly update) a non-network aware filesystem you're likely looking at a broken filesystem in short order.
Given it's ambit, hopefully spacemesh handles all the I/O itself - that means direct I/O, but I didn't see any doco on a quick search.
I'd be inclined to move each clients folder to a separate filesystem on a separate partition. Certainly doesn't solve all the issues a single disk presents, but might ameliorate them somewhat.
 
Old 12-03-2023, 11:09 PM   #9
reb0rn
LQ Newbie
 
Registered: Sep 2023
Posts: 12

Original Poster
Rep: Reputation: 0
I know it does not solve, but I need specific use case, and blockdev --setra 65536000 do work magic, reading 3.5TB from 7 parallel task takes me some 4.5h... which is almost full speed this disk can do, as its specific sequential read just 7 nodes read 7 different parts of hdd, point was to keep seeking at minimal and as I only need a read speed it worked fine

what happened in reality 7 taks ask 7 job, and kernel now read ~64MB till it move to next job... as data is all sequential on disk I lose just a bit at seeking, I might get more with 128MB chunks/block maybe but this is fine, as my need is all 7 folders to be read at specific time under 12h

Last edited by reb0rn; 12-03-2023 at 11:13 PM.
 
Old 12-03-2023, 11:24 PM   #10
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,041

Rep: Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348
Quote:
Originally Posted by reb0rn View Post
I know it does not solve, but I need specific use case, and blockdev --setra 65536000 do work magic, reading 3.5TB from 7 parallel task takes me some 4.5h... which is almost full speed this disk can do, as its specific sequential read just 7 nodes read 7 different parts of hdd, point was to keep seeking at minimal and as I only need a read speed it worked fine

what happened in reality 7 taks ask 7 job, and kernel now read ~64MB till it move to next job... as data is all sequential on disk I lose just a bit at seeking, I might get more with 128MB chunks/block maybe but this is fine, as my need is all 7 folders to be read at specific time under 12h
You repeat yourself and does not answer to questions. The solution depends on the available RAM and other things that we don't know about at all.
 
Old 12-04-2023, 12:24 AM   #11
reb0rn
LQ Newbie
 
Registered: Sep 2023
Posts: 12

Original Poster
Rep: Reputation: 0
What a ram can help in reading 3TB of data at all (other then ram used to read ahead 64MB in my case)?

I said blockdev --setra 65536000 work for me quite fine, I od not need cashing as its only sequential read from point a to end just 7 task at once so kernel seeking in small parts would be issue

look it as this I have 7 movies and I need to play stream them at same time all at once, each movie is 500G... you only need ram to read ahead to minimize seeking from movie 1 to movie 2 etc

Last edited by reb0rn; 12-04-2023 at 12:30 AM.
 
Old 12-04-2023, 01:08 AM   #12
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,041

Rep: Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348
Quote:
Originally Posted by reb0rn View Post
What a ram can help in reading 3TB of data at all (other then ram used to read ahead 64MB in my case)?

I said blockdev --setra 65536000 work for me quite fine, I od not need cashing as its only sequential read from point a to end just 7 task at once so kernel seeking in small parts would be issue

look it as this I have 7 movies and I need to play stream them at same time all at once, each movie is 500G... you only need ram to read ahead to minimize seeking from movie 1 to movie 2 etc
it is not the kernel, but the disk seeking. Kernel and filesystem can only handle logical track/sector info, but they are usually mapped to their physical counterparts, which are different. You cannot avoid seeking. Also using blockdev you will eventually set some kind of caching, which may be useful for the directories you have (to avoid re-reading them), but otherwise pointless, if you really want to read and transfer all the 3.5 TB. And if you want to read all of them in the same time.

Theoretically RAM size matters, for example if you want to read all the files at once, the optimal buffer size that can be allocated to the read process(es) is about the size of ram/8. But you can decide.

Anyway, it looks like it helped, so that's all.
 
Old 12-04-2023, 01:17 AM   #13
lvm_
Member
 
Registered: Jul 2020
Posts: 984

Rep: Reputation: 348Reputation: 348Reputation: 348Reputation: 348
Eh... Actually setra argument is in 512 byte sectors, not bytes, so you set up a 32G buffer, but if it works for you... I think it hits some internal limit on that.

But I don't understand the reaction of others. Yes, you still have to read the data, but if the large read-ahead buffer is enabled, it will be read in larger chunks and so heads will have to be moved from one file to another less frequently - hence the performance increase, logical and expected.
 
Old 12-04-2023, 01:36 AM   #14
reb0rn
LQ Newbie
 
Registered: Sep 2023
Posts: 12

Original Poster
Rep: Reputation: 0
Yeah I presumed as block size is hard coded and even kernel refuse to mount if I change logical size of it... in a way I only wanted to cut down seek to minimum to my specific need, as I in general have no idea how kernel/file system decide to control the HDD when the HDD alone also has its own controller that do his own work
 
Old 12-04-2023, 02:41 AM   #15
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 22,041

Rep: Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348Reputation: 7348
Quote:
Originally Posted by lvm_ View Post
Eh... Actually setra argument is in 512 byte sectors, not bytes, so you set up a 32G buffer, but if it works for you... I think it hits some internal limit on that.

But I don't understand the reaction of others. Yes, you still have to read the data, but if the large read-ahead buffer is enabled, it will be read in larger chunks and so heads will have to be moved from one file to another less frequently - hence the performance increase, logical and expected.
The question is how these files were stored. In general, files are not stored in a huge number of contiguous sectors, therefore using a large read-ahead buffer will not avoid seeking.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How I optimized my drive setup (HDD vs SSD) MirceaKitsune Linux - Hardware 5 04-16-2020 06:40 PM
LXer: Nvidia and ARM: It's a parallel, parallel, parallel world LXer Syndicated Linux News 0 03-21-2013 06:10 PM
ext3 performance -- very large number of files, large filesystems, etc. td3201 Linux - Server 5 11-25-2008 09:28 AM
LXer: This week at LWN: Large pages, large blocks, and large problems LXer Syndicated Linux News 0 09-27-2007 11:40 AM
A very very very very big problem!! Elbryan Linux - Software 2 09-23-2005 05:56 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 06:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration