LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-08-2018, 10:15 AM   #1
DJManuel
LQ Newbie
 
Registered: Oct 2018
Posts: 4

Rep: Reputation: Disabled
Sequential read data off HDD with SSD Cache


Hi Everyone,

I have a Project to complete and it contains almost ONLY sequentially read data (now up to 4TB [Max size for SATA SSD] but in future even bigger) that is stored on disk in just 2 single files (file A and file B). In my usage it is important that I have steady stream of minimum 150MB/s, otherwise the program handling the data will crash (due to "Drives are too slow"). With SSD based solution it works without any problem (they're fast enough).
The general idea of this project is to replace those expensive SSDs with some cheaper HDDs configured as RAID (Probably Storage server with HW Raid controller). Therefore the Throughput of the HDDs should be enough (For example RAID 0 used, it isn't purely about data integrity / data loss).

With the help of some kernel tuning parameters (Read Ahead buffer size) I was able to get the minimum throughput on even just one single Harddrive, so those should not be the bottleneck here.
Basically without the increase of Read Ahead buffer size i was not able to get Throughput to the HDD, with the buffer size increased it works now, but it is still unreliable.
I was searching for some Caching-solutions i could implement...
I did some tests with bcache used as cache, as well as Rapiddisk (yes, that is RAM and not SSD, but in my case i just wanted to test it anyway [I did even tried setting up a RAM-Disk that is used by bcache as caching drive]).
For me it didn't yield any big performance improvements, mainly because of the "Feature" that sequentially read data doesn't get cached.

This is true for almost every caching solution I looked at so far. With Bcache there's even an option on the file size limit to cache, that I tried to disable but it didn't helped either.
Looking even more in to this thematic I basically just found out that it isn't recommended to use an SSD Cache for Sequential read data. I do totally understand and agree with this for the MOST workloads, but not for all.
So basically I was looking for a solution that can cache files in the TB-size dynamically on an SSD (I thought of using some sort of OPTANE DC-grade NVMe SSDs), if needed multiple together.
So write cycles on the SSD is not a problem (If an SSD fails, it just needs to get replaced), for that server I do not care about data loss - I even more don't care about the data loss on the SSD cache drive because of the JUST Sequential Read workload. Bandwith with NVMe should be more than enough.
Either when file is read completely from the software or if someone uses the "STOP" button in the software it can be easily set up to execute any kind of command to clean cache.
The Cache system would just need to fill up the SSD Cache drive, wait some time and replace the already read data on the SSD Cache drive with some new data from the hard drive.

So Basically just a FIFO-Buffer done with some Optane NVMe SSD as Caching disk and some prefetching of data to the Caching disk.

For the future the software would be run multiple times on the same server using the same Data with the same Caching solution. Therefore the Caching solution would need to be able to cache multiple "Data streams".
I have so far looked into solutions like: bcache, lvm cache, dm-cache, Rapiddisk - But all of them have some sort of "Sequential Read Pass through" system - not usable (As mentioned I only did try bcache / rapiddisk and the possibility to "ignore" or disable this limit didn't work on my setup [Or I didn't saw any improvement with my workload]).
I think also that I didn't saw any change was because of the algorithm used to decide on which data gets stored in the Cache and which doesn't - prediction of future read data not working as expected?
Read Ahead helps (increase to 4MB) but when set to a size bigger than about 64MB it is contra productive - probably because of Latency when data gets read that isn't cached.
When I was looking into the Read Ahead incrementalism i didn't saw any big change in Cache hit / miss ration so I think the prediction algorithm doesn't work as I would need it.

Is it possible to improve the read ahead buffer to cache my workload better (for now I would be satisfied by using RAM instead of SSD -> RAM could be expanded in Server)?
Is there a "magic" combination of solutions / kernel flags that work together to be able to cache my workload?
Are there any kind of Caching solution that can handle this kind of workload and / or are made for sequential read?

I'm using Linux kernel 4.8.0-53-generic

I appreciate every suggestion,
Thanks, Remo

Last edited by DJManuel; 10-11-2018 at 06:54 AM.
 
Old 10-10-2018, 06:13 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,150

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Quote:
Originally Posted by DJManuel View Post
... contains almost ONLY sequentially read data
Hmmm - would that be like "almost pregnant" ?. Either it is sequential, or it ain't. Which is it ?. And are the data actually sequential on disk - consecutive sector numbers ?.
How are the two files you mentioned accessed - read one completely then the next, or are the reads inter-leaved ?.
Quote:
With the help of some kernel tuning parameters (Read Ahead buffer size) I was able to get the minimum throughput on even just one single Harddrive, so those should not be the bottleneck here.
That's a pretty big claim on minimal evidence.
Quote:
Basically without the increase of Read Ahead buffer size i was not able to get Throughput to the HDD, with the buffer size increased it works now, but it is still unreliable.
I was searching for some Caching-solutions i could implement...
Presuming a sequential read load, there is no prospect of increasing the ability of the disk(s) to deliver the data faster by using an out-board cache. Hence the tools disabling themselves in that scenario.
You want faster I/O, use better hardware - as per your SSDs or hopefully the RAID card with it's own caching infrastructure.

My suggestion would be to concentrate on ensuring the data (on HDD) is friendly to read-ahead, and looking at tuning the I/O scheduler in need although that will be specific to the environment; modern SSDs on current kernels will use multi-queue which might help with the 2 files if concurrent access is being used, real hard disks don't that I know of.
So I'd give up on chasing the cache solution and just allow the page cache to do it's job.
 
Old 10-10-2018, 10:29 PM   #3
rknichols
Senior Member
 
Registered: Aug 2009
Distribution: Rocky Linux
Posts: 4,784

Rep: Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214Reputation: 2214
Also, keep in mind that writes to an SSD are fast only until the drive has used up its pool of erased blocks. Then, the controller is forced to perform actual erase cycles, and write speed drops dramatically. That "... and replace the already read data on the SSD Cache drive with some new data from the hard drive" action is not going to be as speedy as you might think.
 
Old 10-11-2018, 07:35 AM   #4
DJManuel
LQ Newbie
 
Registered: Oct 2018
Posts: 4

Original Poster
Rep: Reputation: Disabled
Thanks for the posts so far,

I just noticed I had "150Mbit/s" instead of "150MB/s" in my first post, corrected it now.

@syg00
Basically my workload is Sequential, with "almost" I meant that some config files get loaded at the beginning from the HDD, they might not be read sequentially, but at the end it doesn't really matter about those small files.

The 2 files are getting read at the same time (one file for one thread), therefore the read arm of the disk probably needs to jump from one to the other file to read the data. I don't know the exact layout of the disks, are there some tools that can display this relatively easy? How can I rearrange the data to be fully sequential?
I could put those files on 2 different drives to read from (there the data should be sequential), but the problem with that is that the files "get out of sync" - yes the need to be read at the same time (2 Threds that aren't synchronized).

I didn't mentioned that I'm at the moment in a kind of "embedded" server system - the only IO I can use is USB3.0 and ESata, therefore I have tried several external USB3 (3.0, 3.1G1, 3.1G2) Raid enclosure, as well as Esata but most of them don't provide any buffer in the controller itself. Basically I'm limited to theoretical 6Gbit Esata.
With the system I have now i need to do all the caching on the Linux side and can't do it on Hardware (Raid controller).
For the future I will defiantly have the system using a server instead of the "embedded" therefore giving me more access to internal IO (PCIE, SAS, etc.).
Yes, I would use better hardware if I could, but it isn't that easy (Hardware dependencies of the "embedded" server system), I'll have to wait until the "normal" server comes out.

Some of those enclosures do work (with Readahead increased), just as I mentioned with the single drive.
I've tried so many configuration for these Raid boxes - but only got limited success (Up to 8-Bay it works mostly fine, either using esata or USB3.1G1) - but for the future (and what I'm concerned about) is the usage of a server as the base of the system defiantly needed.

SSD is not an option (except for caching) because of the cost - or where should I have money to build a multi petabyte storage - server system filled with SSD, only to store / read those files with the program just to have enough throughput to the drives?
At some point I'm even thinking of using a tape drive to store and read those files off - those tape drives are fast enough today therefore it would be possible.
Filling the drive up with some new files (daily multiple TB of just those kind of "A" and "B" files) would be another problem but at the moment I'm totally not concerned about that.

@rknichols

Yes I know that replacing a cell does go longer that just writing it, but when using OPTANE there is more than enough bandwidth to clear the used cells that they are therefore ready to be written fast again (if that can be done?). Anyway after a complete read of the "A" and "B" file the SSD cache can be cleared totally for that part that got used for caching (if needed even Reformat would be OK). I don't know if it would make any difference in that sense to use SLC or MLC SSDs, but I'm assuming anyway that the OPTANE will be fast enough for my workload.

Thank you so far, I'll hope for some more replies
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Install Linux in Toshiba u940 series with 32gb ssd and 750gb hdd I would like to have boot code in ssd only cvkchary Linux - Laptop and Netbook 4 08-31-2016 04:26 PM
Problem with HDD Password - How to read out my data? Ratamahatta General 10 03-11-2016 05:14 AM
What r the functions used to read data from the all the sectors of hdd nutanj Programming 2 10-30-2006 11:03 AM
Read data from ntfs hdd jreniel Linux - Hardware 3 06-06-2006 09:42 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 04:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration