kernel module development: how to open block devices and use them (open, seek, read).
Linux - KernelThis forum is for all discussion relating to the Linux kernel.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The size of the files I open, I'll take of that later on. What I need to do now is read from the filp I opened, but when I try I get garbage (and I don't even know where it's coming from.
For example, I'm trying with aFile->f_op->read(aFile, buffer, length, &aFile->f_pos). That did't work. Also tried with vfs_read(aFile, buffer, length, position); But that didn't work either.
I saw somewhere (look bellow) some calls to get_fs(), then set_fs(get_ds()) before vfs_read and then restore the old fs caught before but when i tried I hung up the machine (was in kde so didn't see any kernel output before it died.... next time I'll try from a TTY).
Well, well.... the values returned were negative so they were errors from the beginning.
Now, I added the get_fs(); set_ds() stuff and I get to load the first chunk from the very first file, but when it's going to loadthe first chunk from the second file, the computer is hanging on vfs_read(no kernel panic, no seg fault and stack, the computer just stops working after a couple of seconds... can't even send alt+sysrq key combinations).
Why could that be? Could you take a look or even try the module to see what I'm missing?
Provide 4 files from your file system and that should be enough to break it, as it's failing when it starts checking the disk right after inserting the module.
Are you talking about this function?
If yes, then I suggest you explicitly (by printing them) check the values of the variables coloured RED here and also in the else block, I mean check all the array indexes and parameters you pass to the functions, by printing them.
also check where is the control going i.e. in the if statement or the else statement and where it should go!
Code:
static void r5r_loadChunkFromDisk(int diskIndex, unsigned long chunkIndex) {
int i, j;
// read from disk directly
if (loadedChunk[diskIndex] == chunkIndex) return; // chunk is already loaded
if (disks[diskIndex] == NULL) {
// have to calculate it from the other disks
for (i = 0; i < diskNum; i++) {
if (i == diskIndex) continue;
r5r_loadChunkFromDisk(i, chunkIndex);
}
// xor data from all disks
for (i = 0; i < chunkSize; i++) {
for (j = 0; j < diskNum; j++) {
if (j == diskIndex) continue;
Device.data[diskIndex * chunkSize + j] ^= Device.data[j * chunkSize + j];
}
}
}
....
Last edited by Aquarius_Girl; 12-18-2010 at 04:37 AM.
The hang is produced in that function, but not at that point but on the vfs_read. I know it's there because of the printks I placed on the run (and I print the values I'm working with at the time).
Code:
printk(KERN_ERR "R5R Will try to read from disk %d from row %d (offset %d) a piece of %d bytes\n", diskIndex, chunkIndex, chunkIndex * chunkSize, chunkSize);
loff_t t = chunkIndex * chunkSize;
mm_segment_t oldfs = get_fs();
printk(KERN_ERR "R5R Got oldfs\n");
set_fs(KERNEL_DS);
printk(KERN_ERR "R5R set KERNEL_DS\n");
ssize_t bytesRead = vfs_read(disks[diskIndex], Device.data + diskIndex * chunkSize, chunkSize, &t); // here is where the hang happens
printk(KERN_ERR "R5R - result was %d\n", bytesRead);
set_fs(oldfs);
printk(KERN_ERR "R5R reset to oldfs\n");
printk(KERN_ERR "R5R - First char brought from memory is %d\n", Device.data[diskIndex * chunkSize]);
I made a little change in the application to do the ds swapping when opening and closing the filps too. Now the kernel doesn't freeze as it used to do before but it becomes unusable either way: can't type a command or anything.
It's stuck at vfs_read when it reads the second file (read of the first disk is successful). I tried with filp->f_op->read() but had the same result.
Is there an operation I have to do after vfs_read? Perhaps there's some lock involved? Is there any method to see the call stack of the kernel to see where it is getting stuck? Thanks for your help.
printk(KERN_ERR "R5R Will try to read from disk %d from row %d (offset %d) a piece of %d bytes\n", diskIndex, chunkIndex, chunkIndex * chunkSize, chunkSize);
That's how I know the parameters I will use are right.
After having kind of mastered a couple of tricks about the sbull driver (mini block device), I have started working on r5r again. At the moment I'm back to the problem of reading from a file to serve the block request request... and I have found that the module is not behaving the way it should. Let me talk about the problematic part of the code before I show you the full code. The problem is here:
Code:
printk(KERN_ERR "R5R Will try to read from disk %d from row %d (offset %d) a piece of %d bytes\n", diskIndex, chunkIndex, chunkIndex * chunkSize, chunkSize);
loff_t t = chunkIndex * chunkSize;
mm_segment_t oldfs = get_fs();
set_fs(KERNEL_DS);
// is the file in error?
if (IS_ERR(disks[diskIndex])) {
// disk is in error
printk(KERN_NOTICE "R5R Disk %s is in error when trying to read from it\n", diskFiles[diskIndex]);
} else {
ssize_t bytesRead = vfs_read(disks[diskIndex], Device.data + diskIndex * chunkSize, chunkSize, &t);
printk(KERN_ERR "R5R - result was %d\n", bytesRead);
}
set_fs(oldfs);
printk(KERN_ERR "R5R - First byte brought from memory is ord %d\n", Device.data[diskIndex * chunkSize]);
Now what is going on? When I insmod the driver, there is no problem. Data is read from the files correctly (keep in mind the block layer will see if there's a partition table on newly registered device and so on). But then, if you try to read something from the r5r device (I use a dd for it), apparently data is read from the disks as it should, even the requests are finished being served (the request function the driver called in the first place finishes running) but the kernel hangs in an infinite loop afterwards and you don't get to stop it.
I started commenting things out from that block of code and found a couple of behaviors I didn't expect. For example:
If you skip the error checking, vfs_read will return -EFAULT every single time... you have to have the IS_ERR before calling vfs_read. But this is totally crazy to me: if you place the code like this:
you get an -EFAULT every single time. Let me rephrase that. If you put the IS_ERR _without_ an if, it fails.... but if you use it to control the flow, it works (at least for the driver loading round of disk reads). Does that make sense at all?
Then, if you remove error checking and read from disk.... in other words, like this:
It works every single time (driver load and following reads at will).... but then you are not reading anything, are you?
Can anybody tell me what's going on? Why the crazy behavior of IS_ERR/vfs_read? I'll post the code in the following comment so you can take a hard cold look at it.
/*
* R5R (RAID5 Recovery) Module
* Copyright 2010 Edmundo Carmona Antoranz
*
* Released under the terms of GPL v2
*/
#include <linux/module.h> /* needed by all modules */
#include <linux/kernel.h> /* kprint and stuff */
#include <linux/fs.h> /* Block Device Stuff */
#include <linux/vmalloc.h> /* vfree */
#include <linux/blkdev.h>
#include <linux/genhd.h>
#include <linux/hdreg.h>
#include <linux/fs.h>
#define DRIVER_NAME "r5r"
#define DRIVER_AUTHOR "Edmundo Carmona Antoranz <eantoranz@gmail.com>"
#define DRIVER_DESC "A RAID5 Recovery module"
#define R5R_MAXDISKS 32 // top number of disks allowed to handle
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR(DRIVER_AUTHOR);
MODULE_DESCRIPTION(DRIVER_DESC);
#define R5R_ALGORITHM_LEFT_SYNC 1
#define R5R_ALGORITHM_LEFT_ASYNC 2
#define R5R_ALGORITHM_RIGHT_SYNC 3
#define R5R_ALGORITHM_RIGHT_ASYNC 4
#define logical_block_size 512
/*
* Parameters
*/
static int diskNum = 0; // number of disks that make up the array
static int chunkSize = 0; // size of the chunk/stripe
static short algor = 0; // once the algorithm is decoded
static char* algorithm = NULL; // algorithm as received from parameters
static char* diskFiles[R5R_MAXDISKS]; // files that make up the array
static long int diskSize = -1; // raid disk size (-1 for starters)
static int capacity = 0; // in bytes
static struct file* disks[R5R_MAXDISKS]; // real files that make up the disk array
static int loadedChunk[R5R_MAXDISKS]; // save the chunk currently loaded on
// variables
static int majorNumber = 0;
module_param(chunkSize, int, 0);
MODULE_PARM_DESC(chunkSize, "Size of the RAID chunk/stripe (in bytes)");
module_param(algorithm, charp, 0);
MODULE_PARM_DESC(algorithm, "Algorithm to use for RAID. Possible values: ls la ra rs");
module_param_array(diskFiles, charp, &diskNum, 0);
MODULE_PARM_DESC(diskFiles, "Files/devices that make up the RAID device");
// ***************** Structures
// device representation
struct r5r_device
{
spinlock_t lock;
u8 *data;
struct gendisk *gd;
} Device;
// request queue
static struct request_queue *Queue;
// ************** Custom Functions
static void r5r_request(struct request_queue *q); // block layer stuff
static struct block_device_operations r5r_ops; // block layer stuff
// driver initialization
static int r5r_init_driver(void) {
// initialization of the driver
printk(KERN_ERR "R5R - Initializing driver\n");
spin_lock_init(&Device.lock);
Device.data = vmalloc(chunkSize * diskNum); // have memory reserved for a whole chunk/row
if (Device.data == NULL)
return -ENOMEM;
/*
* Get a request queue.
*/
Queue = blk_init_queue(r5r_request, &Device.lock);
if (Queue == NULL)
return -ENOMEM;
blk_queue_logical_block_size(Queue, logical_block_size);
/*
* And the gendisk structure.
*/
Device.gd = alloc_disk(16);
if (!Device.gd)
return -ENOMEM;
Device.gd->major = majorNumber;
Device.gd->first_minor = 0;
Device.gd->fops = &r5r_ops;
Device.gd->private_data = &Device;
strcpy(Device.gd->disk_name, "r5r0");
set_capacity(Device.gd, capacity / logical_block_size); // FIXME make up a valid thing
Device.gd->queue = Queue;
add_disk(Device.gd);
printk(KERN_ERR "R5R - Driver Initialization successful\n");
return 0;
}
// driver uninitialization
static void r5r_uninit_driver(void)
{
del_gendisk(Device.gd);
put_disk(Device.gd);
if (Device.data != NULL) vfree(Device.data);
if (Queue != NULL) blk_cleanup_queue(Queue);
}
/*
* Close opened input files
*/
static void closeOpenFiles(void) {
mm_segment_t oldfs = get_fs();
set_fs(KERNEL_DS);
int i = 0;
for (i = 0; i < diskNum; i++) {
if (disks[i] != NULL) {
// closing a file
filp_close(disks[i], NULL);
printk(KERN_ERR "R5R - File %s closed successfully\n", diskFiles[i]);
}
}
set_fs(oldfs);
}
static void r5r_loadChunkFromDisk(int diskIndex, unsigned long chunkIndex) {
int i, j;
// read from disk directly
if (loadedChunk[diskIndex] == chunkIndex) return; // chunk is already loaded
if (disks[diskIndex] == NULL) {
// have to calculate it from the other disks
for (i = 0; i < diskNum; i++) {
if (i == diskIndex) continue;
r5r_loadChunkFromDisk(i, chunkIndex);
}
// xor data from all disks
for (i = 0; i < chunkSize; i++) {
for (j = 0; j < diskNum; j++) {
if (j == diskIndex) continue;
Device.data[diskIndex * chunkSize + j] ^= Device.data[j * chunkSize + j];
}
}
} else {
printk(KERN_ERR "R5R Will try to read from disk %d from row %d (offset %d) a piece of %d bytes\n", diskIndex, chunkIndex, chunkIndex * chunkSize, chunkSize);
loff_t t = chunkIndex * chunkSize;
mm_segment_t oldfs = get_fs();
set_fs(KERNEL_DS);
// is the file in error?
if (IS_ERR(disks[diskIndex])) {
// disk is in error
printk(KERN_NOTICE "R5R Disk %s is in error when trying to read from it\n", diskFiles[diskIndex]);
} else {
ssize_t bytesRead = vfs_read(disks[diskIndex], Device.data + diskIndex * chunkSize, chunkSize, &t);
printk(KERN_ERR "R5R - result was %d\n", bytesRead);
}
set_fs(oldfs);
printk(KERN_ERR "R5R - First byte brought from memory is ord %d\n", Device.data[diskIndex * chunkSize]);
}
loadedChunk[diskIndex] = chunkIndex;
}
/*
* Load chunk of data from the disk array
* Will return the pointer to the data in the device where this chunk of data was loaded into
*
*/
static u8* r5r_loadChunk(unsigned long index) {
// what disk does it have to be loaded from? First, we calculate the row
unsigned long row = index / (diskNum - 1);
int parityDisk = 0;
if (algor & (R5R_ALGORITHM_LEFT_ASYNC | R5R_ALGORITHM_LEFT_SYNC))
parityDisk = diskNum - row % diskNum - 1;
else
parityDisk = row % diskNum;
// what disk does it have to be loaded from?
unsigned int disk = 0;
if (algor & (R5R_ALGORITHM_LEFT_ASYNC | R5R_ALGORITHM_RIGHT_ASYNC)) {
// async algorithm... all rows start from disk 0
disk = index % (diskNum - 1);
} else {
// sync algorithm.... all rows start from paritydisk + 1
disk = parityDisk + index % (diskNum - 1);
}
if (disk >= parityDisk)
disk += 1;
if (disk >= diskNum)
disk -= diskNum;
// that's the disk....
printk(KERN_ERR "R5R - Will load data from disk %d row %d\n", disk, row);
r5r_loadChunkFromDisk(disk, row);
// return pointer to data
return Device.data + disk * chunkSize;
}
/*
* read data from disks.
* @chunkIndex is the index of the first chunk to read from
* @localOffset is the offset within the chunk to start reading from
* @bytesToRead is the number of bytes that still have to be read
* @buffer is the plae where we are going to place the data
*/
static void r5r_getData(unsigned long chunkIndex, unsigned long localOffset,
unsigned long bytesToRead, u8* buffer) {
// let's load the data from the chunk
// here is where we need the algorithm stuff
u8* diskBuffer;
unsigned long totalBytesRead = 0; // total of bytes read so far
unsigned long bytesRead = 0; // bytes read in every cycle
while (bytesToRead) {
// load the data from disk
diskBuffer = r5r_loadChunk(chunkIndex);
chunkIndex += 1;
// get data to the buffer (up to the end of the chunk)
bytesRead = min(bytesToRead, chunkSize - localOffset);
printk (KERN_NOTICE "R5R memcpy will read %d bytes from position %d of disk chunk and place them at position %d of request buffer\n", bytesRead, localOffset, totalBytesRead);
memcpy(diskBuffer + localOffset, buffer + totalBytesRead, bytesRead);
totalBytesRead += bytesRead;
localOffset = 0; // all following pieces will be read from local offset 0
// how much is left to be read?
bytesToRead -= bytesRead;
}
}
// Block Driver stuff
/*
* Handle an I/O request.
*/
static void r5r_transfer(struct r5r_device *dev, struct request * req) {
unsigned long offset = blk_rq_pos(req) * logical_block_size;
unsigned long nbytes = blk_rq_cur_bytes(req);
printk(KERN_ERR "R5R - Request offset: byte %d size: %d\n", offset, nbytes);
// in what chunk do I have to find that block?
unsigned long chunkIndex = offset / chunkSize;
printk(KERN_ERR "R5R - Chunk Index: %d\n", chunkIndex);
// offset within the chunk
unsigned localOffset = offset % chunkSize;
printk(KERN_ERR "R5R - Local Offset (in chunk): %d\n", localOffset);
r5r_getData(chunkIndex, localOffset, nbytes, req->buffer);
}
static void r5r_printRequestInfo(struct request *req) {
// print stuff about the request
}
static void r5r_request(struct request_queue *q) {
struct request *req;
req = blk_fetch_request(q);
while (req != NULL) {
if (req->cmd_type != REQ_TYPE_FS) {
printk (KERN_NOTICE "Skip non-CMD request\n");
blk_end_request_all(req, -EIO);
continue;
}
printk (KERN_NOTICE "Performing a request on sector %d\n", blk_rq_pos(req));
r5r_transfer(&Device, req);
printk (KERN_NOTICE "Request performed\n");
if ( ! blk_end_request_cur(req, 0) ) {
req = blk_fetch_request(q);
}
}
printk (KERN_NOTICE "No more transfers pending\n");
}
// *********** Module standard functions
/*
* Initialization of module (called on module load)
*/
int init_module()
{
// initialization result
int initResult = 0, i = 0;
bool emptyDisk = false;
// let's check parameters
// number of disks
if (diskNum < 3 || diskNum > R5R_MAXDISKS)
{
printk(KERN_ERR "R5R - Number of disks has to be between 3 and %d. %d provided\n", R5R_MAXDISKS, diskNum);
printk(KERN_ERR "R5R - Module load failed\n");
return -EBADF; // @TODO What values can I use here?
}
// initialize disks variables (just in case)
for (i = 0; i < diskNum; i++) {
disks[i] = NULL;
}
// chunk size
if (chunkSize <= 0)
{
printk(KERN_ERR "R5R - Chunk/stripe size invalid. %d provided\n", chunkSize);
printk(KERN_ERR "R5R - Module load failed\n");
return -EBADF;
}
// algorithm
if (algorithm == NULL)
{
printk(KERN_ERR "R5R - Algorithm was not provided. Possible values: la ls ra rs\n");
printk(KERN_ERR "R5R - Module load failed\n");
return -EBADF;
}
if (strcmp(algorithm, "la") == 0)
{
algor = R5R_ALGORITHM_LEFT_ASYNC;
} else if (strcmp(algorithm, "ls") == 0)
{
algor = R5R_ALGORITHM_LEFT_SYNC;
} else if (strcmp(algorithm, "ra") == 0)
{
algor = R5R_ALGORITHM_RIGHT_ASYNC;
} else if (strcmp(algorithm, "la") == 0)
{
algor = R5R_ALGORITHM_RIGHT_SYNC;
} else
{
printk(KERN_ERR "R5R - Invalid algorithm provided (%s). Possible values: la ls ra rs\n", algorithm);
printk(KERN_ERR "R5R - Module load failed\n");
return -EBADF;
}
// let's try to open the files provided
for (i = 0; i < diskNum; i++) {
loadedChunk[i] = -1; // no chunk loaded at the time
if (strcmp(diskFiles[i], "-") == 0) {
// no file here
if (emptyDisk) {
// there was an empty disk already
printk(KERN_ERR "R5R - Only one missing disk can be defined!\n");
closeOpenFiles();
return -EIO;
}
emptyDisk = true; // already have a missing disk
printk(KERN_ERR "R5R - Added empty file to disk array!\n");
disks[i] = NULL;
continue;
}
mm_segment_t oldfs = get_fs();
set_fs(KERNEL_DS);
disks[i] = filp_open(diskFiles[i], O_RDONLY | O_LARGEFILE, 0);
set_fs(oldfs);
if (!IS_ERR(disks[i])) {
// file successfully open
printk(KERN_ERR "R5R - File %s ready!\n", diskFiles[i]);
if (diskSize == -1)
diskSize = i_size_read(disks[i]->f_dentry->d_inode);
else
diskSize = min(diskSize, i_size_read(disks[i]->f_dentry->d_inode));
} else {
// error when opening file
disks[i]=NULL;
printk(KERN_ERR "R5R - File %s failed to open\n", diskFiles[i]);
// closing all open files
closeOpenFiles();
printk(KERN_ERR "R5R - Module load failed\n");
// return
return -EIO;
}
}
capacity = diskSize * (diskNum - 1);
if (diskSize <= 0) {
// invalid capacity
printk(KERN_ERR "R5R - Invalid capacity. Perhaps one of the files is empty?.\n");
printk(KERN_ERR "R5R - Module load failed\n");
return -EBADF;
}
// let's try to register module
majorNumber = register_blkdev(0, DRIVER_NAME);
if (majorNumber < 0)
{
// error when registering module
printk(KERN_ERR "R5R - Error when registering module\n");
printk(KERN_ERR "R5R - Module load failed\n");
return -EBUSY;
}
// everything is OK
printk(KERN_ERR "R5R - module loaded\n");
printk(KERN_ERR "R5R - Disks: %d\n", diskNum);
printk(KERN_ERR "R5R - Chunk size: %d\n", chunkSize);
printk(KERN_ERR "R5R - Final Capacity: %d MBs\n", capacity / 1024 / 1024);
switch (algor)
{
case R5R_ALGORITHM_LEFT_ASYNC:
printk(KERN_ERR "R5R - Algorithm: Left Asynchronous\n");
break;
case R5R_ALGORITHM_LEFT_SYNC:
printk(KERN_ERR "R5R - Algorithm: Left Synchronous\n");
break;
case R5R_ALGORITHM_RIGHT_ASYNC:
printk(KERN_ERR "R5R - Algorithm: Right Asynchronous\n");
break;
case R5R_ALGORITHM_RIGHT_SYNC:
printk(KERN_ERR "R5R - Algorithm: Right Synchronous\n");
break;
}
printk(KERN_ERR "R5R - Driver major number: %d\n", majorNumber);
// init driver
initResult = r5r_init_driver();
if (initResult < 0) {
// there was an error on the initialization of the module
printk(KERN_ERR "R5R - Error on driver initialization\n");
r5r_uninit_driver();
printk(KERN_ERR "R5R - Module load failed\n");
return -EBUSY;
}
return 0; /* return zero on successful loading */
}
/*
* Module cleanup
* Called when module is unloaded
*/
void cleanup_module()
{
unregister_blkdev(majorNumber, DRIVER_NAME);
closeOpenFiles();
printk(KERN_ERR "R5R module unloaded\n");
}
In case you want to test it, create 3 files, for example, file1.dat, file2.dat and file3.dat... how big? I created them with dd 10 MBs each, then insmod the driver like this:
Is "disks" a static array or a dynamic array?
In both cases again I would say, check its range, for e.g, if you are trying to access the 15th value from an array which can hold only 10 elements, you can get a seg fault or the hanging things.
If it is a dynamic array, check whether you have allocated sufficient memory.
Secondly, I previously faced a situation in a C program where putting a "\n" with every print statement was making the program run absolutely fine but as soon as I removed a single "\n", I received a seg fault. Now here the "\n" was not at all responsible for the weird behaviour, if fact there was some invalid memory access problem and I levied the blame on "\n".
In this case if the function IS_ERR is returning -1 without the conditional statements, IMO that means the problem is somewhere else which is somehow getting hidden by the "if" and exposed with vfs_read.
long holder = IS_ERR (disks[diskIndex])
if (holder == 0)
printk ("\nThank God");
else
printk ("\nTime to cry");
and if you comment out all the reads and error checks and simply try to put in some value and then print the value of disks[diskIndex], does your system hang then too?
EDIT: This reply was written before I could see your above two posts.
1. Can you pint out the function name in which the buggy code is there?
2. Have you checked all the parameters of vfs_read? Does any one of them require a typecast?
Last edited by Aquarius_Girl; 04-21-2011 at 03:09 AM.
I checked the values that IS_ERR returns. It's always 0. 0 before switching fs, after switching fs, after vfs_read and after switch back fs. How about that?
Now, Anisha.... what other approaches could I try to work around this? What technique could I apply so that I can handle this part in userspace? Also, wasn't there an addition to new kernels that is the equivalent of fuse but for block devices?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.