How to write a Linux Driver, that only forwards file operations? - file-io

I need to implement a Linux Kernel Driver, that (in the first step) only forwards all file operations to another file (in later steps, this should be managed and manipulated, but I don't want to discuss this here).
My idea is the following, but when reading, the kernel crashes:
static struct {
struct file *file;
char *file_name;
int open;
} file_out_data = {
.file_name = "/any_file",
.open = 0,
int memory_open(struct inode *inode, struct file *filp) {
PRINTK("<1>open memory module\n");
* We don't want to talk to two processes at the same time
if (
return -EBUSY;
* Initialize the message
Message_Ptr = Message;
file_out_data.file = filp_open(file_out_data.file_name, filp->f_flags, filp->f_mode); //here should be another return handling in case of fail;
/* Success */
return 0;
int memory_release(struct inode *inode, struct file *filp) {
PRINTK("<1>release memory module\n");
* We're now ready for our next caller
/* Success */
return 0;
ssize_t memory_read(struct file *filp, char *buf,
size_t count, loff_t *f_pos) {
PRINTK("<1>read memory module \n");
ret=file_out_data.file->f_op->read(file_out_data.file,buf,count,f_pos); //corrected one, false one is to find in the history
return ret;
So, can anyone please tell me why?

Don't use set_fs() as there is no reason to do it.
Use file->f_fop->read() instead of the vfs_read. Take a look at the file and file_operations structures.

Why are you incrementing twice and decrementing it once? This could cause you to use file_out_data.file after it has been closed.

You want to write memory in your file ou read?
Because you are reading and not writing...
possible i'm wrong


Can we have dirty data on l1 cache in gpu?

I've read some of the common write policies in the microarchitecture of GPUs. For most of the GPU the written policy is the same as the below picture (the picture is from the gpgpu-sim manual). based on the below picture I have a question. can we have dirty data on the l1 cache?
The L1 on some GPU architectures is a write-back cache for global accesses. Note that this topic varies by GPU architecture, e.g. for whether global activity is cached in L1.
Speaking generally, then, yes you can have dirty data. By this I mean that the data in the L1 cache is modified (compared to what is otherwise in global space or the L2 cache) and it has not yet been "flushed" or updated into the L2 cache. (You can also have "stale" data - data in the L1 that has not been modified, but is not consistent with the L2.)
We can create a simple proof point for this (dirty data).
The following code, when executed on a cc7.0 device (and probably some other archtectures as well) will not give the expected answer of 1024.
This is due to the fact that the L1, which is a separate entity per SM, is not immediately flushed to the L2. It therefore has "dirty data" by the above definition.
(The code is broken for this reason. Don't use this code. It's just a proof point.)
#include <iostream>
#include <cuda_runtime.h>
constexpr int num_blocks = 1024;
constexpr int num_threads = 32;
struct Lock {
int *locked;
Lock() {
int init = 0;
cudaMalloc(&locked, sizeof(int));
cudaMemcpy(locked, &init, sizeof(int), cudaMemcpyHostToDevice);
~Lock() {
if (locked) cudaFree(locked);
locked = NULL;
__device__ __forceinline__ void acquire_lock() {
while (atomicCAS(locked, 0, 1) != 0);
__device__ __forceinline__ void unlock() {
atomicExch(locked, 0);
__global__ void counter(Lock lock, int *total) {
if (threadIdx.x == 1) {
*total = *total + 1;
// __threadfence(); uncomment this line to fix
int main() {
int *total_dev;
cudaMalloc(&total_dev, sizeof(int));
int total_host = 0;
cudaMemcpy(total_dev, &total_host, sizeof(int), cudaMemcpyHostToDevice);
Lock lock;
counter<<<num_blocks, num_threads>>>(lock, total_dev);
cudaMemcpy(&total_host, total_dev, sizeof(int), cudaMemcpyDeviceToHost);
std::cout << total_host << std::endl;
In case there is any further doubt about whether this is a proper proof (e.g. to dispel arguments about things being "optimized into a register" etc.) we can study the resultant sass code. The end of the above kernel has code that looks like this:
/*0130*/ LDG.E.SYS R0, [R4] ; /* 0x0000000004007381 */
// load *total /* 0x000ea400001ee900 */
/*0140*/ IADD3 R7, R0, 0x1, RZ ; /* 0x0000000100077810 */
// add 1 /* 0x004fd00007ffe0ff */
/*0150*/ STG.E.SYS [R4], R7 ; /* 0x0000000704007386 */
// store *total /* 0x000fe8000010e900 */
/*0160*/ ATOMG.E.EXCH.STRONG.GPU PT, RZ, [R2], RZ ; /* 0x000000ff02ff73a8 */
//lock.unlock /* 0x000fe200041f41ff */
/*0170*/ EXIT ;
Since the result register has definitely been stored to the global space, we can infer that if another thread (in another SM) reads an unexpected value in global space for *total it must be due to the fact that the store from another SM has not reached the L2, i.e. has not reached device-wide consistency/coherency. Therefore the data in some other SM is "dirty". We can (presumably) rule out the "stale" case here (the data in the other L1 was written, but I have "old" data in my L1) because the global load indicated above does not happen until the lock is acquired in the SM.
Note that the above code "fails" on cc7.0 devices (and probably some other device architectures). It does not necessarily fail on the GPU you are using. But it is still "broken".

Using f_mount to read and write data to text file

In my Application I need to open, read and write data to a text file using the calls f_open, f_read, and f_write.
It is failing to open the .txt file
res = f_open(&f_header.file, file_path, FA_OPEN_EXISTING | FA_WRITE | FA__WRITTEN | FA_READ | FA_CREATE_NEW );
printf("res value after f open %d \n\r",res);
if (res != FR_OK) {
printf("Failed to open %s, error %d\n\r", file_path, res);
This is giving error:
FR_NOT_ENABLED, /* (12) The volume has no work area */
For solving this error application program needs to perform f_mount function after each media change to force cleared the filesystem object.
How to use f_mount() call in this application to solve this issue?
I'm not clear about the 2nd parameter.
I added this f_mount(&fs0, "0://", 1); to solve this issue.
Before the f_open call. It is not taking f_mount() call also.
res=f_mount(&fs0,"0://", 1);
res = f_open(&f_header.file, file_path, FA_OPEN_EXISTING | FA_WRITE | FA__WRITTEN | FA_READ | FA_CREATE_NEW );
The code is stopping while run time before the f_mount()
Here is the source code for f_mount which I'm using:
FRESULT f_mount (
FATFS* fs, /* Pointer to the file system object (NULL:unmount)*/
const TCHAR* path, /* Logical drive number to be mounted/unmounted */
BYTE opt /* 0:Do not mount (delayed mount), 1:Mount immediately */
FATFS *cfs;
int vol;
const TCHAR *rp = path;
vol = get_ldnumber(&rp);
if (vol < 0) return FR_INVALID_DRIVE;
cfs = FatFs[vol]; /* Pointer to fs object */
if (cfs) {
#if _FS_LOCK
#if _FS_REENTRANT /* Discard sync object of the current volume */
if (!ff_del_syncobj(cfs->sobj)) return FR_INT_ERR;
cfs->fs_type = 0; /* Clear old fs object */
if (fs) {
fs->fs_type = 0; /* Clear new fs object */
#if _FS_REENTRANT /* Create sync object for the new volume */
if (!ff_cre_syncobj((BYTE)vol, &fs->sobj)) return FR_INT_ERR;
FatFs[vol] = fs; /* Register new fs object */
if (!fs || opt != 1) return FR_OK; /* Do not mount now, it will be mounted later */
res = find_volume(&fs, &path, 0); /* Force mounted the volume */
LEAVE_FF(fs, res);
The code is not showing any error/warnings at the time of make file.
I'm sure there is no problem with the code.
There is nothing wrong with the code.
Is this some problem related to the memory allocation or out of memory in emmc. What are the possible reason for this behaviour.
According to
FRESULT f_mount (
FATFS* fs, /* [IN] Filesystem object */
const TCHAR* path, /* [IN] Logical drive number */
BYTE opt /* [IN] Initialization option */
Pointer to the filesystem object to be registered and cleared. Null pointer unregisters the registered filesystem object.
Pointer to the null-terminated string that specifies the logical drive. The string without drive number means the default drive.
Mounting option. 0: Do not mount now (to be mounted on the first access to the volume), 1: Force mounted the volume to check if it is ready to work.
In other words, the second parameter is how you want to refer to this particular filesystem when later working with it.
For example, mounting it like so:
f_mount(&fs0, "0://", 1);
you would then be able to open files like this:
f_open(fp, "0://path/to/file", FA_CREATE_ALWAYS);

C++ Builder Function error [bcc32 - Ambiguity error] inside dll file

I am creating a currency converter Win32 program in Embarcadero C++Builder. I wrote a function for transforming date from format specified on user PC to YYYY-MM-DD format. I need that part because of API settings.
When I have this function inside my project it works fine, but I need to have that function inside a DLL.
This is how my code looks like:
#pragma hdrstop
#pragma argsused
#include <SysUtils.hpp>
extern DELPHI_PACKAGE void __fastcall DecodeDate(const System::TDateTime DateTime, System::Word &Year, System::Word &Month, System::Word &Day);
extern "C" UnicodeString __declspec (dllexport) __stdcall datum(TDateTime dat) {
Word dan, mjesec, godina;
UnicodeString datum, datum_dan, datum_mjesec, datum_godina;
DecodeDate(dat, godina, mjesec, dan);
if (dan<=9 && mjesec<=9) {
if (dan<=9 && mjesec>9) {
if (dan>9 && mjesec<=9) {
if (dan>9 && mjesec>9) {
return datum_godina+"-"+datum_mjesec+"-"+datum_dan;
extern "C" int _libmain(unsigned long reason)
return 1;
I've included SysUtils.hpp and declared DecodeDate() function, without those lines I have a million errors. But with code looking like this, I am getting this error, which I can't get rid of:
[bcc32 Error] File1.cpp(30): E2015 Ambiguity between '_fastcall System::Sysutils::DecodeDate(const System::TDateTime,unsigned short &,unsigned short &,unsigned short &) at c:\program files (x86)\embarcadero\studio\19.0\include\windows\rtl\System.SysUtils.hpp:3466' and '_fastcall DecodeDate(const System::TDateTime,unsigned short &,unsigned short &,unsigned short &) at File1.cpp:25'
Full parser context
File1.cpp(27): parsing: System::UnicodeString __stdcall datum(System::TDateTime)
Can you help me to get rid of that error?
The error message is self-explanatory. You have two functions with the same name in scope, and the compiler doesn't know which one you want to use on line 30 because the parameters you are passing in satisfy both function declarations.
To fix the error, you can change this line:
DecodeDate(dat, godina, mjesec, dan);
To either this:
System::Sysutils::DecodeDate(dat, godina, mjesec, dan);
Or this:
dat.DecodeDate(&godina, &mjesec, &dan);
However, either way, you should get rid of your extern declaration for DecodeDate(), as it doesn't belong in this code at all. You are not implementing DecodeDate() yourself, you are just using the one provided by the RTL. There is already a declaration for DecodeDate() in SysUtils.hpp, which you are #include'ing in your code. That is all the compiler needs.
Just make sure you are linking to the RTL/VCL libraries to resolve the function during the linker stage after compiling. You should have enabled VCL support when you created the DLL project. If you didn't, recreate your project and enable it.
BTW, there is a MUCH easier way to implement your function logic - instead of manually pulling apart the TDateTime and reconstituting its components, just use the SysUtils::FormatDateTime() function or the TDateTime::FormatString() method instead, eg:
UnicodeString __stdcall datum(TDateTime dat)
return FormatDateTime(_D("yyyy'-'mm'-'dd"), dat);
UnicodeString __stdcall datum(TDateTime dat)
return dat.FormatString(_D("yyyy'-'mm'-'dd"));
That being said, this code is still wrong, because it is not safe to pass non-POD types, like UnicodeString, over the DLL boundary like you are doing. You need to re-think your DLL function design to use only interop-safe POD types. In this case, change your function to either:
take a wchar_t* as input from the caller, and just fill in the memory block with the desired characters. Let the caller allocate the actual buffer and pass it in to your DLL for populating:
#pragma hdrstop
#pragma argsused
#include <SysUtils.hpp>
extern "C" __declspec(dllexport) int __stdcall datum(double dat, wchar_t *buffer, int buflen)
UnicodeString s = FormatDateTime(_D("yyyy'-'mm'-'dd"), dat);
if (!buffer) return s.Length() + 1;
StrLCopy(buffer, s.c_str(), buflen-1);
return StrLen(buffer);
extern "C" int _libmain(unsigned long reason)
return 1;
wchar_t buffer[12] = {};
datum(SomeDateValueHere, buffer, 12);
// use buffer as needed...
int len = datum(SomeDateValueHere, NULL, 0);
wchar_t *buffer = new wchar_t[len];
int len = datum(SomeDateValueHere, buffer, len);
// use buffer as needed...
delete[] buffer;
allocate a wchar_t[] buffer to hold the desired characters, and then return a wchar_t* pointer to that buffer to the caller. Then export a second function that the caller can pass the returned wchar_t* back to you so you can free it correctly.
#pragma hdrstop
#pragma argsused
#include <SysUtils.hpp>
extern "C" __declspec(dllexport) wchar_t* __stdcall datum(double dat)
UnicodeString s = FormatDateTime("yyyy'-'mm'-'dd", dat);
wchar_t* buffer = new wchar_t[s.Length()+1];
StrLCopy(buffer, s.c_str(), s.Length());
return buffer;
extern "C" __declspec(dllexport) void __stdcall free_datum(wchar_t *dat)
delete[] dat;
extern "C" int _libmain(unsigned long reason)
return 1;
wchar_t *buffer = datum(SomeDateValueHere);
// use buffer as needed...

/proc/[pid]/cmdline file size

i'm trying to get the filesize of the cmdline file in proc/[pid]. For example porc/1/cmdline. The file is not empty, it contains "/sbin/init". But i get file_size = 0.
int main(int argc, char **argv) {
int file_size;
FILE *file_cmd;
file_cmd = fopen("/proc/1/cmdline", "r");
if(file_cmd == NULL) {
}else {
if(fseek(file_cmd, 0L, SEEK_END)!=0) {
file_size = ftell(file_cmd);
printf("fs: %d\n",file_size);
That's normal. /proc files (most of them, there are a few exceptions) are generated by the kernel at the moment you read from them. That means it's impossible to know the size before reading from the file. Think of it as Quantum Mechanics on files. You won't get a state unless you read the information, but there's no guarantee that reading again will give you the same information twice ;-)
In other words, the EOF is only generated when you try to read it. It's not there before that, so there's no way a file size can be determined.
This is really just communication with the kernel disguised as file I/O.

Time CPU Used by Process

I've managed to implement the code on this listing to get a list of all the processes running and their IDs. What I need now is to extract how much time each process uses the CPU.
I've tried referring to the keys in the code, but when I try to print 'Ticks of CPU Time' I get a zero value for all of the processes. Plus, even if I did get a value I'm not sure if 'Ticks of CPU Time' is exactly what I'm looking for.
struct vmspace *p_vmspace; /* Address space. */
struct sigacts *p_sigacts; /* Signal actions, state (PROC ONLY). */
int p_flag; /* P_* flags. */
char p_stat; /* S* process status. */
pid_t p_pid; /* Process identifier. */
pid_t p_oppid; /* Save parent pid during ptrace. XXX */
int p_dupfd; /* Sideways return value from fdopen. XXX */
/* Mach related */
caddr_t user_stack; /* where user stack was allocated */
void *exit_thread; /* XXX Which thread is exiting? */
int p_debugger; /* allow to debug */
boolean_t sigwait; /* indication to suspend */
/* scheduling */
u_int p_estcpu; /* Time averaged value of p_cpticks. */
int p_cpticks; /* Ticks of cpu time. */
fixpt_t p_pctcpu; /* %cpu for this process during p_swtime */
void *p_wchan; /* Sleep address. */
char *p_wmesg; /* Reason for sleep. */
u_int p_swtime; /* Time swapped in or out. */
u_int p_slptime; /* Time since last blocked. */
struct itimerval p_realtimer; /* Alarm timer. */
struct timeval p_rtime; /* Real time. */
u_quad_t p_uticks; /* Statclock hits in user mode. */
u_quad_t p_sticks; /* Statclock hits in system mode. */
u_quad_t p_iticks; /* Statclock hits processing intr. */
int p_traceflag; /* Kernel trace points. */
struct vnode *p_tracep; /* Trace to vnode. */
int p_siglist; /* DEPRECATED */
struct vnode *p_textvp; /* Vnode of executable. */
int p_holdcnt; /* If non-zero, don't swap. */
sigset_t p_sigmask; /* DEPRECATED. */
sigset_t p_sigignore; /* Signals being ignored. */
sigset_t p_sigcatch; /* Signals being caught by user. */
u_char p_priority; /* Process priority. */
u_char p_usrpri; /* User-priority based on p_cpu and p_nice. */
char p_nice; /* Process "nice" value. */
char p_comm[MAXCOMLEN+1];
struct pgrp *p_pgrp; /* Pointer to process group. */
struct user *p_addr; /* Kernel virtual addr of u-area (PROC ONLY). */
u_short p_xstat; /* Exit status for wait; also stop signal. */
u_short p_acflag; /* Accounting flags. */
struct rusage *p_ru; /* Exit information. XXX */
In fact I've also tried to print Time averaged value of p_cpticks and a few others and never got interesting values. Here is my code which is printing the information retrieved (I got it from :
- (NSDictionary *) getProcessList {
NSMutableDictionary *ProcList = [[NSMutableDictionary alloc] init];
kinfo_proc *mylist;
size_t mycount = 0;
mylist = (kinfo_proc *)malloc(sizeof(kinfo_proc));
GetBSDProcessList(&mylist, &mycount);
printf("There are %d processes.\n", (int)mycount);
NSLog(#" = = = = = = = = = = = = = = =");
int k;
for(k = 0; k < mycount; k++) {
kinfo_proc *proc = NULL;
proc = &mylist[k];
// NSString *processName = [NSString stringWithFormat: #"%s",proc->kp_proc.p_comm];
//[ ProcList setObject: processName forKey: processName ];
// [ ProcList setObject: proc->kp_proc.p_pid forKey: processName];
// printf("ID: %d - NAME: %s\n", proc->kp_proc.p_pid, proc->kp_proc.p_comm);
printf("ID: %d - NAME: %s CPU TIME: %d \n", proc->kp_proc.p_pid, proc->kp_proc.p_comm, proc->kp_proc.p_pid );
// Right click on p_comm and select 'jump to definition' to find other values.
return [ProcList autorelease];
EDIT: I've just offered a bounty for this question. What I'm looking for specifically is to get the amount of time each process spends in CPU.
If, in addition to this, you can give %CPU being used by a process, that would be fantastic.
The code should be optimal in that it will be called every second and the method will be called on all running processes. Objective-C preferable.
Thanks again!
Also, any comments as to why people are ignoring this question would also be helpful :)
Have a look at the Darwin source for libtop.c and particularly the libtop_pinfo_update_cpu_usage() function. Note that:
You'll need a basic understanding of Mach programming fundamentals to make sense of this code, as it uses task ports, etc.
If you want to simply use libtop, you'll have to download the source and compile it yourself.
Your process will need privileges to get at the task ports for other processes.
If all this sounds rather daunting, well… There is a way that uses less esoteric APIs: Just spawn a top process and parse its standard output. A quick glimpse over the top(1) man page turned up this little gem:
$ top -s 1 -l 3600 -stats pid,cpu,time
That is, sample once per second for 3600 seconds (one hour), and output to stdout in log form only the statistics for pid, cpu usage, and time.
Spawning and managing the child top process and then parsing its output are all straightforward Unix programming exercises.
Have you taken a look at the struct rusage? You have listed it and commented as "Exit information" but I know that it contains the resources actually used by a process. Take a look at this page. I remember I used getrusage() for calculating the exact amount of CPU time used in my scientific calculation for my current process, so you just have to know how to query that struct for each process in you list i guess