kernel: how to find all threads from a process's task_struct? - process

Given a task struct for a process or a thread, what's the idiom of iterating through all other threads belonging to the same process?

Linux does not distinguish between a process(task) and a thread. The library calls fork() and pthread_create() use the same system call clone(). The difference between fork() and pthread_create() is the bitmask passed to clone(). This bitmask describes which resources (memory, files, filesystems, signal handler,...). See man clone(2) for the details.
Anyway there is something called a thread group and a special flag to the clone() call which indicates that the new process belongs the the same thread group. This mechanism is normally used to keep together all tasks which are created with clone() specifying CLONE_THREAD in the bitmask.
For this threads there exists the macro while_each_thread in the linux/sched/signal.h include file. It is used like this:
struct task_struct *me = current;
struct task_struct *t = me;
do {
whatever(t);
} while_each_thread(me, t);

Related

Why Set Different Variables to Different Processes after fork()?

I ran into a code that looks like this
int main(void){
pid_t pid;
char sharedVariable='P';
char *ptrSharedVariable=&sharedVariable;
pid = fork()
if(pid==0) {
sharedVariable = 'C';
print("Child Process\n");
printf("Address is %p\n", ptrSharedVariable);
printf("char value is %c\n", sharedVariable);
sleep(5);
} else {
sleep(5);
print("Parent Process\n");
printf("Address is %p\n", ptrSharedVariable);
printf("char value is %c\n", sharedVariable);
}
By what I learned on stack overflow, I can tell that the char value of the parent and child process will be different. The child's value is 'C' and the parent's is 'P'. I also can tell that the address in both parent and child should be the same, which is the address to 'sharedVariable'(&sharedVariable).
However here are my question.
What is the point of assigning different char values to different processses? Because for one thing, since we can already identify each process by pid==0 or >0, wouldn't this step be a redundancy? Another reason is I don't see a point in differentiating two processes that do the same job, can't they work without letting the programmers tell them apart?
Why let the addresses of parent and child stay the same? I can suggest that since they are assumed to proceed on similar tasks, it would be convenient to do so, because then we can just copy and paste code. I am hesitant and want to make sure.
if I replaced fork() with vfork(), would the result of the parent's char value then be 'C'?
Thanks a million in advance.
This question has been answered several times. For example here.
Even though I may repeat what has been already written in several answers, here are some precisions for your 3 points:
The naming of the char variable (sharedVariable) in the code that you shared is confusing because the variable is not shared between a parent process and a child process. The address space of the child is a copy of the address space of the parent. So, here there are two processes (father and child) running concurrently with their own stack where the above variable is located (one in the parent's stack and the other in the child's stack).
The address space of a process is virtual. In each process you will see the same virtual addresses but they run with their proper code and data (i.e. they "points" on different physical memory locations). Optimizations are done in the kernel to share as most resources as possible until they are modified by one of the processes (e.g. Copy On Write principle) but this is transparent from the user space programmer's point of view.
If you use vfork(), the variables are shared because the address spaces are shared between the parent and the child. You don't have a copy as it is done for fork(). The resulting child process is like a co-routine (it is lighter than a thread as even the stack is shared!). It is why the parent process is suspended until the child either exits or executes a new program. The manual warns about the risks of such an operation. Its goal is to execute a process immediately (fast fork()/exec() procedures). It is not dedicated to long living child processes as any call to the GLIBC or any other library service may either fail or trigger corruptions in the parent process. vfork() is a direct call to the system without any added value from the GLIBC. In the case of fork(), the user space libraries do lots of "housekeeping" to make the libraries usable in both parent and child processes (GLIBC's wrapper of fork() and pthread_atfork() callbacks). In the case of vfork(), this "housekeeping" is not done because the child process is supposed to be directly overwritten by another program through an execve() call. This is also the reason why a child spawn through vfork() must not call exit() but _exit() because the child would run any registered atexit() callbacks of the father process which could lead to unexpected crashes in both the child and the father processes.

If it safe to have blocking operation inside flatMap { ... } mapper function?

I'd like to organize a thread barrier: given a single lock object, any thread can obtain it and continue thread's chain further, but any other thread will stay dormant on the same lock object until the first thread finishes and releases the lock.
Let's express my intention in code (log() simply prints string in a log):
val mutex = Semaphore(1) // number of permits is 1
source
.subscribeOn(Schedulers.newThread()) // any unbound scheduler (io, newThread)
.flatMap {
log("#1")
mutex.acquireUninterruptibly()
log("#2")
innerSource
.doOnSubscribe(log("#3"))
.doFinally {
mutex.release()
log("#4")
}
}
.subscribe()
It actually works well, i can see how multiple threads show log "#1" and only one of them propagates further, obtaining lock object mutex, then it releases it and i can see other logs, and next threads comes into play. OK
But sometimes, when pressure is quite high and number of threads is greater, say 4-5, i experience DEADLOCK:
Actually, the thread that has acquired the lock, prints "#1" and "#2" but it then never print "#3" (so doOnSubscribe() not called), so it actually stops and does nothing, not subscribing to innerSource in flatMap. So all threads are blocked and app is not responsive at all.
My question - is it safe to have blocking operation inside flatMap? I dig into flatMap source code and i see the place where it internally subscribes:
if (!isDisposed()) {
o.subscribe(new FlatMapSingleObserver<R>(this, downstream));
}
Is it possible that thread's subscription, that has acquired lock, was disposed somehow?
You can use flatMap second parameter maxConcurrency and set it to 1, so it does what you want without manually locking

boost.asio composed operation run in strand

The code:
In thread 1:
boost::async_read(socket, buffer, strand.wrap(read_handler));
In thread 2:
strand.post([](){socket.async_write_some(buffer, strand.wrap(write_handler))});
It is clear that read_handler, async_write_some, write_handler protected by strand, they will not concurrent. However, async_read is an composed operation, it will call zero or more times to async_read_some, those async_read_some also need protect by strand or else they might concurrent with async_write_some in thread 2.
But from the code, strand only wrap read_handler, how asio make all intermediate operations(async_read_some) also wrapped by the strand?
In short, asio_handler_invoke enables one to customize the invocation of handlers in the context of a different handler. In this case, the object returned from strand.wrap() has a custom asio_handler_invoke strategy associated with it that will dispatch handlers into the strand that wrapped the initial handler. Conceptually, it is as follows:
template <typename Handler>
struct strand_handler
{
void operator()();
Handler handler_;
boost::asio::strand dispatcher_;
};
// Customize invocation of Function within context of custom_handler.
template <typename Function>
void asio_handler_invoke(Function function, strand_handler* context)
{
context->dispatcher_.dispatch(function);
}
strand_handler wrapped_completion_handler = strand.wrap(completion_handler);
using boost::asio::asio_handler_invoke;
asio_handler_invoke(intermediate_handler, &wrapped_completion_handler);
The custom asio_handler_invoke hook is located via argument-dependent lookup. This detail is documented in the Handler requirement:
Causes the function object f to be executed as if by calling f().
The asio_handler_invoke() function is located using argument-dependent lookup. The function boost::asio::asio_handler_invoke() serves as a default if no user-supplied function is available.
For more details on asio_handler_invoke, consider reading this answer.
Be aware that an operation may be attempted within the initiating function. The documentation is specific that intermediate handlers will be invoked within the same context as the final completion handler. Therefore, given:
assert(strand.running_in_this_thread());
boost::async_read(socket, buffer, strand.wrap(read_handler));
the boost::async_read itself must be invoked within the context of strand to be thread-safe. See this answer for more details on thread-safety and strands.

Confusion regarding reentrant functions

My understanding of "reentrant function" is that it's a function that can be interrupted (e.g by an ISR or a recursive call) and later resumed such that the overall output of the function isn't affected in any way by the interruption.
Following is an example of a reentrant function from Wikipedia https://en.wikipedia.org/wiki/Reentrancy_(computing)
int t;
void swap(int *x, int *y)
{
int s;
s = t; // save global variable
t = *x;
*x = *y;
// hardware interrupt might invoke isr() here!
*y = t;
t = s; // restore global variable
}
void isr()
{
int x = 1, y = 2;
swap(&x, &y);
}
I was thinking, what if we modify the ISR like this:
void isr()
{
t=0;
}
And let's say, then, that the main function calls the swap function, but then suddenly an interrupt occurs, then the output would surely get distorted as the swap wouldn't be proper, which in my mind makes this function non-reentrant.
Is my thinking right or wrong? Is there some mistake in my understanding of reentrancy?
The answer to your question:
that the main function calls the swap function, but then suddenly an interrupt occurs, then the output would surely get distorted as the swap wouldn't be proper, which in my mind makes this function non-reentrant.
Is no, it does not, because re-entrancy is (by definition) defined with respect to self. If isr calls swap, the other swap would be safe. However, swap is thread-unsafe, though.
The correct way of thinking depends on the precise definition of re-entrancy and thread-safety (See, say Threadsafe vs re-entrant)
Wikipedia, the source of the code in question, selected the definition of reentrant function to be "if it can be interrupted in the middle of its execution and then safely called again ("re-entered") before its previous invocations complete execution".
I have never heard the term re-entrancy used in the context of interrupt service routines. It is generally the responsibility of the ISR (and/or the operating system) to maintain consistency - application code should not need to know anything about what an interrupt might do.
That a function is re-entrant usually means that it can be called from multiple threads simultaneously - or by itself recursively (either directly or through a more elaborate call chain) - and still maintain internal consistency.
For functions to be re-entrant they must generally avoid using static variables and of course avoid calls to other functions that are not themselves re-entrant.

Is mutex+atomic necessary to make this code thread safe, or is mutex enough?

I have some doubts wether mutexes are enough to ensure thread safety of the following code example or if atomics are required. In short question is: Would making idxActive a regular int make this code thread unsafe? Or is code even with atomics thread unsafe? :(
If it is important, I'm on 32 bit x86, linux, gcc 4.6. Of course I presume that 32 or 64 bit makes no diff, but if there is any diff between 32 and 64 bit I would like to know.
#include <memory>
#include <boost/thread/thread.hpp>
#include <string>
#include <vector>
#include <atomic>
#include <boost/thread/mutex.hpp>
using namespace std;
using namespace boost;
static const int N_DATA=2;
class Logger
{
vector<string> data[N_DATA];
atomic<int> idxActive;
mutex addMutex;
mutex printMutex;
public:
Logger()
{
idxActive=0;
for (auto& elem: data)
elem.reserve(1024);
}
private:
void switchDataUsed()
{
mutex::scoped_lock sl(addMutex);
idxActive.store( (idxActive.load()+1)%N_DATA );
}
public:
void addLog(const string& str)
{
mutex::scoped_lock sl(addMutex);
data[idxActive.load()].push_back(str);
}
void printCurrent()
{
mutex::scoped_lock sl(printMutex);
switchDataUsed();
auto idxOld=(idxActive.load()+N_DATA-1)%N_DATA; //modulo -1
for (auto& elem:data[idxOld])
cout<<elem<<endl;
data[idxOld].clear();
}
};
int main()
{
Logger log;
log.addLog(string("Hi"));
log.addLog(string("world"));
log.printCurrent();
log.addLog(string("Hi"));
log.addLog(string("again"));
log.printCurrent();
return 0;
}
You do not need to use atomic variables if all accesses to those variables are protected by a mutex. This is the case in your code, as all public member functions lock addMutex on entry. Therefore addIndex can be a plain int and everything will still work fine. The mutex locking and unlocking ensures that the correct values become visible to other threads in the right order.
std::atomic<> allows concurrent access outside the protection of a mutex, ensuring that threads see correct values of the variable, even in the face of concurrent modifications. If you stick to the default memory ordering it also ensures that each thread reads the latest value of the variable. std::atomic<> can be used to write thread-safe algorithms without mutexes, but is not required if all accesses are protected by the same mutex.
Important Update:
I just noticed that you're using two mutexes: one for addLog and one for printCurrent. In this case, you do need idxActive to be atomic, because the separate mutexes do not provide any synchronization between them.
atomic is not directly related to thread safety. It just ensures that the operations on it are exactly what it says: atomic. Even if all your operations were atomic your code would not necessarily be thread safe.
In your case, the code should be safe. Only 1 thread at a time can enter printCurrent(). While this function is executed other threads can call addLog() (but also only 1 at a time). Depending whether or not switchCurrent has already been executed those entries will make it into the current log or they won't but none will be entered while iterating over it. Only 1 thread at a time can enter addLog which shares its mutex with switchCurrent so they cannot be executed at the same time.
This will be the case even if you make idxActive a simple int Mh, the C++ memory model only deals with single-threaded code — so I'm not too sure if theoretically it could break it. I think if you make idxActive volatile (basically disallowing any load/store optimization on it at all) that it will be ok for all practical purposes. Alternatively you could remove the mutex from switchCurrent but then you need to keep idxActive atomic.
As improvement I would let switchCurrent return the old index instead of recalculating it.