Is it possible to "wake up" linux kernel process from user space without system call? - system

I'm trying to modify a kernel module that manages a special hardware.
The user space process, performs 2 ioctl() system calls per milliseconds to talk with the module. This doesn't meet my real.time requirements because the 2 syscalls sometimes take to long to execute and go out my time slot.
I know that with mmap I could share a memory area, and this is great, but how can I synchronize the data exchange with the module without ioctl() ?

Related

Blocked Processes

As far as I know, some conditions must be validated so that a process continue to run. If they are not confirmed, the processor blocks that process not to waste time. After these conditions are validated, the process enter into ready state.
However, I faced a sentence like this in the book "Modern Operating Systems Andrew Tanenbaum": There are two types of processes which are system processes and user processes. If processor takes a disk interrupt when it executes a user application, the system makes a decision to stop running the current process, and starts to run disk process. In this case, application process is kept in blocked state. After the disk is read or anything is written on the disk, the process waiting for it is unblocked.
I know that a process is blocked in only the situation that a requirement or a condition is not validated. However, I suppose this sentence try to say that disk process has higher precedence, that's why application process is blocked. Is the precedence a factor to block any process ?
What you are describing makes no sense. I have to wonder if this is the result of your quotation.
First of all, the processor does not block processes; the operating system does.
Second, I have not worked on an operating system that works anything like the way you describe here.
Usually, if a disk drive triggers an interrupt, the current process handles that interrupt. While in kernel mode the operating system does whatever queuing is necessary for the disk operation. If the process's time slice is up, only then does the process change. If not, after interrupt handling, the process picks up where it left off before the interrupt.
I cannot imagine a "modern" operating system that invokes a disk process to handle disk interrupts.

LR 12.55/TruClient vusers are stuck in Init state not going to running

I created a TruClient Web (IE) protocol script in LR12.55, when I try to run the script with 50 users, only some would go into running state (in between 25-37) and the rest would stuck in init forever.
I tried to change the Controller -> Options-> Timeout and changed Init timeout from default 180 to 999 however it does not resolve the issue. Can anybody comment on how to resolve this????
TruClient runs a real browser for each vuser (virtual-user), so system resource consumption is higher the API-level testing.
It is possible that 50 vusers is too much for your load-generator machine.
I'd suggest checking CPU and memory levels during the run. If either is over 80% utilization, you should split your load between multiple load-generator machines.
If resources are not fully utilized, the failures should be analyzed to determine the root cause.
To further e-Dough's excellent response, you should expect not to execute these virtual users on the same hardware as the controller. You should expect at least three load generators to be involved, two as primary load and one as a control set. This is in addition to the controller.
Your issue does manifest as the classical, "system out of resources" condition. Consider the same best practices for monitoring your load generator health as you would in monitoring your application under test infrastructure. You want to have monitors for your classical finite resource model components ( CPU, DISK, MEMORY and NETWORK) plus additional sub components, such as a breakout of System and Application under CPU, to understand where and how your system is performing. You want to be able to eliminate false negatives on scalability where your load generators are so unhealthy that they are distorting your test results - Virtual users showing the application is slow when in fact the Virtual Users are slow because the machine in use is resource constrained.

Can VMs on Google Compute detect when they've been migrated?

Is it possible to notify an application running on a Google Compute VM when the VM migrates to different hardware?
I'm a developer for an application (HMMER) that makes heavy use of vector instructions (SSE/AVX/AVX-512). The version I'm working on probes its hardware at startup to determine which vector instructions are available and picks the best set.
We've been looking at running our program on Google Compute and other cloud engines, and one concern is that, if a VM migrates from one physical machine to another while running our program, the new machine might support different instructions, causing our program to either crash or execute more slowly than it could.
Is there a way to notify applications running on a Google Compute VM when the VM migrates? The only relevant information I've found is that you can set a VM to perform a shutdown/reboot sequence when it migrates, which would kill any currently-executing programs but would at least let the user know that they needed to restart the program.
We ensure that your VM instances never live migrate between physical machines in a way that would cause your programs to crash the way you describe.
However, for your use case you probably want to specify a minimum CPU platform version. You can use this to ensure that e.g. your instance has the new Skylake AVX instructions available. See the documentation on Specifying the Minimum CPU Platform for further details.
As per the Live Migration docs:
Live migration does not change any attributes or properties of the VM
itself. The live migration process just transfers a running VM from
one host machine to another. All VM properties and attributes remain
unchanged, including things like internal and external IP addresses,
instance metadata, block storage data and volumes, OS and application
state, network settings, network connections, and so on.
Google does provide few controls to set the instance availability policies which also lets you control aspects of live migration. Here they also mention what you can look for to determine when live migration has taken place.
Live migrate
By default, standard instances are set to live migrate, where Google
Compute Engine automatically migrates your instance away from an
infrastructure maintenance event, and your instance remains running
during the migration. Your instance might experience a short period of
decreased performance, although generally most instances should not
notice any difference. This is ideal for instances that require
constant uptime, and can tolerate a short period of decreased
performance.
When Google Compute Engine migrates your instance, it reports a system
event that is published to the list of zone operations. You can review
this event by performing a gcloud compute operations list --zones ZONE
request or by viewing the list of operations in the Google Cloud
Platform Console, or through an API request. The event will appear
with the following text:
compute.instances.migrateOnHostMaintenance
In addition, you can detect directly on the VM when a maintenance event is about to happen.
Getting Live Migration Notices
The metadata server provides information about an instance's
scheduling options and settings, through the scheduling/
directory and the maintenance-event attribute. You can use these
attributes to learn about a virtual machine instance's scheduling
options, and use this metadata to notify you when a maintenance event
is about to happen through the maintenance-event attribute. By
default, all virtual machine instances are set to live migrate so the
metadata server will receive maintenance event notices before a VM
instance is live migrated. If you opted to have your VM instance
terminated during maintenance, then Compute Engine will automatically
terminate and optionally restart your VM instance if the
automaticRestart attribute is set. To learn more about maintenance
events and instance behavior during the events, read about scheduling
options and settings.
You can learn when a maintenance event will happen by querying the
maintenance-event attribute periodically. The value of this
attribute will change 60 seconds before a maintenance event starts,
giving your application code a way to trigger any tasks you want to
perform prior to a maintenance event, such as backing up data or
updating logs. Compute Engine also offers a sample Python script
to demonstrate how to check for maintenance event notices.
You can use the maintenance-event attribute with the waiting for
updates feature to notify your scripts and applications when a
maintenance event is about to start and end. This lets you automate
any actions that you might want to run before or after the event. The
following Python sample provides an example of how you might implement
these two features together.
You can also choose to terminate and optionally restart your instance.
Terminate and (optionally) restart
If you do not want your instance to live migrate, you can choose to
terminate and optionally restart your instance. With this option,
Google Compute Engine will signal your instance to shut down, wait for
a short period of time for your instance to shut down cleanly,
terminate the instance, and restart it away from the maintenance
event. This option is ideal for instances that demand constant,
maximum performance, and your overall application is built to handle
instance failures or reboots.
Look at the Setting availability policies section for more details on how to configure this.
If you use an instance with a GPU or a preemptible instance be aware that live migration is not supported:
Live migration and GPUs
Instances with GPUs attached cannot be live migrated. They must be set
to terminate and optionally restart. Compute Engine offers a 60 minute
notice before a VM instance with a GPU attached is terminated. To
learn more about these maintenance event notices, read Getting live
migration notices.
To learn more about handling host maintenance with GPUs, read
Handling host maintenance on the GPUs documentation.
Live migration for preemptible instances
You cannot configure a preemptible instances to live migrate. The
maintenance behavior for preemptible instances is always set to
TERMINATE by default, and you cannot change this option. It is also
not possible to set the automatic restart option for preemptible
instances.
As Ramesh mentioned, you can specify the minimum CPU platform to ensure you are only migrated to an instance which has at least the minimum CPU platform you specified. At a high level it looks like:
In summary, when you specify a minimum CPU platform:
Compute Engine always uses the minimum CPU platform where available.
If the minimum CPU platform is not available or the minimum CPU platform is older than the zone default, and a newer CPU platform is
available for the same price, Compute Engine uses the newer platform.
If the minimum CPU platform is not available in the specified zone and there are no newer platforms available without extra cost, the
server returns a 400 error indicating that the CPU is unavailable.

Alternative for batch job scheduling (in compute pool)

Since I don't have root rights on the machines in a compute pool, and thus cannot adapt the load parameters of atd for batch, I'm looking for an alternative way to do job scheduling. Since the machines are used by multiple users, it should be able to take the load into account. Optionally, I'm looking for a way to do this for all the machines it the pool, I.e. there is one central queue with jobs that need to be ran, and a script that distributes them (over ssh) over the machines that are under a certain load. Any ideas?
First: go talk to the system administrators of the compute pool. Enterprise wide job schedulers have become a rather common component in infrastructures these days. Typically, these schedulers do not take into account system load though.
If the above doesn't lead to a good solution, you should carefully consider what load your jobs will impose on the machine: your jobs could be stressing the cpu more, consume large amounts of memory, generate lots of network or disk IO activity. Consequently, determining whether your job should start may depend on a lot of measurement, some of which you would not be able to do as an ordinary user (depends a bit on the kind of OS you are running, and how tight security is). In any case: you would only be able to take into account the load at the job's start up. Obviously, if every user would do this, you're back at square one in no time...
It might be a better idea to see with your system administrator if they have some sort of resource controls in place (e.g. projects in Solaris) through which they can make sure your batches are not tearing down the nodes in the compute pool. Next, write your batch jobs in such a way that they can cope with the OS declining requests for resources.
EDIT: As for the distributed nature: queueing up the jobs and having clients on all nodes point to the same queue, consuming as much as they can in the context of the resource controls...

What is application state?

This is a very general question. I am a bit confused with the term state. I would like to know what do people mean by "state of an application"? Why do they call webserver as "stateless" and database as "stateful"?
How is the state of an application (in a VM) transferred, when the VM memory is moved from one machine to another during live migration.
Is transferring the memory, caches and register values of a system enough to transfer the state of the running application?
You've definitely asked a mouthful -- it's unfortunate that the word state is used in so many different contexts, but each one is a valid use of the word.
State of an application
An application's state is roughly the entire contents of its memory. This can be a difficult concept to get behind until you've seen something like Erlang's server loops, which explicitly pass all the state of the application in a variable from one invocation of the function to the next. In more "normal" programming languages, the "state" of the program is all its global variables, static variables, objects allocated on the heap, objects allocated on the stack, registers, open file descriptors and file offsets, open network sockets and associated kernel buffers, and so forth.
You can actually save that state and resume execution of the process elsewhere. The BLCR checkpoint tools for Linux do exactly this. (Though it is an extremely uncommon task to perform.)
State of a protocol
The state of a protocol is a different sort of meaning -- the statelessness of HTTP requests means that every web browser communication with webservers essentially starts over, from scratch -- every cookie is re-transmitted in both directions to try to "fake" some amount of a "session" for the user's sake. The servers don't hold any resources open for any given client across requests -- each one starts from scratch.
Networked filesystems might also be stateless (earlier versions of NFS) or stateful (newer versions of NFS). The earlier versions assumed every individual packet of reading, writing, or metadata control would be committed as it arrived, and every time a specific byte was needed from a file, it would be re-requested. This allowed the servers to be very simple -- they would do what the client packets told them to do and no effort was required to bring servers and clients back to consistency if a server rebooted or routers disappeared. However, this was bad for performance -- every client requested static data hundreds or thousands of times each day. So newer versions of NFS allowed some amount of data caching on the clients, and persistent file handles between servers and clients, and the servers had to keep track of the state of the clients that were connected -- and vice versa: the clients also had to know what promises they had made to the servers.
A stateful firewall will keep track of active TCP sessions. It knows which sessions the system administrators want to allow through, so it looks for those initial packets specifically. Once the session is set up, it then tracks the established connections as entities in their own rights. (This was a real advancement upon previous stateless firewalls which considered packets in isolation -- the rulesets on previous firewalls were much more permissive to achieve the same levels of functionality, but allowed through far too many malicious packets that pretended a session was already active.)
An application state is simply the state at which an application resides with regards to where in a program is being executed and the memory that is stored for the application. The web is "stateless," meaning everytime you reload a page, no information remains from the previous version of the page. All information must be resent from the server in order to display the page.
Technically, browsers get around the statelessness of the web by utilizing techniques like caching and cookies.
Application state is a data repository available to all classes. Application state is stored in memory on the server and is faster than storing and retrieving information in a database. Unlike session state, which is specific to a single user session, application state applies to all users and sessions. Therefore, application state is a useful place to store small amounts of often-used data that does not change from one user to another.
Resource:http://msdn.microsoft.com/en-us/library/ms178594.aspx
Is transferring the memory, caches and register values of a system enough to transfer the state of the running application?
Does the application have a file open, positioned at byte 225? If so, that file is part of the application's state because the next byte written should go to position 226.
Has the application authenticated itself to a secure server with a time-based key? Then that connection is part of the application's state, because if the application were to be suspended for 24 hours after saving memory, cache, and register values, when it resumes it will no longer have a valid connection to the secure server because it will have timed out.
Things which make an application stateful are easy to overlook.