Which is faster, creating processes or Threads? And why? - process

I just wanna understand which is faster , the thread or the process and why is that ?
all the information i get is about the difference of weight

In the overwhelming majority of cases we can assume that the process creating takes much more time than creating a new thread in an existing process. Creating a process requires at least:
Classes loading and verification.*
Linking. *
Classes initialization. *
Static members initialization. *
Follow the link, there you will find a lot of detailed information about the process loading and you will understand that this is a very cumbersome procedure.
А new thread creating as a whole requires only operation system call.

Related

When creating lots of ByteBuddy classes, do I need to acquire locks of any kind?

I am creating several ByteBuddy classes (using DynamicTypeBuilder) and loading them. The creation of these classes and the loading of them happens on a single thread (the main thread; I do not spawn any threads myself nor do I submit anything to an ExecutorService) in a relatively simple sequence.
I have noticed that running this in a unit test several times in a row yields different results. Sometimes the classes are created and loaded fine. Other times I get errors from the generated bytecode when it is subsequently used (often in the general area of where I am using withArgumentArrayElements, if it matters; ArrayIndexOutOfBoundsErrors and the like; again other times this all works fine (with the same inputs)).
This feels like a race condition, but as I said I'm not spawning any threads. Since I am not using threads, only ByteBuddy (or the JDK) could be. I am not sure where that would be. Is there a ByteBuddy synchronization mechanism I should be using when creating and loading classes with DynamicTypeBuilder.make() and getLoaded()? Maybe some kind of class resolution is happening (or not happening!) on a background thread or something at make() time, and I am accidentally somehow preventing it from completing? Maybe if I'm going to use these classes immediately (I am) I need to supply a different TypeResolutionStrategy? I am baffled, as should be clear, and cannot figure out why a single-threaded program with the same inputs should produce generated classes that behave differently from run to run.
My pattern for loading these classes is:
Try to load the (normally non-existent) class using Class#forName(name, true, Thread.currentThread().getContextClassLoader()).
If (when) that fails, create the ByteBuddy-generated class and load it using the usual ByteBuddy recipes.
If that fails, it would be only because some other thread might have created the class already. In this unit test, there is no other thread. In any case, if a failure were to occur here, I repeat step 1 and then throw an exception if the load fails.
Are there any ByteBuddy-specific steps I should be taking in addition or instead of these?
Phew! I think I can chalk this up to a bug in my code (thank goodness). Briefly, what looked like concurrency issues was (most likely) an issue with accidentally shared classnames and HashMap iteration order: when one particular subclass was created-and-then-loaded, the other would simply be loaded (not created) and vice versa. The net effect was effects that looked like those of a race condition.
Byte Buddy is fully thread-safe. But it does attempt to create a class every time you invoke load what is a fairly expensive operation. To avoid this, Byte Buddy offers the TypeCache mechanism that allows you to implement an efficient cache.
Note that libraries like cglib offer automatic caching. Byte Buddy does not do this since the cache uses all inputs as keys and references them statically what can easily create memory leaks. Also, the keys are rather inefficient which is why Byte Buddy chose this approach.

How should I organise a pile of singly used functions?

I am writing a C++ OpenCV-based computer vision program. The basic idea of the program could be described as follows:
Read an image from a camera.
Do some magic to the image.
Display the transformed image.
The implementation of the core logic of the program (step 2) falls into a sequential calling of OpenCV functions for image processing. It is something about 50 function calls. Some temporary image objects are created to store intermediate results, but, apart from that, no additional entities are created. The functions from step 2 are used only once.
I am confused about organising this type of code (which feels more like a script). I used to create several classes for each logical step of the image processing. Say, here I could create 3 classes like ImagePreprocessor, ImageProcessor, and ImagePostprocessor, and split the abovementioned 50 OpenCV calls and temorary images correspondingly between them. But it doesn't feel like a resonable OOP design. The classes would be nothing more than a way to store the function calls.
The main() function would still just create a single object of each class and call thier methods consequently:
image_preprocessor.do_magic(img);
image_processor.do_magic(img);
image_postprocessor.do_magic(img);
Which is, to my impression, essentially the same thing as callling 50 OpenCV functions one by one.
I start to question whether this type of code requiers an OOP design at all. After all, I can simply provide a function do_magic(), or three functions preprocess(), process(), and postprocess(). But, this approach doesn't feel like a good practice as well: it is still just a pile of function calls, separated into a different function.
I wonder, are there some common practices to organise this script-like kind of code? And what would be the way if this code is a part of a large OOP system?
Usually, in Image Processing, you have a pipeline of various Image Processing Modules. Same is applicable on Video Processing, where each Image is processed according to its timestamp order in the video.
Constraints to consider before designing such pipeline:
Order of Execution of these modules is not always same. Thus, the pipeline should be easily configurable.
All modules of the pipeline should be executable in parallel with each other.
Each module of the pipeline may also have a multithreaded operation. (Out of scope of this answer, but is a good idea when a single module becomes the bottleneck for the pipeline).
Each module should easily adhere to the design and have the flexibility of internal implementation changes without affecting other modules.
The benefit of preprocessing of a frame by one module should be available to later modules.
Proposed Design.
Video Pipeline
A video pipeline is a collection of modules. For now, assume module is a class whose process method is called with some data. How each module can be executed will depend on how such modules are stored in VideoPipeline! To further explain, see below two categories:-
Here, let’s say we have modules A, B, and C which always execute in same order. We will discuss the solution with a video of Frame 1, 2 and 3.
a. Linked List: In a single-threaded application, frame 1 is first executed by A, then B and then C. The process is repeated for next frame and so on. So linked list seems like an excellent choice for the single threaded application.
For a multi-threaded application, speed is what matters. So, of course, you would want all your modules running 128-core machine. This is where Pipeline class comes into play. If each Pipeline object runs in a separate thread, the whole application which may have 10 or 20 modules starts running multithreaded. Note that the single-thread/multithread approach can be made configurable
b. Directed Acyclic Graph: Above-linked list implementation can be further improved when you have high processing power and want to reduce the lag between input and response time of pipeline. Such a case is when module C does not depend on B, but on A. In such case, any frame can be parallelly processed by module B and module C using a DAG based implementation. However, I wouldn’t recommend this as the benefits are not so great compared to the increased complexity, as further management of output from module B and C needs to be done by say module D where D depends on B or C or both. The number of scenarios increases.
Thus, for simplicity sake, let’s use LinkedList based design.
Pipeline
Create a linked list of PipelineElement.
Make process method of pipeline call process method of the first element.
PipelineElement
First, the PipelineElement processes the information by calling its ImageProcessor(read below). The PipelineElement will pass a Packet(of all data, read below) to ImageProcessor and receives the updated packet.
If next element is not null, call next PipelineElement process and pass updated packet.
If next element of a PipelineElement is null, stop. This element is special as it has an Observer object. Other PipelineElement will be set to null for Observer field.
FrameReader(VIdeoReader/ImageReader)
For video/image reader, create an abstract class. Whether you process video or image or multiple, processing is done one frame at a time, so create an abstract class(interface) ImageProcessor.
A FrameReader object stores reference to the pipeline.
For each frame, it pushes the information in by calling process method of Pipeline.
ImageProcessor
There is no Pre and Post ImageProcessor. For example, retinex processing is used as Post Processing but some application can use it as PreProcessing. Retinex processing class will implement ImageProcessor. Each element will hold Its ImageProcessor and Next PipeLineElement object.
Observer
A special class which extends PipelineElement and provides a meaningful output using GUI or disk.
Multithreading
1. Make each method run in its thread.
2. Each thread will poll messages from a BlockingQueue( of small size like 2-3 Frames) to act as a buffer between two PipelineElements. Note: The queue helps in averaging the speed of each module. Thus, small jitters(a module taking too long time for a frame) does not affect video output rate and provides smooth playback.
Packet
A packet will store all the information such as input or Configuration class object. This way you can store intermediate calculations as well as observe a real-time effect of changing configuration of an algorithm using a Configuration Manager.
To conclude, each element can now process in parallel. The first element will process nth frame, the second element will process n-1th frame, and soon, but with this, a lot more issues such as pipeline bottlenecks and additional delays due to less core power available to each element will pop up.
This structure lends itself to the pipes and filters architecture (see Pattern-Oriented Software Architecture Volume 1: A System of Patterns by Frank Buschmann):
The Pipes and Filters architectural pattern provides a structure for
systems that process a stream of data. Each processing step is
encapsulated in a filter component. Data is passed through pipes
between adjacent filters. Recombining filters allows you to build
families of related systems.
See also this short description (with images) from the Enterprise Integration Patterns book.

Perfomance in audio processing with objective-c ++

I'm currently developing an audio application and the performance is one of my main concerns.
There are really good articles like Four common mistakes in audio development or Real-time audio programming 101: time waits for nothing.
I understood that the c++ is the way to go for audio processing but I still have a question: Does Objective-C++ slow down the performance?
For example with a code like this
#implementation MyObjectiveC++Class
- (float*) objCMethodWithOnlyC++:(float*) input {
// Process in full c++ code here
}
#end
Will this code be less efficient than the same one in a cpp file?
Bonus question: What will happen if I use GrandCentralDispatch inside this method in order to to parallelize the process?
Calling an obj C method is slower than calling a pure c or c++ method as the obj C runtime is invoked at every call. If it matters in your case dependent on the number of samples processed in each call. If you are only processing one sample at a time, then this might be a problem. If you process a large buffer then I wouldn't worry too much.
The best thing to do is to profile it and then evaluate the results against your requirements for performance.
And for your bonus question the answer is somewhat the same. GCD comes at a cost, and if that cost is larger than what you gain by parallelising it, then it is not worth. so again it depends on the amount of work you plan to do per call.
Regards
Klaus
To simplify, in the end ObjC and C++ code goes through the same compile and optimize chain as C. So the performance characteristics of identical code inside an ObjC or C++ method are identical.
That said, calling ObjC or C++ methods has different performance characteristics. ObjC provides a dynamically modifiable, binary stable ABI with its methods. These additional guarantees (which are great if you are exposing methods via public API from e.g. a framework) come at a slight performance cost.
If you are doing lots of calls to the same method (e.g. per sample or per pixel), that tiny performance penalty may add up. The same applies to ivar access, which carries another load on top of how ivar access would be in C++ (on the modern runtime) to guarantee binary stability.
Similar considerations apply to GCD. GCD parallelizes operations, so you pay a penalty for thread switches (like with regular threads) and for each new block you dispatch. But the actual code in them runs at just the same speed as it would anywhere else. Also, different from starting your own threads, GCD will re-use threads from a pool, so you don't pay the overhead for creating new threads repeatedly.
It's a trade-off. Depending on what your code does and how it does it and how long individual operations take, either approach may be faster. You'll just have to profile and see.
The worst thing you can probably do is do things that don't need to be real-time (like updating your volume meter) in one of the real-time CoreAudio threads, or make calls that block.
PS - In most cases, performance differences will be negligible. So another aspect to focus on would be readability and easy maintenance of your code. E.g. using blocks can make code that has lots of asynchronicity easier to read because you can set up the blocks in the correct order in one method of your source file, making the flow clearer than if you split them across several methods that all just do a tiny thing and then start an asynchronous process.

Need advice regarding VB.Net multithreading options

Good day all,
I'm having a hell of a time figuring out which multithreading approach to utilize in my current work project. Since I've never written a multithreaded app in my life, this is all confusing and very overwhelming. Without further ado, here's my background story:
I've been assigned to take over work on a control application for a piece of test equipment in my companies R&D lab. The program has to be able to send and receive serial communications with three different devices semi-concurrently. The original program was written in VB 6 (no multithreading) and I did plan on just modding it to work with the newer products that need to be tested until it posed a safety hazard when the UI locked up due to excessive serial communications during a test. This resulted in part of the tester hardware blowing up, so I decided to try rewriting the app in VB.Net as I'm more comfortable with it to begin with and because I thought multithreading might help solve this problem.
My plan was to send commands to the other pieces of equipment from the main app thread and spin the receiving ends off into their own threads so that the main thread wouldn't lock up when timing is critical. However, I've yet to come to terms with my options. To add to my problems, I need to display the received communications in separate rich text boxes as they're received while the data from one particular device needs to be parsed by the main program, but only the text that results from the most current test (I need the text box to contain all received data though).
So far, I've investigated delegates, handling the threads myself, and just began looking into BackgroundWorkers. I tried to use delegates earlier today, but couldn't figure out a way to update the text boxes. Would I need to use a call back function to do this since I can't do it in the body of the delegate function itself? The problem I see with handling threads myself is figuring out how to pass data back and forth between the thread and the rest of the program. BackgroundWorkers, as I said, I just started investigating so I'm not sure what to think about them yet.
I should also note that the plan was for the spawned threads to run continuously until somehow triggered to stop. Is this possible with any of the above options? Are there other options I haven't discovered yet?
Sorry for the length and the fact that I seem to ramble disjointed bits of info, but I'm on a tight deadline and stressed out to the point I can't think straight! Any advice/info/links is more than appreciated. I just need help weighing the options so I can pick a direction and move forward. Thanks to everybody who took the time to read this mess!
OK, serial ports, inter-thread comms, display stuff in GUI components like RichTextBox, need to parse incoming data quickly to decode the protocol and fire into a state-machine.
Are all three serial ports going to fire into the same 'processControl' state-machine?
If so, then you should probably do this by assembling event/data objects and queueing them to the state-machine run by one thread,(see BlockingCollection). This is like hugely safer and easier to understand/debug than locking up the state-engine with a mutex.
Define a 'comms' class to hold data and carry it around the system. It should have a 'command' enum so that threads that get one can do the right thing by switching on the enum. An 'Event' member that can be set to whatever is used by the state-engine. A 'bool loadChar(char inChar)' that can have char-by-char data thrown into it and will return 'true' only if a complete, validated protocol-unit has been assembled, checked and parsed into data mambers. A 'string textify()' method that dumps info about the contained data in text form. A general 'status' string to hold text stuff. An 'errorMess' string and Exception member.
You probably get the idea - this comms class can transport anything around the system. It's encapsulated so that a thread can use it's data and methods without reference to any other instance of comms - it does not need any locking. It can be queued to work threads on a Blocking Collection and BeginInvoked to the GUI thread for displaying stuff.
In the serialPort objects, create a comms at startup and load a member with the serialPort instance. and, when the DataReceived event fires, get the data from the args a char at a time and fire into the comms.loadChar(). If the loadChar call returns true, queue the comms instance to the state-machine input BlockingCollection and then immediately create another comms and start loading up the new one with data. Just keep doing that forever - loading up comms instances with chars until they have a validated protocol unit and queueing them to the state-machine. It may be that each serial port has its own protocol - OK, so you may need three comms descendants that override the loadChar to correctly decode their own protocol.
In the state-machine thread, just take() comms objects from the input and do the state-engine thing, using the current state and the Event from the comms object. If the SM action routine decides to display something, BeginInvoke the comms to the GUI thread with the command set to 'displaySomeStuff'. When the GUI thread gets the comms, it can case-switch on the command to decide what to display/whatever.
Anyway, that's how I build all my process-control type apps. Data flows around the system in 'comms' object instances, no comms object is ever operated on by more than one thead at a time. It's all done by message-passing on either BlockingCollection, (or similar), queues or BeginInvoke() if going to the GUI thread.
The only locks are in the queues and so are encapsulated. There are no explicit locks at all. This means there can be no explicit deadlocks at all. I do get headaches, but I don't get lockups.
Oh - don't go near 'Thread.Join()'.

Workflow Foundation 4: Activity Caching Thread Safety?

There are certain places in my code where I invoke an activity using the WorkflowInvoker.Invoke method. I'm having a lot of performance issues because I create an activity every time I need to invoke this.
According this MSDN Blog post, I should cache the activity and run the same activity instance rather than creating a new one.
However, my application is multi-threaded. Would it be safe for many threads to use the same instance of the Activity? According to the MSDN documentation, it says its not thread-safe, but it looks like the standard message for almost all classes.
I suspect that it should be thread-safe, since the data that the activity uses is stored in a separate context (as Variables and Arguments) rather than a normal instance member of the activity class.
I have found no problems with threads sharing the same Activity instance. This makes sense because data is passed into the activity through the context (rather than the properties of the Activity object). Activity caching significantly improves performance.