difference between instruction level parallelism and parallel processing - process

I do not understand the difference between instruction level parallelism and parallel processing. please help. it would be helpful if anyone gives some example.

ILP
1 . Overlap individual machine operations
(add, mul, load…) so that they execute
in parallel
2 . Transparent to the user
3 . Goal: speed up execution
Parallel Processing
Having separate processors getting separate chunks of the program ( processors programmed to do so)
Nontransparent to the user
Goal: speed up and quality up

Related

Degree of Parallelism in oracle index affects the DB load

I have 4 DB RAC's in a server.
Executing a query on a particular index causes huge load and affects the other RAC's as well. PFB the degree of the index,
DEGREE : 10
INSTANCES : 1
Will decreasing the degree of this index fix the problem? Please advice!
Without having all the details about your case, my guess would be that you have AMM (Automatic Memory Management) disabled. With AMM disabled, parallel queries cause stress on your Shared Pool which can noticeably stall other queries, whereas the amount of stress grows exponentially (example: parallel 2 might be fine while parallel 90 might stall all nodes for 10 seconds each time the parallel 90 query is started). So based on the limited knowledge of your problem, my first two suggestions would be to check whether you have AMM on. If not, either switch it on, or try to reduce the parallelism of your queries to e.g. 4. PS: Parallelism of statistic gathering is sometimes overlooked as a source of parallel queries in great numbers with high parallelism. HTH

Alternating Generations GA? To keep evaluators running

What I want?
I Want to be able to generate population for generation N+2, from ranked population of generation N; while the generation N+1 is still in evaluation. Once the generation N+1 finishes evaluation it will be used to generate generation N+3 (and at that point generation N+2 evaluation will be already running).
Why do I want that?
To better utilize computing resources. My evaluation threads can then run continuously with minimum synchronization and waiting. For a fast evaluation and small generations a single generation can be finished in a very quick time and evaluation thread would then need to wait for next generation; instead it can already work on something else.
Alternative thought
Alternative is to have 2 independent GAs run and generate generations while taking rounds in access to evaluation threads.
This is utilizing resources better, however it's obviously not allways what we want to do (digging two holes instead digging a single hole doesn't allways mean I can find oil faster)

Optimizing write performance of a 3 Node 8 Core/16G Cassandra cluster

We have setup a 3 node performance cluster with 16G RAM and 8 Cores each. Our use case is to write 1 million rows to a single table with 101 columns which is currently taking 57-58 mins for the write operation. What should be our first steps towards optimizing the write performance on our cluster?
The first thing I would do is look at the application that is performing the writes:
What language is the application written in and what driver is it using? Some drivers can offer better inherent performance than others. i.e. Python, Ruby, and Node.js drivers may only make use of one thread, so running multiple instances of your application (1 per core) may be something to consider. Your question is tagged 'spark-cassandra-connector' so that possibly indicates your are using that, which uses the datastax java driver, which should perform well as a single instance.
Are your writes asynchronous or are you writing data one at a time? How many writes does it execute concurrently? Too many concurrent writes could cause pressure in Cassandra, but not very many concurrent writes could reduce throughput. If you are using the spark connector are you using saveToCassandra/saveAsCassandraTable or something else?
Are you using batching? If you are, how many rows are you inserting/updating per batch? Too many rows could put a lot of pressure on cassandra. Additionally, are all of your inserts/updates going to the same partition within a batch? If they aren't in the same partition, you should consider batching them up.
Spark Connector Specific: You can tune the write settings, like batch size, batch level (i.e. by partition or by replica set), write throughput in mb per core, etc. You can see all these settings here.
The second thing I would look at is look at metrics on the cassandra side on each individual node.
What does the garbage collection metrics look like? You can enable GC logs by uncommenting lines in conf/cassandra-env.sh (As shown here). Are Your Garbage Collection Logs Speaking to You?. You may need to tune your GC settings, if you are using an 8GB heap the defaults are usually pretty good.
Do your cpu and disk utilization indicate that your systems are under heavy load? Your hardware or configuration could be constraining your capability Selecting hardware for enterprise implementations
Commands like nodetool cfhistograms and nodetool proxyhistograms will help you understand how long your requests are taking (proxyhistograms) and cfhistograms (latencies in particular) could give you insight into any other possibile disparities between how long it takes to process the request vs. perform mutation operations.

Optimal deployment of components

I have an optimal resource allocation problem:
Let us say that I have a set of steps that execute one after the other (strictly in pre-defined order). Each step consumes a fixed amount of memory and cpu capacity for a pre-specified duration. I also have a infinite set of machines to deploy and run this code in. (each step is an independently deployable component). Each machine specifies its max CPU and memory capacity.
Given a throughput rate (rate at which the first task is invoked), I want to be able to provide the ideal deployment strategy. How to go about this?
This is what I can decipher from the problem Statement, trying to rephrase it :
Given a Graph G which has a pre-order in which the steps must execute( say S1 > S2 > S3...Sk).
Each of this steps has fixed CPU %age (Ci) they consume and fixed time(Ti) they will take for execution
The Instances of this graph are created at a fixed throughput of t tps/sec (i.e if t =100, 100 instances of this graph are created per second).
We need to allocate the resources to these instances in such a way that all the resources are optimally and fully utilized. (i.e the time delay must be minimized for catering/allocating resource to any request).

MATLAB parallel computing setup

I have a quad core computer; and I use the parallel computing toolbox.
I set different number for the "worker" number in the parallel computing setting, for example 2,4,8..............
However, no matter what I set, the AVERAGE cpu usage by MATLAB is exactly 25% of total CPU usage; and None of the cores run at 100% (All are around 10%-30%). I am using MATLAB to run optimization problem, so I really want my quad core computer using all its power to do the computing. Please help
Setting a number of workers (up to 4 on a quad-core) is not enough. You also need to use a command like parfor to signal to Matlab what part of the calculation should be distributed among the workers.
I am curious about what kind of optimization you're running. Normally, optimization problems are very difficult to parallelize, since the result of every iteration depends on the previous one. However, if you want to e.g. try and fit multiple models to the data, or if you have to fit multiple data sets, then you can easily run these in parallel as opposed to sequentially.
Note that having many cores may not be sufficient in terms of resources - if performing the optimization on one worker uses k GB of RAM, performing it on n workers requires at least n*k GB of RAM.