Can Hadoop reduce runtime of SIFT? - object-recognition

Can we use hadoop to run SIFT on multiple images?
SIFT takes ~ 1s on each image to extract keypoints and its descriptors. Considering that each run is independent of others and runtime of 1 run cannot be reduced, can we reduce runtime anyhow?
Multithreading reduces runtime by a factor of number of core processors you have. We can run each image on each processor.
Can hadoop be used anyhow to parallelize run on multiple images?
If yes, by what factor can it reduce runtime supposing we have 3 clusters?

Could you give some good references for mappers? What are the kind of mappers which would be relevant for this job?

Related

parallel execution of dask `DataFrame.set_index()`

I am trying to create an index on a large dask dataframe. No matter what scheduler I am unable to utilize more than the equivalent of one core for the operation. The code is:
(ddf.
.read_parquet(pq_in)
.set_index('title', drop=True, npartitions='auto', shuffle='disk', compute=False)
.to_parquet(pq_out, engine='fastparquet', object_encoding='json', write_index=True, compute=False)
.compute(scheduler=my_scheduler)
)
I am running this on a single 64-core machine. What can I do to utilize more cores? Or is set_index inherently sequential?
That should use multiple cores, though using disk for shuffling may introduce other bottlenecks like your local hard drive. Often you aren't bound by additional CPU cores.
In your situation I would use the distributed scheduler on a single machine so that you can use the diagnostic dashboard to get more insight about your computation.

what are the best ways to study Gem5 CPU models

This is a very generic question. What is the best way to study basic CPU models in gem5 so that i can build my own cpu models using them. DO i need to understand the base models fully. I mean do i need to go through the codes line by line to understand the funcionality of those cpu models in gem5?
If your goal is only to change the timing of different pipeline stages, you can change it in your configuration script, as the cpu models in gem5 have options. You can change instruction latencies, number of functional units, cycles between fetch/decode/execute/...
You could take a look at https://github.com/gem5/gem5/tree/master/configs/common/cores/arm where the authors of these file set some options to change the structure of a cpu core. The core still uses the detailed gem5 out-of-order cpu model, but only the parameters (sizes of structures, latencies between structures ...) are modified.
Using this as an example you could change what you want without having to fully understand the code for the detailed cpu model.

Optimizing write performance of a 3 Node 8 Core/16G Cassandra cluster

We have setup a 3 node performance cluster with 16G RAM and 8 Cores each. Our use case is to write 1 million rows to a single table with 101 columns which is currently taking 57-58 mins for the write operation. What should be our first steps towards optimizing the write performance on our cluster?
The first thing I would do is look at the application that is performing the writes:
What language is the application written in and what driver is it using? Some drivers can offer better inherent performance than others. i.e. Python, Ruby, and Node.js drivers may only make use of one thread, so running multiple instances of your application (1 per core) may be something to consider. Your question is tagged 'spark-cassandra-connector' so that possibly indicates your are using that, which uses the datastax java driver, which should perform well as a single instance.
Are your writes asynchronous or are you writing data one at a time? How many writes does it execute concurrently? Too many concurrent writes could cause pressure in Cassandra, but not very many concurrent writes could reduce throughput. If you are using the spark connector are you using saveToCassandra/saveAsCassandraTable or something else?
Are you using batching? If you are, how many rows are you inserting/updating per batch? Too many rows could put a lot of pressure on cassandra. Additionally, are all of your inserts/updates going to the same partition within a batch? If they aren't in the same partition, you should consider batching them up.
Spark Connector Specific: You can tune the write settings, like batch size, batch level (i.e. by partition or by replica set), write throughput in mb per core, etc. You can see all these settings here.
The second thing I would look at is look at metrics on the cassandra side on each individual node.
What does the garbage collection metrics look like? You can enable GC logs by uncommenting lines in conf/cassandra-env.sh (As shown here). Are Your Garbage Collection Logs Speaking to You?. You may need to tune your GC settings, if you are using an 8GB heap the defaults are usually pretty good.
Do your cpu and disk utilization indicate that your systems are under heavy load? Your hardware or configuration could be constraining your capability Selecting hardware for enterprise implementations
Commands like nodetool cfhistograms and nodetool proxyhistograms will help you understand how long your requests are taking (proxyhistograms) and cfhistograms (latencies in particular) could give you insight into any other possibile disparities between how long it takes to process the request vs. perform mutation operations.

MATLAB parallel computing setup

I have a quad core computer; and I use the parallel computing toolbox.
I set different number for the "worker" number in the parallel computing setting, for example 2,4,8..............
However, no matter what I set, the AVERAGE cpu usage by MATLAB is exactly 25% of total CPU usage; and None of the cores run at 100% (All are around 10%-30%). I am using MATLAB to run optimization problem, so I really want my quad core computer using all its power to do the computing. Please help
Setting a number of workers (up to 4 on a quad-core) is not enough. You also need to use a command like parfor to signal to Matlab what part of the calculation should be distributed among the workers.
I am curious about what kind of optimization you're running. Normally, optimization problems are very difficult to parallelize, since the result of every iteration depends on the previous one. However, if you want to e.g. try and fit multiple models to the data, or if you have to fit multiple data sets, then you can easily run these in parallel as opposed to sequentially.
Note that having many cores may not be sufficient in terms of resources - if performing the optimization on one worker uses k GB of RAM, performing it on n workers requires at least n*k GB of RAM.

Is it possible to run several map task in one JVM?

I want to share large in memory static data(RAM lucene index) for my map tasks in Hadoop? Is there way for several map/reduce tasks to share same JVM?
Jobs can enable task JVMs to be reused by specifying the job configuration mapred.job.reuse.jvm.num.tasks. If the value is 1 (the default), then JVMs are not reused (i.e. 1 task per JVM). If it is -1, there is no limit to the number of tasks a JVM can run (of the same job). One can also specify some value greater than 1 using the api.
In $HADOOP_HOME/conf/mapred-site.xml add the follow property
<property>
<name>mapred.job.reuse.jvm.num.tasks</name>
<value>#</value>
</property>
The # can be set to a number to specify how many times the JVM is to be reused (default is 1), or set to -1 for no limit on the reuse amount.
Shameless plug
I go over using static objects with JVM reuse to accomplish what you describe here:
http://chasebradford.wordpress.com/2011/02/05/distributed-cache-static-objects-and-fast-setup/
Another option, although more complicated, is to use distributed cache with a read-only memory mapped file. That way you can share the resource across the JVM processes as well.
To my best knowledge, there is no easy way for multiple map tasks (Hadoop) to share static data structures.
This is actually a known problem for current Map Reduce model. The reason that current implementation doesn't share static datas across map tasks is because Hadoop is designed to be highly reliable. As a result, if a task fails, it will only crash its own JVM. It will not impact the execution of other JVMs.
I am currently working on a prototype that can distribute the work of a single JVM across multiple cores (essentially you just need one JVM to utilize multi cores). This way, you can reduce the duplication of in memory data structures without costing CPU utilization. The next step for me is to develop a version of Hadoop that can run multiple Map tasks within one JVM, which is exactly what you are asking for.
There is an interesting post here
https://issues.apache.org/jira/browse/MAPREDUCE-2123