Why is PLINQ slower than LINQ for this code?

Why is PLINQ slower than LINQ for this code? - .net-4.0

First off, I am running this on a dual core 2.66Ghz processor machine. I am not sure if I have the .AsParallel() call in the correct spot. I tried it directly on the range variable too and that was still slower. I don't understand why...
Here are my results:
Process non-parallel 1000 took 146 milliseconds
Process parallel 1000 took 156 milliseconds
Process non-parallel 5000 took 5187 milliseconds
Process parallel 5000 took 5300 milliseconds
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
namespace DemoConsoleApp
{
internal class Program
{
private static void Main()
{
ReportOnTimedProcess(
() => GetIntegerCombinations(),
"non-parallel 1000");
ReportOnTimedProcess(
() => GetIntegerCombinations(runAsParallel: true),
"parallel 1000");
ReportOnTimedProcess(
() => GetIntegerCombinations(5000),
"non-parallel 5000");
ReportOnTimedProcess(
() => GetIntegerCombinations(5000, true),
"parallel 5000");
Console.Read();
}
private static List<Tuple<int, int>> GetIntegerCombinations(
int iterationCount = 1000, bool runAsParallel = false)
{
IEnumerable<int> range = Enumerable.Range(1, iterationCount);
IEnumerable<Tuple<int, int>> integerCombinations =
from x in range
from y in range
select new Tuple<int, int>(x, y);
return runAsParallel
? integerCombinations.AsParallel().ToList()
: integerCombinations.ToList();
}
private static void ReportOnTimedProcess(
Action process, string processName)
{
var stopwatch = new Stopwatch();
stopwatch.Start();
process();
stopwatch.Stop();
Console.WriteLine("Process {0} took {1} milliseconds",
processName, stopwatch.ElapsedMilliseconds);
}
}
}

It's slightly slower because PLINQ has a certain overhead (threads, scheduling, etc) so you have to pick carefully what you will parallelize. This particular code you're benchmarking isn't really worth parallelizing, you have to parallelize over tasks with significant load, otherwise the overhead will weight more than the benefits of parallelization.

The majority of your execution time here is likely going to be in actually creating the list, via the ToList() method. This will have to perform several memory allocations, resizing the list and so on. You're also not gaining much of a benefit from parallelizing here because the final operation has to be synchronized (you're building a single list on the output).
Try doing something significantly more complex/expensive in the parallel segment, like prime factorization, and increasing the number of iterations to the hundreds of thousands (5000 is a very small number to use when profiling). You should start to see the difference then.
Also make sure that you're profiling in release mode; all too often I see attempts to profile in debug mode, and the results from that will not be accurate.

Related

Fastest way to read huge text file (6 GB) line by line [duplicate]

I have large txt file with 100000 lines.
I need to start n-count of threads and give every thread unique line from this file.
What is the best way to do this? I think I need to read file line by line and iterator must be global to lock it. Loading the text file to list will be time-consuming and I can receive OutofMemory exception. Any ideas?

You can use the File.ReadLines Method to read the file line-by-line without loading the whole file into memory at once, and the Parallel.ForEach Method to process the lines in multiple threads in parallel:
Parallel.ForEach(File.ReadLines("file.txt"), (line, _, lineNumber) =>
{
// your code here
});

After performing my own benchmarks for loading 61,277,203 lines into memory and shoving values into a Dictionary / ConcurrentDictionary() the results seem to support #dtb's answer above that using the following approach is the fastest:
Parallel.ForEach(File.ReadLines(catalogPath), line =>
{
});
My tests also showed the following:
File.ReadAllLines() and File.ReadAllLines().AsParallel() appear to run at almost exactly the same speed on a file of this size. Looking at my CPU activity, it appears they both seem to use two out of my 8 cores?
Reading all the data first using File.ReadAllLines() appears to be much slower than using File.ReadLines() in a Parallel.ForEach() loop.
I also tried a producer / consumer or MapReduce style pattern where one thread was used to read the data and a second thread was used to process it. This also did not seem to outperform the simple pattern above.
I have included an example of this pattern for reference, since it is not included on this page:
var inputLines = new BlockingCollection<string>();
ConcurrentDictionary<int, int> catalog = new ConcurrentDictionary<int, int>();
var readLines = Task.Factory.StartNew(() =>
{
foreach (var line in File.ReadLines(catalogPath))
inputLines.Add(line);
inputLines.CompleteAdding();
});
var processLines = Task.Factory.StartNew(() =>
{
Parallel.ForEach(inputLines.GetConsumingEnumerable(), line =>
{
string[] lineFields = line.Split('\t');
int genomicId = int.Parse(lineFields[3]);
int taxId = int.Parse(lineFields[0]);
catalog.TryAdd(genomicId, taxId);
});
});
Task.WaitAll(readLines, processLines);
Here are my benchmarks:
I suspect that under certain processing conditions, the producer / consumer pattern might outperform the simple Parallel.ForEach(File.ReadLines()) pattern. However, it did not in this situation.

Read the file on one thread, adding its lines to a blocking queue. Start N tasks reading from that queue. Set max size of the queue to prevent out of memory errors.

Something like:
public class ParallelReadExample
{
public static IEnumerable LineGenerator(StreamReader sr)
{
while ((line = sr.ReadLine()) != null)
{
yield return line;
}
}
static void Main()
{
// Display powers of 2 up to the exponent 8:
StreamReader sr = new StreamReader("yourfile.txt")
Parallel.ForEach(LineGenerator(sr), currentLine =>
{
// Do your thing with currentLine here...
} //close lambda expression
);
sr.Close();
}
}
Think it would work. (No C# compiler/IDE here)

If you want to limit the number of threads to n, the easiest way is to use AsParallel() along with WithDegreeOfParallelism(n) to limit the thread count:
string filename = "C:\\TEST\\TEST.DATA";
int n = 5;
foreach (var line in File.ReadLines(filename).AsParallel().WithDegreeOfParallelism(n))
{
// Process line.
}

As #dtb mentioned above, the fastest way to read a file and then process the individual lines in a file is to:
1) do a File.ReadAllLines() into an array
2) Use a Parallel.For loop to iterate over the array.
You can read more performance benchmarks here.
The basic gist of the code you would have to write is:
string[] AllLines = File.ReadAllLines(fileName);
Parallel.For(0, AllLines.Length, x =>
{
DoStuff(AllLines[x]);
//whatever you need to do
});
With the introduction of bigger array sizes in .Net4, as long as you have plenty of memory, this shouldn't be an issue.

How to limit JProfiler to a subtree

I have a method called com.acmesoftware.shared.AbstractDerrivedBean.getDerivedUniqueId(). When I JProfiler the application, this method, getDerivedUniqueId(), is essentially buried 80 methods deep as expected. The method is invoked on behalf of every bean in the application. I'm trying to record CPU calltree starting with this method down to leaf node (ie, one of the excluded classes).
I tried the following but it didn't produce the expected outcome:
Find a method above the method targeted for profiling, eg, markForDeletion().
set trigger to start recording at getDerivedUniqueId()
set trigger to STOP recording at markForDeletion()
I was expecting to only see everything below markForDeletion(), but I saw everything up to but not INCLUDING getDerivedUniqueId(), which is the opposite of my intended goal. Worse yet, even with 5ms sampling, this trigger increased the previous running time from 10 minutes to "I terminated after 3 hours of running". It seems the trigger is adding a giant amount of overhead on top of the overhead. Hence, even if I figure out how to correctly enable the trigger, the added overhead would seem to render it ineffective.
The reason I need to limit the recording to just this method is: When running in 5ms sampling mode, the application completes in 10 minutes. When I run it in full instrumentation, I've waited 3 hours and it still hasn't completed. Hence, I need to turn on full instrumentation ONLY after getDerivedUniqueId() is invoked and pause profiling when getDerivedUniqueId() is exited.
-- Updated/Edit:
Thank you Ingo Kegel for your assistance.
I am likely not clear on how to use triggers. In the code below, I set triggers as shown after the code. My expectation is that when I JProfile the application (both sampling and full instrumentation) with the below configured triggers, if boolean isCollectMetrics is false, I should see 100% or 99.9% of cpu in filtered classes. However, that is not the case. The CPU tree seems not to take into account the triggers.
Secondly, when isCollectMetrics is true, the jprofiler call tree I expect would start with startProfiling() and end at stopProfiling(). Again, this is not the case either.
The method contains() is the bottleneck. It eventually calls one of 150 getDerivedUniqueId(). I am trying to pinpoint which getDerivedUniqueId() is causing the performance degradation.
public static final AtomicLong doEqualContentTime = new AtomicLong();
public static final AtomicLong instCount = new AtomicLong();
protected boolean contentsEqual(final InstanceSetValue that) {
if (isCollectMetrics) {
// initialization code removed for clarity
// ..........
// ..........
final Set<Instance> c1 = getReferences();
final Set<Instance> c2 = that.getReferences();
long st = startProfiling(); /// <------- start here
for (final Instance inst : c1) {
instCount.incrementAndGet();
if (!c2.contains(inst)) {
long et = stopProfiling(); /// <------- stop here
doEqualContentTime.addAndGet(et - st);
return false;
}
}
long et = stopProfiling(); /// <------- stop here
doEqualContentTime.addAndGet(et - st);
return true;
} else {
// same code path as above but w/o the profiling. code removed for bravity.
// ......
// ......
return true;
}
}
public long startProfiling() {
return System.nanoTime();
}
public long stopProfiling() {
return System.nanoTime();
}
public static void reset() {
doEqualContentTime.set(0);
instCount.set(0);
}
The enabled triggers:
startProfiling trigger:
stopProfiling trigger:
I've tried 'Start Recordings' or 'Record CPU' buttons separately to capture the call tree only

If the overhead with instrumentation is large, you should refine your filters. With good filters, the instrumentation overhead can be very small,
As for the trigger setup, the correct actions are:
"Start recording" with CPU data selected
"Wait for the event to finish"
"Stop recording" with CPU data selected

StackExchange.Redis transaction methods freezes

I have this code to add object and index field in Stackexchange.Redis.
All methods in transaction freeze thread. Why ?
var transaction = Database.CreateTransaction();
//this line freeze thread. WHY ?
await transaction.StringSetAsync(KeyProvider.GetForID(obj.ID), PreSaveObject(obj));
await transaction.HashSetAsync(emailKey, new[] { new HashEntry(obj.Email, Convert.ToString(obj.ID)) });
return await transaction.ExecuteAsync();

Commands executed inside a transaction do not return results until after you execute the transaction. This is simply a feature of how transactions work in Redis. At the moment you are awaiting something that hasn't even been sent yet (transactions are buffered locally until executed) - but even if it had been sent: results simply aren't available until the transaction completes.
If you want the result, you should store (not await) the task, and await it after the execute:
var fooTask = tran.SomeCommandAsync(...);
if(await tran.ExecuteAsync()) {
var foo = await fooTask;
}
Note that this is cheaper than it looks: when the transaction executes, the nested tasks get their results at the same time - and await handles that scenario efficiently.

Marc's answer works, but in my case it caused a decent amount of code bloat (and it's easy to forget to do it this way), so I came up with an abstraction that sort of enforces the pattern.
Here's how you use it:
await db.TransactAsync(commands => commands
.Enqueue(tran => tran.SomeCommandAsync(...))
.Enqueue(tran => tran.SomeCommandAsync(...))
.Enqueue(tran => tran.SomeCommandAsync(...)));
Here's the implementation:
public static class RedisExtensions
{
public static async Task TransactAsync(this IDatabase db, Action<RedisCommandQueue> addCommands)
{
var tran = db.CreateTransaction();
var q = new RedisCommandQueue(tran);
addCommands(q);
if (await tran.ExecuteAsync())
await q.CompleteAsync();
}
}
public class RedisCommandQueue
{
private readonly ITransaction _tran;
private readonly IList<Task> _tasks = new List<Task>();
public RedisCommandQueue Enqueue(Func<ITransaction, Task> cmd)
{
_tasks.Add(cmd(_tran));
return this;
}
internal RedisCommandQueue(ITransaction tran) => _tran = tran;
internal Task CompleteAsync() => Task.WhenAll(_tasks);
}
One caveat: This doesn't provide an easy way to get at the result of any of the commands. In my case (and the OP's) that's ok - I'm always using transactions for a series of writes. I found this really helped trim down my code, and by only exposing tran inside Enqueue (which requires you to return a Task), I'm less likely to "forget" that I shouldn't be awaiting those commands at the time I call them.

I and our team were bitten by this issue several times, so I created a simple Roslyn analyzer to spot such problems.
https://github.com/olsh/stack-exchange-redis-analyzer

.NET 4.0 Threading.Tasks

I've recently started working on a new application which will utilize task parallelism. I have just begun writing a tasking framework, but have recently seen a number of posts on SO regarding the new System.Threading.Tasks namespace which may be useful to me (and I would rather use an existing framework than roll my own).
However looking over MSDN I haven't seen how / if, I can implement the functionality which I'm looking for:
Dependency on other tasks completing.
Able to wait on an unknown number of tasks preforming the same action (maybe wrapped in the same task object which is invoked multiple times)
Set maximum concurrent instances of a task since they use a shared resource there is no point running more than one at once
Hint at priority, or scheduler places tasks with lower maximum concurrent instances at a higher priority (so as to keep said resource in use as much as possible)
Edit ability to vary the priority of tasks which are preforming the same action (pretty poor example but, PredictWeather (Tommorrow) will have a higher priority than PredictWeather (NextWeek))
Can someone point me towards an example / tell me how I can achieve this? Cheers.
C# Use Case: (typed in SO so please for give any syntax errors / typos)
**note Do() / DoAfter() shouldn't block the calling thread*
class Application ()
{
Task LeafTask = new Task (LeafWork) {PriorityHint = High, MaxConcurrent = 1};
var Tree = new TaskTree (LeafTask);
Task TraverseTask = new Task (Tree.Traverse);
Task WorkTask = new Task (MoreWork);
Task RunTask = new Task (Run);
Object SharedLeafWorkObject = new Object ();
void Entry ()
{
RunTask.Do ();
RunTask.Join (); // Use this thread for task processing until all invocations of RunTask are complete
}
void Run(){
TraverseTask.Do ();
// Wait for TraverseTask to make sure all leaf tasks are invoked before waiting on them
WorkTask.DoAfter (new [] {TraverseTask, LeafTask});
if (running){
RunTask.DoAfter (WorkTask); // Keep at least one RunTask alive to prevent Join from 'unblocking'
}
else
{
TraverseTask.Join();
WorkTask.Join ();
}
}
void LeafWork (Object leaf){
lock (SharedLeafWorkObject) // Fake a shared resource
{
Thread.Sleep (200); // 'work'
}
}
void MoreWork ()
{
Thread.Sleep (2000); // this one takes a while
}
}
class TaskTreeNode<TItem>
{
Task LeafTask; // = Application::LeafTask
TItem Item;
void Traverse ()
{
if (isLeaf)
{
// LeafTask set in C-Tor or elsewhere
LeafTask.Do(this.Item);
//Edit
//LeafTask.Do(this.Item, this.Depth); // Deeper items get higher priority
return;
}
foreach (var child in this.children)
{
child.Traverse ();
}
}
}

There are numerous examples here:
http://code.msdn.microsoft.com/ParExtSamples
There's a great white paper which covers a lot of the details you mention above here:
"Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4"
http://www.microsoft.com/downloads/details.aspx?FamilyID=86b3d32b-ad26-4bb8-a3ae-c1637026c3ee&displaylang=en
Off the top of my head I think you can do all the things you list in your question.
Dependencies etc: Task.WaitAll(Task[] tasks)
Scheduler: The library supports numerous options for limiting number of threads in use and supports providing your own scheduler. I would avoid altering the priority of threads if at all possible. This is likely to have negative impact on the scheduler, unless you provide your own.

Custom Performance Counter / Minute in .NET

I'm trying to create a custom performance counter in C# based on per minute.
So far, I've seen only RateOfCountsPerSecond32 or RateOfCountsPerSecond64 available.
Does anybody know what are options for creating a custom counter based on per minute?

This won't be directly supported. You'll have to computer the rate per minute yourself, and then use a NumberOfItems32 or NumberOfItems64 counter to display the rate. Using a helpful name like "Count / minute" will make it clear what the value is. You'll just update the counter every minute. A background (worker) thread would be a good place to do that.
Alternately, you can just depend upon the monitoring software. Use a NumberOfItems32/64 counter, but have the monitoring software do the per-minute computation. The PerfMon tool built into Windows doesn't do this, but there's no reason it couldn't.

By default PerfMon pulls data every second. In order to get permanent image in Windows performance monitor chart, I've wrote custom counter for measure rate of count per minute.
After working for one minute I become receive data from my counter.
Note that accuracy doesn't important for me.
Code snippet look like this:
class PerMinExample
{
private static PerformanceCounter _pcPerSec;
private static PerformanceCounter _pcPerMin;
private static Timer _timer = new Timer(CallBack, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));
private static Queue<CounterSample> _queue = new Queue<CounterSample>();
static PerMinExample()
{
// RateOfCountsPerSecond32
_pcPerSec = new PerformanceCounter("Category", "ORDERS PER SECOND", false);
// NumberOfItems32
_pcPerMin = new PerformanceCounter("Category", "ORDERS PER MINUTE", false);
_pcPerSec.RawValue = 0;
_pcPerMin.RawValue = 0;
}
public void CountSomething()
{
_pcPerSec.Increment();
}
private static void CallBack(Object o)
{
CounterSample sample = _pcPerSec.NextSample();
_queue.Enqueue(sample);
if (_queue.Count <= 60)
return;
CounterSample prev = _queue.Dequeue();
Single numerator = (Single)sample.RawValue - (Single)prev.RawValue;
Single denomenator =
(Single)(sample.TimeStamp - prev.TimeStamp)
/ (Single)(sample.SystemFrequency) / 60;
Single counterValue = numerator / denomenator;
_pcPerMin.RawValue = (Int32)Math.Ceiling(counterValue);
Console.WriteLine("ORDERS PER SEC: {0}", _pcPerSec.NextValue());
Console.WriteLine("ORDERS PER MINUTE: {0}", _pcPerMin.NextValue());
}
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why is PLINQ slower than LINQ for this code? - .net-4.0

Related

Fastest way to read huge text file (6 GB) line by line [duplicate]

How to limit JProfiler to a subtree

StackExchange.Redis transaction methods freezes

.NET 4.0 Threading.Tasks

Custom Performance Counter / Minute in .NET

Categories

Resources