Chronicle Queue performance when using ByteBuffer - chronicle-queue

I'm using Chronicle Queue as a DataStore which will be written to once but read many many times. I'm trying to get the best performance (time to read x number of records).
My data set (for my test) is about 3 million records , where each record consists of a bunch of longs and doubles. I initially started with "Highest-level" API which was obviously slow , then self-describing" data as mentioned in this Chronicle Documentation and finally using "raw data" which gave the best performance.
Code as below:(Corresponding write() code is omitted for brevity)
public List<DomainObject> read()
{
final ExcerptTailer tailer = _cq.createTailer();
List<DomainObject> result = new ArrayList<>();
for (; ; ) {
try (final DocumentContext ctx = tailer.readingDocument()) {
Wire wire = ctx.wire();
if(wire != null) {
wire.readBytes(in -> {
final long var1= in.readLong();
final int var2= in.readInt();
final double var3= in.readDouble();
final int var4= in.readInt();
final double var5= in.readDouble();
final int var6= in.readInt();
final double var7= in.readDouble();
result.add(DomainObject.create(var1, var2, var3, var4, var5, var6, var7);
});
}else{
return result;
}
}
}
}
However to improve my Application performance ,I started using ByteBuffer instead of a "DomainObject" and thus modified by read method as below:
public List<ByteBuffer> read()
{
final ExcerptTailer tailer = _cq.createTailer();
List<ByteBuffer> result = new ArrayList<>();
for (; ; ) {
try (final DocumentContext ctx = tailer.readingDocument()) {
Wire wire = ctx.wire();
if(wire != null) {
ByteBuffer bb = ByteBuffer.allocate(56);
wire.readBytes(in -> {
in.read(bb); });
result.add(bb);
}else{
return result;
}
}
}
}
Above code listing took an average of 550 ms vs 270ms for the first listing.
I also tried using Bytes.elasticByteBuffer as mentioned in this post but it was way slower
I'm guessing the second code listing is slower because it has to loop through the entire byte array.
So my question is - Is there a more performant way to read bytes from Chronicle Queue into a ByteBuffer? My data will always be 56 bytes with 8 bytes for each data item.

I suggest you use Chronicle-Bytes instead of raw ByteBuffer. Chronicle's Bytes class is a wrapper on top of ByteBuffer but much easier to use.
The problem with your code is you create a bunch of objects instead of stream-processing. I suggest you read with something like:
public void read(Consumer<Bytes> consumer) {
final ExcerptTailer tailer = _cq.createTailer();
for (; ; ) {
try (final DocumentContext ctx = tailer.readingDocument()) {
if (ctx.isPresent()) {
consumer.accept(ctx.wire().bytes());
} else {
break;
}
}
}
}
And your writing method could look like:
public void write(BytesMarshallable o) {
try (DocumentContext dc = _cq.acquireAppender().writingDocument()) {
o.writeMarshallable(dc.wire().bytes());
}
}
And then your consumer could be like:
private BytesMarshallable reusable = new BusinessObject(); //your class here
public accept(Bytes b) {
reusable.readMarshallable(b);
// your business logic here
doSomething(reusable);
}

Related

How to set variable inside of a Coroutine after yielding a webrequest

Okay I will try and explain this to the best of my ability. I have searched and searched all day for a solution to this issue but can't seem to find it. The problem that I am having is that I have a list of scriptable objects that I am basically using for custom properties to create gameobjects off of. One of those properties that I need to get is a Texture2D that I turn into a sprite. Therefor, I am using UnityWebRequest in a Coroutine and am having to yield the response. After I get the response I am trying to set the variable. However even using Lambdas it seems to me that if I yield return the response before the result it will not set the variable. So every time I check the variable after the Coroutine it comes back null. If someone could enlighten me with what I am missing here that would be just great!
Here is the Scriptable Object Class I am using.
[CreateAssetMenu(fileName = "new movie",menuName = "movie")]
public class MovieTemplate : ScriptableObject
{
public string Title;
public string Description;
public string ImgURL;
public string mainURL;
public string secondaryURL;
public Sprite thumbnail;
}
Here is the call to the Coroutine
foreach (var item in nodes)
{
templates.Add(GetMovieData(item));
}
foreach (MovieTemplate movie in templates)
{
StartCoroutine(GetMovieImage(movie.ImgURL, result =>
{
movie.thumbnail = result;
}));
}
Here is the Coroutine itself
IEnumerator GetMovieImage(string url, System.Action<Sprite> result)
{
using (UnityWebRequest web = UnityWebRequestTexture.GetTexture(url))
{
yield return web.SendWebRequest();
var img = DownloadHandlerTexture.GetContent(web);
result(Sprite.Create(img, new Rect(0, 0, img.width, img.height), Vector2.zero));
}
}
From what you desribe it still seems that the texture is somehow disposed as soon as the routine finishes. My guess would be that it happens due to the using block.
I would store the original texture reference
[CreateAssetMenu(fileName = "new movie",menuName = "movie")]
public class MovieTemplate : ScriptableObject
{
public string Title;
public string Description;
public string ImgURL;
public string mainURL;
public string secondaryURL;
public Sprite thumbnail;
public Texture texture;
public void SetSprite(Sprite newSprite, Texture newTexture)
{
if(texture) Destroy(texture);
texture = newTexture;
var tex = (Texture2D) texture;
thumbnail = Sprite.Create(tex, new Rect(0, 0, tex.width, tex.height), Vector2.zero);
}
}
So you can keep track of the texture itself as well, let it not be collected by the GC but also destroy it when not needed anymore. Usually Texture2D is removed by the GC as soon as there is no reference to it anymore but Texture2D created by UnityWebRequest might behave different.
Than in the webrequest return the texture and don't use using
IEnumerator GetMovieImage(string url, System.Action<Texture> result)
{
UnityWebRequest web = UnityWebRequestTexture.GetTexture(url));
yield return web.SendWebRequest();
if(!web.error)
{
result?.Invoke(DownloadHandlerTexture.GetContent(web));
}
else
{
Debug.LogErrorFormat(this, "Download error: {0} - {1}", web.responseCode, web.error);
}
}
and finally use it like
for (int i = 0; i < templates.Count; i++)
{
int index = i;//If u use i, it will be overriden too so we make a copy of it
StartCoroutine(
GetMovieImage(
templates[index].ImgURL,
result =>
{
templates[index].SetSprite(result);
})
);
}
The problem is with this section of your code :
foreach (MovieTemplate movie in templates)
{
StartCoroutine(GetMovieImage(movie.ImgURL, result =>
{
movie.thumbnail = result;//wrong movie obj
}));
}
Here you will loose refrence to movie object(override by foreach) before the result of callback arrive .
Change it to something like this :
foreach (int i=0;i<templates.Length;i++)
{
int index= i;//If u use i, it will be overriden too so we make a copy of it
StartCoroutine(GetMovieImage(movie.ImgURL, result =>
{
templates[index].thumbnail = result;
}));
}

My Akka.Net Demo is incredibly slow

I am trying to get a proof of concept running with akka.net. I am sure that I am doing something terribly wrong, but I can't figure out what it is.
I want my actors to form a graph of nodes. Later, this will be a complex graph of business objekts, but for now I want to try a simple linear structure like this:
I want to ask a node for a neighbour that is 9 steps away. I am trying to implement this in a recursive manner. I ask node #9 for a neighbour that is 9 steps away, then I ask node #8 for a neighbour that is 8 steps away and so on. Finally, this should return node #0 as an answer.
Well, my code works, but it takes more than 4 seconds to execute. Why is that?
This is my full code listing:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using Akka;
using Akka.Actor;
namespace AkkaTest
{
class Program
{
public static Stopwatch stopwatch = new Stopwatch();
static void Main(string[] args)
{
var system = ActorSystem.Create("MySystem");
IActorRef[] current = new IActorRef[0];
Console.WriteLine("Initializing actors...");
for (int i = 0; i < 10; i++)
{
var current1 = current;
var props = Props.Create<Obj>(() => new Obj(current1, Guid.NewGuid()));
var actorRef = system.ActorOf(props, i.ToString());
current = new[] { actorRef };
}
Console.WriteLine("actors initialized.");
FindNeighboursRequest r = new FindNeighboursRequest(9);
stopwatch.Start();
var response = current[0].Ask(r);
FindNeighboursResponse result = (FindNeighboursResponse)response.Result;
stopwatch.Stop();
foreach (var d in result.FoundNeighbours)
{
Console.WriteLine(d);
}
Console.WriteLine("Search took " + stopwatch.ElapsedMilliseconds + "ms.");
Console.ReadLine();
}
}
public class FindNeighboursRequest
{
public FindNeighboursRequest(int distance)
{
this.Distance = distance;
}
public int Distance { get; private set; }
}
public class FindNeighboursResponse
{
private IActorRef[] foundNeighbours;
public FindNeighboursResponse(IEnumerable<IActorRef> descendants)
{
this.foundNeighbours = descendants.ToArray();
}
public IActorRef[] FoundNeighbours
{
get { return this.foundNeighbours; }
}
}
public class Obj : ReceiveActor
{
private Guid objGuid;
readonly List<IActorRef> neighbours = new List<IActorRef>();
public Obj(IEnumerable<IActorRef> otherObjs, Guid objGuid)
{
this.neighbours.AddRange(otherObjs);
this.objGuid = objGuid;
Receive<FindNeighboursRequest>(r => handleFindNeighbourRequest(r));
}
public Obj()
{
}
private async void handleFindNeighbourRequest (FindNeighboursRequest r)
{
if (r.Distance == 0)
{
FindNeighboursResponse response = new FindNeighboursResponse(new IActorRef[] { Self });
Sender.Tell(response, Self);
return;
}
List<FindNeighboursResponse> responses = new List<FindNeighboursResponse>();
foreach (var actorRef in neighbours)
{
FindNeighboursRequest req = new FindNeighboursRequest(r.Distance - 1);
var response2 = actorRef.Ask(req);
responses.Add((FindNeighboursResponse)response2.Result);
}
FindNeighboursResponse response3 = new FindNeighboursResponse(responses.SelectMany(rx => rx.FoundNeighbours));
Sender.Tell(response3, Self);
}
}
}
The reason of such slow behavior is the way you use Ask (an that you use it, but I'll cover this later). In your example, you're asking each neighbor in a loop, and then immediately executing response2.Result which is actively blocking current actor (and thread it resides on). So you're essentially making synchronous flow with blocking.
The easiest thing to fix that, is to collect all tasks returned from Ask and use Task.WhenAll to collect them all, without waiting for each one in a loop. Taking this example:
public class Obj : ReceiveActor
{
private readonly IActorRef[] _neighbours;
private readonly Guid _id;
public Obj(IActorRef[] neighbours, Guid id)
{
_neighbours = neighbours;
_id = id;
Receive<FindNeighboursRequest>(async r =>
{
if (r.Distance == 0) Sender.Tell(new FindNeighboursResponse(new[] {Self}));
else
{
var request = new FindNeighboursRequest(r.Distance - 1);
var replies = _neighbours.Select(neighbour => neighbour.Ask<FindNeighboursResponse>(request));
var ready = await Task.WhenAll(replies);
var responses = ready.SelectMany(x => x.FoundNeighbours);
Sender.Tell(new FindNeighboursResponse(responses.ToArray()));
}
});
}
}
This one is much faster.
NOTE: In general you shouldn't use Ask inside of an actor:
Each ask is allocating a listener inside current actor, so in general using Ask is A LOT heavier than passing messages with Tell.
When sending messages through chain of actors, cost of ask is additionally transporting message twice (one for request and one for reply) through each actor. One of the popular patterns is that, when you are sending request from A⇒B⇒C⇒D and respond from D back to A, you can reply directly D⇒A, without need of passing the message through whole chain back. Usually combination of Forward/Tell works better.
In general don't use async version of Receive if it's not necessary - at the moment, it's slower for an actor when compared to sync version.

Store and retrieve string arrays in HBase

I've read this answer (How to store complex objects into hadoop Hbase?) regarding the storing of string arrays with HBase.
There it is said to use the ArrayWritable Class to serialize the array. With WritableUtils.toByteArray(Writable ... writable) I'll get a byte[] which I can store in HBase.
When I now try to retrieve the rows again, I get a byte[] which I have somehow to transform back again into an ArrayWritable.
But I don't find a way to do this. Maybe you know an answer or am I doing fundamentally wrong serializing my String[]?
You may apply the following method to get back the ArrayWritable (taken from my earlier answer, see here) .
public static <T extends Writable> T asWritable(byte[] bytes, Class<T> clazz)
throws IOException {
T result = null;
DataInputStream dataIn = null;
try {
result = clazz.newInstance();
ByteArrayInputStream in = new ByteArrayInputStream(bytes);
dataIn = new DataInputStream(in);
result.readFields(dataIn);
}
catch (InstantiationException e) {
// should not happen
assert false;
}
catch (IllegalAccessException e) {
// should not happen
assert false;
}
finally {
IOUtils.closeQuietly(dataIn);
}
return result;
}
This method just deserializes the byte array to the correct object type, based on the provided class type token.
E.g:
Let's assume you have a custom ArrayWritable:
public class TextArrayWritable extends ArrayWritable {
public TextArrayWritable() {
super(Text.class);
}
}
Now you issue a single HBase get:
...
Get get = new Get(row);
Result result = htable.get(get);
byte[] value = result.getValue(family, qualifier);
TextArrayWritable tawReturned = asWritable(value, TextArrayWritable.class);
Text[] texts = (Text[]) tawReturned.toArray();
for (Text t : texts) {
System.out.print(t + " ");
}
...
Note:
You may have already found the readCompressedStringArray() and writeCompressedStringArray() methods in WritableUtils
which seem to be suitable if you have your own String array-backed Writable class.
Before using them, I'd warn you that these can cause serious performance hit due to
the overhead caused by the gzip compression/decompression.

How can I use Lucene's PriorityQueue when I don't know the max size at create time?

I built a custom collector for Lucene.Net, but I can't figure out how to order (or page) the results. Everytime Collect gets called, I can add the result to an internal PriorityQueue, which I understand is the correct way to do this.
I extended the PriorityQueue, but it requires a size parameter on creation. You have to call Initialize in the constructor and pass in the max size.
However, in a collector, the searcher just calls Collect when it gets a new result, so I don't know how many results I have when I create the PriorityQueue. Based on this, I can't figure out how to make the PriorityQueue work.
I realize I'm probably missing something simple here...
PriorityQueue is not SortedList or SortedDictionary.
It is a kind of sorting implementation where it returns the top M results(your PriorityQueue's size) of N elements. You can add with InsertWithOverflow as many items as you want, but it will only hold only the top M elements.
Suppose your search resulted in 1000000 hits. Would you return all of the results to user?
A better way would be to return the top 10 elements to the user(using PriorityQueue(10)) and
if the user requests for the next 10 result, you can make a new search with PriorityQueue(20) and return the next 10 elements and so on.
This is the trick most search engines like google uses.
Everytime Commit gets called, I can add the result to an internal PriorityQueue.
I can not undestand the relationship between Commit and search, Therefore I will append a sample usage of PriorityQueue:
public class CustomQueue : Lucene.Net.Util.PriorityQueue<Document>
{
public CustomQueue(int maxSize): base()
{
Initialize(maxSize);
}
public override bool LessThan(Document a, Document b)
{
//a.GetField("field1")
//b.GetField("field2");
return //compare a & b
}
}
public class MyCollector : Lucene.Net.Search.Collector
{
CustomQueue _queue = null;
IndexReader _currentReader;
public MyCollector(int maxSize)
{
_queue = new CustomQueue(maxSize);
}
public override bool AcceptsDocsOutOfOrder()
{
return true;
}
public override void Collect(int doc)
{
_queue.InsertWithOverflow(_currentReader.Document(doc));
}
public override void SetNextReader(IndexReader reader, int docBase)
{
_currentReader = reader;
}
public override void SetScorer(Scorer scorer)
{
}
}
searcher.Search(query,new MyCollector(10)) //First page.
searcher.Search(query,new MyCollector(20)) //2nd page.
searcher.Search(query,new MyCollector(30)) //3rd page.
EDIT for #nokturnal
public class MyPriorityQueue<TObj, TComp> : Lucene.Net.Util.PriorityQueue<TObj>
where TComp : IComparable<TComp>
{
Func<TObj, TComp> _KeySelector;
public MyPriorityQueue(int size, Func<TObj, TComp> keySelector) : base()
{
_KeySelector = keySelector;
Initialize(size);
}
public override bool LessThan(TObj a, TObj b)
{
return _KeySelector(a).CompareTo(_KeySelector(b)) < 0;
}
public IEnumerable<TObj> Items
{
get
{
int size = Size();
for (int i = 0; i < size; i++)
yield return Pop();
}
}
}
var pq = new MyPriorityQueue<Document, string>(3, doc => doc.GetField("SomeField").StringValue);
foreach (var item in pq.Items)
{
}
The reason Lucene's Priority Queue is size limited is because it uses a fixed size implementation that is very fast.
Think about what is the reasonable maximum number of results to get back at a time and use that number, the "waste" for when the results are few is not that bad for the benefit it gains.
On the other hand, if you have such a huge number of results that you cannot hold them, then how are you going to be serving/displaying them? Keep in mind that this is for "top" hits so as you iterate through the results you will be hitting less and less relevant ones anyway.

Pattern for limiting number of simultaneous asynchronous calls

I need to retrieve multiple objects from an external system. The external system supports multiple simultaneous requests (i.e. threads), but it is possible to flood the external system - therefore I want to be able to retrieve multiple objects asynchronously, but I want to be able to throttle the number of simultaneous async requests. i.e. I need to retrieve 100 items, but don't want to be retrieving more than 25 of them at once. When each request of the 25 completes, I want to trigger another retrieval, and once they are all complete I want to return all of the results in the order they were requested (i.e. there is no point returning the results until the entire call is returned). Are there any recommended patterns for this sort of thing?
Would something like this be appropriate (pseudocode, obviously)?
private List<externalSystemObjects> returnedObjects = new List<externalSystemObjects>;
public List<externalSystemObjects> GetObjects(List<string> ids)
{
int callCount = 0;
int maxCallCount = 25;
WaitHandle[] handles;
foreach(id in itemIds to get)
{
if(callCount < maxCallCount)
{
WaitHandle handle = executeCall(id, callback);
addWaitHandleToWaitArray(handle)
}
else
{
int returnedCallId = WaitHandle.WaitAny(handles);
removeReturnedCallFromWaitHandles(handles);
}
}
WaitHandle.WaitAll(handles);
return returnedObjects;
}
public void callback(object result)
{
returnedObjects.Add(result);
}
Consider the list of items to process as a queue from which 25 processing threads dequeue tasks, process a task, add the result then repeat until the queue is empty:
class Program
{
class State
{
public EventWaitHandle Done;
public int runningThreads;
public List<string> itemsToProcess;
public List<string> itemsResponses;
}
static void Main(string[] args)
{
State state = new State();
state.itemsResponses = new List<string>(1000);
state.itemsToProcess = new List<string>(1000);
for (int i = 0; i < 1000; ++i)
{
state.itemsToProcess.Add(String.Format("Request {0}", i));
}
state.runningThreads = 25;
state.Done = new AutoResetEvent(false);
for (int i = 0; i < 25; ++i)
{
Thread t =new Thread(new ParameterizedThreadStart(Processing));
t.Start(state);
}
state.Done.WaitOne();
foreach (string s in state.itemsResponses)
{
Console.WriteLine("{0}", s);
}
}
private static void Processing(object param)
{
Debug.Assert(param is State);
State state = param as State;
try
{
do
{
string item = null;
lock (state.itemsToProcess)
{
if (state.itemsToProcess.Count > 0)
{
item = state.itemsToProcess[0];
state.itemsToProcess.RemoveAt(0);
}
}
if (null == item)
{
break;
}
// Simulate some processing
Thread.Sleep(10);
string response = String.Format("Response for {0} on thread: {1}", item, Thread.CurrentThread.ManagedThreadId);
lock (state.itemsResponses)
{
state.itemsResponses.Add(response);
}
} while (true);
}
catch (Exception)
{
// ...
}
finally
{
int threadsLeft = Interlocked.Decrement(ref state.runningThreads);
if (0 == threadsLeft)
{
state.Done.Set();
}
}
}
}
You can do the same using asynchronous callbacks, there is no need to use threads.
Having some queue-like structure to hold the pending requests is a pretty common pattern. In Web apps where there may be several layers of processing you see a "funnel" style approach with the early parts of the processing change having larger queues. There may also be some kind of prioritisation applied to queues, higher priority requests being shuffled to the top of the queue.
One important thing to consider in your solution is that if request arrival rate is higher than your processing rate (this might be due to a Denial of Service attack, or just that some part of the processing is unusually slow today) then your queues will increase without bound. You need to have some policy such as to refuse new requests immediately when the queue depth exceeds some value.