Redis Hash/Set Storing multiple types - redis

I am new to redis so I apologize if this question seems naive. I want to create a hash of the following type:
item = {{"bititem":00001010000100...001010},
{"property":1}}
Where bititem is a bit array created by setbit and property is a simple integer value. Is there any way to do this in redis or do I have to create different objects?

From your example, it is not clear to me why you need the extra depth-level around bititem.
Also, it is not clear to me what you want to do with it afterwards. So I give you three scenario's:
1. Serialized:
You can always serialize your data if it involves multiple levels. Most efficient is MsgPack, second best is JSON. You can deserialize the data in Lua-Redis when needed.
2. Hashed:
If you don't need multiple levels, simply do:
HSET item:01 bititem 00001010000100...001010
HSET item:01 property 1
Only do this though, if you really need to extract the different datamembers often. Separate members have quite some overhead. In general, I prefer to serialize the whole object (with a SET or a HSET).
3. Bitwise enabled:
If you want to make use of Redis' bitwise operations, you need to use simple strings (GET/SET). For example:
SET item:01:bititem "00001010000100...001010"
SET item:01:property 1
or even better:
SET item:01:bititem "00001010000100...001010"
SET item:01:properties [all-other-properties-serialized-as-msgpack]
Hope this helps, TW

Related

Create a List of elements from a DataTable LINQ Column

I would like to know how I can convert elements of a column of a DataTable to a list of type string, grouping the elements to avoid repetition.
For example my DataTable would look like this
DataTable
and I want to make a list containing the elements of only "User" without repeating itself using LINQ.
The code I was trying to use is
InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString).ToList(Function(g) g.ToList())
But it doesn't work for me since I am new to LINQ and still have problems forming the structures.
I'd use this:
InvoiceList = InvoiceDT.AsEnumerable().Select(Function(r) r("User").ToString()).Distinct().ToList()
If you wanted a GroupBy solution it's
InvoiceList = InvoiceDT.AsEnumerable().GroupBy(Function(r) r("User").ToString()).Select(Function(g) g.Key).ToList()
Where your code went wrong was in trying to pass a delegate to ToList; it doesn't take one (and you wouldn't ToList the g either, as it's a list of data rows with all varying properties).
To reshape our IGrouping (something like a list of objects that all share the same Key, which is a property of the list that the IGrouping represents) produced by the groupby into a sequence of string Keys we Select the Key, and then ToList that
There is a lot of back and forthing between developers over things like ToList vs ToArray - some people universally use ToList because, for collections of an unknown number of elements, both list and array will grow and resize repeatedly in the same way but using ToArray requires one additional resizing step at the end to trim off any unused slots. Mostly that's trivial in terms of an overall performance consideration and should be weighed against the benefit of releasing the memory with the trim. Getting into finer details is way beyond the scope of this answer but you can read some huge blog posts about it.
I personally think it's more important to generate sensible code by calling the method that results in the relevant type depending on what you plan to do with it; I ToList if I need List functionality (add/insert/remove).. I prefer ToArray if an array suits the follow-on purposes (read/write/random access, no insert or delete), and if I'll only ever enumerate it I don't To... anything at all - I just ForEach the result of the query, which can give a bigger performance boost than anything else because it means I may not have to enumerate the entire set (if I stop early) or allocate memory all at once for doing so (if I'm writing to a socket or file)
On the use of ToString; it's worth avoiding if you think you'll fall into a pattern where you do it on every column just to get a string. If the column is already a string it's an acceptable way to get the object that DataRow.Item gives you, into a string. If the column is another type it's better to cast it:
DirectCast(r("Age"), Integer)
r.Field(Of Integer)("Age")
Thing is, it's verbose, and ugly, and intellisense doesn't help you out with writing Age or knowing it's an Int. LINQ in VB is bad enough for verbosity without pouring gas on that fire. If you're working with datatables of a known structure, it's a lot nicer if you make strongly typed ones:
Add a new file of type DataSet to your project
Open it so the design surface appears. In the properties grid call it something reasonable, such as AccountsDataSet
Right click, Add Table, call it Invoices
Right click the emppty table, Add Column, call it User
Then use it like:
Dim dt as new AccountsDataSet.InvoicesDataTable
Populate it like:
dt.AddInvoicesRow("John Smith", ... other properties here)
Query it like:
dt.Select(Function(r) r.User).Distinct()
Much nicer than accessing column names by string, and having them be objects that need casting..
Consider the dataset generator as a way to quickly, visually, create poco classes with named, typed properties
Try this
dim list as List(of string) = InvoiceDT.Rows.
Cast(of DataRow)().
Select(Function(r) r("User").ToString()).
Distinct().
ToList()
Here you cast Row collection as IEnumerable(of DataRow), rest is trivial

SHOW KEYS in Aerospike?

I'm new to Aerospike and am probably missing something fundamental, but I'm trying to see an enumeration of the Keys in a Set (I'm purposefully avoiding the word "list" because it's a datatype).
For example,
To see all the Namespaces, the docs say to use SHOW NAMESPACES
To see all the Sets, we can use SHOW SETS
If I want to see all the unique Keys in a Set ... what command can I use?
It seems like one can use client.scan() ... but that seems like a super heavy way to get just the key (since it fetches all the bin data as well).
Any recommendations are appreciated! As of right now, I'm thinking of inserting (deleting) into (from) a meta-record.
Thank you #pgupta for pointing me in the right direction.
This actually has two parts:
In order to retrieve original keys from the server, one must -- during put() calls -- set policy to save the key value server-side (otherwise, it seems only a digest/hash is stored?).
Here's an example in Python:
aerospike_client.put(key, {'bin': 'value'}, policy={'key': aerospike.POLICY_KEY_SEND})
Then (modified Aerospike's own documentation), you perform a scan and set the policy to not return the bin data. From this, you can extract the keys:
Example:
keys = []
scan = client.scan('namespace', 'set')
scan_opts = { 'concurrent': True, 'nobins': True, 'priority': aerospike.SCAN_PRIORITY_MEDIUM }
for x in (scan.results(policy=scan_opts)): keys.append(x[0][2])
The need to iterate over the result still seems a little clunky to me; I still think that using a 'master-key' Record to store a list of all the other keys will be more performant, in my case -- in this way, I can simply make one get() call to the Aerospike server to retrieve the list.
You can choose not bring the data back by setting includeBinData in ScanPolicy to false.

VB.NET "For each" versus ".GetUpperBound(0)"

I would like to know what is preferred...
Dim sLines() As String = s.Split(NewLine)
For each:
For Each sLines_item As String In sLines
.GetUpperBound:
For i As Integer = 0 To sLines.GetUpperBound(0)
I have no idea why the "For Each" was introduced for such cases. Until now I have only used .GetUpperBound, and I don't see any PRO for the "For Each".
Thank you
ps: When I use ."GetUpperBound(0)", I do know that I am iterating over the vector.
The "For Each" in contrast sounds like "I don't care in which order the vector is given to me". But that is just personal gusto, I guess.
Short answer: Do not use GetUpperBound(). The only advantage of GetUpperBound() is that it works for multi-dimensional arrays, where Length doesn't work. However, even that usage is outdated since there is Array.GetLength() available that takes the dimension parameter. For all other uses, For i = 0 to Array.Length - 1 is better and probably the fastest option.
It's largely a personal preference.
If you need to alter the elements of the array, you should use For i ... because changing sLines_item will not affect the corresponding array element.
If you need to delete elements of the array, you can iterate For i = ubound(sLines) to 0 step -1 (or the equivalent).
Short answer
You should always use For Each on IEnumerable types unless you have no other choice.
Long answer
Contrary to the popular understanding, For Each is not a syntactic sugar on top of For Next. It will not necessarily iterate over every element of its source. It is a syntactic sugar on top of IEnumerable.GetEnumerator(). For Each will first get an enumerator to its source then loop until it cannot enumerate further. Basically, it will be replaced by the following code. Keep in mind that this is an oversimplification.
' Ask the source for a way to enumerate its content in a forward only manner.
Dim enumerator As IEnumerator = sLines.GetEnumerator()
' Loop until there is no more element in front of us.
While enumerator.Next() Then
' Invoke back the content of the for each block by passing
' the currently enumerated element.
forEachContent.Invoke(enumerator.Current)
End While
The major difference between this and a classical For Next loop is that it does not depend on any length. This fixes two limitations in modern .NET languages. The first one has to do with the Count method. IEnumerable provides a Count method, but the implementation might not be able to keep track of the actual amount of elements it stores. Because of this, calling IEnumerable.Count might cause the source to be iterated over to actually count the amount of element it contains. Moreover, doing this as the end value for traditional For Next loop will cause this process to be done for every element in the loop. This is very slow. Here is an illustration of this process:
For i As Integer = 0 To source.Count() ' This here will cause Count to be
' evaluated for every element in source.
DoSomething(source(i))
Next
The use of For Each fixes this by never requesting the length of the source.
The second limitation it fixes is the lack of a concept for arrays with infinite amount of elements. An example of such cases would be an array containing every digit of PI where each digit is only calculated when you request them. This is where LINQ makes its entrance and really shines because it enables you to write the following code:
Dim piWith10DigitPrecision = From d In InfinitePiSource
Take 10
Dim piWith250DigitPrecision = From d In InfinitePiSource
Take 250
Dim infite2PiSource = From d In InfinitePiSource
Select d * 2
Now, in an infinite source, you cannot depend on a length to iterate over all of its elements. It has an infinite length thus making a traditional For Next loop an infinite loop. This does not change anything for the first two examples I have given with pi because we explicitly provides the amount of elements we want, but it does for the third one. When would you stop iterating? For Each, when combined with Yield (used by the Take operator), makes sure that you never iterate until you actually requests a specific value.
You might have already figured it out by now but these two things means that For Each effectively have no concept of bounds because it simply does not require them. The only use for GetLowerBound and GetUpperBound are for non-zero-indexed arrays. For instance, you might have an array that indexes values from 1 instead of zero. Even then, you only need GetLowerBound and Length. Obviously, this is only if the position of the element in the source actually matters. If it does not, you can still use For Each to iterate over all elements as it is bound agnostic.
Also, as already mentioned, GetLength should be used for zero-indexed multi-dimensional arrays, again, only if the position of the element matters and not just the element itself.

Fastest key/value pair container in Objective-C

I am creating a syntax highlighting engine. My need is very specific. Keywords will be associated to their respective attribute array via a pointer. The data structure will look something like:
dict = {
"printf": keyword_attr_ptr
, "sprintf": keyword_attr_ptr
, "#import": special_attr_ptr
, "string": lib_attr_ptr
}
The look-up needs to be very fast as I will be iterating over this list every keypress.
I'm asking this question because I can not find any good documentation regarding how NSDictionary caches (if it does) and looks up values by its keys (does it use a map? a hashmap?). Can I rely on NSDictionary to be optimized to search for keys by strings?
When I was doing something similar a long while ago I used the MFC CMap function with very good results. NSDictionary appears to be the equivalent to CMap but the key type isn't specified and the NSDictionary clearly states that a key can be any type of object. I just want to make sure I can rely on it to return the results extremely fast before I put a lot of energy into this problem.
UPDATE 1
After a day of research, I ask the question on SO and I find the answer immediately after... go figure.
This is the documentation related to Dictionaries:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Collections/Articles/Dictionaries.html
It uses a hash table to manage its storage. I guess the short answer is that its almost equivalent to CMap.

What is the advised way to make an empty array in VB.NET?

What is the best way to take an array in VB.NET which can either be Nothing or initialised and give it a length of zero?
The three options I can think of are:
ReDim oBytes(-1)
oBytes = New Byte(-1) {}
oBytes = New Byte() {}
The first example is what most of the developers in my company (we used to do VB 6) have always used. I personaly prefer the third example as it is the easiest to understand what is happening.
So what are the positives and negative to each approach (option 2 and 3 are very similar I know)?
EDIT
So does anyone know of a reason to avoid ReDim other that because it is a holdover from the VB days?
Not that I won't accept that as the answer if that is all anyone has!
I recommend: oBytes = New Byte() {}
You should try to avoid "classic VB-isms" like Redim, and other holdovers from the classic VB days. I would recommend the third option.
Edit
To provide some more information about why to avoid it, see this MSDN page. While the page doesn't specifically advise against it, you can see that Redim suffers from shortcomings (and potential for confusion) that the other syntax does not.
Redim can only be used on existing arrays. Even so, it is semantically equivalent to declaring a new array. Redim releases the old array and creates a new one (so it isn't as if Redim has the ability to "tack on" or "chop off" elements). Additionally, it is destructive unless the Preserve keyword is used, even though there is no visual indication that an assignment is taking place.
Because Redim cannot create an array (but can only work on existing arrays), it can only be used within a procedure; at the class level you're forced to use the New Byte() {} method, leaving you with two visually distinct patterns for assigning new arrays, even though they're semantically identical.