SHOW KEYS in Aerospike? - aerospike

I'm new to Aerospike and am probably missing something fundamental, but I'm trying to see an enumeration of the Keys in a Set (I'm purposefully avoiding the word "list" because it's a datatype).
For example,
To see all the Namespaces, the docs say to use SHOW NAMESPACES
To see all the Sets, we can use SHOW SETS
If I want to see all the unique Keys in a Set ... what command can I use?
It seems like one can use client.scan() ... but that seems like a super heavy way to get just the key (since it fetches all the bin data as well).
Any recommendations are appreciated! As of right now, I'm thinking of inserting (deleting) into (from) a meta-record.

Thank you #pgupta for pointing me in the right direction.
This actually has two parts:
In order to retrieve original keys from the server, one must -- during put() calls -- set policy to save the key value server-side (otherwise, it seems only a digest/hash is stored?).
Here's an example in Python:
aerospike_client.put(key, {'bin': 'value'}, policy={'key': aerospike.POLICY_KEY_SEND})
Then (modified Aerospike's own documentation), you perform a scan and set the policy to not return the bin data. From this, you can extract the keys:
Example:
keys = []
scan = client.scan('namespace', 'set')
scan_opts = { 'concurrent': True, 'nobins': True, 'priority': aerospike.SCAN_PRIORITY_MEDIUM }
for x in (scan.results(policy=scan_opts)): keys.append(x[0][2])
The need to iterate over the result still seems a little clunky to me; I still think that using a 'master-key' Record to store a list of all the other keys will be more performant, in my case -- in this way, I can simply make one get() call to the Aerospike server to retrieve the list.

You can choose not bring the data back by setting includeBinData in ScanPolicy to false.

Related

Confused about keys in gun DB

var stallone = {stallone:{first:'Sylvester',last:'Stallone',gender:'male'}};
var gibson = {gibson:{first:'Mel',last:'Gibson',gender:'male'}};
var movies = gun.get('movies')
movies.put(stallone).key('movies/action').key('movies/actors').key('movies/action/rambo') movies.put(gibson).key('movies/action').key('movies/actors').key('movies/action/roadwarrior').key('movies/comedy');
movies.get('movies/action').val();
returns {_: Object, stallone: Object, gibson: Object} Nice.
movies.get('movies/comedy').val();
returns {_: Object, stallone: Object, gibson: Object} Erm..What is Sly doing here? Not Nice!!
gun.get('movies/comedy').val();
returns {_: Object, stallone: Object, gibson: Object} same thing!!
This behaviour leads to a couple of questions:
1) why bother creating movies ?
I'm working with var movies = gun.get('movies') so why do i have to create the key with 'movies'
in it again? 'movies' should be prefixed automaticly.
2) Even if the multiple keys would work it's not very intuitive. It would be nice if we could just do
movies.put(gibson).keys(['actors','comedy','action']).
Note: i would be happy if could be done tru a loop.
but that doesn't work eather
var gibsonKeys = ['actors','action','comedy','dieHard']
gibsonKeys.forEach(function(key){
movies.put(gibson).key('movies/'+key);
// could be gun.put(gibson).key('movies/'+ key) as well
});
As a sidenote...I know that the keys are just strings and not real paths to the data ;)
Answered by Mark Nadal
A couple things to note:
movies.put(data).key('foo/bar')
is putting data on movies and keying movies with 'foo/bar'. It is an update operation, not an insert operation. So what is returned from the put is the same context (movies), not some sub-document (you could access that sub document with movies.put(data).path('stallone') for example).
If you are wanting to insert a record, kinda like having a table, try using .set - check out this article: https://github.com/amark/gun/wiki/graphs which goes over some examples of various data types.
Actually, for .set this is probably better: https://medium.com/#sbeleidy/a-weekend-with-gun-a61fdcb8cc5d#.49nuy86gs
Keys are different than tags, it also looks like you are probably wanting something like this: https://github.com/PsychoLlama/labelmaker.
Keys are as in key/value you can have multiple keys on something but they all point to the same thing. The above module gives you tags, which allows you to take multiple different things and tag them all with the same tag. Underneath the hood the way this is done is that it creates a set (see above, think of it as an unordered list) which the the tag is keyed to, and then you're able to iterate over all the multiple different items that are in that list. Does that make sense?
Allowing key to accept multiple keys though is probably still a good idea though
However, the behavior above is correct for key, but it seems like what you want to use is a tag like system which you can add to GUN with the above module.

Case insensitive index in Neo4j using Py2neo

I want to make a case-insensitive index in Neo4j using Py2neo.
Read through the docs and googled a lot but didn't find anything. There seems to be this option in Java but not in Py2neo.
Please help!
You can pass configuration options into the GraphDatabaseService.get_or_create_index function as indicated here:
http://book.py2neo.org/en/latest/graphs_nodes_relationships/#py2neo.neo4j.GraphDatabaseService.get_or_create_index
These arguments are passed directly into the REST call as described here:
http://docs.neo4j.org/chunked/milestone/rest-api-indexes.html#rest-api-create-node-index-with-configuration
Hope this helps.
When using legacy indexes you can supply a configuration upon initial creation of the index. You have to set to_lower_case=true in combination with type=fulltext.
Schema indexes on the other hand do not yet support case insensitivity. As a workaround, introduce a copy of the respective property, e.g. name -> nameLower, which gets populated by the lowercase variant of that string. You could do something like this on existing datasets:
CREATE INDEX ON :Person(nameLower);
// --- use seperate transaction
MATCH (p:Person) set p.nameLower = lower(p.name); // maybe apply LIMITs for large amount of nodes
Your query string of course needs to use lower case:
MATCH (p:Person {nameLower:'john'}) RETURN p

Redis Hash/Set Storing multiple types

I am new to redis so I apologize if this question seems naive. I want to create a hash of the following type:
item = {{"bititem":00001010000100...001010},
{"property":1}}
Where bititem is a bit array created by setbit and property is a simple integer value. Is there any way to do this in redis or do I have to create different objects?
From your example, it is not clear to me why you need the extra depth-level around bititem.
Also, it is not clear to me what you want to do with it afterwards. So I give you three scenario's:
1. Serialized:
You can always serialize your data if it involves multiple levels. Most efficient is MsgPack, second best is JSON. You can deserialize the data in Lua-Redis when needed.
2. Hashed:
If you don't need multiple levels, simply do:
HSET item:01 bititem 00001010000100...001010
HSET item:01 property 1
Only do this though, if you really need to extract the different datamembers often. Separate members have quite some overhead. In general, I prefer to serialize the whole object (with a SET or a HSET).
3. Bitwise enabled:
If you want to make use of Redis' bitwise operations, you need to use simple strings (GET/SET). For example:
SET item:01:bititem "00001010000100...001010"
SET item:01:property 1
or even better:
SET item:01:bititem "00001010000100...001010"
SET item:01:properties [all-other-properties-serialized-as-msgpack]
Hope this helps, TW

Compare 2 datasets with dbunit?

Currently I need to create tests for my application. I used "dbunit" to achieve that and now need to compare 2 datasets:
1) The records from the database I get with QueryDataSet
2) The expected results are written in the appropriate FlatXML in a file which I read in as a dataset as well
Basically 2 datasets can be compared this way.
Now the problem are columns with a Timestamp. They will never fit together with the expected dataset. I really would like to ignore them when comparing them, but it doesn't work the way I want it.
It does work, when I compare each table for its own with adding a column filter and ignoreColumns. However, this approch is very cumbersome, as many tables are used in that comparison, and forces one to add so much code, it eventually gets bloated.
The same applies for fields which have null-values
A probable solution would also be, if I had the chance to only compare the very first column of all tables - and not by naming it with its column name, but only with its column index. But there's nothing I can find.
Maybe I am missing something, or maybe it just doesn't work any other way than comparing each table for its own?
For the sake of completion some additional information must be posted. Actually my previously posted solution will not work at all as the process reading data from the database got me trapped.
The process using "QueryDataset" did read the data from the database and save it as a dataset, but the data couldn't be accessed from this dataset anymore (although I could see the data in debug mode)!
Instead the whole operation failed with an UnsupportedOperationException at org.dbunit.database.ForwardOnlyResultSetTable.getRowCount(ForwardOnlyResultSetTable.java:73)
Example code to produce failure:
QueryDataSet qds = new QueryDataSet(connection);
qds.addTable(“specificTable”);
qds.getTable(„specificTable“).getRowCount();
Even if you try it this way it fails:
IDataSet tmpDataset = connection.createDataSet(tablenames);
tmpDataset.getTable("specificTable").getRowCount();
In order to make extraction work you need to add this line (the second one):
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
Great, that this was nowhere documented...
But that is not all: now you'd certainly think that one could add this line after doing a "QueryDataSet" as well... but no! This still doesn't work! It will still throw the same Exception! It doesn't make any sense to me and I wasted so much time with it...
It should be noted that extracting data from a dataset which was read in from an xml file does work without any problem. This annoyance just happens when trying to get a dataset directly from the database.
If you have done the above you can then continue as below which compares only the columns you got in the expected xml file:
// put in here some code to read in the dataset from the xml file...
// and name it "expectedDataset"
// then get the tablenames from it...
String[] tablenames = expectedDataset.getTableNames();
// read dataset from database table using the same tables as from the xml
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
for(int i=0;i<tablenames.length;i++)
{
ITable expectedTable = expectedDataset.getTable(tablenames[i]);
ITable actualTable = actualDataset.getTable(tablenames[i]);
ITable filteredActualTable = DefaultColumnFilter.includedColumnsTable(actualTable, expectedTable.getTableMetaData().getColumns());
Assertion.assertEquals(expectedTable,filteredActualTable);
}
You can also use this format:
// Assert actual database table match expected table
String[] columnsToIgnore = {"CONTACT_TITLE","POSTAL_CODE"};
Assertion.assertEqualsIgnoreCols(expectedTable, actualTable, columnsToIgnore);

TSearch2 - dots explosion

Following conversion
SELECT to_tsvector('english', 'Google.com');
returns this:
'google.com':1
Why does TSearch2 engine didn't return something like this?
'google':2, 'com':1
Or how can i make the engine to return the exploded string as i wrote above?
I just need "Google.com" to be foundable by "google".
Unfortunately, there is no quick and easy solution.
Denis is correct in that the parser is recognizing it as a hostname, which is why it doesn't break it up.
There are 3 other things you can do, off the top of my head.
You can disable the host parsing in the database. See postgres documentation for details. E.g. something like ALTER TEXT SEARCH CONFIGURATION your_parser_config
DROP MAPPING FOR url, url_path
You can write your own custom dictionary.
You can pre-parse your data before it's inserted into the database in some manner (maybe splitting all domains before going into the database).
I had a similar issue to you last year and opted for solution (2), above.
My solution was to write a custom dictionary that splits words up on non-word characters. A custom dictionary is a lot easier & quicker to write than a new parser. You still have to write C tho :)
The dictionary I wrote would return something like 'www.facebook.com':4, 'com':3, 'facebook':2, 'www':1' for the 'www.facebook.com' domain (we had a unique-ish scenario, hence the 4 results instead of 3).
The trouble with a custom dictionary is that you will no longer get stemming (ie: www.books.com will come out as www, books and com). I believe there is some work (which may have been completed) to allow chaining of dictionaries which would solve this problem.
First off in case you're not aware, tsearch2 is deprecated in favor of the built-in functionality:
http://www.postgresql.org/docs/9/static/textsearch.html
As for your actual question, google.com gets recognized as a host by the parser:
http://www.postgresql.org/docs/9.0/static/textsearch-parsers.html
If you don't want this to occur, you'll need to pre-process your text accordingly (or use a custom parser).