How to add filter to match the value in a map bin Aerospike - aerospike

I have a requirement where I have to find the record in an aerospike based on attributeId. The data in aerospike is inthe below format
{
name=ABC,
id=xyz,
ts=1445879080423,
inference={2601=0.6}
}
Now I will be getting the value "2601" programatically and I should find this record based on this value. But the problem is the value is in a Map and the size of this map may be more than 1 like
inference={{2601=0.6},{2830=0.9},{2931=0.8}}
So how can I find this record using attributeId in java. Any suggestions much appreciated

A little know feature of Aerospike is that, in addition to a Bin, you can define an index on:
List values
Map Keys
Map Values
Using in index defined on your map keys in the "inference" bin, you will be able to query (filter) base on the key's name.
I hope this helps

Related

Natural way of indexing elements in Flink

Is there a built-in way to index and access indices of individual elements of DataStream/DataSet collection?
Like in typical Java collections, where you know that e.g. a 3rd element of an ArrayList can be obtained by ArrayList.get(2) and vice versa ArrayList.indexOf(elem) gives us the index of (the first occurence of) the specified element. (I'm not asking about extracting elements out of the stream.)
More specifically, when joining DataStreams/DataSets, is there a "natural"/easy way to join elements that came (were created) first, second, etc.?
I know there is a zipWithIndex transformation that assigns sequential indices to elements. I suspect the indices always start with 0? But I also suspect that they aren't necessarily assigned in the order the elements were created in (i.e. by their Event Time). (It also exists only for DataSets.)
This is what I currently tried:
DataSet<Tuple2<Long, Double>> tempsJoIndexed = DataSetUtils.zipWithIndex(tempsJo);
DataSet<Tuple2<Long, Double>> predsLinJoIndexed = DataSetUtils.zipWithIndex(predsLinJo);
DataSet<Tuple3<Double, Double, Double>> joinedTempsJo = tempsJoIndexed
.join(predsLinJoIndexed).where(0).equalTo(0)...
And it seems to create wrong pairs.
I see some possible approaches, but they're either non-Flink or not very nice:
I could of course assign an index to each element upon the stream's
creation and have e.g. a stream of Tuples.
Work with event-time timestamps. (I suspect there isn't a way to key by timestamps, and even if there was, it wouldn't be useful for
joining multiple streams like this unless the timestamps are
actually assigned as indices.)
We could try "collecting" the stream first but then we wouldn't be using Flink anymore.
The 1. approach seems like the most viable one, but it also seems redundant given that the stream should by definition be a sequential collection and as such, the elements should have a sense of orderliness (e.g. `I'm the 36th element because 35 elements already came before me.`).
I think you're going to have to assign index values to elements, so that you can partition the data sets by this index, and thus ensure that two records which need to be joined are being processed by the same sub-task. Once you've done that, a simple groupBy(index) and reduce() would work.
But assigning increasing ids without gaps isn't trivial, if you want to be reading your source data with parallelism > 1. In that case I'd create a RichMapFunction that uses the runtimeContext sub-task id and number of sub-tasks to calculate non-overlapping and monotonic indexes.

Find out the amount of space each field takes in Google Big Query

I want to optimize the space of my Big Query and google storage tables. Is there a way to find out easily the cumulative space that each field in a table gets? This is not straightforward in my case, since I have a complicated hierarchy with many repeated records.
You can do this in Web UI by simply typing (and not running) below query changing to field of your interest
SELECT <column_name>
FROM YourTable
and looking into Validation Message that consists of respective size
Important - you do not need to run it – just check validation message for bytesProcessed and this will be a size of respective column
Validation is free and invokes so called dry-run
If you need to do such “columns profiling” for many tables or for table with many columns - you can code this with your preferred language using Tables.get API to get table schema ; then loop thru all fields and build respective SELECT statement and finally Dry Run it (within the loop for each column) and get totalBytesProcessed which as you already know is the size of respective column
I don't think this is exposed in any of the meta data.
However, you may be able to easily get good approximations based on your needs. The number of rows is provided, so for some of the data types, you can directly calculate the size:
https://cloud.google.com/bigquery/pricing
For types such as string, you could get the average length by querying e.g. the first 1000 fields, and use this for your storage calculations.

why neo4j indexing can't find the nodes which I know they exist?

I have created and indexed my graph database through localhost:7474 in neo4j(visually).
The Nodes have three properties,name,priority,link.
and I created index on name property of nodes through
add or remove indexes
tab of localhost:7474(as shown in picture)
but when I try to retrieve nodes based on their names,in data browser,console,or my java application the nodes can not be found.
in console or data browser,when I write this query for red(there is a node with the name of red),for example:
start n=node:name(name="red")
return n;
I get returned 0 rows.
and when I type this query:
start n=node:node(name="red")
return n;
or this one:
start n=node:Node(name="red")
return n;
I get Indexnodedoes not exist,IndexNodedoes not exist,in console or data browser.
my database file is in the same path which neo4j default.graphdb file exists(I mean in "C:\Users\fereshteh\Documents\Neo4j" ),and I first created the index,and then the graph database.
I don't know what I am doing wrong,please help me,I will be so thankful.
version of neo4j:1.9.4
I believe your assumption about how to set up the indexing is incorrect. You can read here for more information, but basically there are 3 things that are needed to create/read from an index. The Index name, the entry key, and the entry value.
What you have specified above in the Web Console is the Index name, but in your cypher query, you are specifying the entry key. You either want to use the Node Auto index, or to create a node in cypher and index it there but that isn't an option in 1.9.4.

How to return all newest nodes from neo4j?

Is it possible to query neo4j for the newest nodes? In this case, the indexed property "timestamp" records time in milliseconds on every node.
All of the cypher examples I have lfound concern graph-type queries- "start at node n and follow relationships. What is the general best approach for returning resultsets sorted on one field? Is this even possible in a graph database such as node4j?
In the embedded Java API it is possible to add sorting using Lucene constructs.
http://docs.neo4j.org/chunked/milestone/indexing-lucene-extras.html#indexing-lucene-query-objects
http://blog.richeton.com/2009/05/12/lucene-sort-tips/
In the server mode you can pass an ?order parameter to the lucene lookup query.
http://docs.neo4j.org/chunked/milestone/rest-api-indexes.html#rest-api-find-node-by-query
Depending on how you indexed your data (not numerically as there are issues with the lucene query syntax parser and numeric searches :( ), in cypher you can do:
start n=node:myindes('time: [1 to 1000]') return n order by n.time asc
There are also more graphy ways of doing that, e.g. by linking the events with a NEXT relationship and returning the head and next n elements from this list
http://docs.neo4j.org/chunked/milestone/cypher-cookbook-newsfeed.html
or to create a tree structure for time:
http://docs.neo4j.org/chunked/milestone/cypher-cookbook-path-tree.html
Yes, it is possible, and there are some different ways to do so.
You could either use a timestamp property and a classic index, and sort your result set by that property. Or you could create an in-graph time-based index, like f.e. described in Peter's blog post:
http://blog.neo4j.org/2012/02/modeling-multilevel-index-in-neoj4.html

Sorting Group of Files Using HashTable - Visual Basic

how to sort the group of files in the directory using Hashtable by values?
i'll have more than 500 no of files in the below format:
prod_orders_XXX_<TimeStamp>.dat
XXX = symbol of the product and the length may varies.
<TimeStamp> = date and time
Multiple files for the same XXX are possible with different time stamps.
Here are some examples:
prod_orders_abc_20122001083000.dat
prod_orders_abc_20122001083111.dat
prod_orders_xyz_20122001093157.dat
prod_orders_xyz_20122001083000.dat
prod_orders_abc_20122001163139.dat
prod_orders_abc_20122001093137.dat
I have posted a similar question before but this time i need this specificly using Hashtable. Can someone help ?
You have four problems here.
You shouldn't use an untyped hashtable at all. A generic Dictionary<K,V> is a much better option.
You did not share how you will determine the key for each file name. Items in a hashtable must have both a key and a value. Presumably the file names are the value, but we have no information on the key.
You did not specify what criteria will be used to determine the sort order. Sort by timestamp? File name? Product symbol? With what precedence?
Hashtables are Dictionaries are unsorted by definition. There is no way to sort them. Period. End of story. You can iterate over their contents in a sorted way, but you cannot force it to store sorted items, and attempting to do so would defeat the nice performance benefits of these collections.