how to query by vertex id in Datastax DSE 5.0 Graph in a concise way? - datastax

It seems that the unique id for vertices is community_id in DSE Graph.
I have found that this works (id is long) :
v = g.V().has("VertexLabel","community_id",id).next()
none of those work:
v = g.V("community_id",id).next()
v = g.V("community_id","VertexLabel:"+id).next()
v = g.V(id).next()
v = g.V().hasId(id).next()
v = g.V().hasId("VertexLabel:"+id).next()
v = g.V("VertexLabel:"+id).next()
Edit
After some investigation I found that for a vertex v, v.id() returns a LinkedHashMap:
Vertex v = gT.next();
Object id = v.id();
System.out.println(id);
System.out.println(id.getClass());
System.out.println(g.V().hasId(id).next());
System.out.println(g.V(id).next());
The above prints:
{~label=User, community_id=1488246528, member_id=512}
class java.util.LinkedHashMap
v[{~label=User, community_id=1488246528, member_id=512}]
v[{~label=User, community_id=1488246528, member_id=512}]
There should be a more concise way ...
any help is appreciated :)

Actually I found it:
ids can be written in this String form: "vertexLabel:community_id:member_id"
So for the example above id="User:1488246528:512":
v = g.V().hasId("User:1488246528:512").next()
v = g.V("User:1488246528:512").next()
returns the specific Vertex
Till now I don't know of a good way how to print concisely the id (as a string) of a Vertex so it can be used in V() or in hasId() .. what I currently do is:
LinkedHashMap id = ((LinkedHashMap)v.id());
String idStr = v.label()+":"+id.get("community_id")+":"+id.get("member_id");

Michail, you can also supply your own ids to help simplify this item. There are trade offs in doing so, but there are also advantages. Please see here for more details - http://docs.datastax.com/en/latest-dse/datastax_enterprise/graph/using/createCustVertexId.html?hl=custom%2Cid

Related

difference between two lists that include duplicates

I have a problem with two lists which contain duplicates
a = [1,1,2,3,4,4]
b = [1,2,3,4]
I would like to be able to extract the differences between the two lists ie.
c = [1,4]
but if I do c = a-b I get c =[]
It should be trivial but I can't find out :(
I tried also to parse the biggest list and remove items from it when I find them in the smallest list but I can't update lists on the fly, it does not work either
has anyone got an idea ?
thanks
You see an empty c as a result, because removing e.g. 1 removes all elements that are equal 1.
groovy:000> [1,1,1,1,1,2] - 1
===> [2]
What you need instead is to remove each occurrence of specific value separately. For that, you can use Groovy's Collection.removeElement(n) that removes a single element that matches the value. You can do it in a regular for-loop manner, or you can use another Groovy's collection method, e.g. inject to reduce a copy of a by removing each occurrence separately.
def c = b.inject([*a]) { acc, val -> acc.removeElement(val); acc }
assert c == [1,4]
Keep in mind, that inject method receives a copy of the a list (expression [*a] creates a new list from the a list elements.) Otherwise, acc.removeElement() would modify an existing a list. The inject method is an equivalent of a popular reduce or fold operation. Each iteration from this example could be visualized as:
--inject starts--
acc = [1,1,2,3,4,4]; val = 1; acc.removeElement(1) -> return [1,2,3,4,4]
acc = [1,2,3,4,4]; val = 2; acc.removeElement(2) -> return [1,3,4,4]
acc = [1,3,4,4]; val = 3; acc.removeElement(3) -> return [1,4,4]
acc = [1,4,4]; val = 4; acc.removeElement(4) -> return [1,4]
-- inject ends -->
PS: Kudos to almighty tim_yates who recommended improvements to that answer. Thanks, Tim!
the most readable that comes to my mind is:
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
b.each {c.removeElement(it)}
if you use this frequently you could add a method to the List metaClass:
List.metaClass.removeElements = { values -> values.each { delegate.removeElement(it) } }
a = [1,1,2,3,4,4]
b = [1,2,3,4]
c = a.clone()
c.removeElements(b)

( F#) Indexing a String without type annotating the string

Hey one of the requirements of my task is that I do not use type annotation.
Currently my code looks like this
let (currentSeq: string) =
specie
|> Map.tryFind geneId
|> Option.get
let seq1 = currentSeq.[0..pos - 1]
let seq2 = currentSeq.[pos..String.length currentSeq - 1]`
I have been racking my brain for a while now, but I can not figure out how to index a 'chunk' of the string currentSeq, without type annotating it.
IIUYC, you need to split a string into two without using type annotations. Here's a way:
let splitAt s pos =
let length = s |> String.length
[ s.[0..pos - 1]; s.[pos..length - 1] ]
you don't need to type annotate it:
let currentSeq = "ojdjdsajdsa"
let seq1 = currentSeq.[0..3 - 1]
your last two seqs are too idented

piglatin positional notation, how to specify an interval

Lets say I have a very wide data source:
big_thing = LOAD 'some_path' using MySpecialLoader;
Now I want to generate some smaller thing composed of a subset of big_thing's columns.
smaller_thing = FOREACH big_thing GENERATE
$21,$22,$23 ...... $257;
Is there a way to achieve this without having to write out all the columns?
I'm assuming yes but my searches aren't coming up with much, I think I'm just using the wrong terminology.
EDIT:
So it looks like my question is being very misunderstood. Since I'm a Python person I'll give a python analogy.
Say I have an array l1 which is made up of arrays. So it looks like a grid right? Now I want the array l2 to be a subset of 'l1such thatl2' contains a bunch of columns from l1. I would do something like this:
l2 = [[l[a],l[b],l[c],l[d]] for l in l1]
# a,b,c,d are just some constants.
In pig this is equivalent to something like:
smaller_thing = FOREACH big_thing GENERATE
$1,$22,$3,$21;
But I have a heck of a lot of columns. And the columns I'm interested in are all sequential and there are a lot of those. Then in python I would do this:
l2 = [l[x:y] for l in l2]
#again, x and y are constants, eg x=20, y=180000000. See, lots of stuff I dont want to type out
My question is what is the pig equivalent to this?
smaller_thing = FOREACH big_thing GENERATE ?????
And what about stuff like this:
Python:
l2 = [l[x:y]+l[a:b]+[l[b],l[c],l[d]] for l in l2]
Pig:
smaller_thing = FOREACH big_thing GENERATE ?????
Yes, you can simply load the dataset without columns.
But if you load the data with column names will help you to identify the column details in future scripts.
UDF can help you to perform your query,
For example,
REGISTER UDF\path;
a = load 'data' as (a1);
b = foreach a generate UDF.Func(a1,2,4);
UDF:
public class col_gen extends EvalFunc<String>
{
#Override
public String exec(Tuple tuple) throws IOException {
String data = tuple.get(0).toString();
int x = (int)tuple.get(1);
int y = (int)tuple.get(2);
String[] data3 = data.split(",");
String data2 = data3[x]+",";
x = x+1;
while(x <= y)
{
data2 += data3[x]+",";
x++;
}
data2 = data2.substring(0, data2.length()-1);
return data2;
}
}
The answer can be found in this post: http://blog.cloudera.com/blog/2012/08/process-a-million-songs-with-apache-pig/
Distanced = FOREACH Different GENERATE artistLat..songPreview, etc;
The .. says use everything from artistLat to songPreview.
The same thing can be done with positional notation. eg $1..$6

Matlab's arrayfun for uniform output of class objects

I need to build an array of objects of class ID using arrayfun:
% ID.m
classdef ID < handle
properties
id
end
methods
function obj = ID(id)
obj.id = id;
end
end
end
But get an error:
>> ids = 1:5;
>> s = arrayfun(#(id) ID(id), ids)
??? Error using ==> arrayfun
ID output type is not currently implemented.
I can build it alternatively in a loop:
s = [];
for k = 1 : length(ids)
s = cat(1, s, ID(ids(k)));
end
but what is wrong with this usage of arrayfun?
Edit (clarification of the question): The question is not how to workaround the problem (there are several solutions), but why the simple syntax s = arrayfun(#(id) ID(id), ids); doesn't work. Thanks.
Perhaps the easiest is to use cellfun, or force arrayfun to return a cell array by setting the 'UniformOutput' option. Then you can convert this cell array to an array of obects (same as using cat above).
s = arrayfun(#(x) ID(x), ids, 'UniformOutput', false);
s = [s{:}];
You are asking arrayfun to do something it isn't built to do.
The output from arrayfun must be:
scalar values (numeric, logical, character, or structure) or cell
arrays.
Objects don't count as any of the scalar types, which is why the "workarounds" all involve using a cell array as the output. One thing to try is using cell2mat to convert the output to your desired form; it can be done in one line. (I haven't tested it though.)
s = cell2mat(arrayfun(#(id) ID(id), ids,'UniformOutput',false));
This is how I would create an array of objects:
s = ID.empty(0,5);
for i=5:-1:1
s(i) = ID(i);
end
It is always a good idea to provide a "default constructor" with no arguments, or at least use default values:
classdef ID < handle
properties
id
end
methods
function obj = ID(id)
if nargin<1, id = 0; end
obj.id = id;
end
end
end

overriding a method in R, using NextMethod

how dows this work in R...
I am using a package (zoo 1.6-4) that defines a S3 class for time series sets.
I am writing a derived class where I want to override a few methods and can't get past this one:[.zoo!
in my derived class rows are indexed by timestamp, like in zoo, but differently from zoo, I allow only POSIXct values in the index. my users will be selecting columns all of the time, while slicing series only occasionally so I want to offer obj[name] instead of obj[, name].
my objects have class c("delftfews", "zoo").
but...
how do I override a method?
I tried this:
"[.delftfews" <- function(x, i, j, drop=TRUE, ...) {
if (missing(i)) return(NextMethod())
if (all(class(i) == "character") && missing(j)) {
return(NextMethod('[', x=x, i=1:NROW(x), j=i, drop=drop, ...))
}
NextMethod()
}
but I get this error: Error in rval[i, j, drop = drop., ...] : incorrect number of dimensions.
I have solved by editing the source from zoo: I removed those ..., but I don't get why that works. anybody can explain what is going on here?
The problem is that with the above definition of [.delftfews this code:
library(zoo)
z <- structure(zoo(cbind(a = 1:3, b = 4:6)), class = c("delftfews", "zoo"))
z["a"]
# generates this call: `[.zoo`(x = 1:6, i = 1:3, j = "a", drop = TRUE, z, "a")
Your code does work as is if you write the call like this:
z[j = "a"]
# generates this call: `[.zoo`(x = z, j = "a")
I think what you want is to change the relevant line in [.delftfews to this:
return(NextMethod(.Generic, object = x, i = 1:NROW(x), drop = drop))
# z["a"] now generates this call: `[.zoo`(x = z, i = 1:3, j = "a", drop = TRUE)
A point of clarification: allowing only POSIXct index values does not allow indexing columns by name only. I'm not sure how you arrived at that conclusion.
You're overriding zoo correctly, but I think you misunderstand NextMethod. The error is caused by if (missing(i)) return(NextMethod()), which calls [.zoo if i is missing, but [.zoo requires i because zoo's internal data structure is a matrix. Something like this should work:
if (missing(i)) i <- 1:NROW(x)
though I'm not sure if you have to explicitly pass this new i to NextMethod...
You may be interested in the xts package, if you haven't already taken a look at it.