GroupCount by Multiple Values - tinkerpop

I have data structured as so:
{
number: Integer
letter: String
}
I'd like to do a group count by both properties like so:
g.V().values('number', 'letter').groupCount();
and see the data displayed as so:
[[1,A]:16, [1,B]:64, [2,A]:78, [2,B]:987]
Is there any way to do this in tinkerpop?

A simple
g.V().groupCount().by(values('number', 'letter').fold())
should do the trick.

If you are looking to do a groupCount on properties from related Vertices as well a project() will do the job (values() does not handle traversal only simple properties on the object).
Let's say your letter is a property of an adjacent Vertex (using an outgoing edge in this case (but might as well be ingoing, use in()) and number is a property of your current/starting Vertex:
g.V().project('number', 'letter').
by(values('number')).
by(out('<outgoing-edge-label>').values('letter')).
groupCount()
In many cases it's powerful, it can be arbitrary traversals and/or properties in the by statements.

I'd prefer avoid lambdas if at all possible; but this is how it can be done with a lambda (if it turns out to be impossible to do it without a lambda).
Map<String, Map<Integer, Integer> map = new HashMap<>();
g.V().sideEffect(it -> {
String letter = (String) it.get().property("letter").value());
Integer number = (Integer) it.get().property("number").value());
if (map.get(letter) == null)
map.put(letter, new HashMap<>());
if (map.get(letter).get(number) == null)
map.get(letter).put(number, 1);
else
map.get(letter).put(number, map.get(letter).get(number) + 1);
}).iterate();
It's approximately the same speed as
g.V().values('number', 'letter').groupCount();

Related

What is the most efficient way to join one list to another in kotlin?

I start with a list of integers from 1 to 1000 in listOfRandoms.
I would like to left join on random from the createDatabase list.
I am currently using a find{} statement within a loop to do this but feel like this is too heavy. Is there not a better (quicker) way to achieve same result?
Psuedo Code
data class DatabaseRow(
val refKey: Int,
val random: Int
)
fun main() {
val createDatabase = (1..1000).map { i -> DatabaseRow(i, Random()) }
val listOfRandoms = (1..1000).map { j ->
val lookup = createDatabase.find { it.refKey == j }
lookup.random
}
}
As mentioned in comments, the question seems to be mixing up database and programming ideas, which isn't helping.
And it's not entirely clear which parts of the code are needed, and which can be replaced. I'm assuming that you already have the createDatabase list, but that listOfRandoms is open to improvement.
The ‘pseudo’ code compiles fine except that:
You don't give an import for Random(), but none of the likely ones return an Int. I'm going to assume that should be kotlin.random.Random.nextInt().
And because lookup is nullable, you can't simply call lookup.random; a quick fix is lookup!!.random, but it would be safer to handle the null case properly with e.g. lookup?.random ?: -1. (That's irrelevant, though, given the assumption above.)
I think the general solution is to create a Map. This can be done very easily from createDatabase, by calling associate():
val map = createDatabase.associate{ it.refKey to it.random }
That should take time roughly proportional to the size of the list. Looking up values in the map is then very efficient (approx. constant time):
map[someKey]
In this case, that takes rather more memory than needed, because both keys and values are integers and will be boxed (stored as separate objects on the heap). Also, most maps use a hash table, which takes some memory.
Since the key is (according to comments) “an ascending list starting from a random number, like 18123..19123”, in this particular case it can instead be stored in an IntArray without any boxing. As you say, array indexes start from 0, so using the key directly would need a huge array and use only the last few cells — but if you know the start key, you could simply subtract that from the array index each time.
Creating such an array would be a bit more complex, for example:
val minKey = createDatabase.minOf{ it.refKey }
val maxKey = createDatabase.maxOf{ it.refKey }
val array = IntArray(maxKey - minKey + 1)
for (row in createDatabase)
array[row.refKey - minKey] = row.random
You'd then access values with:
array[someKey - minKey]
…which is also constant-time.
Some caveats with this approach:
If createDatabase is empty, then minOf() will throw a NoSuchElementException.
If it has ‘holes’, omitting some keys inside that range, then the array will hold its default value of 0 — you can change that by using the alternative IntArray constructor which also takes a lambda giving the initial value.)
Trying to look up a value outside that range will give an ArrayIndexOutOfBoundsException.
Whether it's worth the extra complexity to save a bit of memory will depend on things like the size of the ‘database’, and how long it's in memory for; I wouldn't add that complexity unless you have good reason to think memory usage will be an issue.

get every other index in for loop

I have an interesting issue. I have a string that's in html and I need to parse a table so that I can get the data I need out of that table and present it in a way that looks good on a mobile device. So I use regex and it works just fine but now I'm porting my code to using Kotlin and the solution I have is not porting over well. Here is what the solution looks currently:
var pointsParsing = Regex.Matches(htmlBody, "<td.*?>(.*?)</td>", RegexOptions.IgnoreCase | RegexOptions.Compiled);
var pointsSb = new StringBuilder();
for (var i = 0; i < pointsParsing.Count; i+= 2)
{
var pointsTitle = pointsParsing[i].Groups[1].Value.Replace("&", "&");
var pointsValue = pointsParsing[i+1].Groups[1].Value;
pointsSb.Append($"{pointsTitle} {pointsValue} {pointsVerbiage}\n");
}
return pointsSb.ToString();
as you see, each run in the loop I get two results from the regex search and as a result I tell the for loop to increment by two to avoid collision.
However I don't seem to have this ability within Kotlin, I know how to get the index in a for loop but no idea on how to tell it to skip by 2 so I don't accidentally get something I already parsed on the last loop lap.
how would I tell the for loop to work the way I need it to in Kotlin?
You might be looking for chunked which lets you split an iterable into chunks of e.g. 2 elements:
ptsListResults.chunked(2).forEach { data -> // data is a list of (up to) two elements
val pointsTitle = data[0].groups[1]!!.value
val pointsValue = data[1].groups[1]!!.value
// etc
}
so that's more explicit about breaking your list up into meaningful chunks, and operating within the structure of those chunks, rather that manipulating indices.
There's also windowed which is a bit more complex and gives you more options, one of which is disallowing partial windows (i.e. chunks at the end that don't have the required number of elements). Probably doesn't apply here but just so's you know!
I found a solution that looks to work and thought I'd share.
thanks to this SO answer I see how you can skip over the indexes.
val pointsListSearch = "<td.*?>(.*?)</td>".toRegex()
val pointsListSearchResults = pointsListSearch.findAll(htmlBody)
val pointsSb = StringBuilder()
val ptsListResults = pointsListSearchResults.toList()
for (i in ptsListResults.indices step 2)
{
val pointsTitle = ptsListResults[i].groups[1]!!.value
val pointsValue = ptsListResults[i+1].groups[1]!!.value
pointsSb.append("${pointsTitle}: ${pointsValue}")
}

How to group a list by a value?

I have a list and I want to make an object of all the values that contain a certain id.
val list = listOf(A(id = 1), A(1), A(2), A(3), A(2))
From that list I want to make a new list of type Container
data class Container(
val id: Long // The id passed into A.
val elementList: List<A> // The elements that contain the ID.
)
How can I do this in an efficient way without going O(n^2)?
You can use groupBy + map.
The implementation of groupBy is O(n) and the implementation of map is O(n) so the total runtime is O(2n) which is O(n).
list.groupBy { it.id }.map { (id, elementList) -> Container(id, elementList) }
Since this is so short and readable, I'd avoid to make further optimizations if not strictly needed but, if you'd need some further optimizations, you can reduce also the space cost avoiding to allocate multiple lists for example.

How to convert two lists into a list of pairs based on some predicate

I have a data class that describes a chef by their name and their skill level and two lists of chefs with various skill levels.
data class Chef(val name: String, val level: Int)
val listOfChefsOne = listOf(
Chef("Remy", 9),
Chef("Linguini", 7))
val listOfChefsTwo = listOf(
Chef("Mark", 6),
Chef("Maria", 8))
I'm to write a function that takes these two lists and creates a list of pairs
so that the two chefs in a pair skill level's add up to 15. The challenge is to do this using only built in list functions and not for/while loops.
println(pairChefs(listOfChefsOne, listOfChefsTwo))
######################################
[(Chef(name=Remy, level=9), Chef(name=Mark, level=6)),
(Chef(name=Linguini, level=7), Chef(name=Maria, level=8))]
As I mentioned previously I'm not to use any for or while loops in my implementation for the function. I've tried using the forEach function to create a list containing all possible pairs between two lists, but from there I've gotten lost as to how I can filter out only the correct pairs.
I think the clue is in the question here!
I've tried using the forEach function to create a list containing all possible pairs between two lists, but from there I've gotten lost as to how I can filter out only the correct pairs.
There's a filter function that looks perfect for this…
To keep things clear, I'll split out a function for generating all possible pairs.  (This is my own, but bears a reassuring resemblance to part of this answer!  In any case, you said you'd already solved this bit.)
fun <A, B> Iterable<A>.product(other: Iterable<B>)
= flatMap{ a -> other.map{ b -> a to b }}
The result can then be:
val result = listOfChefsOne.product(listOfChefsTwo)
.filter{ (chef1, chef2) -> chef1.level + chef2.level == 15 }
Note that although this is probably the simplest and most readable way, it's not the most efficient for large lists.  (It takes time and memory proportional to the product of the sizes of the two lists.)  You could improve large-scale performance by using streams (which would take the same time but constant memory). But for this particular case, it might be even better to group one of the lists by level, then for each element of the other list, you could directly look up a Chef with 15 - its level.  (That would time proportional to the sum of the sizes of the two lists, and space proportional to the size of the first list.)
Here is the pretty simple naive solution:
val result = listOfChefsOne.flatMap { chef1 ->
listOfChefsTwo.mapNotNull { chef2 ->
if (chef1.level + chef2.level == 15) {
chef1 to chef2
} else {
null
}
}
}
println(result) // prints [(Chef(name=Remy, level=9), Chef(name=Mark, level=6)), (Chef(name=Linguini, level=7), Chef(name=Maria, level=8))]

Lucene SpellChecker Prefer Permutations or special scoring

I'm using Lucene.NET 3.0.3
How can I modify the scoring of the SpellChecker (or queries in general) using a given function?
Specifically, I want the SpellChecker to score any results that are permutations of the searched word higher than the the rest of the suggestions, but I don't know where this should be done.
I would also accept an answer explaining how to do this with a normal query. I have the function, but I don't know if it would be better to make it a query or a filter or something else.
I think the best way to go about this would be to use a customized Comparator in the SpellChecker object.
Check out the source code of the default comparator here:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-spellchecker/3.6.0/org/apache/lucene/search/spell/SuggestWordScoreComparator.java?av=f
Pretty simple stuff, should be easy to extend if you already have the algorithm you want to use to compare the two Strings.
Then you can use set it up to use your comparator with SpellChecker.SetComparator
I think I mentioned the possiblity of using a Filter for this in a previous question to you, but I don't think that's really the right way to go, looking at it a bit more.
EDIT---
Indeed, No Comparator is available in 3.0.3, So I believe you'll need to access the scoring through the a StringDistance object. The Comparator would be nicer, since the scoring has already been applied and is passed into it to do what you please with it. Extending a StringDistance may be a bit less concrete since you will have to apply your rules as a part of the score.
You'll probably want to extend LevensteinDistance (source code), which is the default StringDistance implementation, but of course, feel free to try JaroWinklerDistance as well. Not really that familiar with the algorithm.
Primarily, you'll want to override getDistance and apply your scoring rules there, after getting a baseline distance from the standard (parent) implementation's getDistance call.
I would probably implement something like (assuming you ahve a helper method boolean isPermutation(String, String):
class CustomDistance() extends LevensteinDistance{
float getDistance(String target, String other) {
float distance = super.getDistance();
if (isPermutation(target, other)) {
distance = distance + (1 - distance) / 2;
}
return distance;
}
}
To calculate a score half again closer to 1 for a result that is a permuation (that is, if the default algorithm gave distance = .6, this would return distance = .8, etc.). Distances returned must be between 0 and 1. My example is just one idea of a possible scoring for it, but you will likely need to tune your algorithm somewhat. I'd be cautious about simply returning 1.0 for all permutations, since that would be certain to prefer 'isews' over 'weis' when looking with 'weiss', and it would also lose the ability to sort the closeness of different permutations ('isews' and 'wiess' would be equal matches to 'weiss' in that case).
Once you have your Custom StringDistance it can be passed to SpellChecker either through the Constructor, or with SpellChecker.setStringDistance
From femtoRgon's advice, here's what I ended up doing:
public class PermutationDistance: SpellChecker.Net.Search.Spell.StringDistance
{
public PermutationDistance()
{
}
public float GetDistance(string target, string other)
{
LevenshteinDistance l = new LevenshteinDistance();
float distance = l.GetDistance(target, other);
distance = distance + ((1 - distance) * PermutationScore(target, other));
return distance;
}
public bool IsPermutation(string a, string b)
{
char[] ac = a.ToLower().ToCharArray();
char[] bc = b.ToLower().ToCharArray();
Array.Sort(ac);
Array.Sort(bc);
a = new string(ac);
b = new string(bc);
return a == b;
}
public float PermutationScore(string a, string b)
{
char[] ac = a.ToLower().ToCharArray();
char[] bc = b.ToLower().ToCharArray();
Array.Sort(ac);
Array.Sort(bc);
a = new string(ac);
b = new string(bc);
LevenshteinDistance l = new LevenshteinDistance();
return l.GetDistance(a, b);
}
}
Then:
_spellChecker.setStringDistance(new PermutationDistance());
List<string> suggestions = _spellChecker.SuggestSimilar(word, 10).ToList();