Hi guys i have the following question,
Assume tables foo(a int, b int) and bar(a int, b int) and assume you are given a stream ‘TableReader’ that reads rows from a table that has the following methods:
tr.next() returns the next row of a type ‘row’ from the stream if there is a next row or null if there are no more rows.
Assume columns can be accessed using row[columnName]. For example, to read rows from foo, you have to do the following:
foo_stream = TableReader(‘foo’);
row = foo_stream.next();
row[‘a’] will return the value of column a and row[‘b’] will return the value of column b.
Write a pseudocode to compute the results of the following SQL query which should return a list of rows:
select foo.a, foo.b, bar.a, bar.b
from foo, bar
where foo.a = bar.a
and foo.b <=100;
Can anyone help me on this ?
The solution i tried is:
Foo_stream = TableReader(‘foo’);
Bar_stream = tableReader(‘bar’);
While(foo_stream.next())
{
{
While(bar_stream.next())
{
Row_foo = foo_stream.next();
Row_Bar =bar_stream.next();
{
If((row_foo[a] equals row_bar[a]) AND (row_foo[b] < = 100))
{
Then print row_foo[a],row_foo[b], row_bar[a], row_bar[b]
}
}
}
}
But the above solution is of complexity O(n2), any better solution is appreciated.
Probably something like this:
Foo_stream = TableReader(‘foo’);
Foo_row = Foo_stream.next();
Foo_hash = New HashSet<Foo_row[a].typeof, Foo_row.typeof>();
Do
{
if (Foo_row[b] <= 100)
{
Foo_hash.Add(Foo_row[a], Foo_row);
}
if (!foo_stream.next()) { Foo_row = Foo_stream.next(); }
} While(foo_stream.next())
Bar_stream = tableReader(‘bar’);
While(bar_stream.next())
{
Bar_row = Bar_stream.next();
if (Foo_hash.Exists(Bar_row[a]))
{
Foo_row = Foo_hash(Bar_row[a]);
print foo_row[a],foo_row[b], bar_row[a],bar_row[b];
}
}
This is called the HashMap method, and is O(n), though it uses a relatively large amount of memory.
There is also the MergeSort method, which is O(NlogN) unless the streams are already sorted on [a]. It requires somewhat less memory than the HashMap method.
Finally there is the method that you used, which is called the Nested Loops method, and as you said, is O(n^2), but has the advantage of needing very little memory.
Related
I apologize for the title, I don't exactly know how to word it. But essentially, this is a graph-type query but I know RavenDB's graph functionality will be going away so this probably needs to be solved with Javascript.
Here is the scenario:
I have a bunch of documents of different types, call them A, B, C, D. Each of these particular types of documents have some common properties. The one that I'm interested in right now is "Owner". The owner field is an ID which points to one of two other document types; it can be a Group or a User.
The Group document has a 'Members' field which contains an ID which either points to a User or another Group. Something like this
It's worth noting that the documents in play have custom IDs that begin with their entity type. For example Users and Groups begin with user: and group: respectively. Example IDs look like this: user:john#castleblack.com or group:the-nights-watch. This comes into play later.
What I want to be able to do is the following type of query:
"Given that I have either a group id or a user id, return all documents of type a, b, or c where the group/user id is equal to or is a descendant of the document's owner."
In other words, I need to be able to return all documents that are owned by a particular user or group either explicitly or implicitly through a hierarchy.
I've considered solving this a couple different ways with no luck. Here are the two approaches I've tried:
Using a function within a query
With Dejan's help in an email thread, I was able to devise a function that would walk it's way down the ownership graph. What this attempted to do was build a flat array of IDs which represented explicit and implicit owners (i.e. root + descendants):
declare function hierarchy(doc, owners){
owners = owners || [];
while(doc != null) {
let ownerId = id(doc)
if(ownerId.startsWith('user:')) {
owners.push(ownerId);
} else if(ownerId.startsWith('group:')) {
owners.push(ownerId);
doc.Members.forEach(m => {
let owner = load(m, 'Users') || load(m, 'Groups');
owners = hierarchy(owner, owners);
});
}
}
return owners;
}
I had two issues with this. 1. I don't actually know how to use this in a query lol. I tried to use it as part of the where clause but apparently that's not allowed:
from #all_docs as d
where hierarchy(d) = 'group:my-group-d'
// error: method hierarchy not allowed
Or if I tried anything in the select statement, I got an error that I have exceeded the number of allowed statements.
As a custom index
I tried the same idea through a custom index. Essentially, I tried to create an index that would produce an array of IDs using roughly the same function above, so that I could just query where my id was in that array
map('#all_docs', function(doc) {
function hierarchy(n, graph) {
while(n != null) {
let ownerId = id(n);
if(ownerId.startsWith('user:')) {
graph.push(ownerId);
return graph;
} else if(ownerId.startsWith('group:')){
graph.push(ownerId);
n.Members.forEach(g => {
let owner = load(g, 'Groups') || load(g, 'Users');
hierarchy(owner, graph);
});
return graph;
}
}
}
function distinct(value, index, self){ return self.indexOf(value) === index; }
let ownerGraph = []
if(doc.Owner) {
let owner = load(doc.Owner, 'Groups') || load(doc.Owner, 'Users');
ownerGraph = hierarchy(owner, ownerGraph).filter(distinct);
}
return { Owners: ownerGraph };
})
// error: recursion is not allowed by the javascript host
The problem with this is that I'm getting an error that recursion is not allowed.
So I'm stumped now. Am I going about this wrong? I feel like this could be a subquery of sorts or a filter by function, but I'm not sure how to do that either. Am I going to have to do this in two separate queries (i.e. two round-trips), one to get the IDs and the other to get the docs?
Update 1
I've revised my attempt at the index to the following and I'm not getting the recursion error anymore, but assuming my queries are correct, it's not returning anything
// Entity/ByOwnerGraph
map('#all_docs', function(doc) {
function walkGraph(ownerId) {
let owners = []
let idsToProcess = [ownerId]
while(idsToProcess.length > 0) {
let current = idsToProcess.shift();
if(current.startsWith('user:')){
owners.push(current);
} else if(current.startsWith('group:')) {
owners.push(current);
let group = load(current, 'Groups')
if(!group) { continue; }
idsToProcess.concat(group.Members)
}
}
return owners;
}
let owners = [];
if(doc.Owner) {
owners.concat(walkGraph(doc.Owner))
}
return { Owners: owners };
})
// query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners = "group:my-group-id"
// alternate query (no results)
from index Entity/ByOwnerGraph as x
where x.Owners ALL IN ("group:my-group-id")
I still can't use this approach in a query either as I get the same error that there are too many statements.
So let's all assume that column B is filled with multiple, short statements. These statements may be used more than once, not at all, or just once throughout the column. I want to be able to read what's in each cell of column B and assign a category to it in column F using the Google Sheets script editor. I'll include some pseudo-code of how I would do something like this normally.
for (var i = 0; i < statements.length; i++) {
if (statements[i] == 'Description One') {
category[i] = 'Category One';
}
else if (statements[i] == 'Description Two') {
category[i] = 'Category Two';
}
// and so on for all known categories....
}
How do I go about accessing a cell for a read and accessing a different cell for a write?
Thanks in advance for the help!
Ok, so after a little more thought on the subject, I've arrived at a solution. It's super simple, albeit tedious
function assignCategory(description) {
if (description == 'Description One') {
return 'Category One';
}
// and so on for all known categories
}
Hopefully someone will see this and be helped anyway, if you guys think of a more efficient and easier to maintain way of doing this, by all means do chime in.
Assuming a sheet such as this one, which has a header and six different columns (where B is the description, and F the category); you could use a dictionary to translate your values as follows:
// (description -> category) dictionary
var translations = {
"cooking": "Cooking",
"sports": "Sport",
"leisure": "Leisure",
"music": "Music",
"others": "Other"
}
function assignCategories() {
var dataRange = SpreadsheetApp.getActiveSheet().getDataRange();
for (var i=2; i<=dataRange.getNumRows(); i++) {
var description = dataRange.getCell(i, 2).getValue();
var category = translations[description];
dataRange.getCell(i, 6).setValue(category);
}
}
In case you need additional ruling (i.e. descriptions that contain cricket must be classified as sport), you could accomplish your desired results by implementing your own custom function and using string functions (such as indexOf) or regular expressions.
Using indexOf
// (description -> category) dictionary
var translations = {
"cooking": "Cooking",
"sports": "Sport",
"leisure": "Leisure",
"music": "Music",
"others": "Other"
}
function assignCategories() {
var dataRange = SpreadsheetApp.getActiveSheet().getDataRange();
for (var i=2; i<=dataRange.getNumRows(); i++) {
var description = dataRange.getCell(i, 2).getValue()
var category = assignCategory(description);
if (category) dataRange.getCell(i, 6).setValue(category);
}
}
function assignCategory(description) {
description = description.toLowerCase();
var keys = Object.keys(translations);
for (var i=0; i<categories.length; i++) {
var currentKey = keys[i];
if (description.indexOf(currentKey) > -1)
return translations[currentKey];
}
}
This version is a bit more sophisticated. It will make the 'description' of each row lowercase in order to better compare with your dictionary, and also uses indexOf for checking whether the 'translation key' appears in the description, rather than checking for an exact match.
You should be aware however that this method will be considerably slower, and that the script may timeout (see GAS Quotas). You could implement ways to 'resume' your script operations such that you can re-run it and continue where it left off, in case that this hinders your operations.
I'm trying to convert my nested for loop to asSequence in Kotlin. Here, my goal is to get and update the value of all my object array from another object array with the same key.
nested for loop:
val myFields = getMyFields()
val otherFields = getOtherFields()
for (myField in myFields) { // loop tru the my fields
for (otherField in otherFields) { // find the same fields
if (myField.key == otherField.key) { // if the same, update the value
val updatedMyField = myField.copy(value = otherValue.value)
myFields[myFields.indexOf(myField)] = updatedMyField // update my field value
break
}
}
}
What I've tried:
val updatedMyFields = getMyFields().asSequence()
.map { myField ->
getOtherFields().asSequence()
.map { otherField ->
if (myField.key == otherField.key) {
return#map otherField.value
} else {
return#map ""
}
}
.filter { it?.isNotEmpty() == true }
.first()?.map { myField.copy(value = it.toString()) }
}
.toList()
but this does not compile as it will return List<List<MyField>>.
I'm just looking for something much cleaner for this.
As comments suggest, this would probably be much more efficient with a Map.
(More precisely, a map solution would take time proportional to the sum of the list lengths, while the nested for loop takes time proportional to their product — which gets bigger much faster.)
Here's one way of doing that:
val otherFields = getOtherFields().associate{ it.key to it.value }
val myFields = getMyFields().map {
val otherValue = otherFields[it.key]
if (otherValue != null) it.copy(value = otherValue) else it
}
The first line creates a Map from the ‘other fields’ keys to their values. The rest then uses it to create a new list from ‘my fields’, substituting the values from the ‘other fields’ where present.
I've had to make assumptions about the types &c, since the code in the question is incomplete, but this should do the same. Obviously, you can change how it merges the values by amending the it.copy().
There are likely to be even simpler and more efficient ways, depending on the surrounding code. If you expanded it into a Minimal, Complete, and Verifiable Example — in particular, one that illustrates how you already use a Map, as per your comment — we might be able to suggest something better.
Why do you want to use asSequence() ? You can go for something like that:
val myFields = getMyFields()
val otherFields = getOtherFields()
myFields.forEach{firstField ->
otherFields.forEach{secondField ->
if (firstField.key == secondField.key) {
myFields[myFields.indexOf(firstField)] = secondField.value
}
}
}
This will do the same job than your nested for loop and it's easier to read, to understand and so to maintain than your nested asSequence().
In real project, TEST_TABLE would contain much of TEST_TABLE_NESTED, each with its own testVariable and bunch of testScript. test function from testScript would be used in C++ code, and TEST_TABLE_NESTED tables would be added automatically from C++ code too.
TEST_TABLE =
{
TEST_TABLE_NESTED =
{
testVariable = 5,
testScript =
{
test = function()
print(testVariable, "hello") --How to access 'testVariable'?
end
}
}
}
EDIT :
This is the actual scenario of using this script:
GameObjectScriptTables =
{
GameObject_1 = --Container of scripts corresponding to some gameObject
{
gameObjectOwner = actual_object_passed_from_c++, --This is an actual object passed from c++
GameObjectScript_1 = --This is a script with update(dt) method which will be called somwhere in c++ code
{
update = function(dt)
--here I want to use some data from gameObjectOwner like position or velocity
end
}
}
GameObject_2 =
{
gameObjectOwner = actual_object_passed_from_c++,
GameObjectScript_1 =
{
update = function(dt)
--here I want to use some data from gameObjectOwner like position or velocity
end
},
GameObjectScript_2 =
{
update = function(dt)
--here I want to use some data from gameObjectOwner like position or velocity
end
}
}
--And so on
}
Idea is that exists some testVariable object (passed from C++), which data is used all over TEST_TABLE_NESTED. For me, above example looks natural for this task, but it prints nil instead of 5. So how to acces a testVariable from testScript without printing a full path like TEST_TABLE.TEST_TABLE_NESTED.testVariable?
You're asking for something like a "parent" pointer, which tells table B about table A, but that doesn't exist. Internally, the only association they have is that one of A's values happens to be B, but any number of tables could contain B as a value. Which is B's parent?
If you want B to know about A, you'll need to tell it. You can add an extra parameter to update which receives the game owner object, or update can be a closure which contains the game owner as a bound variable, so on and so forth.
I made it work by providing a gameObjectOwner instance for each GameObjectScript_N. However I don't know is it expensive solution or not.
I need some hibernate/SQL help, please. I'm trying to generate a report against an accounting database. A commission order can have multiple account entries against it.
class CommissionOrderDAO {
int id
String purchaseOrder
double bookedAmount
Date customerInvoicedDate
String state
static hasMany = [accountEntries: AccountEntryDAO]
SortedSet accountEntries
static mapping = {
version false
cache usage: 'read-only'
table 'commission_order'
id column:'id', type:'integer'
purchaseOrder column: 'externalId'
bookedAmount column: 'bookedAmount'
customerInvoicedDate column: 'customerInvoicedDate'
state column : 'state'
accountEntries sort : 'id', order : 'desc'
}
...
}
class AccountEntryDAO implements Comparable<AccountEntryDAO> {
int id
Date eventDate
CommissionOrderDAO commissionOrder
String entryType
String description
double remainingPotentialCommission
static belongsTo = [commissionOrder : CommissionOrderDAO]
static mapping = {
version false
cache usage: 'read-only'
table 'account_entry'
id column:'id', type:'integer'
eventDate column: 'eventDate'
commissionOrder column: 'commissionOrder'
entryType column: 'entryType'
description column: 'description'
remainingPotentialCommission formula : SQLFormulaUtils.AccountEntrySQL.REMAININGPOTENTIALCOMMISSION_FORMULA
}
....
}
The criteria for the report is that the commissionOrder.state==open and the commissionOrder.customerInvoicedDate is not null. And the account entries in the report should be between the startDate and the endDate and with remainingPotentialCommission > 0.
I'm looking to display information on the CommissionOrder mainly (and to display account entries on that commission order between the dates), but when I use the following projection:
def results = accountEntryCriteria.list {
projections {
like ("entryType", "comm%")
ge("eventDate", beginDate)
le("eventDate", endDate)
gt("remainingPotentialCommission", 0.0099d)
and {
commissionOrder {
eq("state", "open")
isNotNull("customerInvoicedDate")
}
}
}
order("id", "asc")
}
I get the correct accountEntries with the proper commissionOrders, but I'm going in backwards: I have loads of accountEntries which can reference the same commissionOrder. Aut when I look at the commissionOrders that I've retrieved, each one has ALL its accountEntries not just the accountEntries between the dates.
I then loop through the results, get the commissionOrder from the accountEntriesList, and remove accountEntries on that commissionOrder after the end date to get the "snapshot" in time that I need.
def getCommissionOrderListByRemainingPotentialCommissionFromResults(results, endDate) {
log.debug("begin getCommissionOrderListByRemainingPotentialCommissionFromResults")
int count = 0;
List<CommissionOrderDAO> commissionOrderList = new ArrayList<CommissionOrderDAO>()
if (results) {
CommissionOrderDAO[] commissionOrderArray = new CommissionOrderDAO[results?.size()];
Set<CommissionOrderDAO> coDuplicateCheck = new TreeSet<CommissionOrderDAO>()
for (ae in results) {
if (!coDuplicateCheck.contains(ae?.commissionOrder?.purchaseOrder) && ae?.remainingPotentialCommission > 0.0099d) {
CommissionOrderDAO co = ae?.commissionOrder
CommissionOrderDAO culledCO = removeAccountEntriesPastDate(co, endDate)
def lastAccountEntry = culledCO?.accountEntries?.last()
if (lastAccountEntry?.remainingPotentialCommission > 0.0099d) {
commissionOrderArray[count++] = culledCO
}
coDuplicateCheck.add(ae?.commissionOrder?.purchaseOrder)
}
}
log.debug("Count after clean is ${count}")
if (count > 0) {
commissionOrderList = Arrays.asList(ArrayUtils.subarray(commissionOrderArray, 0, count))
log.debug("commissionOrderList size = ${commissionOrderList?.size()}")
}
}
log.debug("end getCommissionOrderListByRemainingPotentialCommissionFromResults")
return commissionOrderList
}
Please don't think I'm under the impression that this isn't a Charlie Foxtrot. The query itself doesn't take very long, but the cull process takes over 35 minutes. Right now, it's "manageable" because I only have to run the report once a month.
I need to let the database handle this processing (I think), but I couldn't figure out how to manipulate hibernate to get the results I want. How can I change my criteria?
Try to narrow down the bottle neck of that process. If you have a lot of data, then maybe this check could be time expensive.
coDuplicateCheck.contains(ae?.commissionOrder?.purchaseOrder)
in Set contains have O(n) complexity. You can use i.e. Map to store keys that you would check and then search for "ae?.commissionOrder?.purchaseOrder" as key in the map.
The second thought is that maybe when you're getting ae?.commissionOrder?.purchaseOrder it is always loaded from db by lazy mechanism. Try to turn on query logging and check that you don't have dozens of queries inside this processing function.
Finally and again I would suggest to narrow down where is the most expensive part and time waste.
This plugin maybe helpful.