Maintaining auto ranking as a column in MongoDB - sql

I am using MongoDB as my database.
I have data which contains rank and name as columns. Now a new row can be updated with a rank different from ranks already existing or can be same.
If same then the ranks of other rows must be adjusted .
Rows having lesser rank than the to be inserted one must be incremented by one and the rows which are having ranks can remain as it it.
Feature is something like number bulleted list in MS Word type of applications. Where inserting a row in between adjust the numbering of other rows below it.
Rank 1 is the highest rank.
For e.g. there are 3 rows
Name Rank
A 1
B 2
C 3
Now i want to update a row with D as name and 2 as rank. So now after the row insert, the DB should like below
Name Rank
A 1
B 3
C 4
D 2
Probably using Database triggers i can achieve this by updating the other rows.
I have couple of questions
(a) Is there any other better way than using database trigger for achieving this kind of scenario ? Updating all the rows might be a time consuming job.
(b) Does MongoDB support database trigger natively ?
Best Regards,
Saurav

No, MongoDB, does not provide triggers (yet). Also I don't think trigger is really a great way to achieve this.
So I would just like to throw some ideas, see if it makes sense.
Approach 1
Maybe instead of disturbing those many documents, you can create a collection with only one document (Let's call the collection ranking). In that document, have an array field call ranks. Since it's an array it's already maintaining a sequence.
{
_id : "RANK",
"ranks" : ["A","B","C"]
}
Now if you want to add D to this rank at 2nd position
db.ranking.update({_id:"RANK"},{$push : {"ranks":{$each : ["D"],$position:1}}});
it would add D to index 1 which is 2nd position considering index starts at 0.
{
_id : "RANK",
"ranks" : ["A","D","B","C"]
}
But there is a catch, what if you want to change C position to 1st from 4th, you need to remove it from end and put it in the beginning, I am sure both operation can't be achieved in single update (didn't dig in the options much), so we can run two queries
db.ranking.update({_id:"RANK"},{$pull : {"ranks": "C"}});
db.ranking.update({_id:"RANK"},{$push : {"ranks":{$each : ["C"],$position:0}}});
Then it would be like
{
_id : "RANK",
"ranks" : ["C","A","D","B"]
}
maintaining the rest of sequence.
Now you would probably want to store id instead of A,B,C etc. one document can be 16MB so basically this ranks array can store more than 1.3 million entries of id, if id is MongoDB ObjectId of 12 bytes each. if that is not enough, we still have option to have followup document(s) with further ranking.
Approach 2
you can also, instead of having rank as number, just have two field like followedBy and precededBy.
so your user document would look
{
_id:"A"
"followedBy":"B",
}
{
_id:"B"
"followedBy":"C",
"precededBy":"A"
}
{
_id:"c"
"precededBy":"B",
}
if you want to add D at second position, then you need to change the current 2nd position and you need to insert the new One, so it would be change in only two document
{
_id:"A"
"followedBy":"B",
}
{
_id:"B"
"followedBy":"C",
"precededBy":"D" //changed from A to D
}
{
_id:"c"
"precededBy":"B",
}
{
_id:"D"
"followedBy":"B",
"precededBy":"A"
}
The downside of this approach is that you cannot sort in query based on the ranking until and unless you get all these in application and create a linkedlist sort of structure.
This approach just preserve the ranking with minimum DB changes.

Related

Change order whilst keeping ID field unchanged

i am struggling to find a solution for following. It is hard to find a title for it by the way :)
I am making a tool where i want to track subscriptions to an event. For managing that I have a table with x (25+) number of positions to be filled.
This tool is in VB.net with an underlying MSSQL Database
That Position will be assigend with a 'userid' and some attributes as 'getradio1' , 'getsradio2' etc.
All of that is easy. So you will get something like this in the 'position' table.
Now it comes - and the questions is twofold : Can it be done and if yes how ?
Every UserID has a kind of priority ranking (in the UserID Database)
Now what i want to is to have the position filled in by order of that ranking . Lets assume that the ranking is as follows
UserID101 = Ranking 14
UserID103 = Ranking 5
UserID106 = Ranking 11
UserID102 = Ranking 39
UserID118 = Ranking 1
UserID114 = Ranking 6
Then i want the table updated so that the position is 'reassigned' according to rank as follows (also including the 'getradio' colums
Ideally would be that if a new PositionID was assigned it would automatically do the 'reordering'.
I tried to descridbe the problem as simple and complete as possible. But if you have more question donot hesitate.
Thanks already for your help

Acquiring offset of some row in SQL

TL;DR: Is there a possibility to get OFFSET position of a particular, known row in SQL, considering some ORDER BY is applied?
So consider a schema like this (simplified):
CREATE TABLE "public"."painting" (
"uuid" uuid NOT NULL DEFAULT uuid_generate_v4(),
"name" varchar NOT NULL,
"score" int4 NOT NULL,
"approvedAt" timestamp,
PRIMARY KEY ("uuid")
);
Like
abc1,test1,10,10:00
abc2,test2,9,11:00
abc3,test3,8,8:00
abc4,test4,8,12:00
abc5,test5,6,7:00
I want to make a request sorted by score and limited with 3 items, and I should emphasize that multiple entities might have the same score.
Because of a dynamic nature of that table, while traversing through those items, sorted by score, some new item might appear somewhere in the list.
If I use SQL OFFSET statement, that means this new entity will shift all entities below to one row, so that the new selection will have an item, that was last on previous 3 items selection.
abc1,test1,10,10:00
abc2,test2,9,11:00
abc6,test6,8,15:00 (new item)
CURRENT OFFSET = 3
abc3,test3,8,8:00 (was in previous select)
abc4,test4,8,12:00
abc5,test5,6,7:00
To avoid that, instead of using OFFSET, I can remember the UUID of the item I fetched last, so it'll be abc3. On next request, I can use it's score to add an extra WHERE SCORE < 8 statement, but this will skip abc4, because it's too having score of 8.
If I use WHERE SCORE <= 8 this will again return abc3 which is already traversed. I can't use another field in WHERE clause, because this will affect the results. Additional ORDER BY won't help either.
It seems to me that it is a very common problem in database selection, yet I can't find one comprehensive answer.
So, my question then, if it's possible to do some kind of request like following:
SELECT * FROM "painting" WHERE "score" <= :score ORDER BY "score" DESC OFFSET %position of `abc3`% LIMIT 3
Or alternatively
SELECT OFFSET OF (`abc3`) FROM "painting" WHERE SCORE <= :score ORDER BY "score" DESC LIMIT 3
That will return 2 (because it's the second row with such score), then do
SELECT * FROM "painting" WHERE "score" <= :score ORDER BY "score" DESC OFFSET :offset LIMIT 3
where :score is the score of last received item and :offset is the result of SELECT OFFSET - 1
My own assumption is that we have to SELECT WHERE "score" = :score, and get offset position outside the SQL (or make a very complex SQL query). Though, if we have a lot of items with similar ORDER BY attribute, this helper request might end up being heavier than the data fetch itself.
Yet, I feel like that there's a much more clever SQL way of doing what I'm trying to do.
Good question. Accurate Backend Pagination requires the underlying data to use an ordering criteria with a set of columns that represent a UNIQUE key.
In your case your ordering criteria can be made unique by adding the column uuid to it. With that in mind you can increase the page size by 1 behind the scenes to 4. That 4th row won't be displayed but only used to retrieve the next page.
For example, you can get:
select *
from painting
order by -score, approvedAt, uuid
limit 4
Now you would display the first three rows:
abc1,test1,10,10:00
abc2,test2,9,11:00
abc3,test3,8,8:00
The client app (most likely the UI) will remember -- not display -- the 4th row (the "key") to retrieve the next page:
abc4,test4,8,12:00
Then, to get the next page the query will add a WHERE clause with the "key" and take the form:
select *
from painting
where (-score, approvedAt, uuid) >= (-8, '12:00', 'abc4')
order by -score, approvedAt, uuid
limit 4
This query won't display the new row being inserted, but the original 4th row.
To get blazing fast data retrieval you could create the index:
create index ix1 on painting ((-score), approvedAt, uuid);
See example at DB Fiddle.

Solution for allowing user sorting in SQlite

By user sorting I mean that as a user on the site you see a bunch of items, and you are supposed to be able to reorder them (I'm using jQuery UI).
The user only sees 20 items on each page, but the total number of items can be thousands.
I assume I need to add another column in the table for custom ordering.
If the user sees items from 41-60, and and he sorts them like:
41 = 2nd
42 = 1st
43 = fifth
etc.
I can't just set the ordering column to 2,1,5.
I would need to go through the entire table and change each record.
Is there any way to avoid this and somehow sort only the current selection?
Add another column to store the custom order, just as you suggested yourself. You can avoid the problem of having to reassign all rows' values by using a REAL-typed column: For new rows, you still use an increasing integer sequence for the column's value. But if a user reorders a row, the decimal data type will allow you to use the formula ½ (previous row's value + next row's value) to update the column of the single row that was moved. You
have got two special cases to take care of, namely if a user moves a row to the very beginning or end of the list. In that case, just use min - 1 rsp. max + 1.
This approach is the simplest I can think of, but it also has some downsides. First, it has a theoretical limitation due to the datatype having only double-precision. After a finite number of reorderings, the values are too close together for their average to be a different number. But that's really only a theoretical limit you should never reach in practical applications. Also, the column will use 8 bytes of memory per row, which probably is much more than you actually need.
If your application might scale to the point where those 8 bytes matter or where you might have users that overeagerly reorder rows, you should instead stick to the INTEGER column and use multiples of a constant number as the default values (e.g. 100, 200, 300, ..). You still use the update formula from above, but whenever two values become too close together, you reassign all values. By tweaking the constant multiplier to the average table size / user behaviour, you can control how often this expensive operation has to be done.
There are a couple ways I can think of to do this. One would be to use a SELECT FROM SELECT style statement. As in something like this.
SELECT *
FROM (
SELECT col1, col2, col3...
FROM ...
WHERE ...
LIMIT n,m
) as Table_A
ORDER BY ...
The second option would be to use temp tables such as:
INSERT INTO temp_table_A SELECT ... FROM ... WHERE ... LIMIT n,m;
SELECT * FROM temp_table_A ORDER BY ...
Another option to look at would be jQuery plugin like DataTables
one way i can think of is:
Add a new column (if feasible) or create a new table for holding the order of the items.
On any page you will show around 20 items based on the initial ordering.
Using the jquery's Draggable you can send updates to this table
I think you can do this with an extra column.
First, you could prepopulate this new column with a default sort order and then allow the user to interactively modify it with the drag and drop of jquery-ui.
Lets say this user has 100 items in the table. You set the values in the order column to [1,2,3,...,99,100]. I suggest that you run a script on the original table to set all items to a default sort order.
Now going back to your example where the user is presented with items 41-60: the initial presentation in their browser would rank those at orders [41,42,43,...,59,60]. You might also need to save the lowest order that appears in this subset, in this case 41. Or better yet, save the entire array of rankings and restore the exact same numbers in the new order. This covers the case where they select a set of records that are not already consecutively ordered, perhaps because they belong to someone else.
To demonstrate what I mean: when they reorder them in the page, your javascript reassigns those same numbers back to the subset in the new order. Like this:
item A : 41
item B : 45
item C : 46
item D : 47
item E : 51
item F : 54
item G : 57
then the user changes them to this order, but you reassign the numbers like this:
item D : 41
item F : 45
item E : 46
item A : 47
item C : 51
item B : 54
item G : 57
This should also work if the subset is consecutive.

SQL "group by" like - grouping algorithm

I have a table with more than 2 columns (let's say A, B and C). One column holds some numbers (C) and I want to do a "group by" like grouping, summing the numbers in C, but I don't know the algorithm for doing so.
I tried sorting the table by each column (from last to first, aside from the numbers column (C), so in this case: sort(B) and then sort(A)) and then, wherever nth row holds same values in A and B as in n-1th row, I add the number from nth row to n-1th row (in the C column), and then delete the nth row. Else, if A or B value in row n differs from A or B value in n-1th row, I'll just move to the next row. Then I repeat the algorithm till the last row in table. But somehow this isn't working all the time, especially when there're a lot more columns (some rows remain ungrouped, maybe because of the sorting method).
I want to know whether this is a good grouping algorithm and I need to look for the problem into the sorting method, or I need to use another (sorting and/or grouping) algorithm and which one. Thank you.
LE: Apparently the algorithm that I used works well after a thorough check of the code and fixing some minor mistakes that junior programmers like me often make :)
I think a good way to do this would be to wrap your row into a class, implement the equals method, and then use a Map to add the values up:
public class MyRow {
private Long columnA;
private String columnB;
private int columnC;
#Override
public boolean equals(final Object other) {
if (!other instanceof MyRow) {
return false;
}
final MyRow otherRow = (MyRow) other;
return this.columnA.equals(otherRow.getColumnA()) && this.columnB.equals(otherRow.getColumnB);
}
}
Then you can iterate over all the rows, and create a Map for holding the sums of C.
final Map<MyRow, Integer> computedCSums = new HashMap<MyRow, Integer>();
for (final MyRow myRow : myRows) {
if (computedCSums.get(myRow) == null) {
computedCSums.put(myRow, myRow.getColumnC());
} else {
computedCSums.put(myRow, computedSums.get(myRow) + myRow.getColumnC());
}
}
Then, to get the sum of grouped Cs of any row, you just do:
computedCSum.get(mySelectedRow);
I think there is three things should be considered about group by
less or equal is abstract
comparing two rows A, B according it columns (C1..Cn) are like this : compare each column from C1 to Cn , if we can get which is less, then return ,or if the two values are equal, then we go to compare next, repeat this until return.
which algorithm we choose
1)build a binary search tree or a hash table to store tuples , when we get a tuple, search the equal tuple , if we have , then merge the tuple which have the same group value, else put it to our search structure
2) read some tuples, then sort , walk the buffer and merge the same group
I prefer 1 rather than 2.
memory size
if out input is huge, we must consider memory limit.
we can use merge algorithm to deal this.
if memory exceed our limit , then write the tuples in memory to the tape order by their group columns
when we finish reading the input, then merge the result set in tape.

search within an array with a condition

I have two array I'm trying to compare at many levels. Both have the same structure with 3 "columns.
The first column contains the polygon's ID, the second a area type, and the third, the percentage of each area type for a polygone.
So, for many rows, it will compare, for example, ID : 1 Type : aaa % : 100
But for some elements, I have many rows for the same ID. For example, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 25% --- ID 2, type ccc, 50%. And in the second array, I'll have ID 2, Type aaa, 25% --- ID 2, type bbb, 10% --- ID 2, type eee, 38% --- ID 2, type fff, 27%.
here's a visual example..
So, my function has to compare these two array and send me an email if there are differences.
(I wont show you the real code because there are 811 lines). The first "if" condition is
if array1.id = array2.id Then
if array1.type = array2.type Then
if array1.percent = array2.percent Then
zone_verification = True
Else
zone_verification = False
The probleme is because there are more than 50 000 rows in each array. So when I run the function, for each "array1.id", the function search through 50 000 rows in array2. 50 000 searchs for 50 000 rows.. it's pretty long to run!
I'm looking for something to get it running faster. How could I get my search more specific. Example : I have many id "2" in the array1. If there are many id "2" in the array2, find it, and push all the array2.id = 3 in a "sub array" or something like that, and search in these specific rows. So I'll have just X rows in array1 to compare with X rows in array 2, not with 50 000. and when each "id 2" in array1 is done, do the same thing for "id 4".. and for "id 5"...
Hope it's clear. it's almost the first time I use VB.net, and I have this big function to get running.
Thanks
EDIT
Here's what I wanna do.
I have two different layers in a geospatial database. Both layers have the same structure. They are a "spatial join" of the land parcels (55 000), and the land use layer. The first layer is the current one, and the second layer is the next one we'll use after 2015.
So I have, for each "land parcel" the percentage of each land use. So, for a "land parcel" (ID 7580-80-2532, I can have 50% of farming use (TYPE FAR-23), and 50% of residantial use (RES-112). In the first array, I'll have 2 rows with the same ID (7580-80-2532), but each one will have a different type (FAR-23, RES-112) and a different %.
In the second layer, the same the municipal zoning (land use) has changed. So the same "land parcel" will now be 40% of residential use (RES-112), 20% of commercial (COM-54) and 40% of a new farming use (FAR-33).
So, I wanna know if there are some differences. Some land parcels will be exactly the same. Some parcels will keep the same land use, but not the same percentage of each. But for some land parcel, there will be more or less land use types with different percentage of each.
I want this script to compare these two layers and send me an email when there are differences between these two layers for the same land parcel ID.
The script is already working, but it takes too much time.
The probleme is, I think, the script go through all array2 for each row in array 1.
What I want is when there are more than 1 rows with the same ID in array1, take only this ID in both arrays.
Maybe if I order them by IDs, I could write a condition. kind of "when you find what you're looking for, stop searching when you'll find a different value?
It's hard to explain it clearly because I've been using VB since last week.. And english isn't my first language! ;)
If you just want to find out if there are any differences between the first and second array, you could do:
Dim diff = New HashSet(of Polygon)(array1)
diff.SymmetricExceptWith(array2)
diff will contain any Polygon which is unique to array1 or array2. If you want to do other types of comparisons, maybe you should explain what you're trying to do exactly.
UPDATE:
You could use grouping and lookups like this:
'Create lookup with first array, for fast access by ID
Dim lookupByID = array1.ToLookup(Function(p) p.id)
'Loop through each group of items with same ID in array2
For Each secondArrayValues in array2.GroupBy(Function(p) p.id)
Dim currentID As Integer = secondArrayValues.Key 'Current ID is the grouping key
'Retrieve values with same ID in array1
'Use a hashset to easily compare for equality
Dim firstArrayValues As New HashSet(of Polygon)(lookupByID(currentID))
'Check for differences between the two sets of data, for this ID
If Not firstArrayValues.SetEquals(secondArrayValues) Then
'Data has changed, do something
Console.WriteLine("Differences for ID " & currentID)
End If
Next
I am answering this question based on the first part that you wrote (that is without the EDIT section). The correct answer should explain a good algorithm but I am suggesting you to use DB capabilities because they have optimized many queries for these purpose.
Put all the records in DB two tables - O(n) time ... If the records are static you dont need to perform this step every time.
Table 1
id type percent
Table 2
id type percent
Then use the DB query, some thing like this
select count(*) from table1 t1, table2 t2 where t1.id!=t2.id and t1.type!=t2.type
(you can use some better queries, what I am trying to say is give the control to DB to perform this operation)
retrieve the result in your code and perform the necessary operation.
EDIT
1) You can sort them in O(n logn) time based on ID + type + Percent and then perform binary search.
2) Store the first record in hash map with appropriate key - could be ID only or ID+type
this will take O(n) time and searching ,if key is correct, will take constant time.
You need to define a structure to store this data. We'll store all the data in a LandParcel class, which will have a HashSet<ParcelData>
public class ParcelData
{
public ParcelType Type { get; set; } // This can be an enum, string, etc.
public int Percent { get; set; }
// Redefine Equals and GetHashCode conveniently
}
public class LandParcel
{
public ID Id { get; set; } // Whatever the type of the ID is...
public HashSet<ParcelData> Data { get; set; }
}
Now you have to build your data structure, with something like this:
Dictionary<ID, LandParcel> data1 = new ....
foreach (var item in array1)
{
LandParcel p;
if (!data1.TryGetValue(item.id, out p)
data1[item.id] = p = new LandParcel(id);
// Can this data be repeated?
p.Data.Add(new ParcelData(item.type, item.percent));
}
You do the same with a data2 dictionary for the second array. Now you iterate for all items in data1 and compare them with the item with the same id for data2.
foreach (var parcel2 in data2.Values)
{
var parcel1 = data1[parcel2.ID]; // Beware with exceptions here !!!
if (!parcel1.Data.SetEquals(parcel2.Data))
// You have different parcels
}
(Now that I look at it, we are practically doing a small database query here, kind of smelly code ...)
Sorry for the C# code since I don't really feel so comfortable with VB, but it should be fairly straightforward.