Creating a Unique PostgreSQL Index that Contains Operations - sql

I would like to create a unique PostgreSQL index that contains operations. Does anyone know if this is possible?
CREATE UNIQUE INDEX ux_data ON table (name, color)
The operation that I would like to add is "num & new_num = 0", where num is an existing column and new_num is a value to be inserted.
Is this possible?
Update: Here's more detail regarding what I want to do.
Database:
name color num
one red 5
two green 5
one red 8
What I want to do is to prevent the case where an entry is not unique, like:
New entry 1: name = one, color = red, num = 1.
We have matching names and colors and the first num check results in 101 & 001 = 1, this New entry 1 is not unique, and should be rejected, however, if the number was changed to 2.
New entry 2: name = one, color = red, num = 2
Now we have matching names and colors. For the num in all name/color matches 101 & 010 = 0, and 1000 & 0010 = 0, so we have a unique entry.

This is going to be a project and a half. It doesn't look like there is a very easy way to do this, and yet with PostgreSQL anything is possible. I think you want an exclude constraint using GIST. Unfortunately, that exclude constraint is not going to be an easy one to do since there are no easy types that support this.
Your basic solution I think is going to have to involve:
A cast to a suitable data type (presumably bit(n))
Sufficient operators to allow GiST indexes of bit(n). T his means you have to build functions and operators for all GiST operation classes, create operator families, etc.
Create an exclude constraint using your operations.
This will not be easy and it will be complicated but with some effort and dedication it is possible. I would expect one to two hundred lines of code will be involved in doing it. You'd probably want to write unit tests too.

Related

What is the use case that makes EAVT index preferable to EATV?

From what I understand, EATV (which Datomic does not have) would be great fit for as-of queries. On the other hand, I see no use-case for EAVT.
This is analogous to row/primary key access. From the docs: "The EAVT index provides efficient access to everything about a given entity. Conceptually this is very similar to row access style in a SQL database, except that entities can possess arbitrary attributes rather then being limited to a predefined set of columns."
The immutable time/history side of Datomic is a motivating use case for it, but in general, it's still optimized around typical database operations, e.g. looking up an entity's attributes and their values.
Update:
Datomic stores datoms (in segments) in the index tree. So you navigate to a particular E's segment using the tree and then retrieve the datoms about that E in the segment, which are EAVT datoms. From your comment, I believe you're thinking of this as the navigation of more b-tree like structures at each step, which is incorrect. Once you've navigated to the E, you are accessing a leaf segment of (sorted) datoms.
You are not looking for a single value at a specific point in time. You are looking for a set of values up to a specific point in time T. History is on a per value basis (not attribute basis).
For example, assert X, retract X then assert X again. These are 3 distinct facts over 3 distinct transactions. You need to compute that X was added, then removed and then possibly added again at some point.
You can do this with SQL:
create table Datoms (
E bigint not null,
A bigint not null,
V varbinary(1536) not null,
T bigint not null,
Op bit not null --assert/retract
)
select E, A, V
from Datoms
where E = 1 and T <= 42
group by E, A, V
having 0 < sum(case Op when 1 then +1 else -1 end)
The fifth component Op of the datom tells you whether the value is asserted (1) or retracted (0). By summing over this value (as +1/-1) we arrive at either 1 or 0.
Asserting the same value twice does nothing, and you always retract the old value before you assert a new value. The last part is a prerequisite for the algorithm to work out this nicely.
With an EAVT index, this is a very efficient query and it's quite elegant. You can build a basic Datomic-like system in just 150 lines of SQL like this. It is the same pattern repeated for any permutation of EAVT index that you want.

Solution for allowing user sorting in SQlite

By user sorting I mean that as a user on the site you see a bunch of items, and you are supposed to be able to reorder them (I'm using jQuery UI).
The user only sees 20 items on each page, but the total number of items can be thousands.
I assume I need to add another column in the table for custom ordering.
If the user sees items from 41-60, and and he sorts them like:
41 = 2nd
42 = 1st
43 = fifth
etc.
I can't just set the ordering column to 2,1,5.
I would need to go through the entire table and change each record.
Is there any way to avoid this and somehow sort only the current selection?
Add another column to store the custom order, just as you suggested yourself. You can avoid the problem of having to reassign all rows' values by using a REAL-typed column: For new rows, you still use an increasing integer sequence for the column's value. But if a user reorders a row, the decimal data type will allow you to use the formula ½ (previous row's value + next row's value) to update the column of the single row that was moved. You
have got two special cases to take care of, namely if a user moves a row to the very beginning or end of the list. In that case, just use min - 1 rsp. max + 1.
This approach is the simplest I can think of, but it also has some downsides. First, it has a theoretical limitation due to the datatype having only double-precision. After a finite number of reorderings, the values are too close together for their average to be a different number. But that's really only a theoretical limit you should never reach in practical applications. Also, the column will use 8 bytes of memory per row, which probably is much more than you actually need.
If your application might scale to the point where those 8 bytes matter or where you might have users that overeagerly reorder rows, you should instead stick to the INTEGER column and use multiples of a constant number as the default values (e.g. 100, 200, 300, ..). You still use the update formula from above, but whenever two values become too close together, you reassign all values. By tweaking the constant multiplier to the average table size / user behaviour, you can control how often this expensive operation has to be done.
There are a couple ways I can think of to do this. One would be to use a SELECT FROM SELECT style statement. As in something like this.
SELECT *
FROM (
SELECT col1, col2, col3...
FROM ...
WHERE ...
LIMIT n,m
) as Table_A
ORDER BY ...
The second option would be to use temp tables such as:
INSERT INTO temp_table_A SELECT ... FROM ... WHERE ... LIMIT n,m;
SELECT * FROM temp_table_A ORDER BY ...
Another option to look at would be jQuery plugin like DataTables
one way i can think of is:
Add a new column (if feasible) or create a new table for holding the order of the items.
On any page you will show around 20 items based on the initial ordering.
Using the jquery's Draggable you can send updates to this table
I think you can do this with an extra column.
First, you could prepopulate this new column with a default sort order and then allow the user to interactively modify it with the drag and drop of jquery-ui.
Lets say this user has 100 items in the table. You set the values in the order column to [1,2,3,...,99,100]. I suggest that you run a script on the original table to set all items to a default sort order.
Now going back to your example where the user is presented with items 41-60: the initial presentation in their browser would rank those at orders [41,42,43,...,59,60]. You might also need to save the lowest order that appears in this subset, in this case 41. Or better yet, save the entire array of rankings and restore the exact same numbers in the new order. This covers the case where they select a set of records that are not already consecutively ordered, perhaps because they belong to someone else.
To demonstrate what I mean: when they reorder them in the page, your javascript reassigns those same numbers back to the subset in the new order. Like this:
item A : 41
item B : 45
item C : 46
item D : 47
item E : 51
item F : 54
item G : 57
then the user changes them to this order, but you reassign the numbers like this:
item D : 41
item F : 45
item E : 46
item A : 47
item C : 51
item B : 54
item G : 57
This should also work if the subset is consecutive.

Using Real numbers for explicit sorting in sql database

i'm facing a recurring problem. I've to let a user reorder some list that is stored in a database.
The fist straightforward approach i can think is to have a "position" column with the ordering saved as a integer. p.e.
Data, Order
A 1
B 2
C 3
D 4
Problem here is that if i have to insert FOO in position 2, now my table become
Data, Order
A 1
FOO 2
B 3
C 4
D 5
So to insert a new line, i have to do one CREATE and three UPDATE on a table of five elements.
So my new idea is using Real numbers instead of integers, my new table become
Data, Order
A 1.0
B 2.0
C 3.0
D 4.0
If i want to insert a element FOO after A, this become
Data, Order
A 1.0
FOO 1.5
B 2.0
C 3.0
D 4.0
With only one SQL query executed.
This would work fine with theoretical Real Numbers, but floating point numbers have a limited precision and i wondering how feasible this is and whether and how can i optimize it to avoid exceeding double precision with a reasonable number of modifications
edit:
this is how i implemented it now in python
#classmethod
def get_middle_priority(cls, p, n):
p = Decimal(str(p))
n = Decimal(str(n))
m = p + ((n - p)/2)
i = 0
while True:
m1 = round(m, i)
if m1 > p and m1 < n:
return m1
else:
i += 1
#classmethod
def create(cls, data, user):
prev = data.get('prev')
if prev is None or len(prev)<1:
first = cls.list().first()
if first is None:
priority = 1.0
else:
priority = first.priority - 1.0
else:
prev = cls.list().filter(Rotator.codice==prev).first()
next = cls.list().filter(Rotator.priority>prev.priority).first()
if next is None:
priority = prev.priority + 1.0
else:
priority = cls.get_middle_priority(prev.priority, next.priority)
r = cls(data.get('codice'),
priority)
DBSession.add(r)
return r
If you want to control the position and there is no ORDER BY solution then a rather simple and robust approach is to point to the next or to the previous. Updates/inserts/deletes (other than the first and last) will require 3 operations.
Insert the new Item
Update the Item Prior the New Item
Update the Item After the New Item
After you have that established you can use a CTE (with a UNION ALL) to create a sorted list that will never have a limit.
I have seen rather large implementations of this that were done via Triggers to keep the list in perfect form. I however am not a fan of triggers and would just put the logic for the entire operation in a stored procedure.
You may use a string rather then numbers:
item order
A ffga
B ffgaa
C ffgb
Here, the problem of finite precision is handled by the possibility of growing the string. String storage is theoretically unlimited in the database, only by the size of the storage device. But there is no better solution for absolute-ordering items. Relative-ordering, like linked-lists, might work better (but you can't do order by query then).
The linked list idea is neat but it's expensive to pull out data in order. If you have a database which supports it, you can use something like connect by to pull it out. linked list in sql is a question dedicated to that problem.
Now if you don't, I was thinking of how one can achieve an infinitely divisable range, and thought of sections in a book. What about storing the list initially as
1
2
3
and then to insert between 1 and two you insert a "subsection under 1" so that your list becomes
1
1.1
2
3
If you want to insert another one between 1.1 and 2 you place a second subsection under 1 and get
1
1.1
1.2
2
3
and lastly if you want to add something between 1.1 and 1.2 you need to introduce a subsubsection and get
1
1.1
1.1.1
1.2
2
3
Maybe using letters instead of numbers would be less confusing.
I'm not sure if there is any standard lexicographic ordering in sql databases which could sort this type of list correctly. But I think you could roll your own with some "order by case" and substringing. Edit: I found a question pertaining to this: linky
Another downside is that the worst case field size of this solution would grow exponentially with the number of input items (You could get long rows like 1.1.1.1.1.1 etc). But in the best case it would be linear or almost constant (Rows like 1.934856.1).
This solution is also quite close to what you already had in mind, and I'm not sure that it's an improvement. A decimal number using the binary partitioning strategy that you mentioned will probably increase the number of decimal points between each insert by one, right? So you would get
1,2 -> 1,1.5,2 -> 1,1.25,1.5,2 -> 1,1.125,1.25,1.5,2
So the best case of the subsectioning-strategy seems better, but the worst case a lot worse.
I'm also not aware of any infinite precision decimal types for sql databases. But you could of course save your number as a string, in which case this solution becomes even more similar to your original one.
Set all rows to a unique number starting at 1 and incrementing by 1 at the start. When you insert a new row, set it to count(*) of the table + 1 (there are a variety of ways of doing this).
When the user updates the Order of a row, always update it by calling a stored procedure with this Id (PK) of the row to update and the new order. In the stored procedure,
update tableName set Order = Order + 1 where Order >= #updatedRowOrder;
update tablename set Order = #updatedRowOrder where Id = #pk;
That guarantees that there will always be space and a continuous sequence with no duplicates. I haven't worked you what would happen if you put silly new Order numbers of a row (e.g. <= 0) but probably bad things; that's for the Front End app to prevent.
Cheers -

SQL Select statement to find a unique entry based on many attributes

To put this work in context... I'm trying to filter a database of objects and build descriptions which can be verbalized for a speech UI. To minimise the descriptions I want to find the shortest way to describe an object, based on the idea of Grices Maxims.
It's possible in code by iterating through the records, and running through all permutations, but I keep thinking there ought to be a way to do this in SQL... so far I haven't found it. (I'm using PostGRES.)
So I have a table that looks something like this:
id colour position height
(int) (text) (text) (int)
0 "red" "left" 9
1 "red" "middle" 8
2 "blue" "middle" 8
3 "blue" "middle" 9
4 "red" "left" 7
There are two things I wish to find based on the attributes (excluding the ID).
a) are any of the records unique, based on the minimum number of attributes?
=> e.g. record 0 is unique based on colour and height
=> e.g. record 1 is the only red item in the middle
=> e.g. record 4 is unique as its the only one which has a height of 7
b) how is a particular record unique?
=> e.g. how is record 0 unique? because it is the only item with a colour red, and height of 9
=> e.g. record 4 is unique because it is the only item with a height of 7
It may of course be that no objects are unique based on the attributes which is fine.
+++++++++++++++++++++++++
Answer for (a)
So the only way I can think to do this in SQL is to start off by testing a single attribute to see if there is a single match from all records. If not then add attribute 2 and test again. Then try attributes 1 and 3. Finally try attributes 1,2 and 3.
Something like this:-
single column test:
select * from griceanmaxims
where height=(Select height from griceanmaxims
group by height
having (count(height)=1))
or
relpos=
(Select relpos
from griceanmaxims
group by relpos
having (count(relpos)=1))
or
colour=
(Select colour
from griceanmaxims
group by colour
having (count(colour)=1))
double column tests:
(Select colour,relpos
from griceanmaxims
group by colour,relpos
having (count(colour)=1))
(Select colour,height
from griceanmaxims
group by colour,height
having (count(colour)=1))
etc
++++++++
I'm not sure if there's a better way or how to join up the results from the double column tests.
Also if anyone has any suggestions on how to determine the distinguishing factors for a record (as in question b), that would be great. My guess is that (b) would require (a) to be run for all of the field combinations, but I'm not sure if there's a better way.
Thanks in advance for any help on this one....
I like the idea of attacking the problem using a General Purpose Language eg C#:
1) Iterate through and see if any have 1 attribute which is unique eg ID = 4, which is unique because height is 7. Take ID 4 out of the 'doing' collection, and put into 'done' collection with appropriate attribute
Use a unit testing tool eg MSUNIT to prove the above works
2) Try and extend to n attibutes
Unit Test
3) See if any can be unique with 2 attributes. Take those IDs out of doing and into done with the pairs of attributes
Unit Test
4) Extend to m attributes
Unit Test
3) Refactor maybe using recursion
Hope this helps.

Related rows based on text columns

Given that I have a table with a column of TEXT in it (MySQL or SQlite) is it possible to use the value of that column in a way that I could find similar rows with somewhat related text values?
For example, I if I wanted to find related rows to row_3 - both 1 & 2 would match:
row_1 = this is about sports
row_2 = this is about study
row_3 = this is about study and sports
I know that I could use FULLTEXT or FTS3 if I had a key word I wanted to MATCH against the column values - but I'm just trying to find text that is somewhat related among the rows.
MySQL supports a fulltext search option called QUERY EXPANSION. The idea is that you search for a keyword, it finds a row, and then it uses the words in that row as keywords, to search for more matching rows.
SELECT ... FROM StudiesTable WHERE MATCH(description_text)
AGAINST ('sports' IN NATURAL LANGUAGE MODE WITH QUERY EXPANSION);
Read about it here: http://dev.mysql.com/doc/refman/5.1/en/fulltext-query-expansion.html
You're using the wrong hammer to pound that screw in. A single string in a database column isn't the way to store that data. You can't easily get at the part you care about, which is the individual words.
There is a lot of research into the problem of comparison of text. If you're serious about this need, you'll want to start reading about the variety of techniques in that problem domain.
The first clue is that you want to access / index the data not by complete text string, but by word or sentence fragment (unless you're interested in words that are spelled similarly being matched together, which is harder).
As an example of one technique, generate a chain out of your sentences by grabbing overlapping sets of three words, and store the chain. Then you can search for entries that have a large number of chain segments in common. A set of chain segments for your statements above would be:
row_1 = this is about sports
row_2 =
this is about study
row_3 = this is
about study and sports
this is about (3 matches)
is about sports
is about study (2 matches)
about study and
study and sports
Maybe it would be enough to take each relevant word (more than 4 letters? or comparing against a list of commom words?) in the base row using them as keywords for the fulltext search and building a tmp table (id, row_matched_id, count) to record the matches for each row adding 1 to count when it matches. At the end you'll get in the tmp table all the lines that matched and how many times they matched (how many relevant words were the same).If you want to run it once against the whole database and keep the results, use a persisted table, add a column for the id of the base row and do the search for each new row inserted (or updated) to update the results table.
Using this results table you can find quickly the rows matching more words of the base row without doing the search again.
Edit: with this you can "score" the results, for example, if you count x relevant words in the base row, you can calculate a score in % as (matches/x * 100) and filter all results with for example less than 50% matches. In your example, each row_1 and row_2 would give 50% if considering relevants only words with more than 4 letters or 67% if you consider all the words.