SQL Select statement to find a unique entry based on many attributes - sql

To put this work in context... I'm trying to filter a database of objects and build descriptions which can be verbalized for a speech UI. To minimise the descriptions I want to find the shortest way to describe an object, based on the idea of Grices Maxims.
It's possible in code by iterating through the records, and running through all permutations, but I keep thinking there ought to be a way to do this in SQL... so far I haven't found it. (I'm using PostGRES.)
So I have a table that looks something like this:
id colour position height
(int) (text) (text) (int)
0 "red" "left" 9
1 "red" "middle" 8
2 "blue" "middle" 8
3 "blue" "middle" 9
4 "red" "left" 7
There are two things I wish to find based on the attributes (excluding the ID).
a) are any of the records unique, based on the minimum number of attributes?
=> e.g. record 0 is unique based on colour and height
=> e.g. record 1 is the only red item in the middle
=> e.g. record 4 is unique as its the only one which has a height of 7
b) how is a particular record unique?
=> e.g. how is record 0 unique? because it is the only item with a colour red, and height of 9
=> e.g. record 4 is unique because it is the only item with a height of 7
It may of course be that no objects are unique based on the attributes which is fine.
+++++++++++++++++++++++++
Answer for (a)
So the only way I can think to do this in SQL is to start off by testing a single attribute to see if there is a single match from all records. If not then add attribute 2 and test again. Then try attributes 1 and 3. Finally try attributes 1,2 and 3.
Something like this:-
single column test:
select * from griceanmaxims
where height=(Select height from griceanmaxims
group by height
having (count(height)=1))
or
relpos=
(Select relpos
from griceanmaxims
group by relpos
having (count(relpos)=1))
or
colour=
(Select colour
from griceanmaxims
group by colour
having (count(colour)=1))
double column tests:
(Select colour,relpos
from griceanmaxims
group by colour,relpos
having (count(colour)=1))
(Select colour,height
from griceanmaxims
group by colour,height
having (count(colour)=1))
etc
++++++++
I'm not sure if there's a better way or how to join up the results from the double column tests.
Also if anyone has any suggestions on how to determine the distinguishing factors for a record (as in question b), that would be great. My guess is that (b) would require (a) to be run for all of the field combinations, but I'm not sure if there's a better way.
Thanks in advance for any help on this one....

I like the idea of attacking the problem using a General Purpose Language eg C#:
1) Iterate through and see if any have 1 attribute which is unique eg ID = 4, which is unique because height is 7. Take ID 4 out of the 'doing' collection, and put into 'done' collection with appropriate attribute
Use a unit testing tool eg MSUNIT to prove the above works
2) Try and extend to n attibutes
Unit Test
3) See if any can be unique with 2 attributes. Take those IDs out of doing and into done with the pairs of attributes
Unit Test
4) Extend to m attributes
Unit Test
3) Refactor maybe using recursion
Hope this helps.

Related

SQL Specific LIKE ANY String Search

So I'm in Teradata trying to pull any products that have more than 1 color-related name, as seen in the code snippet here:
SELECT
pt.product_number,
COUNT (CASE WHEN ot.option_name like any ('%green%', '%red%', '%blue%') THEN 1 ELSE NULL END) as differentColorCount
FROM product_table pt
JOIN option_table ot on ot.product_num = pt.product_num
HAVING differentColorCount > 1
GROUP BY 1
This is running fine, but the problem that I'm realizing is that a product might have a hundred different "Red" options for instance. (Red-1, Red-2, Red-3, etc). But I only want a count of when two of the different color strings are present for a single product.
So instead of LIKE ANY what I really need is LIKE ANY TWO. If both Red AND Green are present, count 1. If both Blue AND Purple are present, count 1.
I realize I could do a really long list where I do dozens of LIKE ALLs in every possible combination, but that doesn't seem like it will scale well if I need to check for, say 100 different colors instead of 6?
If anyone has any insight on this I would be incredibly grateful. Thanks in advance for any help you can offer! :)
You can utilize a regular expression to extract the color and then apply a distinct count:
Count (DISTINCT RegExp_Substr(option_name, '(green|red|blue)')) AS differentColorCount
This is similar to your like any ('%green%', '%red%', '%blue%'), but returns the actual matching color instead of TRUE/FALSE.
The'(green|red|blue)' search pattern seperates defines three alternative search strings and returns the first match.

Changing multiple row colours in Report Builder

I would like to change the row colours in the attached image so all rows match with the value in NAME column.
So I want the related 2 rows in CUST CODE, DEL TO, CUST NAME and then the related 3 rows in ORDER NO to be the same colour.
Then i would like the colours to alternate for the next NAME value and so on.
Is this possible? I know how to alternate row colour when each result is one row only, but unsure how to do it with this kind of result.
So to clarify, for NAME (AHe) all rows to be 'lightgrey', NAME (AHO) all rows to be 'whitesmoke', NAME (JH) all rows to be 'lightgrey' etc...
I've used something similar to the following in the background color expression:
=IIF(RUNNINGVALUE(Fields!Name.Value, COUNTDISTINCT,"MainDataSet") MOD 2 = 0,
"LightGrey",
"WhiteSmoke")
There may be another way to do this using ROWNUMBER() or another alternative as well.

Solution for allowing user sorting in SQlite

By user sorting I mean that as a user on the site you see a bunch of items, and you are supposed to be able to reorder them (I'm using jQuery UI).
The user only sees 20 items on each page, but the total number of items can be thousands.
I assume I need to add another column in the table for custom ordering.
If the user sees items from 41-60, and and he sorts them like:
41 = 2nd
42 = 1st
43 = fifth
etc.
I can't just set the ordering column to 2,1,5.
I would need to go through the entire table and change each record.
Is there any way to avoid this and somehow sort only the current selection?
Add another column to store the custom order, just as you suggested yourself. You can avoid the problem of having to reassign all rows' values by using a REAL-typed column: For new rows, you still use an increasing integer sequence for the column's value. But if a user reorders a row, the decimal data type will allow you to use the formula ½ (previous row's value + next row's value) to update the column of the single row that was moved. You
have got two special cases to take care of, namely if a user moves a row to the very beginning or end of the list. In that case, just use min - 1 rsp. max + 1.
This approach is the simplest I can think of, but it also has some downsides. First, it has a theoretical limitation due to the datatype having only double-precision. After a finite number of reorderings, the values are too close together for their average to be a different number. But that's really only a theoretical limit you should never reach in practical applications. Also, the column will use 8 bytes of memory per row, which probably is much more than you actually need.
If your application might scale to the point where those 8 bytes matter or where you might have users that overeagerly reorder rows, you should instead stick to the INTEGER column and use multiples of a constant number as the default values (e.g. 100, 200, 300, ..). You still use the update formula from above, but whenever two values become too close together, you reassign all values. By tweaking the constant multiplier to the average table size / user behaviour, you can control how often this expensive operation has to be done.
There are a couple ways I can think of to do this. One would be to use a SELECT FROM SELECT style statement. As in something like this.
SELECT *
FROM (
SELECT col1, col2, col3...
FROM ...
WHERE ...
LIMIT n,m
) as Table_A
ORDER BY ...
The second option would be to use temp tables such as:
INSERT INTO temp_table_A SELECT ... FROM ... WHERE ... LIMIT n,m;
SELECT * FROM temp_table_A ORDER BY ...
Another option to look at would be jQuery plugin like DataTables
one way i can think of is:
Add a new column (if feasible) or create a new table for holding the order of the items.
On any page you will show around 20 items based on the initial ordering.
Using the jquery's Draggable you can send updates to this table
I think you can do this with an extra column.
First, you could prepopulate this new column with a default sort order and then allow the user to interactively modify it with the drag and drop of jquery-ui.
Lets say this user has 100 items in the table. You set the values in the order column to [1,2,3,...,99,100]. I suggest that you run a script on the original table to set all items to a default sort order.
Now going back to your example where the user is presented with items 41-60: the initial presentation in their browser would rank those at orders [41,42,43,...,59,60]. You might also need to save the lowest order that appears in this subset, in this case 41. Or better yet, save the entire array of rankings and restore the exact same numbers in the new order. This covers the case where they select a set of records that are not already consecutively ordered, perhaps because they belong to someone else.
To demonstrate what I mean: when they reorder them in the page, your javascript reassigns those same numbers back to the subset in the new order. Like this:
item A : 41
item B : 45
item C : 46
item D : 47
item E : 51
item F : 54
item G : 57
then the user changes them to this order, but you reassign the numbers like this:
item D : 41
item F : 45
item E : 46
item A : 47
item C : 51
item B : 54
item G : 57
This should also work if the subset is consecutive.

Creating a Unique PostgreSQL Index that Contains Operations

I would like to create a unique PostgreSQL index that contains operations. Does anyone know if this is possible?
CREATE UNIQUE INDEX ux_data ON table (name, color)
The operation that I would like to add is "num & new_num = 0", where num is an existing column and new_num is a value to be inserted.
Is this possible?
Update: Here's more detail regarding what I want to do.
Database:
name color num
one red 5
two green 5
one red 8
What I want to do is to prevent the case where an entry is not unique, like:
New entry 1: name = one, color = red, num = 1.
We have matching names and colors and the first num check results in 101 & 001 = 1, this New entry 1 is not unique, and should be rejected, however, if the number was changed to 2.
New entry 2: name = one, color = red, num = 2
Now we have matching names and colors. For the num in all name/color matches 101 & 010 = 0, and 1000 & 0010 = 0, so we have a unique entry.
This is going to be a project and a half. It doesn't look like there is a very easy way to do this, and yet with PostgreSQL anything is possible. I think you want an exclude constraint using GIST. Unfortunately, that exclude constraint is not going to be an easy one to do since there are no easy types that support this.
Your basic solution I think is going to have to involve:
A cast to a suitable data type (presumably bit(n))
Sufficient operators to allow GiST indexes of bit(n). T his means you have to build functions and operators for all GiST operation classes, create operator families, etc.
Create an exclude constraint using your operations.
This will not be easy and it will be complicated but with some effort and dedication it is possible. I would expect one to two hundred lines of code will be involved in doing it. You'd probably want to write unit tests too.

How to add "weights" to a MySQL table and select random values according to these?

I want to create a table, with each row containing some sort of weight. Then I want to select random values with the probability equal to (weight of that row)/(weight of all rows). For example, having 5 rows with weights 1,2,3,4,5 out of 1000 I'd get approximately 1/15*1000=67 times first row and so on.
The table is to be filled manually. Then I'll take a random value from it. But I want to have an ability to change the probabilities on the filling stage.
I found this nice little algorithm in Quod Libet. You could probably translate it to some procedural SQL.
function WeightedShuffle(list of items with weights):
max_score ← the sum of every item’s weight
choice ← random number in the range [0, max_score)
current ← 0
for each item (i, weight) in items:
current ← current + weight
if current ≥ choice or i is the last item:
return item i
The easiest (and maybe best/safest?) way to do this is to add those rows to the table as many times as you want the weight to be - say I want "Tree" to be found 2x more often then "Dog" - I insert it 2 times into the table and I insert "Dog" once and just select elements at random one by one.
If the rows are complex/big then it would be best to create a separate table (weighted_Elements or something) in which you'll just have foreign keys to the real rows inserted as many times as the weights dictate.
The best possible scenario (if i understand your question properly) is to setup your table as you normally would and then add two columns both INT's.
Column 1: Weight - This column would hold your weight value going from -X to +X, X being the highest value you want to have as a weight (IE: X=100, -100 to 100). This value is populated to give the row an actual weight and increase or decrease the probability of it coming up.
Column 2: *Count** - This column would hold the count of how many times this row has come up, this column is needed only if you want to use fair weighting. Fair weighting prevents one row from always showing up. (IE: if you have one row weighted at 100 and another at 2 the row with 100 will always show up, this column will allow weight 2 to be more 'valueable' as you get more weight 100 results). This column should be incremented by 1 each time a row result is pulled but you can make the logic more advanced later so it adds the weight etc.
Logic: - Its really simple now, your query simply has to request all rows as you normally would then make an extra select that (you can change the logic here to whatever you want) takes the weights and subtracts the count and order by that column.
The end result should be a table where you will get your weights appearing more often until a certain point where the system will evenly distribute itself out (leave out column 2) and you will have a system that will always return the same weighted order unless you offset the base of the query (IE: LIMIT [RANDOM NUMBER], [NUMBER OF ROWS TO RETURN])
I'm not an expert in probability theory, but assuming you have a column called WEIGHT, how about
select FIELD_1, ... FIELD_N, (rand() * WEIGHT) as SCORE
from YOURTABLE
order by SCORE
limit 0, 10
This would give you 10 records, but you can change the limit clause, of course.
The problem is called Reservoir Sampling (https://en.wikipedia.org/wiki/Reservoir_sampling)
The A-Res algorithm is easy to implement in SQL:
SELECT *
FROM table
ORDER BY pow(rand(), 1 / weight) DESC
LIMIT 10;
I came looking for the answer to the same question - I decided to come up with this:
id weight
1 5
2 1
SELECT * FROM table ORDER BY RAND()/weight
it's not exact - but it is using random so i might not expect exact. I ran it 70 times to get number 2 10 times. I would have expect 1/6th but i got 1/7th. I'd say that's pretty close. I'd have to run a script to do it a few thousand times to get a really good idea if it's working.