Calculating relative frequencies in SQL - sql

I am working on a tag recommendation system that takes metadata strings (e.g. text descriptions) of an object, and splits it into 1-, 2- and 3-grams.
The data for this system is kept in 3 tables:
The "object" table (e.g. what is being described),
The "token" table, filled with all 1-, 2- and 3-grams found (examples below), and
The "mapping" table, which maintains associations between (1) and (2), as well as a frequency count for these occurrences.
I am therefore able to construct a table via a LEFT JOIN, that looks somewhat like this:
SELECT mapping.object_id, mapping.token_id, mapping.freq, token.token_size, token.token
FROM mapping LEFT JOIN
token
ON (mapping.token_id = token.id)
WHERE mapping.object_id = 1;
object_id token_id freq token_size token
+-----------+----------+------+------------+--------------
1 1 1 2 'a big'
1 2 1 1 'a'
1 3 1 1 'big'
1 4 2 3 'a big slice'
1 5 1 1 'slice'
1 6 3 2 'big slice'
Now I'd like to be able to get the relative probability of each term within the context of a single object ID, so that I can sort them by probability, and see which terms are most probably (e.g. ORDER BY rel_prob DESC LIMIT 25)
For each row, I'm envisioning the addition of a column which gives the result of freq/sum of all freqs for that given token_size. In the case of 'a big', for instance, that would be 1/(1+3) = 0.25. For 'a', that's 1/3 = 0.333, etc.
I can't, for the life of me, figure out how to do this. Any help is greatly appreciated!

If I understood your problem, here's the query you need
select
m.object_id, m.token_id, m.freq,
t.token_size, t.token,
cast(m.freq as decimal(29, 10)) / sum(m.freq) over (partition by t.token_size, m.object_id)
from mapping as m
left outer join token on m.token_id = t.id
where m.object_id = 1;
sql fiddle example
hope that helps

Related

Subset large table for use in multiple UNIONs

Suppose I have a table with the following structure:
id measure_1_actual measure_1_predicted measure_2_actual measure_2_predicted
1 1 0 0 0
2 1 1 1 1
3 . . 0 0
I want to create the following table, for each ID (shown is an example for id = 1):
measure actual predicted
1 1 0
2 0 0
Here's one way I could solve this problem (I haven't tested this, but you get the general idea, I hope):
SELECT 1 AS measure,
measure_1_actual AS actual,
measure_1_predicted AS predicted
FROM tb
WHERE id = 1
UNION
SELECT 2 AS measure,
measure_2_actual AS actual,
measure_2_predicted AS predicted
FROM tb WHERE id = 1
In reality, I have five of these "measures" and tens of millions of people - subsetting such a large table five times for each member does not seem the most efficient way of doing this. This is a real-time API, receiving tens of requests a minute, so I think I'll need a better way of doing this. My other thought was to perhaps create a temp table/view for each member once the request is received, and then UNION based off of that subsetted table.
Does anyone have a more efficient way of doing this?
You can use a lateral join:
select t.id, v.*
from t cross join lateral
(values (1, measure_1_actual, measure_1_predicted),
(2, measure_2_actual, measure_2_predicted)
) v(measure, actual, predicted);
Lateral joins were introduced in Postgres 9.4. You can read about them in the documentation.

QGis SQL Query - "Deleting almost duplicates entries"

I have a table with a distance matrix between all the points of an other table. On the distance matrix, I just kept the lignes with a distance less than 100m.
I call the points placed less than 100 m away from eachother duplicates entries. But on the distance matrix, each duplicates entry takes 2 lines
The distance matrix presents like this :
InputID TargetID Distance
1 2 75
1 3 35
2 1 75
3 1 35
I’d like to keep just one of those duplicates entry, which means that on the previous exemple I’d like to keep only the ligne of the 1, because the 2 and the 3 are placed less than 100m away of the 1. But if I only keep the 1 on the distance matrix, I also need to keep only the 1 on my original table.
I use the SQL Query tool of QGis but I don’t really know how to program. Can anyone help me please ?
Thanks !
You could use some subquery in join for retrive the value to delete
delete from my_table m2
inner join (
select m.distance, min(m.InputId) min_id
from my_table m.
inner join (
select distance, count(*)
from my_table
group by Distance
having count(*) > 1
) t on t.distance = m.distance
group by distance
) t2 on t2.distance = m2.distance and t2.min_id = m2.InputId

Pair Objective in SQL (Access)

I have a question concerning returning a pair "objective" such that some demand has been met. The example can be simply depicted here in a (n*3) Matrix as below.
The goal is to find a pair of product (group by ID) that has the least cost. Single row (ID) would be just neglected from the analysis. Of course, if the pair appears multiply, that would be taken account in form of the sum in costs.
ID Product Cost
1 a 2
1 b 3
2 c 4
3 d 5
3 b 6
4 a 6
4 b 5
4 d 4
5 c 3
6 a 2
That would mean that (a,b) is a pair to be considered in ID = 1, (d,b) is a pair considered in ID = 3, (a,b) appears once again in ID = 4, so does (b,d). However, ID = 4 also accounts for another pair which is (a,d) appearing only once in the whole table. The sequence of the pair does not matter. Thus I sum the cost of the two (a,b) pairs and (b,d) pairs to compare the cost value and whether (a,b),(b,d) or (a,d) is cheaper. Of course, the cost of the pairs within, has to be summed as well.
The goal is to return the pair products that has the least cost. At our examples the results would be:
(a,b) = 5(ID = 1) + 11 (ID = 4) = 16
(b,d) = 11 (ID = 3) + 9 (ID = 4) = 20
(a,d) = 10 (ID = 4)
Solution : (a,d) as the optimal pair. Note, that there are cheaper solutions when I only consider single variables instead of pairs variables, but this is not the objective. I am seeking for a pair within a column, that reflects the least cost.
I hope that my question is clear for everyone, and I hope that it is possible to help me out of my query. Many thanks in advance!
Best,
David
If I understand correctly, you want a self-join and aggregation. The self-join generates all pairs of products for each id. The aggregation calculates the sum of costs for them:
select top 1 t1.product, t2.product, sum(t1.cost + t2.cost)
from t as t1 inner join
t as t2
on t1.id = t2.id
where t1.product < t2.product
group by t1.product, t2.product
order by sum(t1.cost + t2.cost);

How to change / convert values in Output that comes from SQL Server table

I have created a view in my SQL Server database which will give me number of columns.
One of the column heading is Priority and the values in this column are Low, Medium, High and Immediate.
When I execute this view, the result is returned perfectly like below. I want to change or assign values for these priorities. For example: instead of Low I should get 4, instead of Medium I should get 3, for High it should be 2 and for Immediate it should be 1.
What should I do to achieve this?
Ticket# Priority
123 Low
1254 Low
5478 Medium
4585 High
etc., etc.,
Use CASE:
Instead of Low I should get 4, instead of Medium I should get 3, for
High it should be 2 and for Immediate it should be 1
SELECT
[Ticket#],
[Priority] = CASE Priority
WHEN 'Low' THEN 4
WHEN 'Medium' THEN 3
WHEN 'High' THEN 2
WHEN 'Immediate' THEN 1
ELSE NULL
END
FROM table_name;
EDIT:
If you use dictionary table like in George Botros Solution you need to remember about:
1) Maintaining and storing dictionary table
2) Adding UNIUQE index to Priority.Name to avoid duplicates like:
Priority table
--------------------
Id | Name | Value
--------------------
1 | Low | 4
2 | Low | 4
...
3) Instead of INNER JOIN defensively you ought to use LEFT JOIN to get all results even if there is no corresponding value in dictionary table.
I have an alternative solution for your problem by creating a new Priority table (Id, Name, Value)
by joining to this table you will be able to select the value column
SELECT Ticket.*, Priority.Value
FROM Ticket INNER JOIN Priority
ON Priority.Name = Ticket.Priority
Note: although using the case keyword is the most straight forward solution for
this problem
this solution may be useful if you will need this priority value in many places at your system

please help me in building a query for the below mentioned table in sql

i have a table name conversion and i have these below mentioned columns in it i want to multiply Length\width row elements l*w of 'dimension' values and display them in another new table
Please let me know if anything changes for the same logic in ms access
probably it is simple but i dont know exact query to solve the problem waiting for your solutions
ID area length/width dimensions **new column(L*W) here**
1 1 l 3 3*5=15
2 1 w 5
3 2 l 4
4 2 w 8
5 3 l 6
6 3 w 10
7 4 l 12
8 4 w 13
9 4 W 10
waiting for your reply
You could query the table twice: once for lengths and once for widths and then join by area and multiply the values:
select length.area, length.dimension * width.dimension
from
(select area, dimension from conversion where lenwidth = 'l') length
inner join
(select area, dimension from conversion where lenwidth = 'w') width
on length.area = width.area;
Two remarks:
I suppose that it is a typo that you have two width entries for area 4? Otherwise you would have to decide which value to take in above select statement.
It would not be a good idea to keep the old table and have a new table holding the results. What if you change a value? You would have to remember to change the result accordingly every time. So either ditch the old table or use a view instead of a new table.
Try this
select *,
dimensions*(lead(dimensions) over(order by id)) product
from table1;
Or if you want for the set of area then
select *,
case when length_width='l' and (lead(length_width) over(order by id))='w'
then dimensions*(lead(dimensions) over(order by id))
else 0
end as product
from table1;
fiddle