Find number of rows with each property value, taking into account only the most recent rows in SQL - sql

I have a database with tables that represents "edits" to "pages". Every edit has an ID and a timestamp and a "status", which has certain discrete values. Pages have IDs and also have "categories".
I wish to find the number of pages with each status within a given category, taking into account only the state as of the most recent edit.
Edits:
+---------+---------+-----------+--------+
| edit_id | page_id | edit_time | status |
+---------+---------+-----------+--------+
| 1 | 10 | 20210502 | 90 |
| 2 | 10 | 20210503 | 91 |
| 3 | 20 | 20210504 | 91 |
| 4 | 30 | 20210504 | 90 |
| 5 | 30 | 20210505 | 92 |
| 6 | 40 | 20210505 | 90 |
| 7 | 50 | 20210503 | 90 |
+---------+---------+-----------+--------+
Pages:
+---------+--------+
| page_id | cat_id |
+---------+--------+
| 10 | 100 |
| 20 | 100 |
| 30 | 100 |
| 40 | 200 |
+---------+--------+
I want to get, for category 100:
+--------+-------+
| stat | count |
+--------+-------+
| 90 | 1 |
| 91 | 2 |
| 92 | 1 |
+--------+-------+
Page 10 and 30 have two edits, but the later one "overrides" the first one, so only the edits with status 91 and 92 are counted. Pages 20 and 40 account for one of 91 and 90 each and page 50 is in the wrong category so it doesn't feature.
I have tried the following, but it doesn't seem to work. The idea was to select the max (i.e. latest) edit for each page with the right category. Then join that to the edit table and group by the status and count the rows:
SELECT stat, COUNT(*)
FROM edits as out_e
INNER JOIN (
SELECT edit_id, page_id, max(edit_time) as last_edit
FROM edits
INNER JOIN pages on edit_page_id = page_id
WHERE cat_id = 100
GROUP BY page_id
) in_e ON out_e.edit_id = in_e.edit_id
GROUP BY stat
ORDER BY stat;
"""
For example in this fiddle: http://sqlfiddle.com/#!9/42f2ed/1
The result is:
+--------+-------+
| stat | count |
+--------+-------+
| 90 | 3 |
| 91 | 1 |
+--------+-------+
What is the correct way to get this information?

SELECT cat_id, stat, COUNT(*) cnt
FROM pages
JOIN edits ON pages.page_id = edits.edit_page_id
JOIN ( SELECT edit_page_id, MAX(edit_time) edit_time
FROM edits
GROUP BY edit_page_id ) last_time ON edits.edit_page_id = last_time.edit_page_id
AND edits.edit_time = last_time.edit_time
GROUP BY cat_id, stat
Output:
cat_id
stat
cnt
100
90
1
100
91
2
100
92
1
200
90
1
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=7592c7853481f6b5a9626c8d111c1d3b (the query is applicable to MariaDB 10.1).
Is it possible to join on the edit_id (which is unique key for each edit)? – Inductiveload
No, this is impossible. cnt=2 counts two different edit_id values - what value must be used?
But you may obtain concatenated values list - simply add GROUP_CONCAT(edit_id) into the output list.
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=b2391972c3f7c4be4254e47514d0f1da

think you dont need the second join - see if the query helps.
select
t1.stat, count(*) count_
from
(
SELECT
e.edit_id, p.page_id, e.stat,
rank() over(partition by e.edit_page_id order by e.edit_time desc) edit_rank
FROM
edits e
INNER JOIN pages p on e.edit_page_id = p.page_id
WHERE
p.cat_id = 100
) t1
where
t1.edit_rank = 1
group by
t1.stat
fiddle url : (https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=0f681dc8d93cc3eebf9a03e0c8d84850)

select e1.stat, count(e1.stat) as count
from edits e1
join (
select edit_page_id, max(edit_time) as edit_time
from edits
where edit_page_id in (
select page_id
from pages
where cat_id = 100
)
group by edit_page_id
) as e2
on e1.edit_page_id = e2.edit_page_id and e1.edit_time = e2.edit_time
group by e1.stat;
Here's the link to fiddle - http://sqlfiddle.com/#!9/42f2ed/40/0
Edit: updated to consider edit_time instead of stat to find latest record

Related

How to order id's using subtotal from another column in PostgreSQL

I have a table returned by a select query. Example :
id | day | count |
-- | ------ | ----- |
1 | 71 | 3 |
1 | 70 | 2 |
1 |Subtotal| 5 |
2 | 70 | 5 |
2 | 71 | 2 |
2 | 69 | 2 |
2 |Subtotal| 9 |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
the day column contains text values (so varchar)
subtotal is the sum of the counts for an id (e.g. id 2 has subtotal of 5 + 2 + 2 = 9)
I now want to order this table so the id’s with the lowest subtotal count come first, and then ordered by day with subtotal at the end (like before)
Expected output:
id | day | count |
-- | ------ | ----- |
3 | 69 | 1 |
3 | 70 | 1 |
3 |Subtotal| 2 |
1 | 70 | 2 |
1 | 71 | 3 |
1 |Subtotal| 5 |
2 | 69 | 2 |
2 | 70 | 5 |
2 | 71 | 2 |
2 |Subtotal| 9 |
I can't figure out how to order based on subtotal only ?
i've tried multiple order by (eg: ORDER BY day = 'Subtotal' & a mix of others) and using window functions but none are helping. Cheers !
Not sure if it's directly applicable to your source query (since you haven't included it) however the ordering you require on the sample data can be done with:
order by Max(count) over(partition by id), day
Note - ordering by day works with your sample data but as it's a string it will not honour numeric ordering, this should really be ordered by the source of the numerical value - again since we don't have your actual query I can't suggest anything more applicable but I'm sure you can substitute the correct column/expression.
I just crated table with 3 columns and tried to reproduce your expected result. I assume that there might be a problem ordering by day, subtotal would be always on top, but it seems as working solution.
create table test
(
id int,
day varchar(15),
count int
)
insert into test
values
(1,'71',3),
(1,'70',2),
(2,'70',5),
(2,'71',2),
(2,'69',2),
(3,'69',1),
(3,'70',1)
select id, day, count
from
(
select id, day, sum(count) as count
from test
group by id, rollup(day)
) as t
order by Max(count) over(partition by id), day

How to trace back a record all the way to origin using SQL

We are a table called ticketing that tracks all the service tickets. One ticket can lead to another ticket which leads to another ticket indicated by the replaced_by_ticket_id field below
| ticket_id | is_current | replaced_by_ticket_id |
|-----------|------------|-----------------------|
| 134 | 0 | 240 |
| 240 | 0 | 321 |
| 321 | 1 | Null |
| 34 | 0 | 93 |
| 25 | 0 | 16 |
| 16 | 0 | 25 |
| 93 | 1 | Null |
How do I write a query to get the number of tickets leading to the current ones (321 & 93)? I mean I could join the table by itself, but there is no way of knowing how many times to join. Plus different tickets have different number of levels.
Here is the expected result of the query
| ticket_id | total_tickets |
|-----------|---------------|
| 321 | 3 |
| 93 | 4 |
What is the best way to do it?
You can use a recursive query; the trick is to keep track of the original "current" ticket, so you can aggregate by that in the outer query.
So:
with cte as (
select ticket_id, ticket_id as parent_id from ticketing where is_current = 1
union all
select c.ticket_id, t.ticket_id
from ticket t
inner join cte c on c.parent_id = t.replaced_by_ticket_id
)
select ticket_id, count(*) total_tickets
from cte
group by ticket_id

sum last values and group by

I have "steps" table like this
id | points | game_id | price | user_id | timestamp | some | additional | fields
it contains game information.
I have a code which can group by game_id
SELECT game_id, MIN(timestamp),
(SELECT points FROM steps as t2 WHERE t2.game_id = t1.game_id ORDER BY t2.id DESC LIMIT 1) as last_point
WHERE user_id = 1
GROUP BY game_id
but I want to group by price and summarize each last point of the game. my query is
SELECT COUNT(DISTINCT game_id) as game_count, COUNT(id) as step_count, SUM(points), price
FROM steps WHERE user_id = 1
GROUP BY price
But this query returns a sum of all points while I need a sum of the last point in each game.
Please point me to the right way
Example result
last_points_sum | game_count | step_count | price
200 | 2 | 3 | 100
400 | 3 | 4 | 200
where table is
id | points | game_id | price | user_id | timestamp
1 | 10 | 5 | 100 | 1 | 100000001
2 | 200 | 5 | 100 | 1 | 100000002
3 | 200 | 6 | 200 | 1 | 100000003
4 | 0 | 6 | 200 | 1 | 100000004
5 | 400 | 6 | 200 | 1 | 100000005
Is this what you're looking for?
This assumes that timestamp is unique, at least for each instance of game_id.
SELECT
COUNT(DISTINCT game_id) AS game_count,
COUNT(id) AS step_count,
SUM(COALESCE(ltIsLastPoints, 0.0) * points),
price
FROM
(SELECT
game_id ltGameID,
MAX(timestamp) ltTimestamp,
1.0 ltIsLastPoints
FROM
steps
GROUP BY
game_id
) lt RIGHT JOIN
steps
ON ltGameID = game_id
AND ltTimestamp = timestamp
WHERE
user_id = 1
GROUP BY
price;
Your description says you want to group by points but your example query groups by price. I went with price.

Loop over one table, subselect another table and update values of first table with SQL/VBA

I have a source table that has a few different prices for each product (depending on the order quantity). Those prices are listed vertically, so each product could have more than one row to display its prices.
Example:
ID | Quantity | Price
--------------------------
001 | 5 | 100
001 | 15 | 90
001 | 50 | 80
002 | 10 | 20
002 | 20 | 15
002 | 30 | 10
002 | 40 | 5
The other table I have is the result table in which there is only one row for each product, but there are five columns that each could contain the quantity and price for each row of the source table.
Example:
ID | Quantity_1 | Price_1 | Quantity_2 | Price_2 | Quantity_3 | Price_3 | Quantity_4 | Price_4 | Quantity_5 | Price_5
---------------------------------------------------------------------------------------------------------------------------
001 | | | | | | | | | |
002 | | | | | | | | | |
Result:
ID | Quantity_1 | Price_1 | Quantity_2 | Price_2 | Quantity_3 | Price_3 | Quantity_4 | Price_4 | Quantity_5 | Price_5
---------------------------------------------------------------------------------------------------------------------------
001 | 5 | 100 | 15 | 90 | 50 | 80 | | | |
002 | 10 | 20 | 20 | 15 | 30 | 10 | 40 | 5 | |
Here is my Python/SQL solution for this (I'm fully aware that this could not work in any way, but this was the only way for me to show you my interpretation of a solution to this problem):
For Each result_ID In result_table.ID:
Subselect = (SELECT * FROM source_table WHERE source_table.ID = result_ID ORDER BY source_table.Quantity) # the Subselect should only contain rows where the IDs are the same
For n in Range(0, len(Subselect)): # n (index) should start from 0 to last row - 1
price_column_name = 'Price_' & (n + 1)
quantity_column_name = 'Quantity_' & (n + 1)
(UPDATE result_table
SET result_table.price_column_name = Subselect[n].Price, # this should be the price of the n-th row in Subselect
result_table.quantity_column_name = Subselect[n].Quantity # this should be the quantity of the n-th row in Subselect
WHERE result_table.ID = Subselect[n].ID)
I honestly have no idea how to do this with only SQL or VBA (those are the only languages I'd be able to use -> MS-Access).
This is a pain in MS Access. If you can enumerate the values, you can pivot them.
If we assume that price is unique (or quantity or both), then you can generate such a column:
select id,
max(iif(seqnum = 1, quantity, null)) as quantity_1,
max(iif(seqnum = 1, price, null)) as price_1,
. . .
from (select st.*,
(select count(*)
from source_table st2
where st2.id = st.id and st2.price >= st.price
) as seqnum
from source_table st
) st
group by id;
I should note that another solution would use data frames in Python. If you want to take that route, ask another question and tag it with the appropriate Python tags. This question is clearly a SQL question.

Subtract the value of a row from grouped result

I have a table supplier_account which has five coloumns supplier_account_id(pk),supplier_id(fk),voucher_no,debit and credit. I want to get the sum of debit grouped by supplier_id and then subtract the value of credit of the rows in which voucher_no is not null. So for each subsequent rows the value of sum of debit gets reduced. I have tried using 'with' clause.
with debitdetails as(
select supplier_id,sum(debit) as amt
from supplier_account group by supplier_id
)
select acs.supplier_id,s.supplier_name,acs.purchase_voucher_no,acs.purchase_voucher_date,dd.amt-acs.credit as amount
from supplier_account acs
left join supplier s on acs.supplier_id=s.supplier_id
left join debitdetails dd on acs.supplier_id=dd.supplier_id
where voucher_no is not null
But here the debit value will be same for all rows. After subtraction in the first row I want to get the result in second row and subtract the next credit value from that.
I know it is possible by using temporary tables. The problem is I cannot use temporary tables because the procedure is used to generate reports using Jasper Reports.
What you need is an implementation of the running total. The easiest way to do it with a help of a window function:
with debitdetails as(
select id,sum(debit) as amt
from suppliers group by id
)
select s.id, purchase_voucher_no, dd.amt, s.credit,
dd.amt - sum(s.credit) over (partition by s.id order by purchase_voucher_no asc)
from suppliers s
left join debitdetails dd on s.id=dd.id
order by s.id, purchase_voucher_no
SQL Fiddle
Results:
| id | purchase_voucher_no | amt | credit | ?column? |
|----|---------------------|-----|--------|----------|
| 1 | 1 | 43 | 5 | 38 |
| 1 | 2 | 43 | 18 | 20 |
| 1 | 3 | 43 | 8 | 12 |
| 2 | 4 | 60 | 5 | 55 |
| 2 | 5 | 60 | 15 | 40 |
| 2 | 6 | 60 | 30 | 10 |