sum last values and group by - sql

I have "steps" table like this
id | points | game_id | price | user_id | timestamp | some | additional | fields
it contains game information.
I have a code which can group by game_id
SELECT game_id, MIN(timestamp),
(SELECT points FROM steps as t2 WHERE t2.game_id = t1.game_id ORDER BY t2.id DESC LIMIT 1) as last_point
WHERE user_id = 1
GROUP BY game_id
but I want to group by price and summarize each last point of the game. my query is
SELECT COUNT(DISTINCT game_id) as game_count, COUNT(id) as step_count, SUM(points), price
FROM steps WHERE user_id = 1
GROUP BY price
But this query returns a sum of all points while I need a sum of the last point in each game.
Please point me to the right way
Example result
last_points_sum | game_count | step_count | price
200 | 2 | 3 | 100
400 | 3 | 4 | 200
where table is
id | points | game_id | price | user_id | timestamp
1 | 10 | 5 | 100 | 1 | 100000001
2 | 200 | 5 | 100 | 1 | 100000002
3 | 200 | 6 | 200 | 1 | 100000003
4 | 0 | 6 | 200 | 1 | 100000004
5 | 400 | 6 | 200 | 1 | 100000005

Is this what you're looking for?
This assumes that timestamp is unique, at least for each instance of game_id.
SELECT
COUNT(DISTINCT game_id) AS game_count,
COUNT(id) AS step_count,
SUM(COALESCE(ltIsLastPoints, 0.0) * points),
price
FROM
(SELECT
game_id ltGameID,
MAX(timestamp) ltTimestamp,
1.0 ltIsLastPoints
FROM
steps
GROUP BY
game_id
) lt RIGHT JOIN
steps
ON ltGameID = game_id
AND ltTimestamp = timestamp
WHERE
user_id = 1
GROUP BY
price;
Your description says you want to group by points but your example query groups by price. I went with price.

Related

How to join two tables with sum of one column and with condition

I have two tables:
table 1
+-------------+--------------+-----------------+
| id_product | id_customer |start_date |
+-------------+--------------+-----------------+
| 1 | 1 | 2021-08-28T10:37|
| 1 | 2 | 2021-08-28T11:17|
| 1 | 3 | 2021-08-28T12:27|
| 2 | 1 | 2021-08-28T17:00|
table 2
+-------------+------------------+----------+-------------------------------+
| id_customer | stop_date | duration | 20 other columns like duration|
+-------------+------------------+----------+-------------------------------+
| 1 | 2021-08-27T17:00| 20 | ...
| 1 | 2021-08-26T17:00| 40 | ...
| 2 | 2021-08-29T17:00| 120 | ...
| 1 | 2021-08-30T17:00| 40 | ...
| ..........................................|
start_date in table 1 is the date the customer started the product.
stop_datein table 2 is the date the customer stopped the product.
I want to join these two tables to have something like : one row with :
productid
customer_id
start_date
sum of all duration for all the stop_date BEFORE start_date.
same as duration for all the 20 reminding columns.
example for product_id = 1, custom_id = 1 :
+-------------+--------------+-----------------+---------------+-----------------------------------+
| id_product | id_customer |start_date | sum(duration) | sum(all other columns from table 2)
+-------------+--------------+-----------------+---------------+-----------------------------------+
| 1 | 1 | 2021-08-28T10:37| 60
I have a really big tables, I am using pyspark with SQL. Do you know an optimised way to this ?
Thank you
EDIT :
There is also an id_product in table2
SELECT
Table_1.id_product,
Table_1.id_customer,
Table_1.start_date,
SUM(duration) AS [sum(duration)]
---,SUM(duration2)
---,SUM(duration3)
FROM Table_1
LEFT JOIN Table_2 ON
Table_2.id_customer = Table_1.id_customer
AND Table_2.id_product = Table_1.id_product
AND Table_2.stop_date < Table_1.start_date
GROUP BY Table_1.id_product,Table_1.id_customer, Table_1.start_date

Find number of rows with each property value, taking into account only the most recent rows in SQL

I have a database with tables that represents "edits" to "pages". Every edit has an ID and a timestamp and a "status", which has certain discrete values. Pages have IDs and also have "categories".
I wish to find the number of pages with each status within a given category, taking into account only the state as of the most recent edit.
Edits:
+---------+---------+-----------+--------+
| edit_id | page_id | edit_time | status |
+---------+---------+-----------+--------+
| 1 | 10 | 20210502 | 90 |
| 2 | 10 | 20210503 | 91 |
| 3 | 20 | 20210504 | 91 |
| 4 | 30 | 20210504 | 90 |
| 5 | 30 | 20210505 | 92 |
| 6 | 40 | 20210505 | 90 |
| 7 | 50 | 20210503 | 90 |
+---------+---------+-----------+--------+
Pages:
+---------+--------+
| page_id | cat_id |
+---------+--------+
| 10 | 100 |
| 20 | 100 |
| 30 | 100 |
| 40 | 200 |
+---------+--------+
I want to get, for category 100:
+--------+-------+
| stat | count |
+--------+-------+
| 90 | 1 |
| 91 | 2 |
| 92 | 1 |
+--------+-------+
Page 10 and 30 have two edits, but the later one "overrides" the first one, so only the edits with status 91 and 92 are counted. Pages 20 and 40 account for one of 91 and 90 each and page 50 is in the wrong category so it doesn't feature.
I have tried the following, but it doesn't seem to work. The idea was to select the max (i.e. latest) edit for each page with the right category. Then join that to the edit table and group by the status and count the rows:
SELECT stat, COUNT(*)
FROM edits as out_e
INNER JOIN (
SELECT edit_id, page_id, max(edit_time) as last_edit
FROM edits
INNER JOIN pages on edit_page_id = page_id
WHERE cat_id = 100
GROUP BY page_id
) in_e ON out_e.edit_id = in_e.edit_id
GROUP BY stat
ORDER BY stat;
"""
For example in this fiddle: http://sqlfiddle.com/#!9/42f2ed/1
The result is:
+--------+-------+
| stat | count |
+--------+-------+
| 90 | 3 |
| 91 | 1 |
+--------+-------+
What is the correct way to get this information?
SELECT cat_id, stat, COUNT(*) cnt
FROM pages
JOIN edits ON pages.page_id = edits.edit_page_id
JOIN ( SELECT edit_page_id, MAX(edit_time) edit_time
FROM edits
GROUP BY edit_page_id ) last_time ON edits.edit_page_id = last_time.edit_page_id
AND edits.edit_time = last_time.edit_time
GROUP BY cat_id, stat
Output:
cat_id
stat
cnt
100
90
1
100
91
2
100
92
1
200
90
1
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=7592c7853481f6b5a9626c8d111c1d3b (the query is applicable to MariaDB 10.1).
Is it possible to join on the edit_id (which is unique key for each edit)? – Inductiveload
No, this is impossible. cnt=2 counts two different edit_id values - what value must be used?
But you may obtain concatenated values list - simply add GROUP_CONCAT(edit_id) into the output list.
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=b2391972c3f7c4be4254e47514d0f1da
think you dont need the second join - see if the query helps.
select
t1.stat, count(*) count_
from
(
SELECT
e.edit_id, p.page_id, e.stat,
rank() over(partition by e.edit_page_id order by e.edit_time desc) edit_rank
FROM
edits e
INNER JOIN pages p on e.edit_page_id = p.page_id
WHERE
p.cat_id = 100
) t1
where
t1.edit_rank = 1
group by
t1.stat
fiddle url : (https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=0f681dc8d93cc3eebf9a03e0c8d84850)
select e1.stat, count(e1.stat) as count
from edits e1
join (
select edit_page_id, max(edit_time) as edit_time
from edits
where edit_page_id in (
select page_id
from pages
where cat_id = 100
)
group by edit_page_id
) as e2
on e1.edit_page_id = e2.edit_page_id and e1.edit_time = e2.edit_time
group by e1.stat;
Here's the link to fiddle - http://sqlfiddle.com/#!9/42f2ed/40/0
Edit: updated to consider edit_time instead of stat to find latest record

Select pairs of values based on condition in other column - PostgreSQL

I've been trying to solve an issue for the past couple of days, but couldn't figure out what the solution would be...
I have a table as the following:
+--------+-----------+-------+
| ShopID | ArticleID | Price |
+--------+-----------+-------+
| 1 | 3 | 150 |
| 1 | 2 | 80 |
| 3 | 3 | 100 |
| 4 | 2 | 95 |
+--------+-----------+-------+
And I woud like to select pairs of shop IDs for which the price of the same article is higher.
F.e. this should look like:
+----------+----------+---------+
| ShopID_1 | ShopID_2 |ArticleID|
+----------+----------+---------+
| 4 | 1 | 2 |
| 1 | 3 | 3 |
+----------+----------+---------+
... showing that Article 2 ist more expensive in ShopID 4 than in ShopID 2. Etc
My code so far looks as following:
SELECT ShopID AS ShopID_1, ShopID AS ShopID_2, ArticleID FROM table
WHERE table.ArticleID=table.ArticleID and table.Price > table.Price
But it doesn't give the result I am searching for.
Can anyone help me with this objective? Thank you very much.
The problem here is about calculating Top N items per Group.
Assuming you have the following data, in table sales.
# select * from sales;
shopid | articleid | price
--------+-----------+-------
1 | 2 | 80
3 | 3 | 100
4 | 2 | 95
1 | 3 | 150
5 | 3 | 50
With the following query we can create a partition for each ArticleId
select
ArticleID,
ShopID,
Price,
row_number() over (partition by ArticleID order by Price desc) as Price_Rank from sales;
This will result:
articleid | shopid | price | price_rank
-----------+--------+-------+------------
2 | 4 | 95 | 1
2 | 1 | 80 | 2
3 | 1 | 150 | 1
3 | 3 | 100 | 2
3 | 5 | 50 | 3
Then we simply select Top 2 items for each AritcleId:
select
ArticleID,
ShopID,
Price
from (
select
ArticleID,
ShopID,
Price,
row_number() over (partition by ArticleID order by Price desc) as Price_Rank
from sales) sales_rank
where Price_Rank <= 2;
which will result:
articleid | shopid | price
-----------+--------+-------
2 | 4 | 95
2 | 1 | 80
3 | 1 | 150
3 | 3 | 100
Finally, we can use crosstab function to get the expected pivot view.
select *
from crosstab(
'select
ArticleID,
ShopID,
ShopID
from (
select
ArticleID,
ShopID,
Price,
row_number() over (partition by ArticleID order by Price desc) as Price_Rank
from sales) sales_rank
where Price_Rank <= 2')
AS sales_top_2("ArticleID" INT, "ShopID_1" INT, "ShopID_2" INT);
And the result:
ArticleID | ShopID_1 | ShopID_2
-----------+----------+----------
2 | 4 | 1
3 | 1 | 3
Note:
You may need to call CREATE EXTENSION tablefunc; in case if you get the error function crosstab(unknown) does not exist.
This query should work:
SELECT t1.ShopID AS ShopID_1, t2.ShopID AS ShopID_2, t1.ArticleID
FROM <yourtable> t1 JOIN
<yourtable> t2
ON t1.ArticleID = t2.ArticleID AND t1.Price > t2.Price;
That is, you need a self-join and appropriate table aliases.

Want to JOIN fourth table in query

I have four tables:
mls_category
points_matrix
mls_entry
bonus_points
My first table (mls_category) is like below:
*--------------------------------*
| cat_no | store_id | cat_value |
*--------------------------------*
| 10 | 101 | 1 |
| 11 | 101 | 4 |
*--------------------------------*
My second table (points_matrix) is like below:
*----------------------------------------------------*
| pm_no | store_id | value_per_point | maxpoint |
*----------------------------------------------------*
| 1 | 101 | 1 | 10 |
| 2 | 101 | 2 | 50 |
| 3 | 101 | 3 | 80 |
*----------------------------------------------------*
My third table (mls_entry) is like below:
*-------------------------------------------*
| user_id | category | distance | status |
*-------------------------------------------*
| 1 | 10 | 20 | approved |
| 1 | 10 | 30 | approved |
| 1 | 11 | 40 | approved |
*-------------------------------------------*
My fourth table (bonus_points) is like below:
*--------------------------------------------*
| user_id | store_id | bonus_points | type |
*--------------------------------------------*
| 1 | 101 | 200 | fixed |
| 2 | 102 | 300 | fixed |
| 1 | 103 | 4 | per |
*--------------------------------------------*
Now, I want to add bonus points value into the sum of total distance according to the store_id, user_id and type.
I am using the following code to get total distance:
SELECT MIN(b.value_per_point) * d.total_distance FROM points_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category
) d ON b.store_id = d.store_id AND b.maxpoint >= d.total_distance
The above code is correct to calculate value, now I want to JOIN my fourth table.
This gives me sum (60*3 = 180) as total value. Now, I want (60+200)*3 = 780 for user 1 and store id 101 and value is fixed.
i think your query will be like below
SELECT Max(b.value_per_point)*( max(d.total_distance)+max(bonus_points)) FROM mls_point_matrix b
JOIN
(
SELECT store_id, sum(t1.totald/c.cat_value) as total_distance FROM mls_category c
JOIN
(
SELECT SUM(distance) totald, user_id, category FROM mls_entry
WHERE user_id= 1 AND status = 'approved' GROUP BY user_id, category
) t1 ON c.cat_no = t1.category group by store_id
) d ON b.store_id = d.store_id inner join bonus_points bp on bp.store_id=d.store_id
DEMO fiddle

Get Last Row of Different id then display data that is greater than zero

This is my table...
+----+--------+
| id | amount |
+----+--------+
| 1 | 100 |
| 1 | 50 |
| 1 | 0 |
| 2 | 500 |
| 2 | 100 |
| 3 | 300 |
| 3 | -2 |
| 4 | 400 |
| 4 | 200 |
+----+--------+
I would like to choose from it each value of id that does not have a nonpositive (i.e. negative or 0) value associated with it, and the smallest amount associated with that id.
If I use this code...
SELECT DISTINCT id, amount
FROM table t
WHERE amount = (SELECT MIN(amount) FROM table WHERE id= t.id)
... then these results show...
+----+--------+
| id | amount |
+----+--------+
| 1 | 0 |
| 2 | 100 |
| 3 | -2 |
| 4 | 200 |
+----+--------+
But what I want the statement to return is...
+----+--------+
| id | amount |
+----+--------+
| 2 | 100 |
| 4 | 200 |
+----+--------+
Just add amount>0 in your query. You missed out that condition in your query. That should do it.
SELECT DISTINCT id, amount FROM table t
WHERE amount = (SELECT MIN(amount) FROM table WHERE id= t.id)
and amount>0;
If you want to display id, where min(amount) > 0, the use this.
SELECT id, min(amount) as amount
FROM table t
group by id
having min(amount) > 0;
Please try the following...
SELECT id,
MIN( amount )
FROM table
WHERE amount > 0
GROUP BY id
ORDER BY id;
This statement starts by selecting all records WHERE amount is larger than 0.
The records from the resulting dataset are then grouped by each surviving value of id and the smallest value of amount is chosen for that GROUP / id.
The resulting pairs of values are then sorted by ORDER id and returned to the user.
If you have any questions or comments, then please feel free to post a Comment accordingly.