(SQLITE) SUM based on cumulative range - sql

I have a table like this:
id | sales | profit | place
_____________________|______
1 | 2 | 1 | US
2 | 3 | - | SL
3 | 1 | 1 | India
4 | 0 | - | Aus
5 | 2 | - | -
6 | 4 | 1 | UK
7 | 1 | - | -
Now what I want to achieve is, wherever profit = 1, I want cumulative sales till that point in order of id column and the corresponding place
i.e.
| cumulativeSales | place |
| ________________ |_______ |
| 2 | US | //(2)
| 6 | India | //(2+3+1)
| 12 | UK | //(2+3+1+0+2+4)
What query should I write for this?

If using a modern version of sqlite (3.25 or newer), you can use window functions:
SELECT cumulativeSales, place
FROM (SELECT id, place, profit
, sum(sales) OVER (ORDER BY id) AS cumulativeSales
FROM yourtable)
WHERE profit = 1
ORDER BY id;
gives
cumulativeSales place
--------------- ----------
2 US
6 India
12 UK
The window function form of sum() (Indicated by the following OVER clause) used in the inner query sums up a given window of the result rows. The default behavior with just an ORDER BY (Without an explicit framing term) is to use the first row up to all the rows with the same value that's being sorted on, but nothing greater. In other words, it calculates the cumulative sum. For way more detail, see the documentation.
The outer query just limits the results to those rows where profit is 1. If you did that all in one without the subquery, it'd only calculate the cumulative sum of those rows, not all the rows, because window functions are computed after WHERE filtering is done.
A different approach that uses a correlated subquery to calculate the running total, that works on older versions without window function support:
SELECT (SELECT sum(sales) FROM yourtable AS t2 WHERE t2.id <= t.id) AS cumulativeSales
, place
FROM yourtable AS t
WHERE profit = 1
ORDER BY id;

Related

SQL sum all rows by id

I am trying to learn SQL queries and have this scenario where I have this table:
Table1
ID | Name | Hour
----------------
1 | Mark | 2
2 | ken | 1.5
3 | jake | 3
1 | Mark | 1.8
2 | ken | 1
Expected result
ID | Name | Hour
----------------
1 | Mark | 3.8
2 | ken | 2.5
3 | jake | 3
I have tried to use the sum() function but I get an error.
My query:
Select ID, Name, Sum(Hour)
From Table1
Where ID = ID
Response:
Kindly use Group by clause whenever the Aggregate functions (min(),max(),sum(),count(),...etc.,) and columns are used together.
Non aggregated columns present in SELECT columns should be used in GROUP BY clause.
For using aggregate function you need to use Group By like this:
Select ID, Name , Sum(Hour) AS Hour From Table1
Group By ID, Name
Order By ID

Postgres create view with column values based on another table?

I'm implementing a view to store leaderboard data of the top 10 users that is computed using an expensive COUNT(*). I'm planning on the view to look something like this:
id SERIAL PRIMARY KEY
user_id TEXT
type TEXT
rank INTEGER
count INTEGER
-- adding an index to user_id
-- adding a two-column unique index to user_id and type
I'm having trouble with seeing how this view should be created to properly account for the rank and type. Essentially, I have a big table (~30 million rows) like this:
+----+---------+---------+----------------------------+
| id | user_id | type | created_at |
+----+---------+---------+----------------------------+
| 1 | 1 | Diamond | 2021-05-11 17:35:18.399517 |
| 2 | 1 | Diamond | 2021-05-12 17:35:17.399517 |
| 3 | 1 | Diamond | 2021-05-12 17:35:18.399517 |
| 4 | 2 | Diamond | 2021-05-13 17:35:18.399517 |
| 5 | 1 | Clay | 2021-05-14 17:35:18.399517 |
| 6 | 1 | Clay | 2021-05-15 17:35:18.399517 |
+----+---------+---------+----------------------------+
With the table above, I'm trying to achieve something like this:
+----+---------+---------+------+-------+
| id | user_id | type | rank | count |
+----+---------+---------+------+-------+
| 1 | 1 | Diamond | 1 | 3 |
| 2 | 2 | Diamond | 2 | 1 |
| 3 | 1 | Clay | 1 | 2 |
| 4 | 1 | Weekly | 1 | 5 | -- 3 diamonds + 2 clay obtained between Mon-Sun
| 5 | 2 | Weekly | 2 | 1 |
+----+---------+---------+------+-------+
By Weekly I am counting the time from the last Sunday to the upcoming Sunday.
Is this doable using only SQL, or is some kind of script needed? If doable, how would this be done? It's worth mentioning that there are thousands of different types, so not having to manually specify type would be preferred.
If there's anything unclear, please let me know and I'll do my best to clarify. Thanks!
The "weekly" rows are produced in a different way compared to the "user" rows (I called them two different "categories"). To get the result you want you can combine two queries using UNION ALL.
For example:
select 'u' as category, user_id, type,
rank() over(partition by type order by count(*) desc) as rk,
count(*) as cnt
from scores
group by user_id, type
union all
select 'w', user_id, 'Weekly',
rank() over(order by count(*) desc),
count(*) as cnt
from scores
group by user_id
order by category, type desc, rk
Result:
category user_id type rk cnt
--------- -------- -------- --- ---
u 1 Diamond 1 3
u 2 Diamond 2 1
u 1 Clay 1 2
w 1 Weekly 1 5
w 2 Weekly 2 1
See running example at DB Fiddle.
Note: For the sake of simplicity I left the filtering by timestamp out of the query. If you really needed to include only the rows of the last 7 days (or other period of time), it would be a matter of adding a WHERE clause in both subqueries.
I think this is what you were talking about, right?
WITH scores_plus_weekly AS ((
SELECT id, user_id, 'Weekly' AS type, created_at
FROM scores
WHERE created_at BETWEEN '2021-05-10' AND '2021-05-17'
)
UNION (
SELECT * FROM scores
))
SELECT
row_number() OVER (ORDER BY CASE "type" WHEN 'Diamond' THEN 0 WHEN 'Clay' THEN 1 ELSE 2 END, count(*) DESC) as "id",
user_id,
"type",
row_number() OVER (PARTITION BY count(*) DESC) as "rank",
count(*)
FROM scores_plus_weekly
GROUP BY user_id, "type"
ORDER BY "id";
I'm sure this is not the only way, but I thought the result wasn't too complex. This query first combines the original database with all scores from this week. For the sake of consistency I picked a date range that matches your entire example set. It then groups by user_id and type to get the counts for each combination. The row_numbers will give you the overall rank and the rank per type. A big part of this query consists of sorting by type, so if you're joining another table that contains the order or priority of the types, the CASE can probably be simplified.
Then, lastly, this entire query can be caught in a view using the CREATE VIEW score_ranks AS , followed by your query.

Return the mean of a grouped value along with the mean of the top n% of that value in the same query?

I need to write one query that returns both the average value of fields in a group as well as the average of the top 33% of the values of those fields in a group.
UserId | Sequence | Value | Value2
-------|----------|-------|-------
1 | 1 | 5 | 0
1 | 2 | 10 | 15
1 | 3 | 15 | 20
1 | 4 | NULL | 25
1 | 5 | NULL | 30
1 | 6 | NULL | 60
The return needs to also contain the denominators used to calculate the means, I want to group by user and return something like this:
UserId | ValueMean | ValueDenom | ValueTopNMean | ValueTopNDenom | Value2Mean | Value2Denom | Value2TopNMean | Value2TopNDenom
-------|-----------|------------|---------------|----------------|------------|-------------|----------------|----------------
1 | 10 | 3 | 15 | 1 | 25 | 6 | 45 | 2
I've tried various window functions (NTILE, PERCENT_RANK, etc.), but what is tricky is I have multiple fields of values that will need to undergo this same operation, and the denominators for each Value field will vary (n% will stay the same, however). Please let me know if I've been unclear or you need more information.
The overall average and top value, as well as the count of non-null values, can easily be computed with aggregate functions.
As for the average and count of top N values: you can use ntile() in a subquery to identify the relevant rows first, then use that information in conditional expressions within aggregate functions in the outer query.
select
userid,
avg(value) avg_value,
count(value) cnt_value,
max(value) top_value,
avg(case when ntile_value = 1 then value end) avg_topn_value,
sum(case when ntile_value = 1 then 1 else 0 end) cnt_topn_value
from (select t.*, ntile(3) over(order by value) ntile_value from mytable t) t
group by userid

sqlite self join query using max()

Given the following table:
| id | user_id | score | date |
|----|---------|-------|------------|
| 1 | 1 | 1 | 2017-08-31 |
| 2 | 1 | 1 | 2017-09-01 |
| 3 | 2 | 2 | 2017-09-01 |
| 4 | 1 | 2 | 2017-09-02 |
| 5 | 2 | 2 | 2017-09-02 |
| 6 | 3 | 1 | 2017-09-02 |
Need to find the user_ids that have the max score for any given date (there can be more than one), so I'm trying:
SELECT s1.user_id
FROM (
SELECT max(score) as max, user_id, date
FROM scores
) AS s2
INNER JOIN scores as s1
ON s1.date = '2017-08-31'
AND s1.score = s2.max
The query returns correctly for the last 2 dates but returns 0 records for the first date ('2107-08-31'), it should return the score of 1
Why won't that first date return correctly and/or is there a more elegant way of writing this query?
Here is the version of the query that comes closest to working, though it does not work when there is only one test score. I do not understand how I am getting away with not using the GROUP BY clause in the aggregate.
SELECT s1.user_id
FROM (
SELECT max(score) as max, user_id, date
FROM scores
) AS s2
INNER JOIN scores as s1
ON s1.date = :date
AND s1.score = s2.max
A correct query option is:
SELECT user_id
FROM scores
WHERE score = (SELECT MAX(score) FROM scores WHERE date = '2017-08-01')
Note that one issue with your query (which is probably your issue) is that the user_id and date in the sub query are not going to be related to the row that contains MAX(score) since you don't have any "group by" clause to force grouping

An SQL query that combines aggregate and non-aggregate values in one row

The following query gives me the information that I need but I want it to take it just a step further. In the table at the bottom (only showing a subset of the fields), I want to group by cust_line in an unusual way (at least to me it's unusual).
Let's look at the items with a cust_line of 2 as an example. I would like these to be represented by one line not 5. For this line, I would like to select all the fields except for the price field where the cust_part = "GROUPINVC". For the total field I would like it to be 'sum(total) as new_total' and for the price, I would like it to be new_total / qty_invoiced, where qty_invoiced is the value on the line where cust_part = "GROUPINV".
Is what I am asking for completely ridiculous? Is it even possible? I'm not advanced at SQL so it may also be easy and I just don't know how to approach it. I thought of using 'partition by' but I couldn't imagine how I would get it to work as I figured it would still return 5 rows where I only want 1.
I've also looked at these questions with similar titles but not really what I am looking for:
SQL query that returns aggregate AND non aggregate results
Combined aggregated and non-aggregate query in SQL
SELECT L.CUST_LINE, I.LINE_NO, I.ORDER_NO, I.STAGE, I.ORDER_LINE_POS, I.CUST_PART,
I.LINE_ITEM_NO, I.QTY_INVOICED, I.CUST_DESC, I.DESCRIPTION, I.SALE_UNIT_PRICE, I.PRICE_TOTAL,
I.INVOICE_NO, I.CUSTOMER_PO_NO, I.ORDER_NO, I.CUSTOMER_NO, I.CATALOG_DESC, I.ORDER_LINE_NOTES
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
ORDER BY L.CUST_LINE;
| cust_line | line_no | cust_part | qty_invoiced | cust_desc | price | total |
| 1 | 4 | ... | 1 | ... | 55 | 55 |
| 2 | 1 | GROUPINV | 1 | some part | 0 | 0 |
| 2 | 6 | ... | 3 | ... | 0 | 0 |
| 2 | 2 | ... | 1 | ... | 0 | 0 |
| 2 | 3 | ... | 1 | ... | 0 | 0 |
| 2 | 7 | ... | 2 | ... | 10 | 20 |
| 3 | 7 | ... | 1 | ... | 67 | 67 |
You can use an analytic function to calculate a total over multiple rows of a result set, then filter out the rows you don't want.
Leaving out all the extra columns for sake of brevity:
SELECT cust_line, qty_invoiced, order_total/qty_invoiced AS price
FROM (
SELECT l.cust_line, qty_invoiced,
SUM(total) OVER (PARTITION BY l.cust_line) AS order_total,
COUNT(cust_line) OVER (PARTITION BY l.cust_line) AS group_count
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
)
WHERE ( cust_part = 'GROUPINV' OR group_count = 1 )
ORDER BY cust_line
I am guessing on what you want in the PARTITION BY clause; this is essentially a GROUP BY that applies only to the SUM function. Not sure if you might also want order_no in the partition.
The trick is to select all the rows in the inner query, applying SUM across them all; then filter out the rows you are not interested in in the outermost query.