Running total of "matches" using a window function in SQL - sql

I want to create a window function that will count how many times the value of the field in the current row appears in the part of the ordered partition coming before the current row. To make this more concrete, suppose we have a table like so:
| id| fruit | date |
+---+--------+------+
| 1 | apple | 1 |
| 1 | cherry | 2 |
| 1 | apple | 3 |
| 1 | cherry | 4 |
| 2 | orange | 1 |
| 2 | grape | 2 |
| 2 | grape | 3 |
And we want to create a table like so (omitting the date column for clarity):
| id| fruit | prior |
+---+--------+-------+
| 1 | apple | 0 |
| 1 | cherry | 0 |
| 1 | apple | 1 |
| 1 | cherry | 1 |
| 2 | orange | 0 |
| 2 | grape | 0 |
| 2 | grape | 1 |
Note that for id = 1, moving along the ordered partition, the first entry 'apple' doesn't match anything (since the implied set is empty), the next fruit, 'cherry' also doesn't match. Then we get to 'apple' again, which is a match and so on. I'm imagining the SQL looks something like this:
SELECT
id, fruit,
<some kind of INTERSECT?> OVER (PARTITION BY id ORDER by date) AS prior
FROM fruit_table;
But I cannot find anything that looks right. FWIW, I'm using PostgreSQL 8.4.

You could solve that without a window function rather elegantly with a self-left join and a count():
SELECT t.id, t.fruit, t.day, count(t0.*) AS prior
FROM tbl t
LEFT JOIN tbl t0 ON (t0.id, t0.fruit) = (t.id, t.fruit) AND t0.day < t.day
GROUP BY t.id, t.day, t.fruit
ORDER BY t.id, t.day
I renamed the date column day because date is a reserved word in every SQL standard and in PostgreSQL.
I corrected a mistake in your sample data. They way you had it, it did not check out. Might confuse people.
If your point is to do it with a window function, this one should work:
SELECT id, fruit, day
,count(*) OVER (PARTITION BY id, fruit ORDER BY day) - 1 AS prior
FROM tbl
ORDER BY id, day
This works, because, I quote the manual:
If frame_end is omitted it defaults to CURRENT ROW.
You effectively count how many rows had the same (id, fruit) on prior days - including the current row. That's what the - 1 is for.

Related

Can I generate a map that shows a particular row was in a particular group in SQLite?

Say I have the following data:
+--------+-------+
| Group | Data |
+--------+-------+
| 1 | row 1 |
| 1 | row 2 |
| 1 | row 3 |
| 20 | row 1 |
| 20 | row 3 |
| 10 | row 1 |
| 10 | row A |
| 10 | row 2 |
| 10 | row 3 |
+--------+-------+
Is it possible to draw a map that shows which groups have which rows? Groups may not be contagious, so they can be placed into a separate table and use the row index for the string index instead. Something like this:
+-------+
| Group |
+-------+
| 1 |
| 20 |
| 10 |
+-------+
+-------+----------------+
| Data | Found in group |
+-------+----------------+
| row 1 | 111 |
| row A | 1 |
| row 2 | 1 1 |
| row 3 | 111 |
+-------+----------------+
Where the first character represents Group 1, the 2nd is Group 20 and the 3rd is Group 10.
Ordering of the Group rows isn't critical so long as I can reference which row goes with which character.
I only ask this because I saw this crazy example in the documentation generating a fractal, but I can't quite get my head around it.
Is this doable?
To find the missing values, first thing is to prepare a dataset which have all possible combination. You can achieve that using CROSS JOIN.
Once you have that DataSet, compare it with the actual DataSet.
Considering the Order by is done in the Grp column, you can achieve it using below.
SELECT
a.Data,group_concat(case when base.Grp is null then "." else "1" end,'') as Found_In_Group
,group_concat(b.Grp) as Group_Order
FROM
(SELECT Data FROM yourtable Group By Data)a
CROSS JOIN
(SELECT Grp FROM yourtable Group By Grp Order by Grp)b
LEFT JOIN yourtable base
ON b.Grp=base.Grp
AND a.Data=base.Data
GROUP BY a.Data
Note: Considered . instead of blank for better visibility to represent missing Group.
Data
Found_In_Group
Group_Order
row 1
111
1,10,20
row 2
11.
1,10,20
row 3
111
1,10,20
row A
.1.
1,10,20
Demo: Try here
SELECT Data, group_concat("Group") AS "Found in group"
FROM yourtable
GROUP BY Data
will give you a CSV list of groups.

Postgres create view with column values based on another table?

I'm implementing a view to store leaderboard data of the top 10 users that is computed using an expensive COUNT(*). I'm planning on the view to look something like this:
id SERIAL PRIMARY KEY
user_id TEXT
type TEXT
rank INTEGER
count INTEGER
-- adding an index to user_id
-- adding a two-column unique index to user_id and type
I'm having trouble with seeing how this view should be created to properly account for the rank and type. Essentially, I have a big table (~30 million rows) like this:
+----+---------+---------+----------------------------+
| id | user_id | type | created_at |
+----+---------+---------+----------------------------+
| 1 | 1 | Diamond | 2021-05-11 17:35:18.399517 |
| 2 | 1 | Diamond | 2021-05-12 17:35:17.399517 |
| 3 | 1 | Diamond | 2021-05-12 17:35:18.399517 |
| 4 | 2 | Diamond | 2021-05-13 17:35:18.399517 |
| 5 | 1 | Clay | 2021-05-14 17:35:18.399517 |
| 6 | 1 | Clay | 2021-05-15 17:35:18.399517 |
+----+---------+---------+----------------------------+
With the table above, I'm trying to achieve something like this:
+----+---------+---------+------+-------+
| id | user_id | type | rank | count |
+----+---------+---------+------+-------+
| 1 | 1 | Diamond | 1 | 3 |
| 2 | 2 | Diamond | 2 | 1 |
| 3 | 1 | Clay | 1 | 2 |
| 4 | 1 | Weekly | 1 | 5 | -- 3 diamonds + 2 clay obtained between Mon-Sun
| 5 | 2 | Weekly | 2 | 1 |
+----+---------+---------+------+-------+
By Weekly I am counting the time from the last Sunday to the upcoming Sunday.
Is this doable using only SQL, or is some kind of script needed? If doable, how would this be done? It's worth mentioning that there are thousands of different types, so not having to manually specify type would be preferred.
If there's anything unclear, please let me know and I'll do my best to clarify. Thanks!
The "weekly" rows are produced in a different way compared to the "user" rows (I called them two different "categories"). To get the result you want you can combine two queries using UNION ALL.
For example:
select 'u' as category, user_id, type,
rank() over(partition by type order by count(*) desc) as rk,
count(*) as cnt
from scores
group by user_id, type
union all
select 'w', user_id, 'Weekly',
rank() over(order by count(*) desc),
count(*) as cnt
from scores
group by user_id
order by category, type desc, rk
Result:
category user_id type rk cnt
--------- -------- -------- --- ---
u 1 Diamond 1 3
u 2 Diamond 2 1
u 1 Clay 1 2
w 1 Weekly 1 5
w 2 Weekly 2 1
See running example at DB Fiddle.
Note: For the sake of simplicity I left the filtering by timestamp out of the query. If you really needed to include only the rows of the last 7 days (or other period of time), it would be a matter of adding a WHERE clause in both subqueries.
I think this is what you were talking about, right?
WITH scores_plus_weekly AS ((
SELECT id, user_id, 'Weekly' AS type, created_at
FROM scores
WHERE created_at BETWEEN '2021-05-10' AND '2021-05-17'
)
UNION (
SELECT * FROM scores
))
SELECT
row_number() OVER (ORDER BY CASE "type" WHEN 'Diamond' THEN 0 WHEN 'Clay' THEN 1 ELSE 2 END, count(*) DESC) as "id",
user_id,
"type",
row_number() OVER (PARTITION BY count(*) DESC) as "rank",
count(*)
FROM scores_plus_weekly
GROUP BY user_id, "type"
ORDER BY "id";
I'm sure this is not the only way, but I thought the result wasn't too complex. This query first combines the original database with all scores from this week. For the sake of consistency I picked a date range that matches your entire example set. It then groups by user_id and type to get the counts for each combination. The row_numbers will give you the overall rank and the rank per type. A big part of this query consists of sorting by type, so if you're joining another table that contains the order or priority of the types, the CASE can probably be simplified.
Then, lastly, this entire query can be caught in a view using the CREATE VIEW score_ranks AS , followed by your query.

Oracle SQL: Counting how often an attribute occurs for a given entry and choosing the attribute with the maximum number of occurs

I have a table that has a number column and an attribute column like this:
1.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 1 | b |
| 1 | a |
| 2 | a |
| 2 | b |
| 2 | b |
+------------
I want to make the number unique, and the attribute to be whichever attribute occured most often for that number, like this (This is the end-product im interrested in) :
2.
+-----+-----+
| num | att |
-------------
| 1 | a |
| 2 | b |
+------------
I have been working on this for a while and managed to write myself a query that looks up how many times an attribute occurs for a given number like this:
3.
+-----+-----+-----+
| num | att |count|
------------------+
| 1 | a | 1 |
| 1 | b | 2 |
| 2 | a | 1 |
| 2 | b | 2 |
+-----------------+
But I can't think of a way to only select those rows from the above table where the count is the highest (for each number of course).
So basically what I am asking is given table 3, how do I select only the rows with the highest count for each number (Of course an answer describing providing a way to get from table 1 to table 2 directly also works as an answer :) )
You can use aggregation and window functions:
select num, att
from (
select num, att, row_number() over(partition by num order by count(*) desc, att) rn
from mytable
group by num, att
) t
where rn = 1
For each num, this brings the most frequent att; if there are ties, the smaller att is retained.
Oracle has an aggregation function that does this, stats_mode().:
select num, stats_mode(att)
from t
group by num;
In statistics, the most common value is called the mode -- hence the name of the function.
Here is a db<>fiddle.
You can use group by and count as below
select id, col, count(col) as count
from
df_b_sql
group by id, col

An SQL query that combines aggregate and non-aggregate values in one row

The following query gives me the information that I need but I want it to take it just a step further. In the table at the bottom (only showing a subset of the fields), I want to group by cust_line in an unusual way (at least to me it's unusual).
Let's look at the items with a cust_line of 2 as an example. I would like these to be represented by one line not 5. For this line, I would like to select all the fields except for the price field where the cust_part = "GROUPINVC". For the total field I would like it to be 'sum(total) as new_total' and for the price, I would like it to be new_total / qty_invoiced, where qty_invoiced is the value on the line where cust_part = "GROUPINV".
Is what I am asking for completely ridiculous? Is it even possible? I'm not advanced at SQL so it may also be easy and I just don't know how to approach it. I thought of using 'partition by' but I couldn't imagine how I would get it to work as I figured it would still return 5 rows where I only want 1.
I've also looked at these questions with similar titles but not really what I am looking for:
SQL query that returns aggregate AND non aggregate results
Combined aggregated and non-aggregate query in SQL
SELECT L.CUST_LINE, I.LINE_NO, I.ORDER_NO, I.STAGE, I.ORDER_LINE_POS, I.CUST_PART,
I.LINE_ITEM_NO, I.QTY_INVOICED, I.CUST_DESC, I.DESCRIPTION, I.SALE_UNIT_PRICE, I.PRICE_TOTAL,
I.INVOICE_NO, I.CUSTOMER_PO_NO, I.ORDER_NO, I.CUSTOMER_NO, I.CATALOG_DESC, I.ORDER_LINE_NOTES
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
ORDER BY L.CUST_LINE;
| cust_line | line_no | cust_part | qty_invoiced | cust_desc | price | total |
| 1 | 4 | ... | 1 | ... | 55 | 55 |
| 2 | 1 | GROUPINV | 1 | some part | 0 | 0 |
| 2 | 6 | ... | 3 | ... | 0 | 0 |
| 2 | 2 | ... | 1 | ... | 0 | 0 |
| 2 | 3 | ... | 1 | ... | 0 | 0 |
| 2 | 7 | ... | 2 | ... | 10 | 20 |
| 3 | 7 | ... | 1 | ... | 67 | 67 |
You can use an analytic function to calculate a total over multiple rows of a result set, then filter out the rows you don't want.
Leaving out all the extra columns for sake of brevity:
SELECT cust_line, qty_invoiced, order_total/qty_invoiced AS price
FROM (
SELECT l.cust_line, qty_invoiced,
SUM(total) OVER (PARTITION BY l.cust_line) AS order_total,
COUNT(cust_line) OVER (PARTITION BY l.cust_line) AS group_count
FROM
(SELECT CUST_LINE, ORDER_NO, LINE_NO
FROM CUSTOMER_ORDER_LINE
GROUP BY CUST_LINE, ORDER_NO, LINE_NO
) L
INNER JOIN CUSTOMER_ORDER_IVC_REP I
ON I.ORDER_NO = L.ORDER_NO
WHERE RESULT_KEY = 999999
AND I.LINE_NO = L.LINE_NO
)
WHERE ( cust_part = 'GROUPINV' OR group_count = 1 )
ORDER BY cust_line
I am guessing on what you want in the PARTITION BY clause; this is essentially a GROUP BY that applies only to the SUM function. Not sure if you might also want order_no in the partition.
The trick is to select all the rows in the inner query, applying SUM across them all; then filter out the rows you are not interested in in the outermost query.

SQL Change Rank based on any value in group of values

I'm not looking for the answer as much as what to search for as I think this is possible. I have a query where the result can be as such:
| ID | CODE | RANK |
I want to base rank off of the code so my I get these results
| 1 | A | 1 |
| 1 | B | 1 |
| 2 | A | 1 |
| 2 | C | 1 |
| 3 | B | 2 |
| 3 | C | 2 |
| 4 | C | 3 |
Basically, based on the group of IDs, if any of the CODEs = a certain value I want to adjust the rank so then I can order by rank first and then other columns. Never sure how to phrase things in SQL.
I tried
CASE WHEN CODE = 'A' THEN 1 WHEN CODE = 'B' THEN 2 ELSE 3 END rank
ORDER BY rank DESC
But I want to keep the ids together, I don't want them broken apart, I was thinking of doing all ranks the same based on the highest if I can't solve it another way?
Thoughts of a SQL function to look at?
You could use the MIN() OVER() analytic function to get the minimum rank value per group, and just order by that;
WITH cte AS (
SELECT id, code,
MIN(CASE WHEN code='A' THEN 1 WHEN code='B' THEN 2 ELSE 3 END)
OVER (PARTITION BY id) rank
FROM mytable
)
SELECT * FROM cte
ORDER BY rank, id, code
An SQLfiddle to test with.