Get dates with values greater than average borrow rate - sql

I have a table called BOOK (memberId, ISBN, dateBorrowed)
For example:
isbn | memberId | borrowed
-------+---------------+-------------+----
9998-01-101-9 | |
9998-01-101-9 | |
9998-01-101-9 | |
9998-01-101-9 | 1000 | 2018-10-02
9998-01-101-9 | 1010 | 2018-09-04
9998-01-101-9 | 1021 | 2018-09-14
9998-01-101-9 | |
9998-01-101-9 | 1001 | 2018-10-02
I have to SELECT all dates, where total count of borrowed books per day is larger, than per all days in average. How to do it?
I have selected date and how many times was it picked by:
SELECT borrowed, COUNT(*) AS dates
FROM BOOK
WHERE borrowed IS NOT NULL
GROUP BY borrowed;
Another query which was written by me is to count average:
SELECT SUM(dates)/COUNT(borrowed) AS average
FROM (
SELECT borrowed, COUNT(*) AS dates
FROM BOOKS
WHERE borrowed IS NOT NULL GROUP BY borrowed
) AS average;
Now, how to concatenate these two sequels into one clear sequel?

Using window functions can help you much: https://www.postgresql.org/docs/current/static/tutorial-window.html
demo: db<>fiddle
My test data:
isbn borrowed
9998-01-101-1 2018-08-01
9998-01-101-2 2018-08-01
9998-01-101-3 2018-08-01
9998-01-101-4 2018-08-01
9998-01-101-5 2018-08-01
9998-01-101-1 2018-08-02
9998-01-101-2 2018-08-02
9998-01-101-3 2018-08-02
9998-01-101-4 2018-08-03
9998-01-101-5 2018-08-03
9998-01-101-1 2018-08-04
9998-01-101-2 2018-08-04
9998-01-101-3 2018-08-04
9998-01-101-4 2018-08-04
9998-01-101-5 2018-08-05
9998-01-101-1 2018-08-05
The query:
SELECT
*
FROM (
SELECT
*,
borrowed_all_time::decimal / COUNT(*) OVER () as avg_borrows_per_day -- D
FROM (
SELECT DISTINCT -- C
borrowed,
COUNT(*) OVER (PARTITION BY borrowed) as borrowed_on_day, -- A
COUNT(*) OVER () as borrowed_all_time -- B
FROM book
)s
)s
WHERE borrowed_on_day > avg_borrows_per_day -- E
A: This window function counts the rows per borrowed date
B: This window function counts all rows which equals to count borrows of all time.
The result so far looks like this:
borrowed borrowed_on_day borrowed_all_time
2018-08-01 5 16
2018-08-01 5 16
2018-08-01 5 16
2018-08-01 5 16
2018-08-01 5 16
2018-08-02 3 16
2018-08-02 3 16
2018-08-02 3 16
2018-08-03 2 16
2018-08-03 2 16
2018-08-04 4 16
2018-08-04 4 16
2018-08-04 4 16
2018-08-04 4 16
2018-08-05 2 16
2018-08-05 2 16
C: Because we need no duplicates we eliminate them with a DISTINCT
D: Counting all rows after eliminating all tied rows gives the count of the distinct days. This dividing borrows of all time gives the average borrows per day. The decimal cast is neccessary. It converts the integer division (16 / 5 == 3) into a float division (16 / 5 == 3.2)
E: Now we can filter borrows per current day > average borrows per day.
The result:
borrowed
2018-08-01
2018-08-04

This looks a bit like HW, so windowed functions might be out of bounds.
SELECT *
FROM (
SELECT BOOK.*,
CAST(
COUNT(1) OVER
( PARTITION BY borrowed
) AS FLOAT) cntThatDay,
CAST(
SUM(1) OVER() AS FLOAT)/ CAST(
(SELECT COUNT(DISTINCT borrowed)
FROM BOOKS
) AS FLOAT) AS totalAverage
FROM BOOK
WHERE borrowed IS NOT NULL
) TMP
WHERE cntThatDay >= totalAverage;

Related

Why does my cumulative column not work as expected?

Here is my query , I have a column called cum_balance which is supposed to calculate the cumulative balance but after row number 10 there is an anamoly and it doesn't work as expected , all I notice is that from row number 10 onwards the hour column has same value. What's the right syntax for this?
[select
hour,
symbol,
amount_usd,
category,
sum(amount_usd) over (
order by
hour asc RANGE BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
) as cum_balance
from
combined_transfers_usd_netflow
order by
hour][1]
I have tried removing RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW , adding a partition by hour and group by hour. None of them gave the expected result or errors
Row Number
Hour
SYMBOL
AMOUNT_USD
CATEGORY
CUM_BALANCE
1
2021-12-02 23:00:00
WETH
227.2795
in
227.2795
2
2021-12-03 00:00:00
WETH
-226.4801153
out
0.7993847087
3
2022-01-05 21:00:00
WETH
5123.716203
in
5124.515587
4
2022-01-18 14:00:00
WETH
-4466.2366
out
658.2789873
5
2022-01-19 00:00:00
WETH
2442.618599
in
3100.897586
6
2022-01-21 14:00:00
USDC
99928.68644
in
103029.584
7
2022-03-01 16:00:00
UNI
8545.36098
in
111574.945
8
2022-03-04 22:00:00
USDC
-2999.343
out
108575.602
9
2022-03-09 22:00:00
USDC
-5042.947675
out
103532.6543
10
2022-03-16 21:00:00
USDC
-4110.6579
out
98594.35101
11
2022-03-16 21:00:00
UNI
-3.209306045
out
98594.35101
12
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
13
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
14
2022-03-16 21:00:00
UNI
-16.04653022
out
98594.35101
15
2022-03-16 21:00:00
UNI
-6.418612089
out
98594.35101
The "problem" with your data in all the ORDER BY values after row 10 are the same.
So if we shrink the data down a little, and use for groups to repeat the experiment:
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,sum(val) over ( partition by grp order by date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as cum_val_1
,sum(val) over ( partition by grp order by date ) as cum_val_2
from data as d
order by 1,2;
we get:
GRP
DATE
VAL
CUM_VAL_1
CUM_VAL_2
1
2021-01-01
10
10
10
1
2021-01-02
11
21
21
1
2021-01-03
12
33
33
2
2021-01-01
20
20
20
2
2021-01-02
21
63
63
2
2021-01-02
22
63
63
2
2021-01-04
23
86
86
we see with group 1 that values accumulate as we expect. So for group 2 we put duplicate values as see those rows get the same value, but rows after "work as expected again".
This tells us how this function work across unstable data (values that sort the same) is that they are all stepped in one leap.
Thus if you want each row to be different they will need better ORDER distinctness. This could be forced by add random values of literal random nature, or feeling non random ROW_NUMBER, but really they would be random, albeit not explicit, AND if you use random, you might get duplicates, thus really should use ROW_NUMBER or SEQx to have unique values.
Also the second formula shows they are equal, and it's the ORDER BY problem not the framing of "which rows" are used.
with data(grp, date, val) as (
select * from values
(1,'2021-01-01'::date, 10),
(1,'2021-01-02'::date, 11),
(1,'2021-01-03'::date, 12),
(2,'2021-01-01'::date, 20),
(2,'2021-01-02'::date, 21),
(2,'2021-01-02'::date, 22),
(2,'2021-01-04'::date, 23)
)
select d.*
,seq8() as s
,sum(val) over ( partition by grp order by date ) as cum_val_1
,sum(val) over ( partition by grp order by date, s ) as cum_val_2
,sum(val) over ( partition by grp order by date, seq8() ) as cum_val_3
from data as d
order by 1,2;
gives:
GRP
DATE
VAL S
CUM_VAL_1
CUM_VAL_2
CUM_VAL_2_2
1
2021-01-01
10
0
10
10
1
2021-01-02
11
1
21
21
1
2021-01-03
12
2
33
33
2
2021-01-01
20
3
20
20
2
2021-01-02
21
4
63
41
2
2021-01-02
22
5
63
63
2
2021-01-04
23
6
86
86

SQL how to count but only count one instance if two columns match?

Wondering how to select from a table:
FIELDID personID purchaseID dateofPurchase
--------------------------------------------------
2 13 147 2014-03-21 00:00:00
3 15 165 2015-03-23 00:00:00
4 13 456 2018-03-24 00:00:00
5 1 133 2018-03-21 00:00:00
6 23 123 2013-03-22 00:00:00
7 25 456 2013-03-21 00:00:00
8 25 456 2013-03-23 00:00:00
9 22 456 2013-03-28 00:00:00
10 25 589 2013-03-21 00:00:00
11 82 147 1991-10-22 00:00:00
12 82 453 2003-03-22 00:00:00
I'd like to get a result table of two columns: weekday and the number of purchases of each weekday, but only count the distinct days of purchases if done by the same person on the same day - for example since personID 25 purchased two things on 2013-03-21, that should only count as one 'thursday' instead of 2.
Basically, if the personID and the dateofPurchase are the same for more than one row, only count it once is what I want.
Here is what I have currently: It does everything correctly except it will count the above scenario under the thursday twice, when I would only want to add one:
SELECT v.wkday as day, COUNT(*) as 'absences'
FROM dbo.AttendanceRecord pr CROSS APPLY
(VALUES (CASE WHEN DATEPART(WEEKDAY, date) IN (1, 7)
THEN 'Weekend'
ELSE DATENAME(WEEKDAY, date)
END)
) v(wkday)
GROUP BY v.wkday;
to clarify:
If an item is purchased for at least one puchaseID on a specific day they will be counted as purchased for that day, and do not need to be counted again for each new purchase ID on that day.
I think you want to count distinct persons, so that would be:
COUNT(DISTINCT personid) as absences
Note that single quotes are not appropriate around column aliases. If you need to escape them, use square braces.
EDIT:
If you want to count distinct person-days, then you can use:
COUNT(DISTINCT CONCAT(personid, ':', dateofpurchase) as absences

Creating a new calculated column in SQL

Is there a way to find the solution so that I need for 2 days, there are 2 UD's because there are June 24 2 times and for the rest there are single days.
I am showing the expected output here:
Primary key UD Date
-------------------------------------------
1 123 2015-06-24 00:00:00.000
6 456 2015-06-24 00:00:00.000
2 123 2015-06-25 00:00:00.000
3 658 2015-06-26 00:00:00.000
4 598 2015-06-27 00:00:00.000
5 156 2015-06-28 00:00:00.000
No of times Number of days
-----------------------------
4 1
2 2
The logic is 4 users are there who used the application on 1 day and there are 2 userd who used the application on 2 days
You can use two levels of aggregation:
select cnt, count(*)
from (select date, count(*) as cnt
from t
group by date
) d
group by cnt
order by cnt desc;

Cumulative Sum per item in DB2

I have DB2 table like below -
Date1 Item_code Amt
2018-06-01 1 2
2018-06-02 1 3
2018-06-03 2 4
2018-06-03 2 5
2018-06-04 3 6
2018-06-05 3 7
2018-06-06 4 8
I need the cumulative sum item_code wise per day. The result should look like -
Date1 Item_code Amt
2018-06-01 1 2
2018-06-02 1 5
2018-06-03 2 9
2018-06-04 3 6
2018-06-05 3 13
2018-06-06 4 8
I have tried a lot by myself and search also on SO but nothing is fulfilling my need. There are a lot of examples if I just need the cumulative sum day wise irrespective of item code.
Any help is greatly appreciated. Thanks in advance.
I think you want aggregation with a cumulative sum:
select item_code, date1,
sum(sum(amt)) over (partition by item_code order by date1) as running_amt
from t
group by item_code, date;

Aggregate value when condition and partition by

I have the below table and I need to aggregate
Id Month Days Hours Audit
1 201803 20 30 Yes
1 201803 20 15 Yes
1 201802 19 4 No
2 201803 20 5 Yes
Expected output:
Id Month Days Hours Audit Total
1 201803 20 2 Yes 100
1 201803 20 3 Yes 100
1 201802 10 4 No
2 201803 20 5 Yes 100
Summary:
Partition by ID & Month
Aggregate Days & Hours
My SQL: (my work)
SELECT (CASE
WHEN AUDIT IN ('YES')
THEN HOURS * DAYS
END) OVER (PARTITION BY ID ,c.month) AS TOTAL
FROM TABLEA
Use sum as the window function.
SELECT t.*,SUM(CASE WHEN AUDIT = 'YES' THEN HOURS * DAYS END)
OVER(PARTITION BY ID,month) AS TOTAL
FROM TABLEA t