Postgresql: Average for each day in interval - sql

I have table that is structured like this:
item_id first_observed last_observed price
1 2016-10-21 2016-10-27 121
1 2016-10-28 2016-10-31 145
2 2016-10-22 2016-10-28 135
2 2016-10-29 2016-10-30 169
What I want is to get the average price for every day. I obviously cannot just group by first_observed or last_observed. Does Postgres offer a smart way of doing this?
The expected output would be like this:
date avg(price)
2016-10-21 121
2016-10-22 128
2016-10-23 128
2016-10-24 128
2016-10-25 128
2016-10-26 128
2016-10-27 128
2016-10-28 140
2016-10-29 157
2016-10-30 157
2016-10-31 157
I could also be outputted like this (both are fine):
start end avg(price)
2016-10-21 2016-10-21 121
2016-10-22 2016-10-27 128
2016-10-28 2016-10-28 140
2016-10-29 2016-10-31 157

demo:db<>fiddle
generate_series allows you to expand date ranges:
First step:
SELECT
generate_series(first_observed, last_observed, interval '1 day')::date as observed,
AVG(price)::int as avg_price
FROM items
GROUP BY observed
ORDER BY observed
expanding the date range
grouping the dates for AVG aggregate
Second step
SELECT
MIN(observed) as start,
MAX(observed) as end,
avg_price
FROM (
-- <first step as subquery>
)s
GROUP BY avg_price
ORDER BY start
Grouping by avg_price to get the MIN/MAX date for it

WITH ObserveDates (ObserveDate) AS (
SELECT * FROM generate_series((SELECT MIN(first_observed) FROM T), (SELECT MAX(last_observed) FROM T), '1 days')
)
SELECT ObserveDate, AVG(Price)
FROM ObserveDates
JOIN T ON ObserveDate BETWEEN first_observed AND last_observed
GROUP BY ObserveDate
ORDER BY ObserveDate

Related

Sales amounts of the top n selling vendors by month with other fields in bigquery

i have a table in bigquery like this (260000 rows):
vendor date item_price discount_price
x 2021-07-08 23:41:10 451,5 0
y 2021-06-14 10:22:10 41,7 0
z 2020-01-03 13:41:12 74 4
s 2020-04-12 01:14:58 88 12
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month vendor_name(top20) sum_of_vendor's_sales sum_of_vendor's_discount item_count(sold)
2020-01 x1 10857 250 150
2020-01 x2 9685 410 50
2020-01 x3 3574 140 45
....
2021 01 x20 700 15 20
2020-02 y1 7421 280 120
2020-02 y2 6500 250 40
2020-02 y3 4500 200 70
.....
2020-02 y20 900 70 30
i tried this (source here). But The desired output could not be obtained.
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors,count(item_price) as count_of_items,sum(discount_price)
from my_table
group by month
) t
Consider below approach
select
format_datetime('%Y%m', date) month,
vendor as vendor_name_top20,
sum(item_price) as sum_of_vendor_sales,
sum(discount_price) as sum_of_vendor_discount,
count(*) as item_count_sold
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sum_of_vendor_sales desc) <= 20

SQL how to convert row with date range to many rows with date range with gaps based on a data column

I want to convert data rows to date ranges in sql based on a column.
Below is the sample data:
Current Data
FROMDATE TODATE Data
1/01/2010 31/10/2010 100
1/01/2011 31/12/2011 50
1/01/2012 31/12/2012 50
1/01/2013 31/12/2013 50
1/01/2014 31/12/2014 50
1/01/2015 12/10/2015 50
13/10/2015 31/12/2015 50
1/01/2016 21/02/2016 50
22/02/2016 31/12/2016 67
1/01/2017 2/10/2017 67
3/10/2017 31/12/2017 75
1/01/2018 31/03/2018 75
1/04/2018 30/06/2018 75
1/07/2018 31/10/2018 40
1/11/2018 31/12/2018 75
1/01/2019 31/03/2019 75
1/04/2019 31/12/2019 75
1/01/2020 1/03/2020 75
Required result is:
FROMDATE TODATE Data
1/01/2010 31/10/2010 100
1/01/2011 21/02/2016 50
22/02/2016 2/10/2017 67
3/10/2017 30/06/2018 75
1/07/2018 31/10/2018 40
1/11/2018 1/03/2020 75
Required List
I would like to give the credit to #Gordon Linoff whom one of the answer helpmed me with Gaps and islands problems which i am just sharing with you.
The reason I posted this as an answer due to the title which could be found in search results for this type of problems
I have done it using Oracle database and it should work with all standard sql database. dbfiddle for reference
SELECT t.key_id
,MIN(fromdate)
,MAX(todate)
FROM (SELECT t.*
,row_number() over(ORDER BY fromdate) AS startseq
,row_number() over(PARTITION BY t.key_id ORDER BY fromdate) AS endseq
FROM some_table t) t
GROUP BY t.key_id
,(startseq - endseq);

SQL how to count but only count one instance if two columns match?

Wondering how to select from a table:
FIELDID personID purchaseID dateofPurchase
--------------------------------------------------
2 13 147 2014-03-21 00:00:00
3 15 165 2015-03-23 00:00:00
4 13 456 2018-03-24 00:00:00
5 1 133 2018-03-21 00:00:00
6 23 123 2013-03-22 00:00:00
7 25 456 2013-03-21 00:00:00
8 25 456 2013-03-23 00:00:00
9 22 456 2013-03-28 00:00:00
10 25 589 2013-03-21 00:00:00
11 82 147 1991-10-22 00:00:00
12 82 453 2003-03-22 00:00:00
I'd like to get a result table of two columns: weekday and the number of purchases of each weekday, but only count the distinct days of purchases if done by the same person on the same day - for example since personID 25 purchased two things on 2013-03-21, that should only count as one 'thursday' instead of 2.
Basically, if the personID and the dateofPurchase are the same for more than one row, only count it once is what I want.
Here is what I have currently: It does everything correctly except it will count the above scenario under the thursday twice, when I would only want to add one:
SELECT v.wkday as day, COUNT(*) as 'absences'
FROM dbo.AttendanceRecord pr CROSS APPLY
(VALUES (CASE WHEN DATEPART(WEEKDAY, date) IN (1, 7)
THEN 'Weekend'
ELSE DATENAME(WEEKDAY, date)
END)
) v(wkday)
GROUP BY v.wkday;
to clarify:
If an item is purchased for at least one puchaseID on a specific day they will be counted as purchased for that day, and do not need to be counted again for each new purchase ID on that day.
I think you want to count distinct persons, so that would be:
COUNT(DISTINCT personid) as absences
Note that single quotes are not appropriate around column aliases. If you need to escape them, use square braces.
EDIT:
If you want to count distinct person-days, then you can use:
COUNT(DISTINCT CONCAT(personid, ':', dateofpurchase) as absences

Difference between rows in sql based on rows number (SQL Server)

I have a problem. I have table with following columns and sample data:
RN Date Time
---------------------
1 2015-02-02 12
2 2015-02-02 25
3 2015-02-02 27
1 2015-02-08 42
2 2015-02-08 45
1 2015-03-01 60
2 2015-03-01 62
3 2015-03-01 63
4 2015-03-01 63
I need get a difference between time start and time end of every day.
For example:
27-12
45-42
63-60
Any suggestions? :)
select
Date, max(Time) as mx, min(Time) as mn, max(Time) - min(Time) as diff
from table_name
group by Date

ORACLE SQL Select timestamps column from table. Producing more results than intended

Hello currently I have a working script below. I am using Oracle 10
SELECT z.no as "ID_One",
MAX(r.value) as "Max",
round(MAX(r.value)/80000,2) as "ROUND"
FROM Table1 r, Table2 z
WHERE r.timestamp > ((SYSDATE - TO_DATE('01/01/1970 00:00:00', 'MM-DD-YYYY HH24:MI:SS')) * 24 * 60 * 60) - 80000
AND r.va=21
AND r.nor IN ('7','98','3','3')
AND r.nor = z.re
GROUP BY r.nor, r.varr, z.no;
It produces a table like this
ID_ONE MAX ROUND
105 500 232
106 232 32
333 23 .21
444 34 .321
I want to select a row call timestamp from table r. However when I add " r.timestamp " in to my query it produces 500 rows of data instead of 4. It looks like it is producing the the highest number for each timestamp instead. How would I produce a table that looks like this ? fyi timestamp column is in unix time. I can do the conversion myself. I just need to know how to get out these rows.
ID_ONE MAX ROUND TIMESTAMP
105 500 232 DEC 21,2021 10:00
106 232 32 DEC 21,2021 23:12
333 23 .21 DEC 31,2021 2:12
444 34 .321 DEC 31,2021 23:12
When I add the column time stamp it does not create what is above. What I am getting instead is something like that looks like this the other two ids are below in this 500 long row of data. I only wanted the 4 that is the highest value (MAX) from this set of time. ID_ONE is my id for a stock of inventory for a warehouse.
ID_ONE ROUND TIMESTAMP MAX
106 338
.06 1406694567
106 355
.06 1406696037
106 246
.04 1406696337
106 363
.06 1406700687
106 330
.06 1406700987
106 512
.09 1406701347
106 459
.08 1406704047
106 427
.07 1406711038
106 596
.1 1406713111
106 401
.07 1406715872
106 682
.11 1406726192
106 2776
.46 1406726492
105 414
.07 1406728863
105 380
.06 1406734055
105 378
.06 1406734655
105 722
.12 1406735555
105 144
.02 1406665697
105 5
I have edited my answer kindly try the below
SELECT z.no as "ID_One",
max(r.value) as "Max",
round(MAX(r.value)/80000,2) as "ROUND",r.Timestamp
FROM Table1 r, Table2 z
where r.timestamp > ((SYSDATE - TO_DATE ('01/01/1970 00:00:00', 'MM
-DD-YYYY HH24:MI:SS')) * 24 * 60 * 60) - 80000
and r.va=21
AND r.nor IN ('7','98','3','3')
AND r.value=(select max(r1.value) from Table1 r1 where r1.va=r.va and r1.nor=r.nor)
AND r.nor = z.re group by r.nor, r.varr, z.no;
This looks like an ideal use case for analytic functions:
SELECT
v1.*,
round(v1.value/80000,2) as rounded_max_value
FROM (
SELECT
z.no as id_one,
r.value,
row_number() over (partition by r.nor, r.varr, z.no order by r.value desc) as rn,
r.timestamp
FROM Table1 r, Table2 z
WHERE r.timestamp >
((SYSDATE - TO_DATE('01/01/1970 00:00:00', 'MM-DD-YYYY HH24:MI:SS')) * 24 * 60 * 60) - 80000
AND r.va=21
AND r.nor IN ('7','98','3','3')
AND r.nor = z.re
) v1
where v1.rn = 1
This query
uses row_number over (partition by .. order by ) to get an ordering of the rows within a group
uses rn = 1 in the outer query to get only the row having the maximum value
Some additional recommendations:
if your r.nor column is numeric, then don't use string literals; use IN (7,98,3,3) instead (BTW: why do you have 3 twice in your IN list?
don't use " for column aliases unless absolutely necessary (since it makes them case-sensitive) ; they are a PITA
don't put your JOIN conditions into the WHERE clause; it makes your query harder to read. Use ANSI style joins instead.