Single SQL query to display aggregate data while grouping by 3 fields - sql

I have a table that contains basic info:
CREATE TABLE testing.testtable
(
recordId serial NOT NULL,
nameId integer,
teamId integer,
countryId integer,
goals integer,
outs integer,
assists integer,
win integer,
sys_time timestamp with time zone NOT NULL DEFAULT now(),
CONSTRAINT testtable_pkey PRIMARY KEY (recordid)
)
I want one single SQL query, (with one record per person-team-country) to display the following data. Note that I want it to group by nameId, teamId, and countryId
Name, Team, and Country
Goal/out ratio (G/O)
Goal + Assist / out ratio (GA/O)
Win percentage (Win%)
The difference between the current goal/out ratio and what it was one month ago (rDif)
The difference between the current goal+assist/out ratio and what it was one month ago (fDif)
The difference between the current win % and what it was one month ago (winDif)
Example Table with all records:
Id nameId teamId countryId goals outs assists win sys_time
1 1 3 5 2 4 11 1 2013-01-01
2 1 3 5 9 4 19 1 2013-01-01
3 1 3 4 10 2 1 0 2013-01-01
4 1 3 4 11 50 14 1 2013-01-01
5 2 2 2 10 5 4 1 2013-01-01
6 2 3 5 4 7 15 0 2013-01-01
7 1 3 5 4 8 22 0 2014-07-01
8 1 3 4 11 3 5 1 2014-07-01
9 3 1 4 44 1 4 1 2014-07-01
Example desired output record (1-3-5):
nameId teamId countryId G/O GA/O Win% rDif fDif winDif
1 3 5 0.938 4.19 66 0.44 0.94 -0.34
The ratios are easy enough to retrieve.. for the differences, I've done the following:
select tt.nameid
avg(tt.goals) - avg(case when tt.sys_time < date_trunc('day', NOW() - interval '1 month') then tt.goals end) as change
from testing.testtable tt
group by tt.nameid
order by change desc
This works if I want the differences for only the nameIds. But I want it to pull one record for each combination of name-team-country. I can't seem to get that working.

You can group by multiple fields:
select tt.nameid, tt.teamID, tt.countryID,
avg(tt.goals) - avg(case when tt.sys_time < date_trunc('day', NOW() - interval '1 month') then tt.goals end) as change
from testing.testtable tt
group by tt.nameid, tt.teamID, tt.countryID
order by change desc

just off the top of my head I think it would work for you to use
group by tt.nameid, tt.teamId, tt.countryId

Related

Find sum of hours for each date worked

I have a table of timesheet entries set up like this:
id
job_id
employee_id
hours_worked
date_worked
1
1
111
8
2022-10-01
2
1
222
8
2022-10-01
3
1
222
8
2022-10-02
4
2
222
8
2022-10-03
5
2
111
8
2022-10-04
6
2
222
5
2022-10-05
7
3
111
8
2022-10-04
8
4
333
8
2022-10-07
9
4
111
3
2022-10-09
I'm trying to find the sum of hours for the first, second, third etc dates that work was done on each job
Ideally I'd like something like this:
job_id
Day1_hours
Day2_hours
Day3_hours
1
16
8
0
2
8
8
5
3
8
0
0
4
8
3
0
The trouble I'm running into is that there can be multiple employees working on each day of the job, so using a query to select the min(date_worked) greater than a subquery for min(date_worked) is sometimes giving me the same dates. There are sometimes days in between work done on a job, so I can't just add a day to the minimum value and check hours for that date.
How can I find the sum of hours_worked for the first date_worked, then the second, third etc?
PIVOT's are great but conditional aggregations offer a bit more flexibility
Example
Select job_id
,[Day1_Hours] = sum( case when DN=1 then hours_worked else 0 end)
,[Day2_Hours] = sum( case when DN=2 then hours_worked else 0 end)
,[Day3_Hours] = sum( case when DN=3 then hours_worked else 0 end)
From ( Select *
,DN = dense_rank() over (partition by job_id order by date_worked)
From YourTable
) A
Group By Job_ID
select job_id
,[1] as day1_hours
,[2] as day2_hours
,[3] as day3_hours
from (
select job_id
,hours_worked
,dense_rank() over(partition by job_id order by date_worked) as days
from t
) t
pivot (sum(hours_worked) for days in([1],[2],[3])) p
job_id
day1_hours
day2_hours
day3_hours
1
16
8
null
2
8
8
5
3
8
null
null
4
8
3
null
Fiddle

Get earliest value from a column with other aggregated columns in postgresql

I have a very simple stock ledger dataset.
1. date_and_time store_id product_id batch opening_qty closing_qty inward_qty outward_qty
2. 01-10-2021 14:20:00 56 a 1 5 1 0 4
3. 01-10-2021 04:20:00 56 a 1 8 5 0 3
4. 02-10-2021 15:30:00 56 a 1 9 2 1 8
5. 03-10-2021 08:40:00 56 a 2 2 6 4 0
6. 04-10-2021 06:50:00 56 a 2 8 4 0 4
Output I want:
select date, store_id,product_id, batch, first(opening_qty),last(closing_qty), sum(inward_qty),sum(outward_qty)
e.g.
1. date store_id product_id batch opening_qty closing_qty inward_qty outward_qty
2. 01-10-2021 56 a 1 8 1 0 7
I am writing a query using First_value window function and tried several others but not able to get the out put I want.
select
date,store_id,product_id,batch,
FIRST_VALUE(opening_total_qty)
OVER(
partition by date,store_id,product_id,batch
ORDER BY created_at
) as opening__qty,
sum(inward_qty) as inward_qty,sum(outward_qty) as outward_qty
from table
group by 1,2,3,4,opening_total_qty
Help please.
As your expected result is one row per group of rows with the same date, you need aggregates rather than window functions which provide as many rows as the ones filtered by the WHERE clause. You can try this :
SELECT date_trunc('day', date),store_id,product_id,batch
, (array_agg(opening_qty ORDER BY datetime ASC))[1] as opening__qty
, (array_agg(closing_qty ORDER BY datetime DESC))[1] as closing_qty
, sum(inward_qty) as inward_qty
, sum(outward_qty ) as outward_qty
FROM table
GROUP BY 1,2,3,4
see the test result in dbfidle.

calculate avg(value) for last 10 records postgresql

i have a tricky task,
lets assume we have table "Racings", and there we have columns TRACK, CAR, CIRCLE_TIME
here is an example how data could be look like:
id
track
car
circle_time
10
1
10
15
9
1
10
14
8
1
10
16
7
1
10
15
6
1
10
13
5
2
10
7
4
2
10
4
3
2
10
5
2
3
10
8
1
3
10
10
what i need, i to add one more coumn like avg3_circle_time which will show me an average time from last 3 circle_time from each track, example:
id
track
car
circle_time
avg3_circle_time
10
1
10
15
15
9
1
10
14
15
8
1
10
16
14.6
7
1
10
15
null
6
1
10
13
null
5
2
10
7
5.3
4
2
10
4
null
3
2
10
5
null
2
3
10
8
null
1
3
10
10
null
I know how it could works in oracle, you could use something like rowid, but in case of postgresql i don't know, i have a draft like .....avg(circle_time) OVER(PARTITION BY track,car.....) as avg3_circle_time..... help me to solve that task please
You can use window functions to calculate moving averages:
SELECT track, id, car, circle_time, AVG(circle_time) OVER (
PARTITION BY track
ORDER BY id
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
)
FROM t
ORDER BY track, id
Depending on your definition of previous three, the window could be ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING.
If you want only values when at least 3 circles available
select *
, case when lag(id, 2) over(partition by TRACK, CAR order by id) is not null then
avg(CIRCLE_TIME) over(partition by TRACK, CAR order by id rows between 2 preceding and current row) end a
from Racing
order by id desc;
db<>fiddle
Output
id track car circle_time a
10 1 10 15 15.0000000000000000
9 1 10 14 15.0000000000000000
8 1 10 16 14.6666666666666667
7 1 10 15 null
6 1 10 13 null
5 2 10 7 5.3333333333333333
4 2 10 4 null
3 2 10 5 null
2 3 10 8 null
1 3 10 10 null
Use LAED() then checking one of the next 2 rows is NULL or not. THEN sum of three values for calculating average.
-- PostgreSQL
SELECT *
, CASE WHEN next_circle_time IS NULL OR next_next_circle_time IS NULL
THEN NULL
ELSE ((t.circle_time + COALESCE(next_circle_time, 0) + COALESCE(next_next_circle_time, 0)) / 3 :: DECIMAL) :: DECIMAL(10, 1)
END avg_circle_time
FROM (SELECT *
, LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) next_circle_time
, LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) next_next_circle_time
FROM Racings) t
Another way Use AVG()
SELECT *
, CASE WHEN LEAD(circle_time, 2) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
OR LEAD(circle_time, 1) OVER (PARTITION BY track ORDER BY id DESC) IS NULL
THEN NULL
ELSE AVG(circle_time) OVER (PARTITION BY track ORDER BY id DESC ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING)
END :: DECIMAL(10, 2) avg_circle_time
FROM Racings
Please check from url where both query exists https://dbfiddle.uk/?rdbms=postgres_11&fiddle=f0cd868623725a1b92bf988cfb2deba3
Several of the posted answers end up repeating the window definition. You can avoid this with the window clause:
select *,
case when row_number() over(track_window) > 2
then trunc(avg(CIRCLE_TIME) over(track_window rows 2 preceding), 1)
end a
from Racing
window track_window as (partition by track order by id)
order by id desc
Note how, in this sample, track_window is defined once, then reused for both row_number and avg. In the latter case, the window clause is embellished with a frame as well (rows 2 preceding).

Resetting a Count in SQL

I have data that looks like this:
ID num_of_days
1 0
2 0
2 8
2 9
2 10
2 15
3 10
3 20
I want to add another column that increments in value only if the num_of_days column is divisible by 5 or the ID number increases so my end result would look like this:
ID num_of_days row_num
1 0 1
2 0 2
2 8 2
2 9 2
2 10 3
2 15 4
3 10 5
3 20 6
Any suggestions?
Edit #1:
num_of_days represents the number of days since the customer last saw a doctor between 1 visit and the next.
A customer can see a doctor 1 time or they can see a doctor multiple times.
If it's the first time visiting, the num_of_days = 0.
SQL tables represent unordered sets. Based on your question, I'll assume that the combination of id/num_of_days provides the ordering.
You can use a cumulative sum . . . with lag():
select t.*,
sum(case when prev_id = id and num_of_days % 5 <> 0
then 0 else 1
end) over (order by id, num_of_days)
from (select t.*,
lag(id) over (order by id, num_of_days) as prev_id
from t
) t;
Here is a db<>fiddle.
If you have a different ordering column, then just use that in the order by clauses.

sql best strategy to partition same values based on temporal sequence

I have data that looks like this, where there are multiple values for each ID that correspond to an ascending date variable:
ID LEVEL DATE
1 10 10/1/2000
1 10 11/20/2001
1 10 12/01/2001
1 30 02/15/2002
1 30 02/15/2002
1 20 05/17/2002
1 20 01/04/2003
1 30 07/20/2003
1 30 03/16/2004
1 30 04/15/2004
I want to acquire a count per each ID/LEVEL/DATE block that looks like this:
ID LEVEL COUNT
1 10 3
1 30 2
1 20 2
1 30 3
The problem is that if I use the count windows function and partition by level, it groups 30 together regardless of the temporal sequence. I want the count for level 30 both before and after 20 to be distinct. Does anyone know how to do that?
A standard gaps and islands solution using ROW_NUMBER(), if it's available on your particular DBMS...
WITH
ordered AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date) AS set_ordinal,
ROW_NUMBER() OVER (PARTITION BY id, level ORDER BY date) AS grp_ordinal
FROM
yourData
)
SELECT
id,
level,
set_ordinal - grp_ordinal,
MIN(date),
COUNT(*)
FROM
ordered
GROUP BY
id,
level,
set_ordinal - grp_ordinal
ORDER BY
id,
MIN(date)
Visualising the effect of the two row numbers...
ID LEVEL DATE set_ordinal grp_ordinal set-grp GROUP
-- ----- ---------- ----------- ----------- ------- --------
1 10 10/01/2000 1 1 0 1,10,0
1 10 11/20/2001 2 2 0 1,10,0
1 10 12/01/2001 3 3 0 1,10,0
1 30 02/15/2002 4 1 3 1,30,3
1 30 02/15/2002 5 2 3 1,30,3
1 20 05/17/2002 6 1 5 1,20,5
1 20 01/04/2003 7 2 5 1,20,5
1 30 07/20/2003 8 3 5 1,30,5
1 30 03/16/2004 9 4 5 1,30,5
1 30 04/15/2004 10 5 5 1,30,5