PostgreSQL query group by two "parameters" - sql

I've been trying to figure out the following PostgreSQL query with no success for two days now.
Let's say I have the following table:
| date | value |
-------------------------
| 2018-05-11 | 0.20 |
| 2018-05-11 | -0.12 |
| 2018-05-11 | 0.15 |
| 2018-05-10 | -1.20 |
| 2018-05-10 | -0.70 |
| 2018-05-10 | -0.16 |
| 2018-05-10 | 0.07 |
And I need to find out the query to count positive and negative values per day:
| date | positives | negatives |
------------------------------------------
| 2018-05-11 | 2 | 1 |
| 2018-05-10 | 1 | 3 |
I've been able to figure out the query to extract only positives or negatives, but not both at the same time:
SELECT to_char(table.date, 'DD/MM') AS date
COUNT(*) AS negative
FROM table
WHERE table.date >= DATE(NOW() - '20 days' :: INTERVAL) AND
value < '0'
GROUP BY to_char(date, 'DD/MM'), table.date
ORDER BY table.date DESC;
Can please someone assist? This is driving me mad. Thank you.

Use a FILTER clause with the aggregate function.
SELECT to_char(table.date, 'DD/MM') AS date,
COUNT(*) FILTER (WHERE value < 0) AS negative,
COUNT(*) FILTER (WHERE value > 0) AS positive
FROM table
WHERE table.date >= DATE(NOW() - '20 days'::INTERVAL)
GROUP BY 1
ORDER BY DATE(table.date) DESC

I would simply do:
select date_trunc('day', t.date) as dte,
sum( (value < 0)::int ) as negatives,
sum( (value > 0)::int ) as positives
from t
where t.date >= current_date - interval '20 days'
group by date_trunc('day', t.date),
order by dte desc;
Notes:
I prefer using date_trunc() to casting to a string for removing the time component.
You don't need to use now() and convert to a date. You can just use current_date.
Converting a string to an interval seems awkward, when you can specify an interval using the interval keyword.

Related

Pgsql- How to filter report days with pgsql?

Let's say I have a table Transaction which has data as following:
Transaction
| id | user_id | amount | created_at |
|:-----------|------------:|:-----------:| :-----------:|
| 1 | 1 | 100 | 2021-09-11 |
| 2 | 1 | 1000 | 2021-09-12 |
| 3 | 1 | -100 | 2021-09-12 |
| 4 | 2 | 200 | 2021-10-13 |
| 5 | 2 | 3000 | 2021-10-20 |
| 6 | 3 | -200 | 2021-10-21 |
I want to filter this data by this: last 4days, 15days, 28days:
Note: If user click on select option 4days this will filter last 4 days.
I want this data
total commission (sum of all transaction amount * 5%)
Total Top up
Total Debut: which amount (-)
Please help me out and sorry for basic question!
Expect result:
** If user filter last 4days:
Let's say current date is: 2021-09-16
So result:
- TotalCommission (1000 - 100) * 5
- TotalTopUp: 1000
- TotalDebut: -100
I suspect you want:
SELECT SUM(amount) * 0.05 AS TotalCmomission,
SUM(amount) FILTER (WHERE amount > 0) AS TotalUp,
SUM(amount) FILTER (WHERE amount < 0) AS TotalDown
FROM t
WHERE created_at >= CURRENT_DATE - 4 * INTERVAL '1 DAY';
This assumes that there are no future created_at (which seems like a reasonable assumption). You can replace the 4 with whatever value you want.
Take a look at the aggregate functions sum, max and min. Last four days should look like this:
SELECT
sum(amount)*.05 AS TotalComission,
max(amount) AS TotalUp,
min(amount) AS TotalDebut
FROM t
WHERE created_at BETWEEN CURRENT_DATE-4 AND CURRENT_DATE;
Demo: db<>fiddle
Your description indicates specifying the number of days to process and from your expected results indicate you are looking for results by user_id (perhaps not as user 1 falls into the range). Perhaps the the best option would be to wrap the query into a SQL function. Then as all your data is well into the future you would need to parameterize that as well. So the result becomes:
create or replace
function Commissions( user_id_in integer default null
, days_before_in integer default 0
, end_date_in date default current_date
)
returns table( user_id integer
, totalcommission numeric
, totalup numeric
, totaldown numeric
)
language sql
as $$
select user_id
, sum(amount) * 0.05
, sum(amount) filter (where amount > 0)
, sum(amount) filter (where amount < 0)
from transaction
where (user_id = user_id_in or user_id_in is null)
and created_at <# daterange( (end_date_in - days_before_in * interval '1 day')::date
, end_date_in
, '[]'::text -- indicates inclusive of both dates
)
group by user_id;
$$;
See demo here. You may just want to play around with the parameters and see the results.

Calculating working minutes for Normal and Night Shift

I am making a query to fetch the working minutes for employees. The problem I have is the Night Shift. I know that I need to subtract the "ShiftStartMinutesFromMidnight" but I can't find the right logic.
NOTE: I can't changing the database, I only can use the data from it.
Let's say I have these records.
+----+--------------------------+----------+
| ID | EventTime | ReaderNo |
-----+--------------------------+----------+
| 1 | 2019-12-04 11:28:46.000 | In |
| 1 | 2019-12-04 12:36:17.000 | Out |
| 1 | 2019-12-04 12:39:23.000 | In |
| 1 | 2019-12-04 12:51:21.000 | Out |
| 1 | 2019-12-05 07:37:49.000 | In |
| 1 | 2019-12-05 08:01:22.000 | Out |
| 2 | 2019-12-04 22:11:46.000 | In |
| 2 | 2019-12-04 23:06:17.000 | Out |
| 2 | 2019-12-04 23:34:23.000 | In |
| 2 | 2019-12-05 01:32:21.000 | Out |
| 2 | 2019-12-05 01:38:49.000 | In |
| 2 | 2019-12-05 06:32:22.000 | Out |
-----+--------------------------+----------+
WITH CT AS (SELECT
EIn.PSNID, EIn.PSNNAME
,CAST(DATEADD(minute, -0, EIn.EventTime) AS date) AS dt
,EIn.EventTime AS LogIn
,CA_Out.EventTime AS LogOut
,DATEDIFF(minute, EIn.EventTime, CA_Out.EventTime) AS WorkingMinutes
FROM
VIEW_EVENT_EMPLOYEE AS EIn
CROSS APPLY
(
SELECT TOP(1) EOut.EventTime
FROM VIEW_EVENT_EMPLOYEE AS EOut
WHERE
EOut.PSNID = EIn.PSNID
AND EOut.ReaderNo = 'Out'
AND EOut.EventTime >= EIn.EventTime
ORDER BY EOut.EventTime
) AS CA_Out
WHERE
EIn.ReaderNo = 'In'
)
SELECT
PSNID
,PSNNAME
,dt
,LogIn
,LogOut
,WorkingMinutes
FROM CT
WHERE dt BETWEEN '2019-11-29' AND '2019-12-05'
ORDER BY LogIn
;
OUTPUT FROM QUERY
+----+------------+-------------------------+-------------------------+----------------+
| ID | date | In | Out | WorkingMinutes |
-----+------------+-------------------------+-------------------------+----------------+
| 1 | 2019-12-04 | 2019-12-04 11:28:46.000 | 2019-12-04 12:36:17.000 | 68 |
| 1 | 2019-12-04 | 2019-12-04 12:39:23.000 | 2019-12-04 12:51:21.000 | 12 |
| 1 | 2019-12-05 | 2019-12-05 07:37:49.000 | 2019-12-05 08:01:22.000 | 24 |
-----+------------+-------------------------+-------------------------+----------------+
I was thinking something like this. When Out is between 06:25 - 6:40. But I also need to check If employee, previous day has In between 21:50 - 22:30. I need that second condition because some employee from first shift maybe can Out, for example at 6:30.
*(1310 is the ShiftStartMinutesFromMidnight
Line 3 of Query
CAST(DATEADD(minute, -0, EIn.EventTime) AS date) AS dt
Updating the Line 3 with this code.
CASE
WHEN CAST(CA_Out.LogDate AS time) BETWEEN '06:25:00' AND '06:40:00'
AND CAST(EIn.LogDate AS time) BETWEEN '21:50:00' AND '22:30:00' THEN CAST(DATEADD(minute, -1310, EIn.LogDate) AS date)
ELSE CAST(DATEADD(minute, -0, EIn.LogDate) AS date)
END as dt
Expected Output
+----+------------+-------------------------+-------------------------+----------------+
| ID | date | In | Out | WorkingMinutes |
-----+------------+-------------------------+-------------------------+----------------+
| 2 | 2019-12-04 | 2019-12-04 22:11:46.000 | 2019-12-04 23:06:17.000 | 55 |
| 2 | 2019-12-04 | 2019-12-04 23:34:23.000 | 2019-12-05 01:32:21.000 | 118 |
| 2 | 2019-12-04 | 2019-12-05 01:38:49.000 | 2019-12-05 06:32:22.000 | 294 |
-----+------------+-------------------------+-------------------------+----------------+
Assuming that total minutes per separate date is enough:
WITH
/* enumerate pairs */
cte1 AS ( SELECT *,
COUNT(CASE WHEN ReaderNo = 'In' THEN 1 END)
OVER (PARTITION BY ID
ORDER BY EventTime) pair
FROM test ),
/* divide by pairs */
cte2 AS ( SELECT ID, MIN(EventTime) starttime, MAX(EventTime) endtime
FROM cte1
GROUP BY ID, pair ),
/* get dates range */
cte3 AS ( SELECT CAST(MIN(EventTime) AS DATE) minDate,
CAST(MAX(EventTime) AS DATE) maxDate
FROM test),
/* generate dates list */
cte4 AS ( SELECT minDate theDate
FROM cte3
UNION ALL
SELECT DATEADD(dd, 1, theDate)
FROM cte3, cte4
WHERE theDate < maxDate ),
/* add overlapped dates to pairs */
cte5 AS ( SELECT ID, starttime, endtime, theDate
FROM cte2, cte4
WHERE theDate BETWEEN CAST(starttime AS DATE) AND CAST(endtime AS DATE) ),
/* adjust borders */
cte6 AS ( SELECT ID,
CASE WHEN starttime < theDate
THEN theDate
ELSE starttime
END starttime,
CASE WHEN CAST(endtime AS DATE) > theDate
THEN DATEADD(dd, 1, theDate)
ELSE endtime
END endtime,
theDate
FROM cte5 )
/* calculate total minutes per date */
SELECT ID,
theDate,
SUM(DATEDIFF(mi, starttime, endtime)) workingminutes
FROM cte6
GROUP BY ID,
theDate
ORDER BY 1,2
fiddle
The solution is specially made detailed, step by step, so that you can easily understand the logic.
You may freely combine some CTEs into one. You may also use pre-last cte5 combined with cte2 if you need the output strongly as shown.
The solution assumes that none records are lost in source data (each 'In' matches strongly one 'Out' and backward, and no adjacent or overlapped pairs).
Don't know where you stopped but here is how I do,
Night shift 20:00 - 05:00 so in one day 00:00 - 5:00; 22:00 - 24:00
day shift 5:00 - 22:00
To get easier overlapping checking you need to change all dates to unix timestamp. so you don't have to split time intervals like shown above
So generate map of each period work for fetch period date_from and date_till, make sure to add holiday and pre-holiday exceptions where periods are different
something like:
Unix values is only for understanding.
unix_from_tim, unix_till_tim, shift_type
1580680800, 1580680800, 1 => example 02-02-2020:22:00:00, 03-02-2020:05:00:00, 1
1580680800, 1580680800, 0 => example 03-02-2020:05:00:00, 03-02-2020:22:00:00, 0
1580680800, 1580680800, 1 => example 03-02-2020:22:00:00, 04-02-2020:05:00:00, 1
...
Make sure you don't calculate overlapping minutes on period start/end..
And there is worker one row
with unix_from_tim, unix_from_tim
1580680800, 1580680800=> something like 02-02-2020:16:30:00, 03-02-2020:07:10:00
When you check overlapping you can get ms like this:
MIN(work_period:till,worker_period:till) - MAX(work_period:from, worker_period:from);
example in simple numbers:
work_period 3 - 7
worker_period 5 - 12
MIN(7,12) - MAX(3,5) = 7 - 5 = 2 //overlap
work_period 3 - 7
worker_period 8 - 12
MIN(7,12) - MAX(3,8) = 7 - 8 = -1 //if negative not overlap!
work_period 3 - 13
worker_period 8 - 12
MIN(13,12) - MAX(3,8) = 13 - 8 = 5 //full overlap!
And you have to check each worker period on all overlaping time generated work intervals.
May be someone can make select where you don't have to generate work_shift overlapping but its not a easy task if you add more holidays, transferred days, reduced time days etc.
Hope it helps

How can I aggregate values based on an arbitrary monthly cycle date range in SQL?

Given a table as such:
# SELECT * FROM payments ORDER BY payment_date DESC;
id | payment_type_id | payment_date | amount
----+-----------------+--------------+---------
4 | 1 | 2019-11-18 | 300.00
3 | 1 | 2019-11-17 | 1000.00
2 | 1 | 2019-11-16 | 250.00
1 | 1 | 2019-11-15 | 300.00
14 | 1 | 2019-10-18 | 130.00
13 | 1 | 2019-10-18 | 100.00
15 | 1 | 2019-09-18 | 1300.00
16 | 1 | 2019-09-17 | 1300.00
17 | 1 | 2019-09-01 | 400.00
18 | 1 | 2019-08-25 | 400.00
(10 rows)
How can I SUM the amount column based on an arbitrary date range, not simply a date truncation?
Taking the example of a date range beginning on the 15th of a month, and ending on the 14th of the following month, the output I would expect to see is:
payment_type_id | payment_date | amount
-----------------+--------------+---------
1 | 2019-11-15 | 1850.00
1 | 2019-10-15 | 230.00
1 | 2019-09-15 | 2600.00
1 | 2019-08-15 | 800.00
Can this be done in SQL, or is this something that's better handled in code? I would traditionally do this in code, but looking to extend my knowledge of SQL (which at this stage, isnt much!)
Click demo:db<>fiddle
You can use a combination of the CASE clause and the date_trunc() function:
SELECT
payment_type_id,
CASE
WHEN date_part('day', payment_date) < 15 THEN
date_trunc('month', payment_date) + interval '-1month 14 days'
ELSE date_trunc('month', payment_date) + interval '14 days'
END AS payment_date,
SUM(amount) AS amount
FROM
payments
GROUP BY 1,2
date_part('day', ...) gives out the current day of month
The CASE clause is for dividing the dates before the 15th of month and after.
The date_trunc('month', ...) converts all dates in a month to the first of this month
So, if date is before the 15th of the current month, it should be grouped to the 15th of the previous month (this is what +interval '-1month 14 days' calculates: +14, because the date_trunc() truncates to the 1st of month: 1 + 14 = 15). Otherwise it is group to the 15th of the current month.
After calculating these payment_days, you can use them for simple grouping.
I would simply subtract 14 days, truncate the month, and add 14 days back:
select payment_type_id,
date_trunc('month', payment_date - interval '14 day') + interval '14 day' as month_15,
sum(amount)
from payments
group by payment_type_id, month_15
order by payment_type_id, month_15;
No conditional logic is actually needed for this.
Here is a db<>fiddle.
You can use the generate_series() function and make a inner join comparing month and year, like this:
SELECT specific_date_on_month, SUM(amount)
FROM (SELECT generate_series('2015-01-15'::date, '2015-12-15'::date, '1 month'::interval) AS specific_date_on_month)
INNER JOIN payments
ON (TO_CHAR(payment_date, 'yyyymm')=TO_CHAR(specific_date_on_month, 'yyyymm'))
GROUP BY specific_date_on_month;
The generate_series(<begin>, <end>, <interval>) function generate a serie based on begin and end with an specific interval.

Poor Performance on Outer Join Timestamp Range Comparisons (Gap-Filling Time Series Data)

I have some time-series data (1.5 million rows currently). I am filling in some time gaps in my query using the generate_series method.
Imagine the following data that has a gap between 10 AM and 1 PM....
+-------+----------+-------+
| time | category | value |
+-------+----------+-------+
| 8 AM | 1 | 100 |
| 9 AM | 1 | 200 |
| 10 AM | 1 | 300 |
| 1 PM | 1 | 100 |
| 2 PM | 1 | 500 |
+-------+----------+-------+
I need my query results to fill in any gaps with the last known value for the series. Such as the following....
+-------+----------+-------+
| time | category | value |
+-------+----------+-------+
| 8 AM | 1 | 100 |
| 9 AM | 1 | 200 |
| 10 AM | 1 | 300 |
| 11 AM | 1 | 300 | (Gap filled with last known value)
| 12 PM | 1 | 300 | (Gap filled with last known value)
| 1 PM | 1 | 100 |
| 2 PM | 1 | 500 |
+-------+----------+-------+
I have a query that does this, but it's really slow (~5 secs in thesimplified example below). I'm hoping someone can show me a better/faster way?
In my case, my data is by the minute. So I fill in the gaps on 1-minute increments. I use the lead/window function to determine what the NEXT timestamp is for each row so I know which generated gap fillers will use that value.
Please see example below....
Generate test data
(create data for every minute for a year, with a 1 hour gap two hours ago)
create table mydata as
with a as
(
select
date_time
from
generate_series(date_trunc('minute', now())::timestamp - '1 year':: interval, date_trunc('minute', now()::timestamp - '2 hours'::interval), interval '1 minute') as date_time
union
select
date_time
from
generate_series(date_trunc('minute', now())::timestamp - '1 hour':: interval, date_trunc('minute', now()::timestamp ), interval '1 minute') as date_time
),
b as
(
select category from generate_series(1,10,1) as category
)
select
a.*,
b.*,
round(random() * 100)::integer as value
from
a
cross join
b
;
create index myindex1 on mydata (category, date_time);
create index myindex2 on mydata (date_time);
Query the data to get all category=5 data for the last 5 days (with gaps filled)
with a as
(
select
mydata.*,
lead(mydata.date_time) over (PARTITION BY category ORDER BY date_time asc) as next_date_time
from
mydata
where
category = 5
and
date_time between now() - '5 days'::interval and now()
),
b as
(
SELECT generated_time::timestamp without time zone FROM generate_series(date_trunc('minute', now()) - '5 days'::interval, date_trunc('minute', now()), interval '1 minute') as generated_time
)
select
b.generated_time as date_time,
a.category,
a.value
from
b
left join
a
on
b.generated_time >= a.date_time and b.generated_time < a.next_date_time
order by
b.generated_time desc
;
This query functions perfectly. Sample results...
+---------------------+----------+-------+
| date_time | category | value |
+---------------------+----------+-------+
| 2018-07-06 12:17:00 | 5 | 13 |
| 2018-07-06 12:16:00 | 5 | 17 | (gap filled)
| 2018-07-06 12:15:00 | 5 | 17 | (gap filled)
| ... | ... | ... | (gap filled)
| 2018-07-06 11:18:00 | 5 | 17 | (gap filled)
| 2018-07-06 11:17:00 | 5 | 17 |
| 2018-07-06 11:16:00 | 5 | 62 |
+---------------------+----------+-------+
However, this part kills performance...
b.generated_time >= a.date_time and b.generated_time < a.next_date_time
If I just do something like..
b.generated_time = a.next_date_time
Then it's very fast, but of course, incorrect results. It really doesn't like me doing an 'and', OR, greaterThan or lessThan. I thought that maybe it was because I was comparing to next_date_time which is generated on-the-fly and not indexed. But I even tried materializing that data into a table with an index first, performance was roughly the same.
I added the timescaledb extension tag to this post in case they have some built-in functionality to assist with this.
The 'explain' results
Sort (cost=268537.46..270431.35 rows=757556 width=16)
Sort Key: b.generated_time DESC
CTE a
-> WindowAgg (cost=0.44..11057.66 rows=6818 width=24)
-> Index Scan using myindex1 on mydata (cost=0.44..10938.35 rows=6818 width=16)
Index Cond: ((category = 5) AND (date_time >= (now() - '5 days'::interval)) AND (date_time <= now()))
CTE b
-> Function Scan on generate_series generated_time (cost=0.02..12.52 rows=1000 width=8)
-> Nested Loop Left Join (cost=0.00..170538.18 rows=757556 width=16)
Join Filter: ((b.generated_time >= a.date_time) AND (b.generated_time < a.next_date_time))
-> CTE Scan on b (cost=0.00..20.00 rows=1000 width=8)
-> CTE Scan on a (cost=0.00..136.36 rows=6818 width=24)
I'm using Postgres 10.4. Any suggestions on how to make this faster?
Thanks!!
So, I going to 'partially' answer my own question. I did find a way to accomplish what I want that performs MUCH better (sub-second). However, it is not as intuitive/readable and would really like to know how to make my first method faster. Just for the sake of knowledge, I really want to know what I was doing wrong.
Anyway, the following method seems to work. I calculate the number of minutes between each row, then just generate a series of rows with the same data but with 1 minute increments that number of times.
I'll give this a few days. If nobody comes up with a fix (or a better way) for the first method, then I'll mark this as the accepted answer.
select
generate_series(date_time, date_time + (((EXTRACT(EPOCH FROM (lead(mydata.date_time) over w - date_time)) / 60)-1) || 'minutes')::interval, interval '1 minute') as date_time,
category,
value
from
mydata
where
category = 5
and
date_time between now() - '5 days'::interval and now()
window w as (PARTITION BY category ORDER BY date_time asc)
order by
mydata.date_time desc

Use Date difference in a where clause with SQL

From a table named 'Subscriptions' I want to list all item that expire between 0 and 3 days from the current day.
$Today = date('Y-m-d');
|--------|----------|------------|
| SUB_Id | SUB_Name | SUB_End |
|--------|----------|------------|
| 1 | Banana | 2017-12-01 |
| 2 | Apple | 2017-11-03 |
| 3 | Pear | 2017-11-03 |
|--------|----------|------------|
I should have the last two rows as the SUB_End - $Today is <= 3 days.
What I try:
select * from Subscriptions
where DATEDIFF(SUB_End , $today) <= 3;
I would do this entirely in SQL:
select s.*
from Subscriptions s
where sub_end >= curdate() and
sub_end <= curdate() + interval 3 day;
You can use the following query, this is only for sql server :
select * from Subscriptions
where DATEDIFF(day, SUB_End , getdate()) <= 3;
You can try this.
Edited: Oracle based solution.
select s.*
from Subscriptions s
where s.sub_end BETWEEN TRUNC(SYSDATE - 3) AND TRUNC(SYSDATE)