Select min and max values while grouped by a third column

Select min and max values while grouped by a third column - sql

I have a table with campaign data and need to get a list of 'spend_perc' min and max values while grouping by the client_id AND timing of these campaigns.
sample data being:
camp_id | client_id | start_date | end_date | spend_perc
7257 | 35224 | 2017-01-16 | 2017-02-11 | 100.05
7284 | 35224 | 2017-01-16 | 2017-02-11 | 101.08
7308 | 35224 | 2017-01-16 | 2017-02-11 | 101.3
7309 | 35224 | 2017-01-16 | 2017-02-11 | 5.8
6643 | 35224 | 2017-02-08 | 2017-02-24 | 79.38
6645 | 35224 | 2017-02-08 | 2017-02-24 | 6.84
6648 | 35224 | 2017-02-08 | 2017-02-24 | 100.01
6649 | 78554 | 2017-02-09 | 2017-02-27 | 2.5
6650 | 78554 | 2017-02-09 | 2017-02-27 | 18.5
6651 | 78554 | 2017-02-09 | 2017-02-27 | 98.5
what I'm trying to get is the rows with min and max 'spend_perc' values per each client_id AND within the same campaign timing (identical start/end_date):
camp_id | client_id | start_date | end_date | spend_perc
7308 | 35224 | 2017-01-16 | 2017-02-11 | 101.3
7309 | 35224 | 2017-01-16 | 2017-02-11 | 5.8
6645 | 35224 | 2017-02-08 | 2017-02-24 | 6.84
6648 | 35224 | 2017-02-08 | 2017-02-24 | 100.01
6649 | 78554 | 2017-02-09 | 2017-02-27 | 2.5
6651 | 78554 | 2017-02-09 | 2017-02-27 | 98.5

smth like:?
with a as
(select distinct
camp_id,client_id,start_date,end_date,max(spend_perc) over (partition by start_date,end_date),min(spend_perc) over (partition by start_date,end_date)
from tn
)
select camp_id,client_id,start_date,end_date,case when spend_perc=max then max when spend_perc = min then min end spend_perc
from a
order by camp_id,client_id,start_date,end_date,spend_perc

I think you will want to get rid of the camp_id field because that will be meaningless in this case. So you want something like:
SELECT client_id, start_date, end_date,
min(spend_perc) as min_spend_perc, max(spend_perc) as max_spend_perc
FROM mytable
GROUP BY client_id, start_date, end_date;
Group by the criteria you want to, and select min and max as columns per unique combination of these values (i.e. per row).

Related

Date compare between two datetime columns and a constant

Lets say I have this following table:
+------+-------------------------+-------------------------+
| ID | CreatedDate | LastChangedDate |
+------+-------------------------+-------------------------+
| 3965 | 2019-01-23 03:54:44.903 | 2021-03-12 06:24:45.390 |
+------+-------------------------+-------------------------+
| 3966 | 2019-01-23 03:55:37.160 | 2021-01-09 04:50:20.697 |
+------+-------------------------+-------------------------+
| 3967 | 2019-01-23 03:56:21.197 | 2020-05-11 06:10:14.203 |
+------+-------------------------+-------------------------+
| 3968 | 2019-01-23 03:57:07.943 | 2020-05-11 11:28:26.580 |
+------+-------------------------+-------------------------+
| 3969 | 2019-01-23 03:58:01.020 | NULL |
+------+-------------------------+-------------------------+
| 3970 | 2019-01-23 03:58:42.293 | 2021-05-11 09:57:54.553 |
+------+-------------------------+-------------------------+
| 4143 | 2019-03-19 04:23:08.003 | 2020-12-14 10:08:38.303 |
+------+-------------------------+-------------------------+
| 4144 | 2019-03-19 04:51:14.533 | 2020-12-14 10:05:11.867 |
+------+-------------------------+-------------------------+
| 4145 | 2019-03-19 05:16:28.980 | 2019-07-11 07:23:15.803 |
+------+-------------------------+-------------------------+
| 4146 | 2019-03-19 05:18:49.550 | 2020-01-02 09:12:13.597 |
+------+-------------------------+-------------------------+
| 4808 | 2019-09-17 05:44:54.587 | 2021-01-09 10:35:20.860 |
+------+-------------------------+-------------------------+
| 5243 | 2020-01-02 09:07:10.573 | 2021-02-01 16:06:51.770 |
+------+-------------------------+-------------------------+
| 5666 | 2020-08-12 07:16:20.617 | 2021-01-09 04:52:25.427 |
+------+-------------------------+-------------------------+
| 5877 | 2020-09-05 05:35:56.160 | 2021-01-09 04:51:43.707 |
+------+-------------------------+-------------------------+
Now lets say I want to see whether CreatedDate or LastChangedDate column is greater than '2021-01-09'.
To do the comparison, I need to check if LastChangedDate is NULL then compare the given date with CreatedDate field, else compare with LastChangedDate field.
What I have tried so far:
DECLARE #test_date as varchar(20) = '2021-01-09'
SELECT ID, CreatedDate, LastChangedDate
FROM #temptable
WHERE CAST(ISNULL(LastChangedDate,CreatedDate) as date) > CAST(#test_date as date)
But it is not giving proper output.
What I want is:
+------+-------------------------+-------------------------+
| ID | CreatedDate | LastChangedDate |
+------+-------------------------+-------------------------+
| 3967 | 2019-01-23 03:56:21.197 | 2020-05-11 06:10:14.203 |
+------+-------------------------+-------------------------+
| 3968 | 2019-01-23 03:57:07.943 | 2020-05-11 11:28:26.580 |
+------+-------------------------+-------------------------+
| 3969 | 2019-01-23 03:58:01.020 | NULL |
+------+-------------------------+-------------------------+
| 4143 | 2019-03-19 04:23:08.003 | 2020-12-14 10:08:38.303 |
+------+-------------------------+-------------------------+
| 4144 | 2019-03-19 04:51:14.533 | 2020-12-14 10:05:11.867 |
+------+-------------------------+-------------------------+
| 4145 | 2019-03-19 05:16:28.980 | 2019-07-11 07:23:15.803 |
+------+-------------------------+-------------------------+
| 4146 | 2019-03-19 05:18:49.550 | 2020-01-02 09:12:13.597 |
+------+-------------------------+-------------------------+
Also here is a sqlplayground with sample data

In fact you shouldn't need any CASTs:
first define your variable as a date instead of a string - always use the correct datatype for the data you are storing.
second use logical operations (AND/OR) instead of COALESCE to make your query sargable i.e. able to use indexes. Anytime you use a function on a column in your WHERE clause you run the risk of preventing the use of indexes and slowing your query down.
DECLARE #test_date as date = '2021-01-09';
SELECT ID, CreatedDate, LastChangedDate
FROM #temptable
WHERE (LastChangedDate IS NOT NULL AND LastChangedDate > #test_date)
OR (LastChangedDate IS NULL AND CreatedDate > #test_date);

I think you are comparing only date. So try with compare date and time both.
SELECT ID,CreatedDate, LastChangedDate
FROM #temptable WHERE
CAST(ISNULL(LastChangedDate,CreatedDate) as datetime) > CAST(#test_date as datetime)

SQL LAG function over dates

I have the following table (example):
+----+-------+-------------+----------------+
| id | value | last_update | ingestion_date |
+----+-------+-------------+----------------+
| 1 | 30 | 2021-02-03 | 2021-02-07 |
+----+-------+-------------+----------------+
| 1 | 29 | 2021-02-03 | 2021-02-06 |
+----+-------+-------------+----------------+
| 1 | 28 | 2021-01-25 | 2021-02-02 |
+----+-------+-------------+----------------+
| 1 | 25 | 2021-01-25 | 2021-02-01 |
+----+-------+-------------+----------------+
| 1 | 23 | 2021-01-20 | 2021-01-31 |
+----+-------+-------------+----------------+
| 1 | 20 | 2021-01-20 | 2021-01-30 |
+----+-------+-------------+----------------+
| 2 | 55 | 2021-02-03 | 2021-02-06 |
+----+-------+-------------+----------------+
| 2 | 50 | 2021-01-25 | 2021-02-02 |
+----+-------+-------------+----------------+
The result I need:
It should be the last updated value in the column value and the penult value (based in the last_update and ingestion_date) in the value2.
+----+-------+-------------+----------------+--------+
| id | value | last_update | ingestion_date | value2 |
+----+-------+-------------+----------------+--------+
| 1 | 30 | 2021-02-03 | 2021-02-07 | 28 |
+----+-------+-------------+----------------+--------+
| 2 | 55 | 2021-02-03 | 2021-02-06 | 50 |
+----+-------+-------------+----------------+--------+
The query I have right now is the following:
SELECT id, value, last_update, ingestion_date, value2
FROM
(SELECT *,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY last_update DESC, ingestion_date DESC) AS order,
LAG(value) OVER(PARTITION BY id ORDER BY last_update, ingestion_date) AS value2
FROM table)
WHERE ordem = 1
The result I am getting:
+----+-------+-------------+----------------+--------+
| ID | value | last_update | ingestion_date | value2 |
+----+-------+-------------+----------------+--------+
| 1 | 30 | 2021-02-03 | 2021-02-07 | 29 |
+----+-------+-------------+----------------+--------+
| 2 | 55 | 2021-02-03 | 2021-02-06 | 50 |
+----+-------+-------------+----------------+--------+
Obs: I am using Athena from AWS

Group by bursts of occurences in TimescaleDB/PostgreSQL

this is my first question in stackoverflow, any advice on how to ask a well structured question will be welcomed.
So, I have a TimescaleDB database, which is time-series databases built over Postgres. It has most of its functionalities, so if any of you don't know about Timescale it won't be an issue.
I have a select statement which returns:
time | num_issues | actor_login
------------------------+------------+------------------
2015-11-10 01:00:00+01 | 2 | nifl
2015-12-10 01:00:00+01 | 1 | anandtrex
2016-01-09 01:00:00+01 | 1 | isaacrg
2016-02-08 01:00:00+01 | 1 | timbarclay
2016-06-07 02:00:00+02 | 1 | kcalmes
2016-07-07 02:00:00+02 | 1 | cassiozen
2016-08-06 02:00:00+02 | 13 | phae
2016-09-05 02:00:00+02 | 2 | phae
2016-10-05 02:00:00+02 | 13 | cassiozen
2016-11-04 01:00:00+01 | 6 | cassiozen
2016-12-04 01:00:00+01 | 4 | cassiozen
2017-01-03 01:00:00+01 | 5 | cassiozen
2017-02-02 01:00:00+01 | 8 | cassandraoid
2017-03-04 01:00:00+01 | 16 | erquhart
2017-04-03 02:00:00+02 | 3 | erquhart
2017-05-03 02:00:00+02 | 9 | erquhart
2017-06-02 02:00:00+02 | 5 | erquhart
2017-07-02 02:00:00+02 | 2 | greatwarlive
2017-08-01 02:00:00+02 | 8 | tech4him1
2017-08-31 02:00:00+02 | 7 | tech4him1
2017-09-30 02:00:00+02 | 17 | erquhart
2017-10-30 01:00:00+01 | 7 | erquhart
2017-11-29 01:00:00+01 | 12 | erquhart
2017-12-29 01:00:00+01 | 8 | tech4him1
2018-01-28 01:00:00+01 | 6 | ragasirtahk
And it follows. Basically it returns a username in a bucket of time, in this case 30 days.
The SQL query is:
SELECT DISTINCT ON(time_bucket('30 days', created_at))
time_bucket('30 days', created_at) as time,
count(id) as num_issues,
actor_login
FROM
issues_event
WHERE action = 'opened' AND repo_name='netlify/netlify-cms'
group by time, actor_login
order by time, num_issues DESC
My question is, how can i detect or group the rows which have equal actor_login and are consecutive.
For example, I would like to group the cassiozen from 2016-10-05 to 2017-01-03, but not with the other cassiozen of the column.
I have tried with auxiliar columns, with window functions such as LAG, but without a function or a do statement I don't think it is possible.
I also tried with functions but I can't find a way.
Any approach, idea or solution will be fully appreciated.
Edit: I show my desired output.
time | num_issues | actor_login | actor_group_id
------------------------+------------+------------------+----------------
2015-11-10 01:00:00+01 | 2 | nifl | 0
2015-12-10 01:00:00+01 | 1 | anandtrex | 1
2016-01-09 01:00:00+01 | 1 | isaacrg | 2
2016-02-08 01:00:00+01 | 1 | timbarclay | 3
2016-06-07 02:00:00+02 | 1 | kcalmes | 4
2016-07-07 02:00:00+02 | 1 | cassiozen | 5
2016-08-06 02:00:00+02 | 13 | phae | 6
2016-09-05 02:00:00+02 | 2 | phae | 6
2016-10-05 02:00:00+02 | 13 | cassiozen | 7
2016-11-04 01:00:00+01 | 6 | cassiozen | 7
2016-12-04 01:00:00+01 | 4 | cassiozen | 7
2017-01-03 01:00:00+01 | 5 | cassiozen | 7
2017-02-02 01:00:00+01 | 8 | cassandraoid | 12
2017-03-04 01:00:00+01 | 16 | erquhart | 13
2017-04-03 02:00:00+02 | 3 | erquhart | 13
2017-05-03 02:00:00+02 | 9 | erquhart | 13
2017-06-02 02:00:00+02 | 5 | erquhart | 13
2017-07-02 02:00:00+02 | 2 | greatwarlive | 17
2017-08-01 02:00:00+02 | 8 | tech4him1 | 18
2017-08-31 02:00:00+02 | 7 | tech4him1 | 18
2017-09-30 02:00:00+02 | 17 | erquhart | 16
2017-10-30 01:00:00+01 | 7 | erquhart | 16
2017-11-29 01:00:00+01 | 12 | erquhart | 16
2017-12-29 01:00:00+01 | 8 | tech4him1 | 21
2018-01-28 01:00:00+01 | 6 | ragasirtahk | 24
The solution of MatBaille is almost perfect.
I just wanted to group the consecutive actors like this so I could extract a bunch of metrics with other attributes of the table.

You could use a so-called "gaps-and-islands" approach
WITH
sorted AS
(
SELECT
*,
ROW_NUMBER() OVER ( ORDER BY time) AS rn,
ROW_NUMBER() OVER (PARTITION BY actor_login ORDER BY time) AS rn_actor
FROM
your_results
)
SELECT
*,
rn - rn_actor AS actor_group_id
FROM
sorted
Then the combination of (actor_login, actor_group_id) will group consecutive rows together.
db<>fiddle demo

How to GROUP BY date,id By checking overall sum of transactions in ORACLE

I want to group by table using client_code,date and filial. But I also need to check overall turnover summ between given dates. client_code = substr(CODE_ACCOUNT, 10, 8)
Table ACCOUNT_TABLE
------------------------------------------------------------------------------
CODE_FILIAL | OPER_DAY | CODE_ACCOUNT | CREDIT |
------------------------------------------------------------------------------
00825 | 2020-01-02 12:32:22 |20210000700123343001 | 124544112 |
00825 | 2020-02-23 21:45:00 |20210000700123343001 | 523553452.23 |
00825 | 2020-02-23 21:45:00 |20212000700224543001 | 245565345.23 |
00825 | 2020-02-10 09:18:00 |20212000700224543001 | 987565345.23 |
00825 | 2020-03-21 14:45:00 |20212000700253374001 | 100053523.23 |
00825 | 2020-04-03 18:45:00 |20212000700123343001 | 133354523.23 |
00825 | 2020-05-18 23:00:00 |20210000700123343001 | 892334523.23 |
Below what I have tried so far
SELECT substr(CODE_ACCOUNT, 10, 8) AS CLIENT_CODE,CODE_ACCOUNT,oborot,mnth,CODE_FILIAL FROM (
SELECT CODE_ACCOUNT,sum(CREDIT)/100 as oborot,to_char(s.OPER_DAY, 'YYYY-MM') AS mnth,CODE_FILIAL
FROM ACCOUNT_TABLE s
WHERE s.OPER_DAY >= to_date('01.01.2020', 'DD.MM.YYYY')
AND s.OPER_DAY < to_date('01.07.2020', 'DD.MM.YYYY')
AND s.CODE_FILIAL in ('00820','00825')
AND substr(s.CODE_ACCOUNT, 1, 8) IN ('20208000','20210000','20212000')
GROUP BY CODE_ACCOUNT,to_char(s.OPER_DAY, 'YYYY-MM'),s.CODE_FILIAL
) rs ORDER BY rs.oborot DESC
Result i took:
CLIENT_CODE | CODE_ACCOUNT | OBOROT | MNTH | CODE_FILIAL|
-----------------------------------------------------------------------------------
00123343 | 20210000900123343001 | 124544112 | 2020-01 | 00825 |
00123343 | 20210000900123343001 | 523553452.23 | 2020-02 | 00825 |
00123343 | 20212000700123343001 | 133354523.23 | 2020-04 | 00825 |
00123343 | 20210000900123343001 | 892334523.23 | 2020-05 | 00825 |
00224543 | 20212000700224543001 | 1233130690.46 | 2020-02 | 00825 |
00253374 | 20212000700253374001 | 100053523.23 | 2020-03 | 00825 |
In this case I am trying to take 6 month. As you see that last data is not needed to me. Cause it is less than 1000000000 . I want for one client_code during 6 month overall summ is greater that 1000000000 number.
Result I want
CLIENT_CODE | CODE_ACCOUNT | OBOROT | MNTH | CODE_FILIAL|
-----------------------------------------------------------------------------------
00123343 | 20210000900123343001 | 124544112 | 2020-01 | 00825 |
00123343 | 20210000900123343001 | 523553452.23 | 2020-02 | 00825 |
00123343 | 20212000700123343001 | 133354523.23 | 2020-04 | 00825 |
00123343 | 20210000900123343001 | 892334523.23 | 2020-05 | 00825 |
00224543 | 20212000700224543001 | 1233130690.46 | 2020-02 | 00825 |
In above result client_code = 00253374 is not taken because oborot summa is less than 1000000000. To sum up I need to add WHERE clause somewhere to check all summ. Any help is appreciated!

You can use the having clause in yoir inner query as follows:
GROUP BY CODE_ACCOUNT,to_char(s.OPER_DAY, 'YYYY-MM'),s.CODE_FILIAL
Having sum(CREDIT)/100 >= 1000000000
Update:
You need to use the analytical function as follows:
Select * from
(Select t.*, sum(OBOROT) over (partition by client_code, CODE_FILIAL) as sm
From (<your_existing_query>) t)
Where sm >= 1000000000;

Aggregate data from days into a month

I have data that is presented by the day and I want to the data into a monthly report. The data looks like this.
INVOICE_DATE GROSS_REVENUE NET_REVENUE
2018-06-28 ,1623.99 ,659.72
2018-06-27 ,112414.65 ,38108.13
2018-06-26 ,2518.74 ,1047.14
2018-06-25 ,475805.92 ,172193.58
2018-06-22 ,1151.79 ,478.96
How do I go about creating a report where it gives me the total gross revenue and net revenue for the month of June, July, August etc where the data is reported by the day?
So far this is what I have
SELECT invoice_date,
SUM(gross_revenue) AS gross_revenue,
SUM(net_revenue) AS net_revenue
FROM wc_revenue
GROUP BY invoice_date

I would simply group by year and month.
SELECT invoice_date,
SUM(gross_revenue) AS gross_revenue,
SUM(net_revenue) AS net_revenue
FROM wc_revenue GROUP BY year(invoice_date), month(invoice_date)
Since I don't know if you have access to the year and month functions, another solution would be to cast the date as a varchar and group by the left-most 7 characters (year+month)
SELECT left(cast(invoice_date as varchar(50)),7) AS invoice_date,
SUM(gross_revenue) AS gross_revenue,
SUM(net_revenue) AS net_revenue
FROM wc_revenue GROUP BY left(cast(invoice_date as varchar(50)),7)

You could try a ROLLUP. Sample illustration below:
Table data:
mysql> select * from wc_revenue;
+--------------+---------------+-------------+
| invoice_date | gross_revenue | net_revenue |
+--------------+---------------+-------------+
| 2018-06-28 | 1623.99 | 659.72 |
| 2018-06-27 | 112414.65 | 38108.13 |
| 2018-06-26 | 2518.74 | 1047.14 |
| 2018-06-25 | 475805.92 | 172193.58 |
| 2018-06-22 | 1151.79 | 478.96 |
| 2018-07-02 | 150.00 | 100.00 |
| 2018-07-05 | 350.00 | 250.00 |
| 2018-08-07 | 600.00 | 400.00 |
| 2018-08-09 | 900.00 | 600.00 |
+--------------+---------------+-------------+
mysql> SELECT month(invoice_date) as MTH, invoice_date, SUM(gross_revenue) AS gross_revenue, SUM(net_revenue) AS net_revenue
FROM wc_revenue
GROUP BY MTH, invoice_date WITH ROLLUP;
+------+--------------+---------------+-------------+
| MTH | invoice_date | gross_revenue | net_revenue |
+------+--------------+---------------+-------------+
| 6 | 2018-06-22 | 1151.79 | 478.96 |
| 6 | 2018-06-25 | 475805.92 | 172193.58 |
| 6 | 2018-06-26 | 2518.74 | 1047.14 |
| 6 | 2018-06-27 | 112414.65 | 38108.13 |
| 6 | 2018-06-28 | 1623.99 | 659.72 |
| 6 | NULL | 593515.09 | 212487.53 |
| 7 | 2018-07-02 | 150.00 | 100.00 |
| 7 | 2018-07-05 | 350.00 | 250.00 |
| 7 | NULL | 500.00 | 350.00 |
| 8 | 2018-08-07 | 600.00 | 400.00 |
| 8 | 2018-08-09 | 900.00 | 600.00 |
| 8 | NULL | 1500.00 | 1000.00 |
| NULL | NULL | 595515.09 | 213837.53 |
+------+--------------+---------------+-------------+

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select min and max values while grouped by a third column - sql

Related

Date compare between two datetime columns and a constant

SQL LAG function over dates

Group by bursts of occurences in TimescaleDB/PostgreSQL

How to GROUP BY date,id By checking overall sum of transactions in ORACLE

Aggregate data from days into a month

Categories

Resources