SQL statement with having, min , max - sql

I have a table:
ID INTEGER NOT NULL, -- AUTOMATIC RECORD'S ID
CUSTOMER_ID INTEGER NOT NULL,
BILING_PERIOD DATE NOT NULL,
DOCUMENT_ID INTEGER NOT NULL,
DATE_CREATED DATE NOT NULL -- WHEN THE DOCUMENT WAS CREATED
I want to select number of documents for customer in biling period,
id for the document that was created first in biling period for customer
and id for the document that was created last in biling period for customer.
All should be sorted by customer and biling period.
I want only biling periods that have more than 1 document for customer.
So when we have for example such data:
ID CUSTOMER_ID BILING_PERIOD DOCUMENT_ID DATE_CREATED
1 5 2020-01-01 123 2020-02-01
2 5 2020-01-01 22 2019-02-01
3 5 2020-01-01 3 2010-02-01
4 99 2020-01-01 458 2021-02-01
5 99 2020-01-01 64 2010-02-01
6 100 2020-01-01 120 2020-02-01
7 99 2019-06-01 452 2019-06-01
8 99 2019-06-01 546 2019-12-01
I want my results looks like that:
CUSTOMER_ID BILING_PERIOD NR_OF_DOC FIRST_DOC_ID LAST_DOC_ID
5 2020-01-01 3 3 123
99 2019-06-01 2 452 546
99 2020-01-01 2 64 458
Myself I can only count number of documents per user and period
SELECT customer_id, biling_period, count(*) as nr_of_doc
FROM T1
GROUP BY customer_id, biling_period
HAVING COUNT() > 1;
CUSTOMER_ID BILING_PERIOD NR_OF_DOC
5 2020-01-01 3
99 2019-06-01 2
99 2020-01-01 2
I do not know hot to get document_id for newest and oldest document.

You can use row_number() and aggregation:
select
customer_id,
billing_period,
count(*),
max(case when rn_asc = 1 then document_id end) fist_doc_id,
max(case when rn_desc = 1 then document_id end) last_doc_id
from (
select
t.*,
row_number() over(
partition by customer_id, billing_period order by date_created
) rn_asc,
row_number() over(
partition by customer_id, billing_period order by date_created desc
) rn_desc
from t1 t
) t
group by customer_id, billing_period
having count(*) > 1
order by customer_id, billing_period
This will wodk properly even if the document ids are not in sequence.
Demo on DB Fiddle:
customer_id | billing_period | count | fist_doc_id | last_doc_id
----------: | :------------- | ----: | ----------: | ----------:
5 | 2020-01-01 | 3 | 3 | 123
99 | 2019-06-01 | 2 | 452 | 546
99 | 2020-01-01 | 2 | 64 | 458

In your sample data, the document ids seem to be assigned in order. If that is the case, you can just use aggregation:
SELECT customer_id, billing_period, count(*) as nr_of_doc,
MIN(document_id), MAX(document_id)
FROM T1
GROUP BY customer_id, billing_period
HAVING COUNT() > 1;

Related

How to create a cumulative count distinct with partition by in SQL?

I have a table with user data and want to create a cumulative count distinct but this type of window function does not exist. This is my table
date | user-id | purchase-id
2020-01-01 | 1 | 244
2020-01-03 | 1 | 244
2020-02-01 | 1 | 524
2020-03-01 | 2 | 443
Now, I want a cum count distinct for purchase id like this:
date | user-id | purchase-id | cum_purchase
2020-01-01 | 1 | 244 | 1
2020-01-03 | 1 | 244 | 1
2020-02-01 | 1 | 524 | 2
2020-03-01 | 2 | 443 | 1
I tried
Select
dt,
user_id,
count(distinct purchase_id) over (partition by user_id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cum_ct
from table
I get an error that I cannot use count distinct with an order by statement. What to do?
Something like this
Select
dt as [date],
user_id,
purchase_id
SUM(CASE WHEN rn = 1 THEN 1 ELSE 0 END) over (partition by user_id ORDER BY dt ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cum_ct
from (
SELECT
dt,
user_id,
purchase_id,
ROW_NUMBER() OVER (PARTITION BY user_id, purchase_id ORDER BY dt) as RN
FROM sometable
) sub

Postgresql how to select columns where it matches conditions?

I have a table like this:
inventory_id | customer_id | max
--------------+-------------+---------------------
4497 | 1 | 2005-07-28 00:00:00
1449 | 1 | 2005-08-22 00:00:00
1440 | 1 | 2005-08-02 00:00:00
3232 | 1 | 2005-08-02 00:00:00
3418 | 2 | 2005-08-02 00:00:00
654 | 2 | 2005-08-02 00:00:00
3164 | 2 | 2005-08-21 00:00:00
2053 | 2 | 2005-07-27 00:00:00
I want to select rows where most recent date with corresponding columns, This is what I want to achieve:
inventory_id | customer_id | max
--------------+-------------+---------------------
1449 | 1 | 2005-08-22 00:00:00
3164 | 2 | 2005-08-21 00:00:00
I tried to use aggregate but I need inventory_id and customer_id appear at the same time.
Is there any method that could do this?
Use distinct on:
select distinct on (customer_id) t.*
from t
order by customer_id, max desc;
distinct on is a Postgres extension that returns on row per whatever is in the parentheses. This row is based on the order by -- the first one that appears in the sorted set.
SELECT inventory_id, customer_id, max FROM
(SELECT inventory_id, customer_id, max,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY max DESC) AS ROWNO
FROM inventory_table) AS A
WHERE ROWNO=1

Grouping consecutive sequences of rows

I'm trying to group consecutive rows where a boolean value is true on SQL Server. For example, here's what some source data looks like:
AccountID | ID | IsTrue | Date
-------------------------------
1 | 1 | 1 | 1/1/2013
1 | 2 | 1 | 1/2/2013
1 | 3 | 1 | 1/3/2013
1 | 4 | 0 | 1/4/2013
1 | 5 | 1 | 1/5/2013
1 | 6 | 0 | 1/6/2013
1 | 7 | 1 | 1/7/2013
1 | 8 | 1 | 1/8/2013
1 | 9 | 1 | 1/9/2013
And here's what I'd like as the output
AccountID | Start | End
-------------------------------
1 | 1/1/2013 | 1/3/2013
1 | 1/7/2013 | 1/9/2013
I have a hunch that there's some trick with grouping by partitions that will make this work but I've been unable to figure it out. I've made some progress using LAG but haven't been able to put it all together.
Thanks for the help!
This is an example of a gaps and islands problem. For this version, you just need a sequential number for each isTrue. Subtracting this number of days from each date is a constant for adjacent values that are the same:
select accountId, isTrue, min(date), max(date)
from (select t.*,
row_number() over (partition by accountId, isTrue order by date) as seqnum
from t
) t
group by accountId, isTrue, dateadd(day, -seqnum, date);
This defines all groups. If I assume that you just want values of "1" that are more than 1 day long, then:
select accountId, isTrue, min(date), max(date)
from (select t.*,
row_number() over (partition by accountId, isTrue order by date) as seqnum
from t
where isTrue = 1
) t
group by accountId, isTrue, dateadd(day, -seqnum, date)
having count(*) > 1;
You can try the following, here is the demo. I have assumption that id will always have consecutive values.
with cte as
(
select
*,
count(*) over (partition by IsTrue, rnk) as total
from
(
select
*,
id - row_number() over (partition by IsTrue order by id, date) as rnk
from myTable
) val
)
select
accountId,
min(date) as start,
max(date) as end
from cte
where total > 1
group by
accountId,
rnk
Output:
| accountid | start | end |
| --------- | ---------- | -----------|
| 1 | 2013-01-01 | 2013-01-03 |
| 1 | 2013-01-07 | 2013-01-09 |

Query to check if column value appears more than once

I have a PostgreSQL query:
SELECT DISTINCT ON ("contract"."contract_id") "contract"."id"
FROM "contract_versions" "contract"
WHERE "contract"."client_id" = 1
GROUP BY "contract"."contract_id", "contract"."id"
ORDER BY "contract"."contract_id", "contract"."change_effective_date" DESC
I want to add something like if contract_id occurs more than once then change_effective_date >= now()
Dataset:
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
100 | 10 | 1 | 2020-05-17 00:00:00
200 | 10 | 1 | 2020-05-16 00:00:00
300 | 10 | 1 | 2020-05-14 00:00:00
400 | 20 | 1 | 2020-05-17 00:00:00
500 | 30 | 1 | 2020-05-13 00:00:00
600 | 30 | 1 | 2020-05-14 00:00:00
Expected result:
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
200 | 10 | 1 | 2020-05-16 00:00:00
400 | 20 | 1 | 2020-05-17 00:00:00
600 | 30 | 1 | 2020-05-14 00:00:00
If the count of contract_id is more than 1, I want a row with change_effective_date less than or equals today
I tried using:
SELECT DISTINCT ON ("contract"."contract_id") "contract"."id",
COUNT("contract"."contract_id") AS cnt
FROM "contract_versions" "contract"
WHERE "contract"."client_id" = 1 AND
CASE WHEN "cnt" > 1 THEN "contract"."change_effective_date" <= now() END
GROUP BY "contract"."contract_id", "contract"."id"
ORDER BY "contract"."contract_id", "contract"."change_effective_date" DESC
but its throwing an error column "cnt" does not exist
Thanks
I have never seen DISTINCT ON combined with GROUP BY. It is possible (aggregate first, then pick rows from that aggregation result), but this is not what you are doing and it is not what you want either. You want the ranking applied by the ORDER BY clause for the DISTINCT ON to take the current date into account.
SELECT DISTINCT ON (contract_id) *
FROM contract_versions
WHERE client_id = 1
ORDER BY
contract_id,
CASE WHEN change_effective_date <= CURRENT_DATE THEN 1 ELSE 2 END,
change_effective_date DESC;
Demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=f709d18b23504dfaf9f586ace231be1d
There are 2 problems in your queries:
In the first query you have:
ERROR: column "contract.change_effective_date" must appear in the
GROUP BY clause or be used in an aggregate function
In the second you cannot directly reference a column alias ("cnt") in the query: you can do this in another query that must reference the original query as derived table or inline view.
Here is something that could be a solution:
select * from contract_versions;
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 20 | 1 | 2020-05-16 09:00:00
(4 rows)
Null display is "NULL".
SELECT DISTINCT ON (contract_id) contract_id,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC;
contract_id | change_effective_date
-------------+-----------------------
10 | 2020-05-14 14:00:00
20 | 2020-05-16 09:00:00
(2 rows)
SELECT DISTINCT ON (contract_id) contract_id,
COUNT(contract_id) AS cnt,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC;
contract_id | cnt | change_effective_date
-------------+-----+-----------------------
10 | 3 | 2020-05-14 14:00:00
20 | 1 | 2020-05-16 09:00:00
(2 rows)
SELECT
contract_id,
CASE WHEN cnt > 1 THEN change_effective_date <= now()
END
FROM
(
SELECT DISTINCT ON (contract_id) contract_id,
COUNT(contract_id) AS cnt,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC
) v;
contract_id | case
-------------+------
10 | t
20 | NULL
(2 rows)
With row_number() window function:
select t.id, t.contract_id, t.client_id, t.change_effective_date
from (
select *,
row_number() over (
partition by contract_id
order by (change_effective_date > now())::int, change_effective_date desc
) rn
from contract_versions
) t
where t.rn = 1
order by t.id
It is not clear in your question, because your sample data contains only 1 client_id, if a contract_id's value belongs to only one client_id.
If this is not the case, then you must change in the above query:
partition by contract_id
to:
partition by contract_id, client_id
See the demo.
Results:
| id | contract_id | client_id | change_effective_date |
| --- | ----------- | --------- | ------------------------ |
| 200 | 10 | 1 | 2020-05-16 00:00:00.000 |
| 400 | 20 | 1 | 2020-05-17 00:00:00.000 |
| 600 | 30 | 1 | 2020-05-14 00:00:00.000 |

SQL query to show most recent qty, but grouped by customer [duplicate]

This question already has answers here:
Oracle SQL query: Retrieve latest values per group based on time [duplicate]
(2 answers)
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Closed 4 years ago.
Item Number | Customer | Creation Date | Onhand Qty
123 1 03-FEB-19 654
234 3 03-FEB-19 987
789 5 03-FEB-19 874
321 4 03-FEB-19 147
567 7 03-FEB-19 632
123 1 29-JAN-19 547
234 3 29-JAN-19 814
789 5 29-JAN-19 458
321 4 29-JAN-19 330
567 7 29-JAN-19 118
I have this data set above, but for thousands of items and hundreds of customers.
What I'd like to do is to just return the latest 'Onhand Qty' field, so max(creation_date) but by item and customer.
Item Number | Customer | Creation Date | Onhand Qty
123 1 03-FEB-19 654
234 3 03-FEB-19 987
789 5 03-FEB-19 874
321 4 03-FEB-19 147
567 7 03-FEB-19 632
Effectively, I'm trying to find the most recent onhand qty amount, by customer and item, so I can say that at the most recent check, 'Customer 1 had 654 units of Item 123'.
Is someone able to help me?
This is in an Oracle database (V11).
Many thanks
Use ROW_NUMBER() as follows :
SELECT * FROM (
SELECT t.*, ROW_NUMBER() OVER(PARTITION BY Customer, Item_Number ORDER BY creation_date DESC) rn
FROM mytable t
) WHERE rn = 1
In the subquery, ROW_NUMBER() assigns a sequence number to each record in groups of records having the same Customer/Item. The sequence is ordered by descending creation date (so the highest date comes first). Then, the outer query filters on the first record in each group.
This DB Fiddle demo with your sample data returns :
ITEM_NUMBER | CUSTOMER | CREATION_DATE | ONHAND_QTY | RN
----------: | -------: | :------------ | ---------: | -:
123 | 1 | 29-JAN-19 | 547 | 1
234 | 3 | 29-JAN-19 | 814 | 1
321 | 4 | 29-JAN-19 | 330 | 1
789 | 5 | 29-JAN-19 | 458 | 1
567 | 7 | 29-JAN-19 | 118 | 1
use row_number()
select * from (select *,row_number() over(partition by Customer order by creation_date desc,qty desc) rn from table
) t where t.rn=1
You can try using row_number() and add partition by Customer,item order by creation_date desc in over clause
select * from
(
select *,row_number() over(partition by Customer,item order by creation_date desc) rn from table
)A where rn=1