Query to check if column value appears more than once - sql

I have a PostgreSQL query:
SELECT DISTINCT ON ("contract"."contract_id") "contract"."id"
FROM "contract_versions" "contract"
WHERE "contract"."client_id" = 1
GROUP BY "contract"."contract_id", "contract"."id"
ORDER BY "contract"."contract_id", "contract"."change_effective_date" DESC
I want to add something like if contract_id occurs more than once then change_effective_date >= now()
Dataset:
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
100 | 10 | 1 | 2020-05-17 00:00:00
200 | 10 | 1 | 2020-05-16 00:00:00
300 | 10 | 1 | 2020-05-14 00:00:00
400 | 20 | 1 | 2020-05-17 00:00:00
500 | 30 | 1 | 2020-05-13 00:00:00
600 | 30 | 1 | 2020-05-14 00:00:00
Expected result:
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
200 | 10 | 1 | 2020-05-16 00:00:00
400 | 20 | 1 | 2020-05-17 00:00:00
600 | 30 | 1 | 2020-05-14 00:00:00
If the count of contract_id is more than 1, I want a row with change_effective_date less than or equals today
I tried using:
SELECT DISTINCT ON ("contract"."contract_id") "contract"."id",
COUNT("contract"."contract_id") AS cnt
FROM "contract_versions" "contract"
WHERE "contract"."client_id" = 1 AND
CASE WHEN "cnt" > 1 THEN "contract"."change_effective_date" <= now() END
GROUP BY "contract"."contract_id", "contract"."id"
ORDER BY "contract"."contract_id", "contract"."change_effective_date" DESC
but its throwing an error column "cnt" does not exist
Thanks

I have never seen DISTINCT ON combined with GROUP BY. It is possible (aggregate first, then pick rows from that aggregation result), but this is not what you are doing and it is not what you want either. You want the ranking applied by the ORDER BY clause for the DISTINCT ON to take the current date into account.
SELECT DISTINCT ON (contract_id) *
FROM contract_versions
WHERE client_id = 1
ORDER BY
contract_id,
CASE WHEN change_effective_date <= CURRENT_DATE THEN 1 ELSE 2 END,
change_effective_date DESC;
Demo: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=f709d18b23504dfaf9f586ace231be1d

There are 2 problems in your queries:
In the first query you have:
ERROR: column "contract.change_effective_date" must appear in the
GROUP BY clause or be used in an aggregate function
In the second you cannot directly reference a column alias ("cnt") in the query: you can do this in another query that must reference the original query as derived table or inline view.
Here is something that could be a solution:
select * from contract_versions;
id | contract_id | client_id | change_effective_date
-----+-------------+-----------+-----------------------
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 10 | 1 | 2020-05-14 14:00:00
100 | 20 | 1 | 2020-05-16 09:00:00
(4 rows)
Null display is "NULL".
SELECT DISTINCT ON (contract_id) contract_id,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC;
contract_id | change_effective_date
-------------+-----------------------
10 | 2020-05-14 14:00:00
20 | 2020-05-16 09:00:00
(2 rows)
SELECT DISTINCT ON (contract_id) contract_id,
COUNT(contract_id) AS cnt,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC;
contract_id | cnt | change_effective_date
-------------+-----+-----------------------
10 | 3 | 2020-05-14 14:00:00
20 | 1 | 2020-05-16 09:00:00
(2 rows)
SELECT
contract_id,
CASE WHEN cnt > 1 THEN change_effective_date <= now()
END
FROM
(
SELECT DISTINCT ON (contract_id) contract_id,
COUNT(contract_id) AS cnt,
change_effective_date
FROM contract_versions contract
WHERE client_id = 1
GROUP BY contract_id, id, change_effective_date
ORDER BY contract_id, change_effective_date DESC
) v;
contract_id | case
-------------+------
10 | t
20 | NULL
(2 rows)

With row_number() window function:
select t.id, t.contract_id, t.client_id, t.change_effective_date
from (
select *,
row_number() over (
partition by contract_id
order by (change_effective_date > now())::int, change_effective_date desc
) rn
from contract_versions
) t
where t.rn = 1
order by t.id
It is not clear in your question, because your sample data contains only 1 client_id, if a contract_id's value belongs to only one client_id.
If this is not the case, then you must change in the above query:
partition by contract_id
to:
partition by contract_id, client_id
See the demo.
Results:
| id | contract_id | client_id | change_effective_date |
| --- | ----------- | --------- | ------------------------ |
| 200 | 10 | 1 | 2020-05-16 00:00:00.000 |
| 400 | 20 | 1 | 2020-05-17 00:00:00.000 |
| 600 | 30 | 1 | 2020-05-14 00:00:00.000 |

Related

Subtracting previous row value from current row

I'm doing an aggregation like this:
select
date,
product,
count(*) as cnt
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
order by
product asc, date asc
This produces data which looks like this:
| date | product | cnt | difference |
|------------|---------|------|------------|
| 2020-03-31 | p1 | 100 | null |
| 2020-07-31 | p1 | 1000 | 900 |
| 2020-09-30 | p1 | 900 | -100 |
| 2020-12-31 | p1 | 1100 | 200 |
| 2020-03-31 | p2 | 200 | null |
| 2020-07-31 | p2 | 210 | 10 |
| ... | ... | ... | x |
But without the difference column. How could I make such a calculation? I could pivot the date column and subtract that way but maybe there's a better way
Was able to use lag with partition by and order by to get this to work:
select
date,
product,
count,
count - lag(count) over (partition by product order by date, product) as difference
from(
select
date,
product,
count(*) as count
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
) t

postgresql query to generate report with multiple columns

I'm having a customer transaction table in postgresql db with the below columns
transactionId (primary)| customerId(int8)| transactionDate (timestamp)
1 2 2020-02-14
2 3 2020-01-08
3 1 2020-02-06
4 2 2020-02-13
5 2 2020-03-24
Need to build a query to create the report of the below
CustomerId| FirstTransaction| TotalTransactions| Transactions/Week| RecentTransactions
1 2020-02-06 1 1 2020-02-06
3 2020-01-08 1 1 2020-01-08
2 2020-02-13 3 2 2020-03-24
When the customer first started at first, total transactions, Frequency per week, Recency of last?
and the report should consider(contain) last 3 months records only.
Try the following, here is the demo.
with cte as
(
select
*,
count(*) over (partition by customerId) as totalTransactions,
1 + floor((extract(day from transactionDate) - 1) / 7) as transactionsWeek
from myTable
where transactionDate >= '2020-01-01'
and transactionDate <= '2020-03-31'
)
select
customerId,
min(transactionDate) as firstTransaction,
max(totalTransactions) as totalTransactions,
max(transactionDate) as recentTransactions,
(ceil(avg(totalTransactions)/count(distinct transactionsWeek))::int) as "Transactions/Week"
from cte
group by
customerId
order by
customerId
Output:
| customerid | firsttransaction | totaltransactions | recenttransactions | Transactions/Week |
| ---------- | ------------------------ | ----------------- | ------------------------ | ----------------- |
| 1 | 2020-02-06 | 1 | 2020-02-06 | 1 |
| 2 | 2020-02-13 | 3 | 2020-03-24 | 2 |
| 3 | 2020-01-08 | 1 | 2020-01-08 | 1 |
for the last three months you can also use following in where condition
transactionDate > CURRENT_DATE - INTERVAL '3 months'

Get users who took ride for 3 or more consecutive dates

I have below table, it shows user_id and ride_date.
+---------+------------+
| user_id | ride_date |
+---------+------------+
| 1 | 2019-11-01 |
| 1 | 2019-11-03 |
| 1 | 2019-11-05 |
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-05 |
| 4 | 2019-11-07 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 5 | 2019-11-11 |
| 5 | 2019-11-13 |
+---------+------------+
I want user_id who took rides for 3 or more consecutive days along with days on which they took consecutive rides
The desired result is as below
+---------+-----------------------+
| user_id | consecutive_ride_date |
+---------+-----------------------+
| 2 | 2019-11-03 |
| 2 | 2019-11-04 |
| 2 | 2019-11-05 |
| 2 | 2019-11-06 |
| 3 | 2019-11-03 |
| 3 | 2019-11-04 |
| 3 | 2019-11-05 |
| 3 | 2019-11-06 |
| 4 | 2019-11-08 |
| 4 | 2019-11-09 |
| 4 | 2019-11-10 |
+---------+-----------------------+
SQL Fiddle
With LAG() and LEAD() window functions:
with cte as (
select *,
datediff(
day,
lag([ride_date]) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev1,
datediff(
day,
lag([ride_date], 2) over (partition by [user_id] order by [ride_date]),
[ride_date]
) prev2,
datediff(
day,
[ride_date],
lead([ride_date]) over (partition by [user_id] order by [ride_date])
) next1,
datediff(
day,
[ride_date],
lead([ride_date], 2) over (partition by [user_id] order by [ride_date])
) next2
from Table1
)
select [user_id], [ride_date]
from cte
where
(prev1 = 1 and prev2 = 2) or
(prev1 = 1 and next1 = 1) or
(next1 = 1 and next2 = 2)
See the demo.
Results:
> user_id | ride_date
> ------: | :---------
> 2 | 03/11/2019
> 2 | 04/11/2019
> 2 | 05/11/2019
> 2 | 06/11/2019
> 3 | 03/11/2019
> 3 | 04/11/2019
> 3 | 05/11/2019
> 3 | 06/11/2019
> 4 | 07/11/2019
> 4 | 08/11/2019
> 4 | 09/11/2019
Here is one way to adress this gaps-and-island problem:
first, assign a rank to each user ride with row_number(), and recover the previous ride_date (aliased lag_ride_date)
then, compare the date of the previous ride to the current one in a conditional sum, that increases when the dates are successive ; by comparing this with the rank of the user ride, you get groups (aliased grp) that represent consecutive rides with a 1 day spacing
do a window count how many records belong to each group (aliased cnt)
filter on records whose window count is greater than 3
Query:
select user_id, ride_date
from (
select
t.*,
count(*) over(partition by user_id, grp) cnt
from (
select
t.*,
rn1
- sum(case when ride_date = dateadd(day, 1, lag_ride_date) then 1 else 0 end)
over(partition by user_id order by ride_date) grp
from (
select
t.*,
row_number() over(partition by user_id order by ride_date) rn1,
lag(ride_date) over(partition by user_id order by ride_date) lag_ride_date
from Table1 t
) t
) t
) t
where cnt >= 3
Demo on DB Fiddle
This is a typical gaps and island problems.
We can solve it as follows
with data
as (
select user_id
,ride_date
,dateadd(day
,-row_number() over(partition by user_id order by ride_date asc)
,ride_date) as grp_field
from Table1
)
,consecutive_days
as(
select user_id
,ride_date
,count(*) over(partition by user_id,grp_field) as cnt
from data
)
select *
from consecutive_days
where cnt>=3
order by user_id,ride_date
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=7bb851d9a12966b54afb4d8b144f3d46
There is no need to apply gaps-and-islands methodologies to this problem. The problem is much simpler to solve.
You can return the users and first date just by using LEAD():
SELECT t1.*
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1
WHERE ride_date_2 = DATEADD(day, 2, ride_date);
If you want the actual dates, you can unpivot the results:
SELECT DISTINCT t1.user_id, v.ride_date
FROM (SELECT t1.*,
LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
FROM table1 t1
) t1 CROSS APPLY
(VALUES (t1.ride_date),
(DATEADD(day, 1, t1.ride_date)),
(DATEADD(day, 2, t1.ride_date))
) v(ride_date)
WHERE t1.ride_date_2 = DATEADD(day, 2, t1.ride_date)
ORDER BY t1.user_id, v.ride_date;

Redshift count with variable

Imagine I have a table on Redshift with this similar structure. Product_Bill_ID is the Primary Key of this table.
| Store_ID | Product_Bill_ID | Payment_Date
| 1 | 1 | 01/10/2016 11:49:33
| 1 | 2 | 01/10/2016 12:38:56
| 1 | 3 | 01/10/2016 12:55:02
| 2 | 4 | 01/10/2016 16:25:05
| 2 | 5 | 02/10/2016 08:02:28
| 3 | 6 | 03/10/2016 02:32:09
If I want to query the number of Product_Bill_ID that a store sold in the first hour after it sold its first Product_Bill_ID, how could I do this?
This example should outcome
| Store_ID | First_Payment_Date | Sold_First_Hour
| 1 | 01/10/2016 11:49:33 | 2
| 2 | 01/10/2016 16:25:05 | 1
| 3 | 03/10/2016 02:32:09 | 1
You need to get the first hour. That is easy enough using window functions:
select s.*,
min(payment_date) over (partition by store_id) as first_payment_date
from sales s
Then, you need to do the date filtering and aggregation:
select store_id, count(*)
from (select s.*,
min(payment_date) over (partition by store_id) as first_payment_date
from sales s
) s
where payment_date <= first_payment_date + interval '1 hour'
group by store_id;
SELECT
store_id,
first_payment_date,
SUM(
CASE WHEN payment_date < DATEADD(hour, 1, first_payment_date) THEN 1 END
) AS sold_first_hour
FROM
(
SELECT
*,
MIN(payment_date) OVER (PARTITION BY store_id) AS first_payment_date
FROM
yourtable
)
parsed_table
GROUP BY
store_id,
first_payment_date

Sql Query with time interval

I want to get best 3 day of users between "2014-07-01" and "2014-08-01"
Could someone help me? I've been stuck here for 3 days.
In real score table entries are 10:00 to 22:00 and 1 entries for each hour.
Total of 12 entry for each day and each player (sometimes it could be less 1 or 2).
This is the output I'm trying to get:
ID | User_ID | Username | Sum(Score) | Date
--------------------------------------------------
1 | 1 | Xxx | 52 | 2014-07-01
2 | 1 | Xxx | 143 | 2014-07-02
3 | 2 | Yyy | 63 | 2014-07-01
...
Score table:
ID | User_ID | Score | Datetime
-----------------------------------------
1 | 1 | 35 | 2014-07-01 11:00:00
2 | 1 | 17 | 2014-07-01 12:00:00
3 | 2 | 36 | 2014-07-01 11:00:00
4 | 2 | 27 | 2014-07-01 12:00:00
5 | 1 | 66 | 2014-07-02 11:00:00
6 | 1 | 77 | 2014-07-02 12:00:00
7 | 2 | 93 | 2014-07-02 12:00:00
...
User table :
ID | Username
--------------
1 | Xxx
2 | Yyy
3 | Zzz
...
I think you need to aggregate first by date, and then choose the first three using row_number(). To do the aggregation:
select s.user_id, sum(s.datetime, 'day') as theday, sum(score) as score,
row_number() over (partition by s.user_id order by sum(score) desc) as seqnum
from scores s
group by s.user_id;
To get the rest of the information, use this as a subquery or CTE:
select u.*, s.score
from (select s.user_id, sum(s.datetime, 'day') as theday, sum(s.score) as score,
row_number() over (partition by s.user_id order by sum(s.score) desc) as seqnum
from scores s
group by s.user_id
) s join
users u
on s.user_id = u.users_id
where seqnum <= 3
order by u.user_id, s.score desc;
SELECT 'group has no id' as ID,
u.ID as User_ID,
u.Username,
sum(s.Score) "Sum(Score)",
s.Datetime::date as Date
FROM User u,
Score s
WHERE u.id = s.User_ID
AND s.Datetime BETWEEN '2014-07-01' AND '2014-08-01 23:59:59'
GROUP BY u.ID, u.Username, s.Datetime::date
ORDER BY sum(s.Score) DESC
LIMIT 3;