postgresql query to generate report with multiple columns - sql

I'm having a customer transaction table in postgresql db with the below columns
transactionId (primary)| customerId(int8)| transactionDate (timestamp)
1 2 2020-02-14
2 3 2020-01-08
3 1 2020-02-06
4 2 2020-02-13
5 2 2020-03-24
Need to build a query to create the report of the below
CustomerId| FirstTransaction| TotalTransactions| Transactions/Week| RecentTransactions
1 2020-02-06 1 1 2020-02-06
3 2020-01-08 1 1 2020-01-08
2 2020-02-13 3 2 2020-03-24
When the customer first started at first, total transactions, Frequency per week, Recency of last?
and the report should consider(contain) last 3 months records only.

Try the following, here is the demo.
with cte as
(
select
*,
count(*) over (partition by customerId) as totalTransactions,
1 + floor((extract(day from transactionDate) - 1) / 7) as transactionsWeek
from myTable
where transactionDate >= '2020-01-01'
and transactionDate <= '2020-03-31'
)
select
customerId,
min(transactionDate) as firstTransaction,
max(totalTransactions) as totalTransactions,
max(transactionDate) as recentTransactions,
(ceil(avg(totalTransactions)/count(distinct transactionsWeek))::int) as "Transactions/Week"
from cte
group by
customerId
order by
customerId
Output:
| customerid | firsttransaction | totaltransactions | recenttransactions | Transactions/Week |
| ---------- | ------------------------ | ----------------- | ------------------------ | ----------------- |
| 1 | 2020-02-06 | 1 | 2020-02-06 | 1 |
| 2 | 2020-02-13 | 3 | 2020-03-24 | 2 |
| 3 | 2020-01-08 | 1 | 2020-01-08 | 1 |
for the last three months you can also use following in where condition
transactionDate > CURRENT_DATE - INTERVAL '3 months'

Related

SQL Server Get all Birthday Years

I have a table in SQL Server that is Composed of
ID, B_Day
1, 1977-02-20
2, 2001-03-10
...
I want to add rows to this table for each year of a birthday, up to the current birthday year.
i.e:
ID, B_Day
1,1977-02-20
1,1978-02-20
1,1979-02-20
...
1,2020-02-20
2, 2001-03-10
2, 2002-03-10
...
2, 2019-03-10
I'm struggling to determine what the best strategy for accomplishing this. I thought about recursively self-joining, but that creates far too many layers. Any suggestions?
The following should work
with row_gen
as (select top 200 row_number() over(order by name)-1 as rnk
from master..spt_values
)
select a.id,a.b_day,dateadd(year,rnk,b_day) incr_b_day
from dbo.t a
join row_gen b
on dateadd(year,b.rnk,a.b_day)<=getdate()
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=0d06c95e1914ca45ca192d0d192bd2e0
You can use recursive approach :
with cte as (
select t.id, t.b_day, convert(date, getdate()) as mx_dt
from table t
union all
select c.id, dateadd(year, 1, c.b_day), c.mx_dt
from cte c
where dateadd(year, 1, c.b_day) < c.mx_dt
)
select c.id, c.b_day
from cte c
order by c.id, c.b_day;
Default recursion is 100, you can add query hint for more recursion option (maxrecursion 0).
If your dataset is not too big, one option is to use a recursive query:
with cte as (
select id, b_day bday0, b_day, 1 lvl from mytable
union all
select
id,
bday0,
dateadd(year, lvl, bday0), lvl + 1
from cte
where dateadd(year, lvl, bday0) <= getdate()
)
select id, b_day from cte order by id, b_day
Demo on DB Fiddle:
id | b_day
-: | :---------
1 | 1977-02-20
1 | 1978-02-20
1 | 1979-02-20
1 | 1980-02-20
1 | 1981-02-20
1 | 1982-02-20
1 | 1983-02-20
1 | 1984-02-20
1 | 1985-02-20
1 | 1986-02-20
1 | 1987-02-20
1 | 1988-02-20
1 | 1989-02-20
1 | 1990-02-20
1 | 1991-02-20
1 | 1992-02-20
1 | 1993-02-20
1 | 1994-02-20
1 | 1995-02-20
1 | 1996-02-20
1 | 1997-02-20
1 | 1998-02-20
1 | 1999-02-20
1 | 2000-02-20
1 | 2001-02-20
1 | 2002-02-20
1 | 2003-02-20
1 | 2004-02-20
1 | 2005-02-20
1 | 2006-02-20
1 | 2007-02-20
1 | 2008-02-20
1 | 2009-02-20
1 | 2010-02-20
1 | 2011-02-20
1 | 2012-02-20
1 | 2013-02-20
1 | 2014-02-20
1 | 2015-02-20
1 | 2016-02-20
1 | 2017-02-20
1 | 2018-02-20
1 | 2019-02-20
1 | 2020-02-20
2 | 2001-03-01
2 | 2002-03-01
2 | 2003-03-01
2 | 2004-03-01
2 | 2005-03-01
2 | 2006-03-01
2 | 2007-03-01
2 | 2008-03-01
2 | 2009-03-01
2 | 2010-03-01
2 | 2011-03-01
2 | 2012-03-01
2 | 2013-03-01
2 | 2014-03-01
2 | 2015-03-01
2 | 2016-03-01
2 | 2017-03-01
2 | 2018-03-01
2 | 2019-03-01
2 | 2020-03-01

How to get total number of users in each status at End of Day based on event log table?

I got an event log table which captures the change of status of all users, say status A, status B and Status C. They can change it whenever they want. How can I get the snapshot of how many users are in each status at every End of Day (from the earliest day in the event log table till the latest day)
Appreciate if anyone can show me how to do it by PostsgreSQL in an elegant way. Thanks!
Edit: the event log table captures a bunch of events (one of them is status change) of every user, log_id records the order of the event log of that particular user.
user_id | log_time | status | event_A | log_id |
----------------------------------------------------------
456 | 2019-01-05 15:00 | C | | 5 |
123 | 2019-01-05 14:00 | C | | 4 |
123 | 2019-01-05 13:00 | | xxx | 3 |
456 | 2019-01-04 22:00 | B | | 4 |
456 | 2019-01-04 10:00 | C | xxx | 3 |
987 | 2019-01-04 05:00 | C | | 3 |
123 | 2019-01-03 23:00 | B | | 2 |
987 | 2019-01-03 15:00 | | xxx | 2 |
456 | 2019-01-02 22:00 | A | xxx | 2 |
123 | 2019-01-01 23:00 | C | | 1 |
456 | 2019-01-01 09:00 | B | xxx | 1 |
987 | 2019-01-01 04:00 | A | | 1 |
So I want to get the total number of user in each status at End of Day:
Date | status A | status B | status C |
---------------------------------------------
2019-01-05 | 0 | 0 | 3 |
2019-01-04 | 0 | 2 | 1 |
2019-01-03 | 2 | 1 | 0 |
2019-01-02 | 2 | 0 | 1 |
2019-01-01 | 1 | 1 | 1 |
This was quiet challenging to do :). I tried to fragment the sub-queries for good readability. It is probably not an very efficient way to do what you want, but it does the job.
-- collect all days to make sure there are no missing days
WITH all_days_cte(dt) as (
SELECT
generate_series(
(SELECT min(date_trunc('day', log_time)) from your_table),
(SELECT max(date_trunc('day', log_time)) from your_table),
'1 day'
)::DATE
),
-- collect all useres
all_users_cte as (
select distinct
user_id
from your_table
),
-- setup the table with infos needed, i.e. only the last status by day and user_id
infos_to_aggregate_cte as (
select
s.user_id,
s.dt,
s.status
from (
select
user_id,
date_trunc('day', log_time)::DATE as dt,
status,
row_number() over (partition by user_id, date_trunc('day', log_time) order by log_time desc) rn
from your_table
where status is not null
) s
-- only the last status of the day
where s.rn = 1
),
-- now we still have a problem, we need to find the last status, if there was no change on a day
completed_infos_cte as (
select
u.user_id,
d.dt,
-- not very efficient, but found no other way (first_value(...) would be nice, but there is no simple way to exclude nulls
(select
status
from infos_to_aggregate_cte i2
where i2.user_id = u.user_id
and i2.dt <= d.dt
and i2.status is not null
order by i2.dt desc
limit 1) status
from all_days_cte d
-- cross product for all dates and users (that is what we need for our aggregation)
cross join all_users_cte u
left outer join infos_to_aggregate_cte i on u.user_id = i.user_id
and d.dt = i.dt
)
select
c.dt,
sum(case when status = 'A' then 1 else 0 end) status_a,
sum(case when status = 'B' then 1 else 0 end) status_b,
sum(case when status = 'C' then 1 else 0 end) status_c
from completed_infos_cte c
group by c.dt
order by c.dt desc

Looking for duplicate transactions within a 5 minutes over a 24 hour time period

I am looking for duplicate transactions between a 5 minute window during a 24 hour period. I am trying to find users abusing other users access. Here is what I have so far, but it is only searching the past 5 minutes and not searching the 24 hour period. It is ORACLE.
SELECT p.id, Count(*) count
FROM tranledg tl,
patron p
WHERE p.id = tl.patronid
AND tl.trandate > (sysdate-5/1440)
AND tl.plandesignation in ('1')
AND p.id in (select id from tranledg tl where tl.trandate > (sysdate-1))
GROUP BY p.id
HAVING COUNT(*)> 1
Example data:
Patron
id | Name
--------------------------
1 | Joe
2 | Henry
3 | Tom
4 | Mary
5 | Sue
6 | Marie
Tranledg
tranid | trandate | location | patronid
--------------------------
1 | 2015-03-01 12:01:00 | 1500 | 1
2 | 2015-03-01 12:01:15 | 1500 | 2
3 | 2015-03-01 12:03:30 | 1500 | 1
4 | 2015-03-01 12:04:00 | 1500 | 3
5 | 2015-03-01 15:01:00 | 1500 | 4
6 | 2015-03-01 15:01:15 | 1500 | 4
7 | 2015-03-01 17:01:15 | 1500 | 2
8 | 2015-03-01 18:01:30 | 1500 | 1
9 | 2015-03-01 19:02:00 | 1500 | 3
10 | 2015-03-01 20:01:00 | 1500 | 4
11 | 2015-03-01 21:01:00 | 1500 | 5
I would expect the following data to return:
ID | COUNT
1 | 2
4 | 2
You can use an analytic clause with a range window like this:
select *
from (select tranid
, patronid
, count(*) over(partition by patronid
order by trandate
range between 0 preceding
and 5/60/24 following) count
from tranledg
where trandate >= sysdate-1)
where count > 1
It will output all transactions that are followed with more ones for the same patronid in the range of 5 minutes along with the count of the transactions in the range (you did not specify what to do if there are more than one such a range or when the ranges are overlapping).
Output on the test data (without the condition for sysdate as it already passed):
TRANID PATRONID COUNT
------ -------- -----
1 1 2
5 4 2
I did it using Postgres online, Oracle version very similar, only be carefull with date operation.
SQL DEMO
You need a self join.
SELECT T1.patronid, count(*)
FROM Tranledg T1
JOIN Tranledg T2
ON T2."trandate" BETWEEN T1."trandate" + '-2 minute' AND T1."trandate" + '2 minute'
AND T1."patronid" = T2."patronid"
AND T1."tranid" <> T2."tranid"
GROUP BY T1.patronid;
OUTPUT
You need to fix the data, so 1 has two records.

Weekly Average Reports: Redshift

My Sales data for first two weeks of june, Monday Date i.e 1st Jun , 8th Jun are below
date | count
2015-06-01 03:25:53 | 1
2015-06-01 03:28:51 | 1
2015-06-01 03:49:16 | 1
2015-06-01 04:54:14 | 1
2015-06-01 08:46:15 | 1
2015-06-01 13:14:09 | 1
2015-06-01 16:20:13 | 5
2015-06-01 16:22:13 | 1
2015-06-01 16:27:07 | 1
2015-06-01 16:29:57 | 1
2015-06-01 19:16:45 | 1
2015-06-08 10:54:46 | 1
2015-06-08 15:12:10 | 1
2015-06-08 20:35:40 | 1
I need a find weekly avg of sales happened in a given range .
Complex Query:
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
count(sales_count)
from final_result_set
group by h, dow.
Output :
h | day_of_week | count
3 | 1 | 3
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 8
19 | 1 | 1
20 | 1 | 1
If I try to apply avg on the above final result, It is not actually fetching correct answer!
(some_manipulation_part), ifact as
( select date, sales_count from final_result_set
) select date_part('h',date )) as h ,
date_part('dow',date )) as day_of_week ,
avg(sales_count)
from final_result_set
group by h, dow.
h | day_of_week | count
3 | 1 | 1
4 | 1 | 1
8 | 1 | 1
10 | 1 | 1
13 | 1 | 1
15 | 1 | 1
16 | 1 | 1
19 | 1 | 1
20 | 1 | 1
So I 've two mondays in the given range, it is not actually dividing by it. I am not even sure what is happening inside redshift.
To get "weekly averages" use date_trunc():
SELECT date_trunc('week', my_date_column) as week
, avg(sales_count) AS avg_sales
FROM final_result_set
GROUP BY 1;
I hope you are not actually using date as name for your date column. It's a reserved word in SQL and a basic type name, don't use it as identifier.
If you group by the day of week (DOW) you get averages per weekday. and sunday is 0. (Use ISODOW to get 7 for Sunday.)

How to determine an Increase in Employee Salary from consecutive Contract Rows?

I got a problem in my query :
My table store data like this
ContractID | Staff_ID | EffectDate | End Date | Salary | active
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 0
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 1
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 1
I would like to write a query like below:
ContractID | Staff_ID | EffectDate | End Date | Salary | Increase
-------------------------------------------------------------------------
1 | 1 | 2013-01-01 | 2013-12-30 | 100 | 0
2 | 1 | 2014-01-01 | 2014-12-30 | 150 | 50
3 | 1 | 2015-01-01 | 2015-12-30 | 200 | 50
4 | 2 | 2014-05-01 | 2015-04-30 | 500 | 0
5 | 2 | 2015-05-01 | 2016-04-30 | 700 | 200
-------------------------------------------------------------------------
Increase column is calculated by current contract minus previous contract
I use sql server 2008 R2
Unfortunately 2008R2 doesn't have access to LAG, but you can simulate the effect of obtaining the previous row (prev) in the scope of a current row (cur), with a RANKing and a self join to the previous ranked row, in the same partition by Staff_ID):
With CTE AS
(
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],[active],
ROW_NUMBER() OVER (Partition BY Staff_ID ORDER BY ContractID) AS Rnk
FROM Table1
)
SELECT cur.[ContractID], cur.[Staff_ID], cur.[EffectDate], cur.[End Date],
cur.[Salary], cur.Rnk,
CASE WHEN (cur.Rnk = 1) THEN 0 -- i.e. baseline salary
ELSE cur.Salary - prev.Salary END AS Increase
FROM CTE cur
LEFT OUTER JOIN CTE prev
ON cur.[Staff_ID] = prev.Staff_ID and cur.Rnk - 1 = prev.Rnk;
(If ContractId is always perfectly incrementing, we wouldn't need the ROW_NUMBER and could join on incrementing ContractIds, I didn't want to make this assumption).
SqlFiddle here
Edit
If you have Sql 2012 and later, the LEAD and LAG Analytic Functions make this kind of query much simpler:
SELECT [ContractID], [Staff_ID], [EffectDate], [End Date], [Salary],
Salary - LAG(Salary, 1, Salary) OVER (Partition BY Staff_ID ORDER BY ContractID) AS Incr
FROM Table1
Updated SqlFiddle
One trick here is that we are calculating delta increments in salary, so for the first employee contract we need to return the current salary so that Salary - Salary = 0 for the first increase.