how to know the changed name in table by date_key - sql

i have a table with 3 value
Date_key | user_name | user_id
2022-07-12 | milkcotton | 1
2022-09-12 | cereal | 2
2022-06-12 | musicbox1 | 3
2022-12-31 | harrybel1 | 1
2022-12-25 | milkcotton1| 4
2023-01-01 | cereal | 2
i want to know the user who changed the user_name in 1 semester (01 july 2022 - 31 december 2022). Can i do this?
my expected value is:
previous_name| new_name | user_id
milkcotton | harrybel1 | 1
Thank you!
know the changed of the user_name from 1 table

Note: This is done in Postgres SQL. This should be similar in most of the SQL engines. Date functions could slightly different in other SQL engines.
Try this:
with BaseTbl as(
select *,
cast(to_char(Date_key, 'YYYYMM') as int) as year_month,
cast(to_char(Date_key, 'MM') as int) as month,
row_number() over(partition by user_id order by date_key desc) as rnk
from Table1
),
LatestTwoChanges as(
select *
from BaseTbl
where user_id in (select user_id from BaseTbl where rnk=2 )
and rnk <=2
)
select
t2.user_name as previous_name,
t1.user_name as new_name,
t1.user_id
from LatestTwoChanges t1
join LatestTwoChanges t2
on t1.user_id=t2.user_id
where t1.rnk=1
and t2.rnk=2
and t1.year_month-t2.year_month <6
and t1.user_name <> t2.user_name
and (t1.month + t2.month <= 12 or t1.month + t2.month >=14 )
-- this is to check whether the date falling in the same semester.
SQL fiddle demo Here
Here, the table t1 contains the latest changes and table t2 contains the previous changes for a user_id.
The last filter condition
and (t1.month + t2.month <= 12 or t1.month + t2.month >=14 )
is to make sure that the two dates are falling in the same semester or not . which means the two months should be either between 1 and 6 or 7 and 12

Related

Converting PostgreSQL recursive CTE to SQL Server

I'm having trouble adapting some recursive CTE code from PostgreSQL to SQL Server, from the book "Fighting Churn with Data"
This is the working PostgreSQL code:
with recursive
active_period_params as (
select interval '30 days' as allowed_gap,
'2021-09-30'::date as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= (e.start_date-allowed_gap)::date
)
select account_id, min(start_date) as start_date
from active
group by account_id
This is my attempt at converting to SQL Server. It gets stuck in a loop. I believe the issue has to do with the UNION ALL required by SQL Server.
with
active_period_params as (
select 30 as allowed_gap,
cast('2021-09-30' as date) as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION ALL
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
)
select account_id, min(start_date) as start_date
from active
group by account_id
The subscription table is a list of subscriptions belonging to customers. A customer can have multiple subscriptions with overlapping dates or gaps between dates. null end_date means the subscription is currently active and has no defined end_date. Example data for a single customer (account_id = 15) below:
subscription
---------------------------------------------------
| id | account_id | start_date | end_date |
---------------------------------------------------
| 6 | 15 | 01/06/2021 | null |
| 5 | 15 | 01/01/2021 | null |
| 4 | 15 | 01/06/2020 | 01/02/2021 |
| 3 | 15 | 01/04/2020 | 15/05/2020 |
| 2 | 15 | 01/03/2020 | 15/05/2020 |
| 1 | 15 | 01/06/2019 | 01/01/2020 |
Expected query result (as produced by PostgreSQL code):
------------------------------
| account_id | start_date |
------------------------------
| 15 | 01/03/2020 |
Issue:
The SQL Server code above gets stuck in a loop and doesn't produce a result.
Description of the PostgreSQL code:
anchor block finds subs that are active as at the calc_date (30/09/2021) (id 5 & 6), and returns the min start_date (01/01/2021)
the recursion block then looks for any earlier subs that existed within the allowed_gap, which is 30 days prior to the min_start date found in 1). id 4 meets this criteria, so the new min start_date is 01/06/2020
recursion repeats and finds two subs within the allowed_gap (01/06/2020 - 30 days). Of these subs (id 2 & 3), the new min start_date is 01/03/2020
recursion fails to find an earlier sub within the allowed_gap (01/03/2020 - 30 days)
query returns a start date of 01/03/2020 for account_id 15
Any help appreciated!
It seems the issue is related to the way SQL Server deals with recursive CTEs.
This is a type of gaps-and-islands problem, and does not actually require recursion.
There are a number of solutions, here is one. Given your requirement, there may be more efficient methods, but this should get you started.
Using LAG we identify rows which are within the specified gap of the next row
We use a running COUNT to give each consecutive set of rows an ID
We group by that ID, and take the minimum start_date, filtering out non-qualifying groups
Group again to get the minimum per account
DECLARE #allowed_gap int = 30,
#calc_date datetime = cast('2021-09-30' as date);
WITH PrevValues AS (
SELECT *,
IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
ORDER BY start_date), '2099-01-01') < DATEADD(day, -#allowed_gap, start_date)
THEN 1 END
FROM subscription
),
Groups AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
ORDER BY start_date ROWS UNBOUNDED PRECEDING)
FROM PrevValues
),
ByGroup AS (
SELECT
account_id,
GroupId,
start_date = MIN(start_date)
FROM Groups
GROUP BY account_id, GroupId
HAVING COUNT(CASE WHEN start_date <= #calc_date
and (end_date > #calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
account_id,
start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;
db<>fiddle

Group by month and name SQL

I need some help with SQL.
I have
Table1 with columns Id, Date1 and Date2
Table2 with columns Table1Id and Table2Id
Table3 with columns Id and Name
Here is my try:
with tmp_tab as (
select
v."Name" as name
, date_part('month', cv."OfferAcceptedDate") as MonthAcceptedName
, date_part('month', cv."OfferSentDate") as MonthSentName
, 1 as cntAcc
, 1 as cntSent
from hr_metrics."CvInfo" as cv
join hr_metrics."CvInfoVacancy" as civ
on civ."CvInfosId" = cv."Id"
join hr_metrics."Vacancy" as v
on civ."VacanciesId" = v."Id"
where cv."OfferSentDate" is not null
and date_part('year', cv."OfferSentDate") = date_part('year', CURRENT_DATE)
group by v."Name" , date_part('month', cv."OfferAcceptedDate"),
date_part('month', cv."OfferSentDate")
)
select distinct
tmp_tab."name" as name,
tmp_tab.MonthSentName as mSent,
tmp_tab.MonthAcceptedName as mAcc,
Sum(tmp_tab.cntSent) as sented,
Sum(tmp_tab.cntacc) as accepted
from tmp_tab as tmp_tab
group by tmp_tab.name, tmp_tab.MonthSentName, tmp_tab.MonthAcceptedName;
I need to take Count(date2)/Count(date1) grouped by monthes and name.
I have no idea how to do that, as there is no table with monthes.
DB - Postgres
sample data from comment:
t1
1 | 01/01/2021 | 31/03/2021
2 | 05/01/2021 | 18/01/2021
3 | 12/01/2021 | 31/01/2021
4 | 13/03/2021 | 22/03/2021
t2
1 | 1
2 | 1
3 | 2
4 | 1
t3
1 | SomeName1
2 | someName2
Desired result:
Name | month | value
SomeName1 | 1 | 1\2
SomeName1 | 3 | 2
SomeName2 | 1 | 1
Update: if count(date2) == 0, than count(date2) = -1
Source answer
Here code for my question thats work. And yeah, i've asked it on ru too.
select name, month, sum((SRC=1)::int) as AcceptedCount, sum((SRC=2)::int) as SentCount,
case when sum((SRC=1)::int) = 0 then -1
else sum((SRC=2)::int)::float / sum((SRC=1)::int) end as Result
from (
select v.name, SRC,
extract('month' from case SRC when 1 then OfferAcceptedDate else OfferSentDate end) as month
from (select (date_part('year', CURRENT_DATE)::char(4) || '-01-01')::timestamptz as from_date) x
cross join (select 1 as SRC union all select 2) s
join CvInfo as cv on (SRC=1 and cv.OfferAcceptedDate >= from_date and cv.OfferAcceptedDate < from_date + interval '1 year')
or (SRC=2 and cv.OfferSentDate >= from_date and cv.OfferSentDate < from_date + interval '1 year')
join CvInfoVacancy as civ on civ.CvInfosId = cv.Id
join Vacancy as v on civ.VacanciesId = v.Id
where case SRC when 1 then OfferAcceptedDate else OfferSentDate end is not null
) x
group by name, month

SQL Select Statement for Time and attendance for a month

Anyone can help with this one please? Our attendance system generates the following data:
Empid Department Timestamp Read_ID
3221 IT 2017-01-29 11:12:00.000 1
5565 IT 2017-01-29 12:28:06.000 1
5565 IT 2017-01-29 12:28:07.000 1
3221 IT 2017-01-29 13:12:00.000 2
5565 IT 2017-01-29 13:28:06.000 2
3221 IT 2017-01-30 07:42:15.000 1
3221 IT 2017-01-30 16:16:15.000 2
3221 IT 2017-01-31 09:05:00.000 1
3221 IT 2017-01-31 11:05:00.000 2
3221 IT 2017-01-31 13:20:00.000 1
3221 IT 2017-01-31 16:10:00.000 2
Where Read_ID value are :
1 = Entry
2 = Exit
I'm looking for SQL query to run on MS SQL 2014 that summarize attendance time for each employee on monthly basis, for instance
Empid Department Year Month TotalHours
3221 IT 2017 1 15:24
5565 IT 2017 1 01:00
This query should give you the result you need. It works by selecting each entries, and joining it with the next exit of the same employee (entries without further exits are ignored) : this gives us the duration of this employee shift. Then results are aggregated and shift durations are sumed in each group.
SELECT
t1.empid,
t1.department,
YEAR(t1.timestamp) Year,
MONTH(t1.timestamp) Month,
CONVERT(
varchar(12),
DATEADD(minute, SUM(DATEDIFF(minute, t1.timestamp, t2.timestamp)), 0),
114
) TotalHours
FROM
mytable t1
INNER JOIN mytable t2
ON t1.empid = t2.empid
AND t2.read_id = 2
AND t2.timestamp = (
SELECT MIN(timestamp)
FROM mytable
WHERE
read_id = 2
AND empid = t2.empid
AND timestamp > t1.timestamp
)
WHERE
t1.read_id = 1
GROUP BY t1.empid, t1.department, YEAR(t1.timestamp), MONTH(t1.timestamp)
ORDER BY 1, 2, 3, 4
Returns :
empid | department | Year | Month | TotalHours
----: | :--------- | ---: | ----: | :-----------
3221 | IT | 2017 | 1 | 15:24:00:000
5565 | IT | 2017 | 1 | 02:00:00:000
DB Fiddle demo on SQL Server 2014
There is an edge case, however, where an employee enters twice and then exists (this happens in your data, where employee 5565 enters at 29/01/2017 12:28:06 and at 29/01/2017 12:28:07, and then exits at 29/01/2017 13:28:06. The above query will take in account the two overlaping entries and map them to the same exit, resulting in this hour of work being counted twice.
While this matches your expected results, is this what you really want ? Here is an alternative query that , if several consecutive of the same employee entries happen, only takes in account the latest one :
SELECT
t1.empid,
t1.department,
YEAR(t1.timestamp) Year,
MONTH(t1.timestamp) Month,
CONVERT(
varchar(12),
DATEADD(minute, SUM(DATEDIFF(minute, t1.timestamp, t2.timestamp)), 0),
114
) TotalHours
FROM
mytable t1
INNER JOIN mytable t2
ON t1.empid = t2.empid
AND t2.read_id = 2
AND t2.timestamp = (
SELECT MIN(timestamp)
FROM mytable
WHERE
read_id = 2
AND empid = t2.empid
AND timestamp > t1.timestamp
)
WHERE
t1.read_id = 1
AND NOT EXISTS (
SELECT 1
FROM mytable
WHERE
read_id = 1
AND empid = t1.empid
AND timestamp > t1.timestamp
AND timestamp < t2.timestamp
)
GROUP BY t1.empid, t1.department, YEAR(t1.timestamp), MONTH(t1.timestamp)
ORDER BY 1, 2, 3, 4
Returns :
empid | department | Year | Month | TotalHours
----: | :--------- | ---: | ----: | :-----------
3221 | IT | 2017 | 1 | 15:24:00:000
5565 | IT | 2017 | 1 | 01:00:00:000
DB fiddle
Try this. I was not sure what time format would satisfy your system, so I put both:
SELECT * INTO #Tbl3 FROM (VALUES
(3221,'IT','2017-01-29 11:12:00.000',1),
(5565,'IT','2017-01-29 12:28:06.000',1),
(5565,'IT','2017-01-29 12:28:07.000',1),
(3221,'IT','2017-01-29 13:12:00.000',2),
(5565,'IT','2017-01-29 13:28:06.000',2),
(3221,'IT','2017-01-30 07:42:15.000',1),
(3221,'IT','2017-01-30 16:16:15.000',2),
(3221,'IT','2017-01-31 09:05:00.000',1),
(3221,'IT','2017-01-31 11:05:00.000',2),
(3221,'IT','2017-01-31 13:20:00.000',1),
(3221,'IT','2017-01-31 16:10:00.000',2))
x (Empid,Department,Timestamp,Read_ID)
;With cte as (
SELECT t1.Empid, t1.Department
, [Year] = Year(t1.Timestamp)
, [Month] = Month(t1.Timestamp)
, Seconds = SUM(DATEDIFF(second, t1.Timestamp, t2.Timestamp))
FROM #Tbl3 as t1
OUTER APPLY (
SELECT Timestamp = MIN(t.Timestamp)
FROM #Tbl3 as t
WHERE t.Department = t1.Department and t.Empid = t1.Empid
and t.Timestamp > t1.Timestamp and t.Read_ID = 2
) as t2
WHERE t1.Read_ID = 1
GROUP BY t1.Empid, t1.Department, Year(t1.Timestamp), Month(t1.Timestamp))
SELECT *, TotalHours = Seconds / 3600., TotalTime =
RIGHT('0'+CAST(Seconds / 3600 as VARCHAR),2) + ':' +
RIGHT('0'+CAST((Seconds % 3600) / 60 as VARCHAR),2) + ':' +
RIGHT('0'+CAST(Seconds % 60 as VARCHAR),2)
FROM cte;

How to force zero values in Redshift?

If this is my query:
select
(min(timestamp))::date as date,
(count(distinct(user_id)) as user_id_count
(row_number() over (order by signup_day desc)-1) as days_since
from
data.table
where
timestamp >= current_date - 3
group by
timestamp
order by
timestamp asc;
And these are my results
date | user_id_count | days_since
------------+-----------------+-------------
2018-01-22 | 3 | 1
2018-01-23 | 5 | 0
How can I get it the table to show (where the user ID count is 0?):
date | user_id_count | days_since
------------+-----------------+-------------
2018-01-21 | 0 | 0
2018-01-22 | 3 | 1
2018-01-23 | 5 | 0
You need to generate the dates. In Postgres, generate_series() is the way to go:
select g.ts as dte,
count(distinct t.user_id) as user_id_count
row_number() over (order by signup_day desc) - 1) as days_since
from generate_series(current_date::timestamp - interval '3 day', current_date::timestamp, interval '1 day') g(ts) left join
data.table t
on t.timestamp::date = g.ts
group by t.ts
order by t.ts;
You have to create a "calendar" table with dates and left join your aggregated result like that:
with
aggregated_result as (
select ...
)
select
t1.date
,coalesce(t2.user_id_count,0) as user_id_count
,coalesce(t2. days_since,0) as days_since
from calendar t1
left join aggregated_result t2
using (date)
more on creating calendar table: How do I create a dates table in Redshift?

Display result between the period

I am wanting to display results where the date stored in the table is not between the dates specified in the query.
if last_Tran_date != from_date
and if last_Tran_date != to_date
therefore there are no transaction.
so i would like to display the result.
example
last transaction date
1-JAN-16
2-JAN-16
8-FEB-16
10-MAC-16
PERIOD TO QUERY : (FROM 2-JAN-16 TO 8-FEB-16)
IF last transaction date not between the period query,
then display the result.
SELECT L.TDR_CODE||' - '||T.TDR_NAME TDR_CODE,L.CLIENT_NO,L.CLIENT_TYPE
,L.AMLA_RISK,L.ACCT_TYPE,L.CLIENT_NAME,L.DATE_CREATED,L.ANNUAL_INCOME
,L.NET_WORTH,L.ACCT_GROUP,L.PAIDUP_CAPITAL,L.SHAREHOLDER_FUND,L.OCCUPATION
,L.LAST_TRAN_DATE,K.CHQ_BANK,K.CHQ_NO,K.CHQ_AMT,decode(K.category,'3'
, decode(nvl(K.cancel_flag,'N'),'N',1,-2) ,0) chqamt_cash
FROM BOS_M_CLIENT L
, BOS_M_TRADER T,BOS_M_LEDGER_REC K
WHERE ((K.CHQ_NO IS NOT NULL AND K.CHQ_AMT>50000)
OR (K.CATEGORY='3' AND K.CHQ_AMT>10000))
AND L.PROHIBIT_TRADE<>'C'
AND L.CLIENT_NO = K.CLIENT_NO(+)
AND L.amla_risk='High'
AND L.TDR_CODE=T.TDR_CODE
AND L.tdr_code>=:P_FROM_TDR_CODE
AND L.tdr_code<=:P_TO_TDR_CODE
AND K.TRAN_DATE>=:P_FROM_DATE
AND K.TRAN_DATE<=:P_TO_DATE
AND L.LAST_TRAN_DATE NOT BETWEEN :P_FROM_DATE AND :P_TO_DATE
If there are "gaps" in your data then SQL will NOT display the missing data UNLESS you do something extra. e.g.
trans_date
2016-01-01
-- there is a "gap" here, there are "missing dates"
2016-01-12
What you now need is a set of rows for each date from 1-Jan to 12-Jan. There are many way to get those rows, below I have used "connect by leve;" which is an Oracle specific technique and demonstrates how we can find "missing dates":
CREATE TABLE YOURTABLE
(TRANS_DATE date);
INSERT INTO YOURTABLE (TRANS_DATE) VALUES (to_date('2016-01-01','yyyy-mm-dd'));
INSERT INTO YOURTABLE (TRANS_DATE) VALUES (to_date('2016-01-12','yyyy-mm-dd'));
2 rows affected
SELECT
c.cal_date, t.*
FROM (
SELECT to_date('2016-01-01','yyyy-mm-dd') + ROWNUM - 1 as cal_date
FROM (
SELECT ROWNUM FROM (
SELECT 1 FROM DUAL
CONNECT BY LEVEL <= (to_date('2016-01-12','yyyy-mm-dd') - (to_date('2016-01-01','yyyy-mm-dd')-1))
)
)
) c
LEFT JOIN yourtable t ON c.cal_date = t.trans_date
WHERE t.trans_date IS NOT NULL
ORDER BY c.cal_date
;
CAL_DATE | TRANS_DATE
:-------- | :---------
01-JAN-16 | 01-JAN-16
12-JAN-16 | 12-JAN-16
SELECT
c.cal_date, t.*
FROM (
SELECT to_date('2016-01-01','yyyy-mm-dd') + ROWNUM - 1 as cal_date
FROM (
SELECT ROWNUM FROM (
SELECT 1 FROM DUAL
CONNECT BY LEVEL <= (to_date('2016-01-12','yyyy-mm-dd') - (to_date('2016-01-01','yyyy-mm-dd')-1))
)
)
) c
LEFT JOIN yourtable t ON c.cal_date = t.trans_date
WHERE t.trans_date IS NULL
ORDER BY c.cal_date
;
CAL_DATE | TRANS_DATE
:-------- | :---------
02-JAN-16 | null
03-JAN-16 | null
04-JAN-16 | null
05-JAN-16 | null
06-JAN-16 | null
07-JAN-16 | null
08-JAN-16 | null
09-JAN-16 | null
10-JAN-16 | null
11-JAN-16 | null
dbfiddle here