Writing a Single Query w/ Multiple CTE Subqueries SQL/R - sql

I have some data I would like to pull from a database, I'm using RStudio for my query. What I intend to do is write:
The first CTE statement to pull all my necessary information.
The second CTE statement will add two new columns for two row numbers, which are partitioned by different groups. Two additional columns will be added for Lead and Lag values.
The third CTE will produce two more columns where the two columns use nested case_when statements to give me NewOpen and NewClosed dates.
What I have so far:
q5<- sqlQuery(ch,paste("
;with CTE AS
(
select
oz.id as AccountID
,ac.PROD_TYPE_CDE as ProductTypeCode
,CASE WHEN ac.OPEN_DTE='0001-01-01' then null else ac.OPEN_DTE END as OpenDate
,CASE WHEN ac.CLOS_DTE = '0001-01-01' then null else ac.CLOS_DTE END as ClosedDate
,df.proc_dte as FullDate
FROM
dbs.tb_dbs_acct_fact df
inner join
dbs.tb_acct_details ac on df.dw_serv_id = ac.dw_serv_id
left outer join
dbs.tb_oz_id oz on df.proc_dte = oz.proc_dte
),
cte1 as
(
select *
,row_nbr = row_number() over( partition by AccountID order by AccountID, FullDate asc )
,row_nbr2 = row_number() over( partition by AccountID,ProductTypeCode order by AccountID, FullDate asc )
,lag(ProductTypeCode) over(partition by AccountID order by FullDate asc ) as Lagging
,LEAD(ProductTypeCode) over(partition by AccountID order order by FullDate asc ) as Leading
FROM CTE
),
cte2 as (select *
,case when cte1.row_nbr = 1 & cte1.Lagging=cte1.ProductTypeCode then cte1.OpenDate else
case when cte1.Lagging<>cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewOpen
,case when cte1.ClosedDate IS NOT NULL then cte1.ClosedDate else
case when cte1.Leading <> cte1.ProductTypeCode then cte1.FullDate else NULL END END as NewClosed
FROM cte1
);"))
This code, however won't run.

As mentioned, WITH is a statement to define CTEs to be used in a final query. Your query only contains CTE definitions but never actually use any in a final statement. Additionally, you can combine the first two CTEs since window functions can run at any level. Possibly the last CTE can serve as your final SELECT statement.
sql <- "WITH CTE AS
(SELECT
oz.id AS AccountID
, ac.PROD_TYPE_CDE as ProductTypeCode
, CASE
WHEN ac.OPEN_DTE='0001-01-01'
THEN NULL
ELSE ac.OPEN_DTE
END AS OpenDate
, CASE
WHEN ac.CLOS_DTE = '0001-01-01'
THEN NULL
ELSE ac.CLOS_DTE
END AS ClosedDate
, df.proc_dte AS FullDate
, ROW_NUMBER() OVER (PARTITION BY oz.id
ORDER BY oz.id, df.proc_dte) AS row_nbr
, ROW_NUMBER() OVER (PARTITION BY oz.id, ac.PROD_TYPE_CDE
ORDER BY oz.id, df.proc_dte) AS row_nbr2
, LAG(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id
ORDER BY df.proc_dte) AS Lagging
, LEAD(ac.PROD_TYPE_CDE) OVER (PARTITION BY oz.id
ORDER BY df.proc_dte) AS Leading
FROM
dbs.tb_dbs_acct_fact df
INNER JOIN
dbs.tb_acct_details ac ON df.dw_serv_id = ac.dw_serv_id
LEFT OUTER JOIN
dbs.tb_oz_id oz ON df.proc_dte = oz.proc_dte
)
SELECT *
, CASE
WHEN row_nbr = 1 & Lagging = ProductTypeCode
THEN OpenDate
ELSE
CASE
WHEN Lagging <> ProductTypeCode
THEN FullDate
ELSE NULL
END
END AS NewOpen
, CASE
WHEN ClosedDate IS NOT NULL
THEN ClosedDate
ELSE
CASE
WHEN Leading <> ProductTypeCode
THEN FullDate
ELSE NULL
END
END AS NewClosed
FROM CTE;"
q5 <- sqlQuery(ch, sql)

Related

SQL - Adding conditions to SELECT

I have a table which has a timestamp and inCycle status of a machine. I'm using two CTE's and doing an INNER JOIN on row number so I can easily compare the timestamp of one row to the next. I have the DATEDIFF working and now I need to look at the inCycle status. Basically, if the inCycleThis and inCycleNext both = 1, I need to add it to an InCycle total.
Similarly (Shown table will make this clear):
incycleThis/next = 0,1 = not in cycle
incycleThis/next = 0,0 = not in cycle
incycleThis/next = 1,1 = in cycle
If I was doing this client side, this would be pretty simple. I need to do this in a stored procedure though due to there being a lot of records. I'd love to use an 'IF' in the SELECT section, but it seems that's not how it works.
The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
This is what I have so far:
WITH History_CTE (DT, MID, FRO, IC, RowNum)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
),
History2_CTE (DT2, MID2, FRO2, IC2, RowNum2)
AS
(
SELECT DateAndTime
,MachineID
,FeedRateOverride
,InCycle
,ROW_NUMBER()OVER(ORDER BY MachineID, DateAndTime) AS "row number"
FROM History
WHERE DateAndTime >= '2020-11-15'
AND DateAndTime < '2020-11-16'
)
SELECT DT as 'TimeStamp'
,DT2 as 'TimeStamp Next Row'
,MID
,FRO
,IC as 'InCycle this'
,IC2 as 'InCycle next'
,RowNum
,DATEDIFF(s, History2_CTE.DT2, History_CTE.DT) AS 'Diff_seconds'
FROM History_CTE
INNER JOIN
History2_CTE ON History_CTE.RowNum = History2_CTE.RowNum2 + 1
Consider adding a third CTE to first conditionally calculate your needed value. Then aggregate for final statement. Recall CTEs can reference previously defined CTEs. Be sure to always quailfy columns with table aliases in JOIN queries.
WITH
... first two ctes...
, sub AS (
SELECT h1.DT AS 'TimeStamp'
, h2.DT2 AS 'TimeStamp Next Row'
, h1.MID
, h1.FRO
, h1.IC AS 'InCycle this'
, h2.IC2 AS 'InCycle next'
, h1.RowNum
, DATEDIFF(s, h2.DT2, h1.DT) AS 'Diff_seconds'
, CASE
WHEN (h1.IC = 1 AND h2.IC2 = 1) OR (h1.IC= 1 AND h2.IC2 = 0)
THEN DATEDIFF(s, h2.DT2, h1.DT)
END AS 'IC_Diff_seconds'
FROM History_CTE h1
INNER JOIN History2_CTE h2
ON h1.RowNum = h2.RowNum2 + 1
)
SELECT SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
And if needing to add groupings, incorporate GROUP BY:
SELECT h1.MID
, h1.FRO
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY h1.MID
, h1.FRO
Even aggregate calculations by day:
SELECT CONVERT(date, [TimeStamp]) AS [Day]
, SUM([Diff_seconds]) AS Diff_seconds_Total
, SUM([IC_Diff_seconds]) AS IC_Diff_seconds_Total
FROM sub
GROUP BY CONVERT(date, [TimeStamp])
The result I'm looking for at the end is simply: InCycle = Xtime. Something like:
SUM(Diff_seconds if((InCycleThis = 1 AND InCycleNext = 1) OR (InCycleThis = 1 AND InCycleNext = 0))
As I understand your question, you just need to sum the difference betwen the timestamp of "in cycle" rows and the timestamp of the next row.
select machineid,
sum(datediff(s, dateandtime, lead_dateandtime)) as total_in_time
from (
select h.*,
lead(dateandtime) over(partition by machineid order by dateandtime) as lead_dateandtime
from history h
) h
where inclycle = 1
group by machineid

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10
This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.
SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]
Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

MS SQL SERVER LAG

I'm trying to apply a condition to LAG in a SQL query. Does anyone know how to do this?
This is the query:
SELECT CONCAT([FirstName],' ',[LastName]) AS employee,
CAST([ArrivalTime] AS DATE) AS date,
CAST(DATEADD(hour,2,FORMAT([ArrivalTime],'HH:mm')) AS TIME) as time,
CASE [EventType]
WHEN 20001 THEN 'ENTRY'
ELSE 'EXIT'
END AS Action,
OutTime =
CASE [EventType]
WHEN '20001'
THEN DATEDIFF(minute,Lag([ArrivalTime],1) OVER(ORDER BY [CardHolderID], [ArrivalTime]), [ArrivalTime])
ELSE
NULL
END
FROM [CCFTEvent].[dbo].[ReportEvent]
LEFT JOIN [CCFTCentral].[dbo].[Cardholder] ON [CCFTEvent].[dbo].[ReportEvent].[CardholderID] = [CCFTCentral].[dbo].[Cardholder].[FTItemID]
WHERE EventClass = 41
AND [FirstName] IS NOT NULL
AND [FirstName] LIKE 'Leeann%'
The problem I have is when the times are subtracted between two different dates, it must also be NULL when subtracting between two different dates.
The 910 is incorrect.
I'd add another condition to your case statement. i.e.
...
CASE
WHEN [EventType] = '20001' AND DATEDIFF(DAY,[ArrivalTime],LAG([ArrivalTime]) over (ORDER BY [CardHolderID], [ArrivalTime])) > 0
THEN NULL
WHEN [EventType] = '20001'
THEN DATEDIFF(minute,Lag([ArrivalTime],1) OVER(ORDER BY [CardHolderID], [ArrivalTime]), [ArrivalTime])
ELSE NULL
It seems to me that the LAG just needs to be partitioned by the date (& some other fields for good measure).
If the previous date is in another partition,
then the LAG will return NULL,
then the datediff will return NULL.
SELECT
CONCAT(holder.FirstName+' ', holder.LastName) AS employee,
CAST(repev.ArrivalTime AS DATE) AS [date],
CAST(SWITCHOFFSET(repev.ArrivalTime,'+02:00') AS TIME) as [time],
IIF(repev.EventType = 20001, 'ENTRY', 'EXIT') AS Action,
(CASE WHEN repev.EventType = 20001
THEN DATEDIFF(minute, LAG(repev.ArrivalTime)
OVER (PARTITION BY repev.EventClass, repev.CardholderID, CAST(repev.ArrivalTime AS DATE)
ORDER BY repev.ArrivalTime), repev.ArrivalTime)
END) AS OutTime
FROM [CCFTEvent].[dbo].[ReportEvent] AS repev
LEFT JOIN [CCFTCentral].[dbo].[Cardholder] AS holder ON holder.FTItemID = repev.CardholderID
WHERE repev.EventClass = 41
AND holder.FirstName LIKE 'Leeann%'
Test on db<>fiddle here

How do I remove certain duplicates in a complex SQL query

I am writing a query and need it to Remove all duplicates of a.GenUserID but also keep the most recent login date ( that is b.LogDateTime) but this date must be older than 6 months. If there are later dates, they have to be removed.
I hope this makes sense.
SELECT DISTINCT
a.GenUserID,
c.DeletionDate,
b.LogDateTime,
(CASE c.Disabled WHEN 0 THEN 'NO' else 'YES - ARCHIVED' end)
FROM RioReport.dbo.GenUser a
LEFT JOIN dbo.GenUserArchive c on a.GenUserID = c.GenUserID
LEFT JOIN dbo.GenUserAccessHistory b on a.GenUserID = b.ExtraInfo
WHERE(a.Disabled=0 or c.Disabled=0)
AND c.DeletionDate IS NOT NULL
AND ((DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime IS NULL))
ORDER BY a.GenUserID, b.LogDateTime desc
You could add the row_number() information to your query, and wrap that query into an outer query that just takes the records with number 1 from that result:
select *
from (
select a.GenUserID,
c.DeletionDate,
b.LogDateTime,
case c.Disabled when 0 then 'NO' else 'YES - ARCHIVED' end as diabled,
row_number() over (partition by a.GenUserID
order by b.LogDateTime desc) as rn
from RioReport.dbo.GenUser a
inner join dbo.GenUserArchive c
on a.GenUserID = c.GenUserID
left join dbo.GenUserAccessHistory b
on a.GenUserID = b.ExtraInfo
where (a.Disabled=0 or c.Disabled=0)
and c.DeletionDate is not null
and (DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime is null)
)
where rn = 1
order by a.GenUserID
Note that you can turn the first left join into an inner join without any change to the result set, since you have a non-null check on one of its fields. inner join is then preferred, and might give a performance improvement.
If GenUserAccessHistory.LogDateTime is always non-null, then you can avoid the test or b.LogDateTime is null by moving the DateAdd(MM, -6, GetDate()) > b.LogDateTime condition to the appropriate join on clause.
The generated row number will be given in order of descending LogDateTime values, and restart from 1 for every different user.
Alternative without window functions
row_number() and other window functions are supported since SQL Server 2008. In comments you write you cannot use it. If that is the case, here is an alternative using a common table expression (supported since SQL Server 2005):
;with cte as (
select a.GenUserID,
c.DeletionDate,
b.LogDateTime,
case c.Disabled when 0 then 'NO' else 'YES - ARCHIVED' end as disabled,
from RioReport.dbo.GenUser a
inner join dbo.GenUserArchive c
on a.GenUserID = c.GenUserID
left join dbo.GenUserAccessHistory b
on a.GenUserID = b.ExtraInfo
where (a.Disabled=0 or c.Disabled=0)
and c.DeletionDate is not null
and (DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime is null)
)
select *
from cte main
where LogDateTime is null
or not exists (select 1
from cte sub
where sub.GenUserID = main.GenUserID
and sub.LogDateTime > main.LogDateTime)
order by GenUserID
Try with the below query.
;WITH CTE_Group
AS(
SELECT
ROW_NUMBER() OVER (PARTITION BY a.GenUserID ORDER BY b.LogDateTime DESC) as RNO,
a.GenUserID,
c.DeletionDate,
b.LogDateTime,
(CASE c.Disabled WHEN 0 THEN 'NO' else 'YES - ARCHIVED' end) IsArchived
FROM RioReport.dbo.GenUser a
LEFT JOIN dbo.GenUserArchive c on a.GenUserID = c.GenUserID
LEFT JOIN dbo.GenUserAccessHistory b on a.GenUserID = b.ExtraInfo
WHERE(a.Disabled=0 or c.Disabled=0)
AND c.DeletionDate IS NOT NULL
AND ((DateAdd(MM, -6, GetDate()) > b.LogDateTime or b.LogDateTime IS NULL)))
SELECT GenUserID,
DeletionDate,
LogDateTime,
IsArchived
FROM WITH_CTE_Group
WHERE RNO=1
Use cte and window function
;with ctr as (
select a.GenUserID, a.DeletionDate, a.LogDateTime
row_number()over(partition by a.GenUserID order by b.LogDateTime desc) rnk
from RioReport.dbo.GenUser a )
select a.GenUserID, a.DeletionDate, a.LogDateTime,
CASE WHEN DATEDIFF(mm,LogDateTime,getdate())<6 THEN 'NO' else 'YES - ARCHIVED' end)
from ctr a where a.rnk=1

CASE Statement inside a subquery

I was able to create the following query after help from the post below
select * from duppri t
where exists (
select 1
from duppri
where symbolUP = t.symbolUP
AND date = t.date
and price <> t.price)
ORDER BY date
SQL to check when pairs don't match
I have now realized that I need to add a case statement to indicate when all the above criteria fits, but the type value is equal between duppri and t.duppri. This occurs because of case sensitivity. This query is an attempt to clean up a portfolio accounting system that unfortunately allowed numerous duplicates because it didn't have strong referential integrity or constraints.
I would like the case statement to produce the column 'isMatch'
Date |Type|Symbol |SymbolUP |Concatt |Price |IsMatch
6/30/1995 |gaus|313586U72|313586U72|gaus313586U72|109.25|Different
6/30/1995 |gbus|313586U72|313586U72|gbus313586U72|108.94|Different
6/30/1995 |agus|SRR |SRR |agusSRR |10.25 |Different
6/30/1995 |lcus|SRR |SRR |lcusSRR |0.45 |Different
11/27/1996|lcus|LLY |LLY |lcusLLY |76.37 |Matched
11/27/1996|lcus|lly |LLY |lcusLLY |76 |Matched
11/28/1996|lcus|LLY |LLY |lcusLLY |76.37 |Matched
11/28/1996|lcus|lly |LLY |lcusLLY |76 |Matched
I tried the following CASE statement but it is creating errors
SELECT * from duppri t
where exists (
select 1,
CASE IsMatch WHEN [type] = [t.TYPE] THEN 'Matched' ELSE 'Different' END
from duppri
where symbolUP = t.symbolUP
AND date = t.date
and price <> t.price)
ORDER BY date
You could just use window functions, if I understand correctly:
select d.*,
(case when mint = maxt
then 'Matched' else 'Different'
end)
from (select d.*,
min(type) over (partition by symbolup, date) as mint,
max(type) over (partition by symbolup, date) as maxt,
min(price) over (partition by symbolup, date) as minp,
max(price) over (partition by symbolup, date) as maxp
from duppri d
) d
where minp <> maxp
order by date;
The subquery used with the exists predicate can't and won't return anything other than true/false but you can accomplish what you want using a subquery like this, which should work:
select
*,
(select
CASE when count(distinct type) = 1 THEN 'Matched' ELSE 'Different' END
from duppri
where symbol = t.symbol and date = t.date
) IsMatch
from duppri t
where exists (
select 1
from duppri
where symbol = t.symbol
and price <> t.price);