How can you see in the image the result of the query? How can I leave the inputs and outputs in a single row, as well as the time and date?
SELECT
TR.Part, TR.Descripcion,
TIO.Date,
M1 = CASE WHEN TIO.TypeOperation = 1 THEN 'Salio Almacen' END,
M2 = CASE WHEN TIO.TypeOperation = 0 THEN 'Entro Almacen' END,
TIO.TypeOperation
FROM
Mant.Tool_InOut TIO
JOIN
Mant.Tool_Register TR ON TIO.idTools = TR.Id
WHERE
CAST(TIO.Date AS date) BETWEEN CAST(#f1 AS date) AND CAST(#f2 AS date)
Start here:
SELECT
TR.Part, TR.Descripcion,
TIO.Date AS Salio_Almacen_Date,
, (SELECT TOP 1 ead.Date
FROM Mant.Tool_InOut ead
WHERE ead.idTools = TIO.idTools
AND ead.TypeOperation = 0
AND ead.Date >0 TIO.Date
) As Entro_Almacen_Date
FROM
Mant.Tool_InOut TIO
JOIN
Mant.Tool_Register TR ON TIO.idTools = TR.Id
WHERE TIO.TypeOperation = 1
AND TIO.Date >= CAST(#f1 AS date) AND TIO.Date < DATEADD(day, 1, CAST(#f2 AS date))
Note this won't work for rows 10 and 11 in the sample, which are out of order from the others, or row 12, which has no matching TypeOperation 1 record. Row 7 will keep the NULL in the second place, but this seems correct.
The purpose here is to illustrate the challenge. Before we can give a complete solution, you need to understand the data better around why there are two rows in the first place, so you can tell us how you want to handle these edge cases.
Finally, note the change to the date checks in the WHERE clause. Avoiding the CAST() on the date field allows you to make better use of any index index on the field, which can have a drastic impact on query performance.
I imagine you want to know at any certain date if a part returned to the warehouse (almacen) or not.
If this is the case, then i guess it would be something like this:
SELECT
TR.Part
, TR.Descripcion
, M1 = 'Salio'
, T1 = MIN(TS.[Date])
, M2 = 'Entro'
, T2 = CASE WHEN MAX(TE.[Date]) > MAX(TS.[Date]) THEN MAX(TE.[Date] ELSE NULL END
, TIO.TypeOperation
FROM Mant.Tool_Register TR
JOIN Mant.Tool_InOut TS ON TS.idTools = TR.Id
AND TS.TypeOperation = 1
JOIN Mant.Tool_InOut TE ON TE.idTools = TR.Id
AND TE.TypeOperation = 0
WHERE CAST(TS.Date AS date) BETWEEN CAST(#F1 AS date) AND CAST(#F2 AS date)
AND CAST(TE.Date AS date) BETWEEN CAST(#F1 AS date) AND CAST(#F2 AS date)
It doesn't work for me at all, following the logic.
I give you the raw table data, as you can see in the column, the value at zero tells me when the product has left my warehouse and the number one when it has returned.
How can I place it next to each other.
Related
Everything in this query works except for the second LEFT JOIN, where BEGIN_DATE and END_DATE are. Because I have to group by the additional columns, so they can be used in the "on join", I am getting false numbers. Is there any way to do this without having to group by. I hope this makes sense. Basically because I have to include BEGIN_DATE AND END_DATE in the group by, everything gets lost.
SELECT
to_char(T1.CALL_TIMESTAMP,'YYYY-IW') AS OMONTH
,COUNT(T1.HOUSE) AS NODECALLS
,T3.NODE_CODE
,T5.NODECUSTCOUNT
,T1.CALL_CATEGORY_LVL_3
,sum((CASE WHEN T1.TC_WIP_TRANSACTION_ID IS NOT NULL THEN 1 ELSE 0 END )) AS TC
,sum((CASE WHEN T1.TC_WIP_TRANSACTION_ID IS NOT NULL THEN 1 ELSE 0 END ))/nullif(COUNT(T1.HOUSE), 0) AS SVRATEPERCALL
,COUNT(T1.HOUSE)/ nullif(T5.NODECUSTCOUNT, 0) AS CALLRATE
FROM CVKOMNZP.NZKOMUSER.NFOV_INBD_REMEDY_CALL_DETAILS T1
LEFT JOIN
(
SELECT T2.NODE_CODE,T2.BEGIN_DATE,T2.END_DATE,T2.HOUSE,T2.CORP
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST T2
) T3
ON T1.CORP = T3.CORP AND T1.HOUSE = T3.HOUSE AND (T1.CALL_TIMESTAMP BETWEEN T3.BEGIN_DATE AND T3.END_DATE)
LEFT JOIN
(
SELECT count(ADM_HOUSEHOLD_ID) AS NODECUSTCOUNT,NODE_CODE,BEGIN_DATE, END_DATE
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST
WHERE HOUSE_STATUS_CODE = 2
AND END_DATE = '2999-12-31 00:00:00'
AND T1.CALL_TIMESTAMP BETWEEN BEGIN_DATE AND END_DATE
GROUP BY NODE_CODE,BEGIN_DATE,END_DATE
) T5
ON T5.NODE_CODE = T3.NODE_CODE AND T1.CALL_TIMESTAMP BETWEEN T5.BEGIN_DATE AND T5.END_DATE
WHERE T1.EXCLUSION_FLAG = 'N'
AND T1.CALL_TIMESTAMP >= To_Date ('07-29-2017', 'MM-DD-YYYY' ) AND T1.CALL_TIMESTAMP <= To_Date ('07-31-2017', 'MM-DD-YYYY' )
GROUP BY
to_char(T1.CALL_TIMESTAMP,'YYYY-IW')
,T3.NODE_CODE
,T5.NODECUSTCOUNT
,T1.CALL_CATEGORY_LVL_3
If I am understanding this right, you want to get a COUNT without grouping by BEGIN and END DATE. However, because your Subquery (2nd LEFT JOIN) needs to include the BEGIN and NEED, you do not know how to group without it.
If this is the case, you'll need a subquery for your count and JOIN it back to the same table.
FYI: Your T1.CALL_TIMESTAMP does not make sense in this subquery since you don't have a table called T1. I renamed it to "a". Feel free to change it to what you want.
See if this make sense
LEFT JOIN
(
SELECT a.BEGIN_DATE,
a.END_DATE,
node.NODECUSTCOUNT,
a.node_code
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST a
/**Subquery to get a COUNT of all the Node based on NODE_CODE.
You link this back to your query above using the NODE CODE**/
JOIN ( SELECT count(ADM_HOUSEHOLD_ID) AS NODECUSTCOUNT,
NODE_CODE
FROM CVKOMNZP.NZKOMUSER.D_HOUSEHOLD_CH_HIST
GROUP BY NODE_CODE ) node on node.node_code = a.node_code
WHERE a.HOUSE_STATUS_CODE = 2
AND a.END_DATE = '2999-12-31 00:00:00'
AND a.CALL_TIMESTAMP BETWEEN BEGIN_DATE AND END_DATE
) ..JOIN THIS BACK TO YOUR MAIN TABLE
i have a select statement that contains hundred thousands if data, however the execution time is very slow which take longer than 15 minutes. Is the any way that i can improve the execution time for this select statement.
select a.levelP,
a.code,
a.descP,
(select nvl(SUM(amount),0) from ca_glopen where code = a.code and acc_mth = '2016' ) ocf,
(select nvl(SUM(amount),0) from ca_glmaintrx where code = a.code and to_char(doc_date,'yyyy') = '2016' and to_char(doc_date,'yyyymm') < '201601') bcf,
(select nvl(SUM(amount),0) from ca_glmaintrx where jum_amaun > 0 and code = a.code and to_char(doc_date,'yyyymm') = '201601' ) debit,
(select nvl(SUM(amount),0) from ca_glmaintrx where jum_amaun < 0 and code = a.code and to_char(doc_date,'yyyymm') = '201601' ) credit
from ca_chartAcc a
where a.code is not null
order by to_number(a.code), to_number(levelP)
please help me for the way to up speed my query and result.TQ
Your primary problem is that most of your subqueries use functions on your search criteria, including some awkward ones on your dates. It's much better to flip that around and explicitly qualify the expected range, by supplying actual dates (a one month range is usually a small percentage of total rows, so this is very likely to hit an index).
SELECT Chart.levelP, Chart.code, Chart.descP,
COALESCE(GL_SUM.ocf, 0),
COALESCE(Transactions.bcf, 0),
COALESCE(Transactions.debit, 0),
COALESCE(Transactions.credit, 0),
FROM ca_ChartAcc Chart
LEFT JOIN (SELECT code, SUM(amount) AS ocf
FROM ca_GLOpen
WHERE acc_mth = '2016') GL_Sum
ON GL_Sum.code = Chart.code
LEFT JOIN (SELECT code,
SUM(amount) AS bcf,
SUM(CASE WHEN amount > 0 THEN amount) AS debit,
SUM(CASE WHEN amount < 0 THEN amount) AS credit,
FROM ca_GLMainTrx
WHERE doc_date >= TO_DATE('2016-01-01')
AND doc_date < TO_DATE('2016-02-01')) Transactions
ON Transactions.code = Chart.code
WHERE Chart.code IS NOT NULL
ORDER BY TO_NUMBER(Chart.code), TO_NUMBER(Chart.levelP)
If you only need a few codes, it may yield better results to push those values into the subqueries as well (although note that the optimizer is likely to perform this for you).
It may be possible to remove the calls to TO_NUMBER(...) from the ORDER BY clause; however, this depends on the format of the values, since how they were encoded may change the ordering of results.
Im having a slight issue merging the following statements
declare #From DATE
SET #From = '01/01/2014'
declare #To DATE
SET #To = '31/01/2014'
--ISSUED SB
SELECT
COUNT(pm.DateAppIssued) AS Issued,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateAppIssued, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Paased
SELECT
COUNT(pm.DatePassed) AS Passed,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DatePassed, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Received
SELECT
COUNT(pm.DateAppRcvd) AS Received,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateAppRcvd, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
--Offered
SELECT
COUNT(pm.DateOffered) AS Offered,
pm.Lender,
pm.AmountRequested,
p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
AND (CONVERT(DATE,DateOffered, 103)
Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103))
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
Ideally I would like the result of theses query's to show as follows
Issued, Passed , Offered, Received,
All in one table
Any Help on this would be greatly appreciated
Thanks
Rusty
I'm fairly certain in this case the query can be written without the use of any CASE statements, actually:
DECLARE #From DATE = '20140101'
declare #To DATE = '20140201'
SELECT Mortgage.lender, Mortgage.amountRequested, Profile.caseTypeId,
COUNT(Issue.issued) as issued,
COUNT(Pass.passed) as passed,
COUNT(Receive.received) as received,
COUNT(Offer.offered) as offered
FROM BPS.dbo.tbl_Profile_Mortgage as Mortgage
JOIN BPS.dbo.tbl_Profile as Profile
ON Mortgage.fk_profileId = Profile.id
AND Profile.caseTypeId = 2
LEFT JOIN (VALUES (1, #From, #To)) Issue(issued, rangeFrom, rangeTo)
ON Mortgage.DateAppIssued >= Issue.rangeFrom
AND Mortgage.DateAppIssued < Issue.rangeTo
LEFT JOIN (VALUES (2, #From, #To)) Pass(passed, rangeFrom, rangeTo)
ON Mortgage.DatePassed >= Pass.rangeFrom
AND Mortgage.DatePassed < Pass.rangeTo
LEFT JOIN (VALUES (3, #From, #To)) Receive(received, rangeFrom, rangeTo)
ON Mortgage.DateAppRcvd >= Receive.rangeFrom
AND Mortgage.DateAppRcvd < Receive.rangeTo
LEFT JOIN (VALUES (4, #From, #To)) Offer(offered, rangeFrom, rangeTo)
ON Mortgage.DateOffered >= Offer.rangeFrom
AND Mortgage.DateOffered < Offer.rangeTo
WHERE Mortgage.lender > ''
AND (Issue.issued IS NOT NULL
OR Pass.passed IS NOT NULL
OR Receive.received IS NOT NULL
OR Offer.offered IS NOT NULL)
GROUP BY Mortgage.lender, Mortgage.amountRequested, Profile.caseTypeId
(not tested, as I lack a provided data set).
... Okay, some explanations are in order, because some of this is slightly non-intuitive.
First off, read this blog entry for tips about dealing with date/time/timestamp ranges (interestingly, this also applies to all other non-integral types). This is why I modified the #To date - so the range could be safely queried without needing to convert types (and thus ignore indices). I've also made sure to choose a safe format - depending on how you're calling this query, this is a non issue (ie, parameterized queries taking an actual Date type are essentially format-less).
......
COUNT(Issue.issued) as issued,
......
LEFT JOIN (VALUES (1, #From, #To)) Issue(issued, rangeFrom, rangeTo)
ON Mortgage.DateAppIssued >= Issue.rangeFrom
AND Mortgage.DateAppIssued < Issue.rangeTo
.......
What's the difference between COUNT(*) and COUNT(<expression>)? If <expression> evaluates to null, it's ignored. Hence the LEFT JOINs; if the entry for the mortgage isn't in the given date range for the column, the dummy table doesn't attach, and there's no column to count. Unfortunately, I'm not sure how the interplay between the dummy table, LEFT JOIN, and COUNT() here will appear to the optimizer - the joins should be able to use indices, but I don't know if it's smart enough to be able to use that for the COUNT() here too....
(Issue.issued IS NOT NULL
OR Pass.passed IS NOT NULL
OR Receive.received IS NOT NULL
OR Offer.offered IS NOT NULL)
This is essentially telling it to ignore rows that don't have at least one of the columns. They wouldn't be "counted" in any case (well, they'd likely have 0) - there's no data for the function to consider - but they would show up in the results, which probably isn't what you want. I'm not sure if the optimizer is smart enough to use this to restrict which rows it operates over - that is, turn the JOIN conditions into a way to restrict the various date columns, as if they were in the WHERE clause too. If the query runs slow, try adding the date restrictions to the WHERE clause and see if it helps.
You could either as Dan Bracuk states use a union, or you could use a case-statement.
declare #From DATE = '01/01/2014'
declare #To DATE = '31/01/2014'
select
sum(case when (CONVERT(DATE,DateAppIssued, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Issued
, sum(case when (CONVERT(DATE,DatePassed, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Passed
, sum(case when (CONVERT(DATE,DateAppRcvd, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Received
, sum(case when (CONVERT(DATE,DateOffered, 103) Between CONVERT(DATE,#From,103) and CONVERT(DATE,#To,103)) then 1 else 0 end) as Offered
, pm.Lender
, pm.AmountRequested
, p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE CaseTypeID = 2
And Lender > ''
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
Edit:
What I've done is looked at your queries.
All four queries have identical Where Clause, with the exception of the date comparison. Therefore I've created a new query, which selects all your data which might be used in one of the four counts.
The last clause; the data-comparison, is moved into a case statement, returning 1 if the row is between the selected date-range, and 0 otherwise. This basically indicates whether the row would be returned in your previous queries.
Therefore a sum of this column would return the equivalent of a count(*), with this date-comparison in the where-clause.
Edit 2 (After comments by Clockwork-muse):
Some notes on performance, (tested on MS-SQL 2012):
Changing BETWEEN to ">=" and "<" inside a case-statement does not affect the cost of the query.
Depending on the size of the table, the query might be optimized quite a lot, by adding the dates in the where clause.
In my sample data (~20.000.000 rows, spanning from 2001 to today), i got a 48% increase in speed by adding.
or (DateAppIssued BETWEEN #From and #to )
or (DatePassed BETWEEN #From and #to )
or (DateAppRcvd BETWEEN #From and #to )
or (DateOffered BETWEEN #From and #to )
(There were no difference using BETWEEN and ">=" and "<".)
It is also worth nothing that i got a 6% increase when changing the #From = '01/01/2014' to #From '2014-01-01' and thus omitting the convert().
Eg. an optimized query could be:
declare #From DATE = '2014-01-01'
declare #To DATE = '2014-01-31'
select
sum(case when (DateAppIssued >= #From and DateAppIssued < #To) then 1 else 0 end) as Issued
, sum(case when (DatePassed >= #From and DatePassed < #To) then 1 else 0 end) as Passed
, sum(case when (DateAppRcvd >= #From and DateAppRcvd < #To) then 1 else 0 end) as Received
, sum(case when (DateOffered >= #From and DateOffered < #To) then 1 else 0 end) as Offered
, pm.Lender
, pm.AmountRequested
, p.CaseTypeID
FROM BPS.dbo.tbl_Profile_Mortgage AS pm
INNER JOIN BPS.dbo.tbl_Profile AS p
ON pm.FK_ProfileId = p.Id
WHERE 1=1
and CaseTypeID = 2
and Lender > ''
and (
(DateAppIssued >= #From and DateAppIssued < #To)
or (DatePassed >= #From and DatePassed < #To)
or (DateAppRcvd >= #From and DateAppRcvd < #To)
or (DateOffered >= #From and DateOffered < #To)
)
GROUP BY pm.Lender,p.CaseTypeID,pm.AmountRequested;
I do however really like Clockwork-muse's answer, as I prefer joins to case-statements, where posible :)
The all-in-one queries here in other answers are certainly elegant, but if you are in a rush to get something working as a one-off, or if you agree the following approach is easy to read and maintain when you have to revisit it some time down the road (or someone else less skilled has to work out what's going on) - here's a skeleton of a Common Table Expression alternative which I believe is quite clear to read :
WITH Unioned_Four AS
( SELECT .. -- first select : Issued
UNION ALL
SELECT .. -- second : Passed
UNION ALL
SELECT .. -- Received
UNION ALL
SELECT .. -- Offered
)
SELECT
-- group fields
-- SUMs of the count fields
FROM Unioned_Four
GROUP BY .. -- etc
Obviously the fields have to match in the 4 parts of the UNION, requiring dummy fields returning zero in each one.
So you could have kept the simple approach that you started with, but wrapped it up as a derived table using the CTE syntax to allow you to have the four counts all on one row per GROUPing. Also if you have to add extra filtering to specific queries of the four, then it's easier to meddle with the individual SELECTs - the flipside being (of course) that further requirements for all four would need to be duplicated!
I have this query:
SELECT `s`.`time` , SUM( s.love ) AS total_love, SUM( s.sad ) AS total_sad, SUM( s.angry ) AS total_angry, SUM( s.happy ) AS total_happy
FROM (`employee_workshift` AS e)
JOIN `workshift` AS w ON `e`.`workshift_uuid` = `w`.`uuid`
JOIN `shift_summary` AS s ON `w`.`uuid` = `s`.`workshift_uuid`
WHERE `s`.`location_uuid` = '81956feb-3fd7-0e84-e9fe-b640434dfad0'
AND `e`.`employee_uuid` = '3866a979-bc5e-56cb-cede-863afc47b8b5'
AND `s`.`workshift_uuid` = '8c9dbd85-18a3-6ca9-e3f3-06eb602b6f38'
AND `s`.`time` >= CAST( '18:00:00' AS TIME )
AND `s`.`time` <= CAST( '00:00:00' AS TIME )
AND `s`.`date` LIKE '%2014-03%'
My problem is it returns "NULL" but when I changed my 'end_time' to "23:59:59", it returned the right data. I've got an idea to pull the hour of both 'start_time' and 'end_time' and then insert it in a loop to get everything between them.
$time_start = 15;
$time_end = 03;
So it should produce: 15,16,17,18,19,20,21,22,23,00,01,02,03
Then I'll compare them all. But this would take a lot of line and effort than just simply using "BETWEEN". Or should I just use "in_array"? Have you encountered this? I hope someone could help. Thanks.
19:00 is certainly bigger then 00:00 - so your approach should not work.
Try using full timestamp (including date) to get all data you need.
Try to use this query. I don't know your data structure so check INNER JOIN between s and s1 tables. The join must be one row to one row - the difference only in date. Date of s1 rows must be earlier on 1 day than s table rows.
SELECT s.time , SUM( s.love ) AS total_love, SUM( s.sad ) AS total_sad, SUM( s.angry ) AS total_angry, SUM( s.happy ) AS total_happy
FROM (employee_workshift AS e)
JOIN workshift AS w ON e.workshift_uuid = w.uuid
JOIN shift_summary AS s ON w.uuid = s.workshift_uuid
JOIN shift_summary AS s1 ON (w.uuid = s.workshift_uuid AND CAST(s.date as DATE)=CAST(s1.date as DATE)+1)
WHERE s.location_uuid = '81956feb-3fd7-0e84-e9fe-b640434dfad0'
AND e.employee_uuid = '3866a979-bc5e-56cb-cede-863afc47b8b5'
AND s.workshift_uuid = '8c9dbd85-18a3-6ca9-e3f3-06eb602b6f38'
AND s1.time >= CAST( '18:00:00' AS TIME )
AND s.time <= CAST( '00:00:00' AS TIME )
AND s.date LIKE '%2014-03%'
I need to analyze some weblogs and determine if a user has visited once, taken a year break, and visited again. I want to add a flag to every row (Y/N) with a VisitId that meets the above criteria.
How would I go about creating this sql?
Here are the fields I have, that I think need to be used (by analyzing the timestamp of the first page of each visit):
VisitID - each visit has a unique Id (ie. 12356, 12345, 16459)
UserID - each user has one Id (ie. steve = 1, ted = 2, mark = 12345, etc...)
TimeStamp - looks like this: 2010-01-01 00:32:30.000
select VisitID, UserID, TimeStamp from page_view_t where pageNum = 1;
thanks - any help would be greatly appreciated.
You could rank every user's rows, then join the ranked row set to itself to compare adjacent rows:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY TimeStamp)
FROM page_view_t
),
flagged AS (
SELECT
*,
IsReturnVisit = CASE
WHEN EXISTS (
SELECT *
FROM ranked
WHERE UserID = r.UserID
AND rnk = r.rnk - 1
AND TimeStamp <= DATEADD(YEAR, -1, r.TimeStamp)
)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
)
SELECT
VisitID,
UserID,
TimeStamp,
IsReturnVisit
FROM flagged
Note: the above flags only return visits.
UPDATE
To flag the first visits same as return visits, the flagged CTE could be modified as follows:
…
SELECT
*,
IsFirstOrReturnVisit = CASE
WHEN p.UserID IS NULL OR r.TimeStamp >= DATEADD(YEAR, 1, p.TimeStamp)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
LEFT JOIN ranked p ON r.UserID = p.UserID AND r.rnk = p.rnk + 1
…
References that might be useful:
WITH common_table_expression (Transact-SQL)
Ranking Functions (Transact-SQL)
ROW_NUMBER (Transact-SQL)
The other guy was faster but since I took time to do it and it's a completely different approach I might as well post It :D.
SELECT pv2.VisitID,
pv2.UserID,
pv2.TimeStamp,
CASE WHEN pv1.VisitID IS NOT NULL
AND pv3.VisitID IS NULL
THEN 'YES' ELSE 'NO' END AS IsReturnVisit
FROM page_view_t pv2
LEFT JOIN page_view_t pv1 ON pv1.UserID = pv2.UserID
AND pv1.VisitID <> pv2.VisitID
AND (pv1.TimeStamp <= DATEADD(YEAR, -1, pv2.TimeStamp)
OR pv2.TimeStamp <= DATEADD(YEAR, -1, pv1.TimeStamp))
AND pv1.pageNum = 1
LEFT JOIN page_view_t pv3 ON pv1.UserID = pv3.UserID
AND (pv3.TimeStamp BETWEEN pv1.TimeStamp AND pv2.TimeStamp
OR pv3.TimeStamp BETWEEN pv2.TimeStamp AND pv1.TimeStamp)
AND pv3.pageNum = 1
WHERE pv2.pageNum = 1
Assuming page_view_t table stores UserID and TimeStamp details of each visit of the user, the following query will return users who have visited taking a break of at least an year (365 days) between two consecutive visits.
select t1.UserID
from page_view_t t1
where (
select datediff(day, max(t2.[TimeStamp]), t1.[TimeStamp])
from page_view_t t2
where t2.UserID = t1.UserID and t2.[TimeStamp] < t1.[TimeStamp]
group by t2.UserID
) >= 365