I need to find the date when the hit = 1, and when the hit moved from
1 to 0.Here is the table
Once I use this code. For the following table
select t1.id,t1.hit,min(datehit_1) as Hit_1, min(dateHit_0)as modelHit_0 from (
select
id,
dateHit_0 = (case when hits = 0 then date else null end),
datehit_1 = (case when hits = 1 then date else null end),
datacontrol_modelhits
from dmt.dqm_std_model_output_history
where id = 10 ) as t1
group by t1.id, t1.hit
But Unfortunately, I am getting the this output:
But I need to get this output. Want to find the last change for hit =0 which is 23/10/2019.
I need to find the date when the hit = 1, and when the hit moved from 1 to 0.Here is the table
I think you want:
select t.*
from (select t.*,
lag(hit) over (partition by id order by coalesce(datehit_0, datehit_1) as prev_hit,
lead(hit) over (partition by id order by coalesce(datehit_0, datehit_1) as next_hit
from t
) t
where (hit = 1 and next_hit = 0) or
(hit = 0 and prev_hit = 1);
If you want only the most recent two rows:
select top (2) t.*
from (select t.*,
lag(hit) over (partition by id order by coalesce(datehit_0, datehit_1) as prev_hit,
lead(hit) over (partition by id order by coalesce(datehit_0, datehit_1) as next_hit
from t
) t
where (hit = 1 and next_hit = 0) or
(hit = 0 and prev_hit = 1)
order by coalesce(datehit_0, datehit_1) desc;
Related
If I were to have a table such as the one below:
id_
last_updated_by
1
robot
1
human
1
robot
2
robot
3
robot
3
human
Using SQL, how could I group by the ID and create a new column to indicate whether a human has ever updated the record like this:
id_
last_updated_by
updated_by_human
1
robot
1
2
robot
0
3
robot
1
UPDATE
I'm currently doing the following, though I'm not sure how efficient this is. Selecting the latest record and then merging it with my calculated column via a sub-select.
SELECT MAIN.TRANSACTION_ID,
MAIN.CREATED_DATE
MAIN.CREATED_BY_USER_ID,
MAIN.OWNER_USER_ID,
STP.TOUCHED_BY_HUMAN
FROM (
SELECT TRANSACTION_ID,
CREATED_DATE
CREATED_BY_USER_ID_
OWNER_USER_ID_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by End_Dt desc) = 1
) MAIN
LEFT JOIN (
SELECT TRANSACTION_ID,
CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1 END AS CREATED_BY_HUMAN,
CASE
WHEN OWNER_USER_ID IN ('ROBOT', 'MACHINE') OR
OWNER_USER_ID LIKE 'N%' OR
OWNER_USER_ID IS NULL
THEN 0
ELSE 1 END AS OWNED_BY_HUMAN,
CASE
WHEN CREATED_BY_HUMAN = 0 AND
OWNED_BY_HUMAN = 0
THEN 0
ELSE 1 END AS TOUCHED_BY_HUMAN_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by TOUCHED_BY_HUMAN_ desc) = 1
) STP
ON MAIN.TRANSACTION_ID = STP.TRANSACTION_ID
If I'm following your problem, then something like this should work.
SELECT
t.*
,CASE WHEN a.id IS NOT NULL THEN 1 ELSE 0 END AS updated_by_human
FROM table t
LEFT JOIN (SELECT DISTINCT id FROM table WHERE last_updated_by = 'human') a ON t.id = a.id
That takes care of the updated_by_human field, but if you also need to reduce the records in table (only keeping a subset) then you need more information to do that.
Exists clauses are usually not that performant but if your data isn't big this should work.
select id_,
IF (EXISTS (SELECT 1 FROM table_name t2 WHERE t2.last_updated_by = 'human' and t2.id_ = t1.id_), 1, 0) AS updated_by_human
from table_name t1;
here is another way
SELECT *
FROM table_name t1
GROUP BY ti.id_
HAVING COUNT(*) > 0
AND MAX(CASE t1.last_updated_by WHEN 'human' THEN 1 ELSE 0 END) = 1;
Since you didn't specified which column is used to determine this record is the newest record added by a given id, I assume that there will be a column to track the insert/modify timestamp (which is pretty standard table design), let's put it is last_updated_timestamp (if you don't have any, then I still insist you to have one as an auditing trail without timestamp does not make sense)
Given your table name is updating_trail
SELECT updating_trail.*, last_update_trail.modified_by_human
FROM updating_trail
INNER JOIN (
-- determine the id_, the lastest modified_timestamp, and a flag check to determine if there is any record with last_update_by is 'human' -> if yes then give 1
SELECT updating_trail.id_, MAX(last_update_timestamp) AS most_recent_update_ts, MAX(CASE WHEN updating_trail.last_updated_by = 'human' THEN 1 ELSE 0 END) AS modified_by_human
FROM updating_trail
GROUP BY updating_trail.id_
) last_update_trail
ON updating_trail.id_ = last_update_trail.id_ AND updating_trail.last_update_timestamp = last_update_trail.most_recent_update_ts;
Give
id_
last_updated_by
last_update_timestamp
modified_by_human
1
robot
2021-10-19T20:00:00.000Z
1
2
robot
2021-10-19T17:00:00.000Z
0
3
robot
2021-10-19T16:00:00.000Z
1
Check out this sample db fiddle I created for you
This is a 1:1 translation of your query to conditional aggregation:
SELECT TRANSACTION_ID,
CREATED_DATE,
CREATED_BY_USER_ID,
OWNER_USER_ID,
Max(CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1
END) Over (PARTITION BY TRANSACTION_ID) AS CREATED_BY_HUMAN
FROM Table_Name
WHERE CREATED_DATE >= Cast('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= Cast('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY Row_Number() Over (PARTITION BY TRANSACTION_ID ORDER BY End_Dt DESC) = 1
I have a query below where it compares the number of stagingCabincrew and StagingCockpitCrew columns from the staging schema and compares them to their data schema equivalent 'DataCabinCrew' and 'DataCockpitCrew'.
Below is the query and the results outputted:
WITH CTE AS
(SELECT cd.*,
c.*,
DataFlight,
l.ScheduledDepartureDate,
l.ScheduledDepartureAirport
FROM
(SELECT *,
ROW_NUMBER() OVER(PARTITION BY LegKey
ORDER BY UpdateID DESC) AS RowNumber
FROM Data.Crew) c
INNER JOIN Data.CrewDetail cd ON c.UpdateID = cd.CrewUpdateID
AND cd.IsPassive = 1
AND RowNumber = 1
INNER JOIN
(SELECT *,
Carrier + CAST(FlightNumber AS VARCHAR) + Suffix AS DataFlight
FROM Data.Leg) l ON c.LegKey = l.LegKey )
SELECT StagingFlight,
sac.DepartureDate,
sac.DepartureAirport,
cte.DataFlight,
cte.ScheduledDepartureDate,
cte.ScheduledDepartureAirport,
SUM(CASE
WHEN sac.CREWTYPE = 'F' THEN 1
ELSE 0
END) AS StagingCabinCrew,
SUM(CASE
WHEN sac.CREWTYPE = 'C' THEN 1
ELSE 0
END) AS StagingCockpitCrew,
SUM(CASE
WHEN cte.CrewType = 'F' THEN 1
ELSE 0
END) AS DataCabinCrew,
SUM(CASE
WHEN cte.CrewType = 'C' THEN 1
ELSE 0
END) AS DataCockpitCrew
FROM
(SELECT *,
Airline + CAST(FlightNumber AS VARCHAR) + Suffix AS StagingFlight,
ROW_NUMBER() OVER(PARTITION BY Airline + CAST(FlightNumber AS VARCHAR) + Suffix
ORDER BY UpdateId DESC) AS StageRowNumber
FROM Staging.SabreAssignedCrew) sac
LEFT JOIN CTE cte ON StagingFlight = DataFlight
AND sac.DepartureDate = cte.ScheduledDepartureDate
AND sac.DepartureAirport = cte.ScheduledDepartureAirport
AND sac.CREWTYPE = cte.CrewType
WHERE MONTH(sac.DepartureDate) + YEAR(sac.DepartureDate) = MONTH(GETDATE()) + YEAR(GETDATE())
AND StageRowNumber = 1 --AND cte.ScheduledDepartureDate IS NOT NULL
--AND cte.ScheduledDepartureAirport IS NOT NULL
GROUP BY StagingFlight,
sac.DepartureDate,
sac.DepartureAirport,
cte.DataFlight,
cte.ScheduledDepartureDate,
cte.ScheduledDepartureAirport
The results are correct, all I need to do is add a condition in the WHERE clause where StagingCabinCrew <> DataCabinCrew AND StagingCockpitCrew <> DataCockpitCrew
If a row appears then we have found an error in the data, I just need helping adding this condition in the WHERE Clause because the columns in the WHERE Clause are referring to a SUM and CASE Function. I just need help manipulating the query so that I can add this WHERE Clause
I will guess you are trying to use an alias in the same query.
You CANT do this, because the alias wont be recognized in the WHERE.
SELECT field1 + field2 as myField
FROM yourTable
WHERE myField > 3
You need to include it in a sub query
with cte2 as (
SELECT field1 + field2 as myField
FROM yourTable
)
SELECT *
FROM cte2
WHERE myField > 3
or repeat the function
SELECT field1 + field2 as myField
FROM yourTable
WHERE field1 + field2 > 3
I have some data:
Declare #table table (RID VARCHAR(10),
CommType INT,
CommunicationType INT,
VALUE VARCHAR(20),
lastDate Datetime)
INSERT INTO #table (RID, CommType, CommunicationType, VALUE, lastDate)
VALUES
('00WAAS', 3, 0, 'mohan#gmail', '2012-06-15 15:23:49.653'),
('00WAAS', 3, 1, 'manasa#gmail', '2015-08-15 15:23:49.653'),
('00WAAS', 3, 2, 'mother#gmail', '2014-09-15 15:23:49.653'),
('00WAAS', 3, 2, 'father#gmail', '2016-01-15 15:23:49.653'),
('00WAAS', 3, 0, 'hello#gmail', '2013-01-15 15:23:49.653')
My query:
SELECT
TT.RID,
COALESCE(Homemail, BusinessMail, OtherMail) Mail
FROM
(SELECT
RID, MAX(Homemail) Homemail,
MAX(BusinessMail) BusinessMail,
MAX(OtherMail) OtherMail
FROM
(SELECT
RID,
CASE
WHEN CommType = 3 AND CommunicationType = 0 THEN VALUE
END AS Homemail,
CASE
WHEN CommType = 3 AND CommunicationType = 1 THEN VALUE
END AS BusinessMail,
CASE
WHEN CommType = 3 AND CommunicationType = 2 THEN VALUE
END AS OtherMail,
lastDate
FROM
#table) T
GROUP BY RID) TT
What I'm expecting
Here I need to get result if CommType = 3 and CommunicationType = 0 then related value based on latest date and if data is not available for
CommType = 3 and CommunicationType = 0
then I need to get data of CommunicationType = 1
related value based on latest date and if there is no data for
CommunicationType = 1
then CommunicationType = 2 based on latest date of that CommunicationTypes.
Here I have tried Case condition ,MAX and Coalesce
If combination data is present in CommunicationType = 0 is present get CommunicationType = 0 based on latest date
If combination data is not present in CommunicationType = 0 then get CommunicationType = 1 based on latest date
If combination data is not present in CommunicationType = 1 then get CommunicationType = 2 based on latest date
I'm not entirely sure I've understood the requirement. But I think you want:
One record returned for each RID.
The returned record should have a CommType of 3.
If there is more than one record with a CommType 3 you want the record with the lowest CommunicationType.
If there is still more than one record you want the one with the most recent lastDate.
This query uses the windowed function ROW_NUMBER to rank the available records, within a subquery. PARTITION BY ensures each RID is ranked sepearatly. The outer query returns all records with a rank of 1.
Query
SELECT
r.*
FROM
(
/* For each RID We want the lowest communication type with
* the most recent last date.
*/
SELECT
ROW_NUMBER() OVER (PARTITION BY RID ORDER BY CommunicationType, lastDate DESC) AS rn,
*
FROM
#table
WHERE
CommType = 3
) AS r
WHERE
r.rn = 1
;
Next Steps
This query is ok but could be better. For example what would happen if two records had a matching CommType, CommunicationType and lastDate? Reading up on the differences between ROW_NUMBER, RANK, DENSE_RANK and NTILE will help you figure out your options here.
If I understood you correctly, use ROW_NUMBER() :
SELECT tt.RID,COALESCE(tt.Homemail,tt.businessMail,tt.OtherMail)
FROM(
select s.RID,
MAX(CASE WHEN s.CommType = 3 AND s.CommunicationType = 0 THEN s.VALUE END) AS Homemail,
MAX(CASE WHEN s.CommType = 3 AND s.CommunicationType = 1 THEN s.VALUE END) AS BusinessMail,
MAX(CASE WHEN s.CommType = 3 AND s.CommunicationType = 2 THEN s.VALUE END) AS OtherMail
from (SELECT t.*,ROW_NUMBER() OVER(PARTITION BY t.rid,t.communicationType ORDER BY t.lastDate DESC)
FROM #table t
WHERE t.commType = 3) s
WHERE s.rnk = 1
GROUP BY s.rid) tt
I want a count but it repeats 1 with every record. Can you please suggest what to do?
SELECT Count(*),
innerTable.*
FROM (SELECT (SELECT NAME
FROM tours
WHERE tours.id = tourbooking.tourid) AS NAME,
(SELECT url
FROM tours
WHERE tours.id = tourbooking.tourid) AS Url,
(SELECT TOP 1 NAME
FROM tourimages
WHERE tourimages.tourid = tourbooking.tourid
ORDER BY id ASC) AS ImageName,
(SELECT duration + ' ' + CASE WHEN durationtype = 'd' THEN
'Day(s)' WHEN
durationtype =
'h' THEN 'Hour(s)' END
FROM tours
WHERE tours.id = tourbooking.tourid) AS Duration,
(SELECT Replace(Replace('<a> Adult(s) - <c> Children', '<a>', Sum
(CASE
WHEN [type] = 1 THEN 1
ELSE 0
END)),
'<c>',
Sum(CASE
WHEN [type] = 2 THEN 1
ELSE 0
END))
FROM tourperson
WHERE tourperson.bookingid = tourbooking.id) AS TotalPassengers
,
startdate,
createddate AS BookingDate,
id AS BookingID,
[status],
serviceprice
FROM tourbooking
WHERE memberid = 6)AS innerTable
GROUP BY innerTable.NAME,
innerTable.bookingdate,
innerTable.bookingid,
innerTable.duration,
innerTable.imagename,
innerTable.serviceprice,
innerTable.startdate,
innerTable.status,
innerTable.totalpassengers,
innerTable.url
You select records from tourbooking. One of the columns you select is id. This is probably the table's primary key and thus unique. (If not, you should hurry to change that name.)
You call this ID BookingID, and it is one of the columns you group by. So you get one result record per record in tourbooking. The number of records within such a "group" is of course 1; it is the one record you select and show.
If you built real groups, say a result record per day, then you'd get a real count, e.g. the number of bookings per day.
Not sure where to start... But basically I have a report table, an account table, and an account history table. The account history table will have zero or more records, where each record is the state of the account cancelled flag after it changed.
There is other stuff going on, but basically i am looking to return the account detail data, with the state of account cancelled bit on the start date and enddate as different columns.
What is the best way to do this?
I have the following working query below
(Idea) Should I do seperate joins on history table, 1 for each date?
I guess I could do it in three separate queries ( Get Begin Snapshot, End Snapshot, Normal Report query with a join to each snapshot)
something else?
Expected output:
AccountID, OtherData, StartDateCancelled, EndDateCancelled
Test Tables:
DECLARE #Report TABLE (ReportID INT, StartDate DATETIME, EndDate DATETIME)
DECLARE #ReportAccountDetail TABLE( ReportID INT, Accountid INT, Cancelled BIT )
DECLARE #AccountHistory TABLE( AccountID INT, ModifiedDate DATETIME, Cancelled BIT )
INSERT INTO #Report
SELECT 1,'1/1/2011', '2/1/2011'
--
INSERT INTO #ReportAccountDetail
SELECT 1 AS ReportID, 1 AS AccountID, 0 AS Cancelled
UNION
SELECT 1,2,0
UNION
SELECT 1,3,1
UNION
SELECT 1,4,1
--
INSERT INTO #AccountHistory
SELECT 2 AS CustomerID, '1/2/2010' AS ModifiedDate, 1 AS Cancelled
UNION--
SELECT 3, '2/1/2011', 1
UNION--
SELECT 4, '1/1/2010', 1
UNION
SELECT 4, '2/1/2010', 0
UNION
SELECT 4, '2/1/2011', 1
Current Query:
SELECT Accountid, OtherData,
MAX(CASE WHEN BeginRank = 1 THEN CASE WHEN BeginHistoryExists = 1 THEN HistoryCancelled ELSE DefaultCancel END ELSE NULL END ) AS StartDateCancelled,
MAX(CASE WHEN EndRank = 1 THEN CASE WHEN EndHistoryExists = 1 THEN HistoryCancelled ELSE DefaultCancel END ELSE NULL END ) AS EndDateCancelled
FROM
(
SELECT c.Accountid,
'OtherData' AS OtherData,
--lots of other data
ROW_NUMBER() OVER (PARTITION BY c.AccountID ORDER BY
CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate desc) AS BeginRank,
CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END AS BeginHistoryExists,
ROW_NUMBER() OVER ( PARTITION BY c.AccountID ORDER BY
CASE WHEN ch.ModifiedDate <= Report.EndDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate desc) AS EndRank,
CASE WHEN ch.ModifiedDate <= Report.EndDate THEN 1 ELSE 0 END AS EndHistoryExists,
CAST( ch.Cancelled AS INT) AS HistoryCancelled,
0 AS DefaultCancel
FROM
#Report AS Report
INNER JOIN #ReportAccountDetail AS C ON Report.ReportID = C.ReportID
--Others joins related for data to return
LEFT JOIN #AccountHistory AS CH ON CH.AccountID = C.AccountID
WHERE Report.ReportID = 1
) AS x
GROUP BY AccountID, OtherData
Welcome input on writing stack overflow questions. Thanks!
ROW_NUMBER() often suprises me and out-performs my expectations. In this case, however, I'd be tempted to just use correlated sub-queries. At least, I'd test them against the alternatives.
Note: I would also use real tables, with real indexes, and a realistic volume of fake data. (If it's worth posting this question, I'm assuming that it's worth testing this realistically.)
SELECT
[Report].ReportID,
[Account].AccountID,
[Account].OtherData,
ISNULL((SELECT TOP 1 Cancelled FROM AccountHistory WHERE AccountID = [Account].AccountID AND ModifiedDate <= [Report].StartDate ORDER BY ModifiedDate DESC), 0) AS StartDateCancelled,
ISNULL((SELECT TOP 1 Cancelled FROM AccountHistory WHERE AccountID = [Account].AccountID AND ModifiedDate <= [Report].EndDate ORDER BY ModifiedDate DESC), 0) AS EndDateCancelled
FROM
Report AS [Report]
LEFT JOIN
ReportAccountDetail AS [Account]
ON [Account].ReportID = [Report].ReportID
ORDER BY
[Report].ReportID,
[Account].AccountID
Note: For whatever reason, I've found that TOP 1 and ORDER BY is faster than MAX().
In terms of your suggested answer, I'd modify it slightly to just use ISNULL instead of trying to make the Exists columns work.
I'd also join on the "other data" after all of the working out, rather than inside the inner-most query, so as to avoid having to group by all the "other data".
WITH
HistoricData AS
(
SELECT
Report.ReportID,
c.Accountid,
c.OtherData,
ROW_NUMBER() OVER (PARTITION BY c.ReportID, c.AccountID ORDER BY CASE WHEN ch.ModifiedDate <= Report.StartDate THEN 1 ELSE 0 END DESC, ch.ModifiedDate DESC) AS BeginRank,
ROW_NUMBER() OVER (PARTITION BY c.ReportID, c.AccountID ORDER BY ch.ModifiedDate DESC) AS EndRank,
CH.Cancelled
FROM
#Report AS Report
INNER JOIN
#ReportAccountDetail AS C
ON Report.ReportID = C.ReportID
LEFT JOIN
#AccountHistory AS CH
ON CH.AccountID = C.AccountID
AND CH.ModifiedDate <= Report.EndDate
)
,
FlattenedData AS
(
SELECT
ReportID,
Accountid,
OtherData,
ISNULL(MAX(CASE WHEN BeginRank = 1 THEN Cancelled END), 0) AS StartDateCancelled,
ISNULL(MAX(CASE WHEN EndRank = 1 THEN Cancelled END), 0) AS EndDateCancelled
FROM
[HistoricData]
GROUP BY
ReportID,
AccountID,
OtherData
)
SELECT
*
FROM
[FlattenedData]
LEFT JOIN
[OtherData]
ON Whatever = YouLike
WHERE
[FlattenedData].ReportID = 1
And a final possible version...
WITH
ReportStartHistory AS
(
SELECT
*
FROM
(
SELECT
[Report].ReportID,
ROW_NUMBER() OVER (PARTITION BY [Report].ReportID, [History].AccountID ORDER BY [History].ModifiedDate) AS SequenceID,
[History].*
FROM
Report AS [Report]
INNER JOIN
AccountHistory AS [History]
ON [History].ModifiedDate <= [Report].StartDate
)
AS [data]
WHERE
SequenceID = 1
)
,
ReportEndHistory AS
(
SELECT
*
FROM
(
SELECT
[Report].ReportID,
ROW_NUMBER() OVER (PARTITION BY [Report].ReportID, [History].AccountID ORDER BY [History].ModifiedDate) AS SequenceID,
[History].*
FROM
Report AS [Report]
INNER JOIN
AccountHistory AS [History]
ON [History].ModifiedDate <= [Report].EndDate
)
AS [data]
WHERE
SequenceID = 1
)
SELECT
[Report].ReportID,
[Account].*,
ISNULL([ReportStartHistory].Cancelled, 0) AS StartDateCancelled,
ISNULL([ReportEndHistory].Cancelled, 0) AS EndDateCancelled
FROM
Report AS [Report]
INNER JOIN
Account AS [Account]
LEFT JOIN
[ReportStartHistory]
ON [ReportStartHistory].ReportID = [Report].ReportID
AND [ReportStartHistory].AccountID = [Account].AccountID
LEFT JOIN
[ReportEndHistory]
ON [ReportEndHistory].ReportID = [Report].ReportID
AND [ReportEndHistory].AccountID = [Account].AccountID