Assign session number to a series of transactions - sql

CREATE TABLE [Transaction](
[TransactionID] [bigint] IDENTITY(1,1) NOT NULL,
[LocationID] [int] NOT NULL,
[KioskID] [int] NOT NULL,
[TransactionDate] [datetime] NOT NULL,
[TransactionType] [varchar](7) NOT NULL,
[Credits] [int] NOT NULL,
[StartingBalance] [int] NULL,
[EndingBalance] [int] NULL,
[SessionID] [int] NULL
);
Please refer to this fiddle for the sample data:
Link to SQL Fiddle
I'm trying to figure out if there is a way to assign a session number to a sequence of transactions in a single update.
A "Session" is defined as a number of deposits and purchases ending with a withdrawal. A Session has sequential transactions consisting of:
1 to n deposits (TransactionType = 'D'),
0 to n purchases (TransactionType = 'P') and
0 or 1 withdrawals (TransactionType = 'W')
With the same LocationID and KioskID. A session can end with a 0 balance or a withdrawal. First deposit with no session starts one. Only P transactions have balances. For D and W they are NULL.
LocationID, KioskID, SessionID must be unique.
I'm really hoping that there is a SQL way of doing this. I'd hate to have to loop through hundreds of millions of transactions to set sessions procedurally.

This should do it:
;WITH markSessions as
(
SELECT *,
CASE
WHEN TransactionType='W' THEN 1
WHEN TransactionType='P' And EndingBalance=0 THEN 1
ELSE 0 END As SessionEnd
FROM Transactions
)
SELECT *,
SUM(SessionEnd) OVER(PARTITION BY LocationID, KioskID ORDER BY TransactionID)
+ 1 - SessionEnd As SessionID
FROM markSessions
No triggers, cursors or client code needed.
If you actually want to set the SessionID in the table, then you'd use an UPDATE statement like this:
;WITH markSessions as
(
SELECT *,
CASE
WHEN TransactionType='W' THEN 1
WHEN TransactionType='P' And EndingBalance=0 THEN 1
ELSE 0 END As SessionEnd
FROM Transactions
)
UPDATE markSessions
SET SessionID = SUM(SessionEnd) OVER(PARTITION BY LocationID, KioskID ORDER BY TransactionID)
+ 1 - SessionEnd
I am unable to test it, but the following should take into account pre-existing SessionIDs
;WITH markSessions as
(
SELECT *,
CASE
WHEN TransactionType='W' THEN 1
WHEN TransactionType='P' And EndingBalance=0 THEN 1
ELSE 0 END As SessionEnd
FROM Transactions
)
UPDATE markSessions
SET SessionID = SUM(SessionEnd) OVER(PARTITION BY LocationID, KioskID ORDER BY TransactionID)
+ 1 - SessionEnd
+ COALESCE(MAX(SessionID) OVER (PARTITION BY LocationID, KioskID), 0)
WHERE SessionID Is NULL
Note that this will only work if all new rows (those without SessionIDs) have higher transaction IDs than the Pre-existing rows (those that already have SessionIDs). It definitely NOT work if new rows were added with TransactionIDs, lower than the highest TransactionID already assigned a SessionID.
If you may have that situation, then you likely will have to reassign the old TransactionIDs.

Related

Update multiple latest record on table using loop

CREATE TABLE [dbo].[masterTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[CID] [int] NOT NULL,
[PID] [int] NOT NULL,
[Description] [nvarchar](max) NOT NULL,
[CreatedOn] [datetime] NOT NULL,
[State] [nchar](20) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
INSERT INTO [dbo].[masterTable]
([CID]
,[PID]
,[Description]
,[CreatedOn]
)
VALUES
(189
,186
,'FC1_189'
,GETUTCDATE()),
(189
,186
,'FC2_189'
,DATEADD(D, +1, GETUTCDATE())),
(190
,186
,'FC1_190'
,DATEADD(d, +2, GETUTCDATE())),
(190
,186
,'FC2_190'
,DATEADD(d, +3, GETUTCDATE())),
(191
,186
,'FC1_191'
,DATEADD(d, +4, GETUTCDATE())),
(191
,186
,'FC2_191'
,DATEADD(d, +5, GETUTCDATE()))
I have a table with 6 records I am trying to update the latest created record based on CID and PID data with the state 'Latest data' and other old created on should update state with 'old data' I tried this but for old created on data not working. check the below query.
Explanation: I currently have 6 records based on CID and PID data.
FC2_189 row is the latest record based on createdOn column so, for this row state column should update with 'latest data' and the other record FC1_189 are old records based on created on date, so compared to FC2_189 row this record is old data based on createdon column so state column should update with 'old data'.
same should happen with FC2_190,FC1_190 and FC2_191,FC1_191 data
SELECT Description,
STATE,
CreatedOn,
PID,
CID,
ROW_NUMBER() OVER (
PARTITION BY ContractID ORDER BY CreatedOn DESC
) contractRN
INTO #ControlTable
FROM masterTable
DECLARE #i INT = 1
DECLARE #count INT
SELECT #count = Count(*)
FROM #ControlTable
WHILE #i <= #count
BEGIN
UPDATE masterTable SET State =
CASE WHEN CreatedOn = (SELECT MAX(CreatedOn) FROM masterTable)
THEN 'Latest data'
ELSE 'Old data'
END
SET #i = #i + 1
END
DROP TABLE #ControlTable
You don't need a loop or joins at all. You can simply calculate a row-number inside a CTE, then update the CTE.
WITH cte AS (
SELECT *,
rn = ROW_NUMBER() OVER (PARTITION BY PID, CID ORDER BY CreatedOn DESC)
FROM masterTable
)
UPDATE cte
SET State = CASE WHEN rn = 1
THEN 'Latest data'
ELSE 'Old data'
END;

get trip period in sqlserver

I have a GPS app that saves data to the following table:
CREATE TABLE [dbo].[T_Tracking]
(
[id] [int] IDENTITY(1,1) NOT NULL,
[IMEI] [nvarchar](50) NULL,
[TrackTime] [datetime] NULL,
[Longitude] [nvarchar](50) NULL,
[Lattitude] [nvarchar](50) NULL,
[speed] [float] NULL,
[CarID] [int] NULL,
[Country] [nvarchar](50) NULL,
[City] [nvarchar](50) NULL,
[Area] [nvarchar](50) NULL,
[Street] [nvarchar](50) NULL,
[FullAddress] [nvarchar](150) NULL,
[Distance] [float] NULL
-- ...
)
I want to make a trip query pulling back start time & speed, and end time & speed.
This is my query:
SELECT id
, IMEI
, TrackTime as StartTime
, speed as StartSpeed
, CarID
, FullAddress
, (
SELECT TOP (1) TrackTime AS Expr1
FROM T_Tracking AS E2
WHERE (CarID = E1.CarID)
AND (id > E1.id)
AND (speed <5)
ORDER BY id desc
) AS StopTime
, (
SELECT TOP (1) speed AS Expr1
FROM T_Tracking AS E2
WHERE (CarID = E1.CarID)
AND (id > E1.id)
AND (speed <5)
ORDER BY id desc
) AS EndSpeed
FROM T_Tracking AS E1
WHERE (speed > 5)
order by id desc
It works fine, but to decide that it is the end if the trip the car should be stopping for 15 minutes (i.e. as the car might stop in traffic for a minute or 2, so we don't want that to count as the end of the trip).
How can I add this logic?
Additionally, I need to sum the distance field to get the trip total distance.
Sample Table Data:
The desired result is:
Notes:
the GPS get save a record every 30 sec
the car may be stopping in traffic so to decide it is the end of the trip it must be stopping for 15 min
stop not always speed=0 it will be speed<5 (Device accuracy / tolerance)
distance is the distance between the current point and the previous (distance in the 30 sec)
This is demo how you can get stops. For simplicity I used all INT to represent time and speed, the query can be easily adapted for DATETIME data type. The stop is logged if it's duration > 4.
SELECT CarID, stopped, grp
, min(TrackTime) stopStartTime
, max(TrackTime) + 1 - min(TrackTime) stopDuration
FROM (
SELECT id, TrackTime, CarID, stopped
, row_number() over (partition by CarID order by TrackTime) - row_number() over (partition by CarID, stopped order by TrackTime) grp
FROM (
SELECT id, TrackTime, CarID, CASE WHEN speed > 5 THEN 0 ELSE 1 END stopped
FROM (
-- Demo data
VALUES
(0,0,10, 2)
,(1,1,10, 8)
,(2,2,10, 8)
,(3,3,10, 4)
,(4,4,10, 4)
,(5,5,10, 4)
,(6,6,10, 4)
,(7,7,10, 4)
,(8,8,10, 4)
,(9,9,10, 8)
) T_Tracking (id, TrackTime, CarID, speed)
) g
) t
GROUP BY CarID, stopped, grp
HAVING max(TrackTime) + 1 - min(TrackTime) > 4
ORDER BY min(TrackTime)
Try this (untested):
select JourneyBounds.id
, JourneyBounds.IMEI
, JourneyBounds.TrackTime as StartTime
, JourneyBounds.speed as StartSpeed
, JourneyBounds.CarID
, JourneyBounds.FullAddress
, max(journey.TrackTime) StopTime
, max(case when journey.id = JourneyBounds.EndOfJourneyId then journey.speed else null end) EndSpeed
, JourneyBounds.Distance + sum(journey.Distance) TotalDistance
from (
select *
, (
select min(id)
from T_Tracking EndOfJourney
where EndOfJourney.Id > StartOfJourney.Id
and EndOfJourney.CarId = StartOfJourney.CarId
and EndOfJourney.speed < 5
--edit; car must have been stopped for 15 mins; so we need to check that the records after this stop confirm that (i.e. that the car does not move in that time)
and not exists (
select top 1 1
from T_Tracking WaitFifteenMins
where WaitFifteenMins.Id > EndOfJourney.Id
and WaitFifteenMins.TrackTime <= DateAdd(minute, 15, EndOfJourney.TrackTime)
and WaitFifteenMins.speed >= 5
)
--end of edit
) EndOfJourneyId
from T_Tracking StartOfJourney
where StartOfJourney.speed < 5
) JourneyBounds
inner join T_Tracking journey
on journey.CarId = JourneyBounds.CarId
and journey.id > JourneyBounds.Id
and journey.id <= JourneyBounds.EndOfJourneyId
group by JourneyBounds.id
, JourneyBounds.IMEI
, JourneyBounds.TrackTime
, JourneyBounds.speed
, JourneyBounds.CarID
, JourneyBounds.FullAddress
, JourneyBounds.Distance
having count(1) > 1
The JourneyBounds subquery gets all records with speed < 5 (i.e. the start records for any potential journeys). Additionally it pulls back the id of the first stop after that start (i.e. the first record with a greater id which has a speed of less than 5).
It then does an inner join pulling back all records for the same car which come after the start time, up to & including the end record for this journey. We then calculate the distance by summing all distances of the records on this journey.
The having count(1) > 1 on the end just says if the car's start record is immediately followed by a stop, we can assume it's not moved / there was no journey. Presumably we don't want those non-journeys in our results.

Query optimization for convert VARBINARY to VARCHAR and charindex on it

I have a repository table which has around 18.7 million rows and every month around 500 thousand to 100 thousand rows are added. The table structure is as follows
CREATE TABLE [dbo].[my_table](
[id] [bigint] NULL,
[a_timestamp] [datetime] NULL,
[eventId] [bigint] NULL,
[userId] [varchar](255) NULL,
[customerid] [varchar](128) NULL,
[messageType] [varchar](100) NULL,
[message] [varbinary](max) NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
I have written the following query to get various counts for each month. The query takes around 10 minutes to execute now. I need help to optimize this query and if possible to bring the time to a couple of mins.
SELECT DATEADD(month, DATEDIFF(month, 0,a_timestamp), 0) AS MonthYear,
COUNT(*) AS [Count],
COUNT(DISTINCT customerid) AS [Unique Customers],
COUNT(DISTINCT userId) AS [Unique Users]
FROM [my_table]
WHERE messageType = 'Outbound'
AND userId NOT IN ('master', 'admin')
AND CHARINDEX('Retrieve Document',CONVERT(VARCHAR(MAX),[message])) > 1
GROUP BY DATEADD(month, DATEDIFF(month, 0,a_timestamp), 0)
ORDER BY MonthYear
I think the key reasons for the long execution time are as follows
CHARINDEX('Retrieve Document',CONVERT(VARCHAR(MAX),[message])) > 1 converting from VARBINARY to VARCHAR and searching if 'Retrieve Document'
userId NOT IN ('master', 'admin') filtering users other than the users in the list (the actual list is longer than 2 strings around 10 strings)
18.7 million rows in the table
A couple of points to note
I don't create this table and I can't change it
I don't have SHOWPLAN permission
I need to use this query in Excel data connections and have the user run it from excel. The user will have only select privileges.
Given that you cannot change the existing table, it may be better to change your strategy.
Instead of running your query and building a new set of results completely every time. Why don't you insert new results into another table (lets call it AccumulatedResults) on a monthly basis.
That way you are only handling the 500K new recs each time. This will be much faster than rebuilding the entire result set every time. The query will look a little like:
INSERT INTO AccumulatedResults
(
MonthYear,
[COUNT],
UniqueCustomers,
UniqueUsers,
)
SELECT
DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0) AS MonthYear,
COUNT(*) AS [Count],
COUNT(DISTINCT customerid) AS [Unique Customers],
COUNT(DISTINCT userId) AS [Unique Users]
FROM
[my_table]
WHERE
messageType = 'Outbound' AND
userId NOT IN ('master', 'admin') AND
CHARINDEX('Retrieve Document', CONVERT(VARCHAR(MAX), [message])) > 1
-- This is a new condition
AND DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0)
> (SELECT MAX(MonthYear) FROM AccumulatedResults)
GROUP BY
DATEADD(month, DATEDIFF(month, 0, a_timestamp), 0)

Duplicate a row multiple times

Basically I want to duplicate a row a variable number of times.
I have a table with the following structure:
CREATE TABLE [dbo].[Start](
[ID] [int] NOT NULL,
[Apt] [int] NOT NULL,
[Cost] [int] NOT NULL)
I want to duplicate each row in this table (Apt-1) times so in the end there will be #Apt rows. Moreover for each new row the value of Cost is decremented until it reaches 0. ID will be the same as there are no primary keys. If I have a record like this:
1 5 3
I need 4 new rows inserted in the same table and they should look like this
1 5 2
1 5 1
1 5 0
1 5 0
I have tried so far a lot of ways but I cannot make it work. Many thanks!
try this
DECLARE #Start TABLE (
[ID] [int] NOT NULL,
[Apt] [int] NOT NULL,
[Cost] [int] NOT NULL)
INSERT #Start (ID, Apt, Cost)
VALUES (1, 5, 3)
; WITH CTE_DIGS AS (
SELECT ROW_NUMBER() OVER(ORDER BY (SELECT 1)) AS rn
FROM master.sys.all_columns AS a
)
INSERT #Start (ID, Apt, Cost)
SELECT ID, Apt, CASE WHEN Cost - rn < 0 THEN 0 ELSE Cost - rn END
FROM #Start
INNER JOIN CTE_DIGS
ON Apt > rn
Try:
;with cte as
(select [ID], [Apt], [Cost], 1 counter from [Start]
union all
select [ID],
[Apt],
case sign([Cost]) when 1 then [Cost]-1 else 0 end [Cost],
counter+1 counter
from cte where counter < [Apt])
select [ID], [Apt], [Cost]
from cte

Remove duplicate row and update next row to current row and continue

I need a select query ..
Environment : SQL DBA -SQL SERVER 2005 or newer
Example :
In this sample table, if I select top 20 no duplicate records should come and next record should be in 20 records .
Example :
123456 should not repeat in 20 records and if 18th is duplicate, in place of 18th, 19th record should come and in 19th—20th should come, in 20th ---21st should come .
No concern of Asc or Desc for rows .
Lookup Table before
Id Name
123456 hello
123456 hello
123654 hi
123655 yes
LookUp Table after
Id Name
123456 hello
123654 hi
123655 yes
My table:
CREATE TABLE [dbo].[test](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ContestId] [int] NOT NULL,
[PrizeId] [int] NOT NULL,
[ContestParticipantId] [int] NOT NULL,
[SubsidiaryAnswer] [varchar](256) NOT NULL,
[SubsidiaryDifference] [bigint] NOT NULL,
[AttemptTime] [datetime] NOT NULL,
[ParticipantName] [varchar](250) NOT NULL,
[IsSubscribed] [bit] NOT NULL,
[IsNewlyRegistered] [bit] NOT NULL,
[IsWinner] [bit] NOT NULL,
[IsWinnerConfirmed] [bit] NOT NULL,
[IsWinnerExcluded] [bit] NOT NULL) ON [PRIMARY]
My question is: from this select, we actually need the first 20, but unique ones.
SELECT TOP 20 * FROM test order by SubsidiaryDifference
When we do the above query, we have currently some double in there. In case there is a double, we need take them only 1 time and take the next one
Any one know this issue ?
Thanks in advance :)
Reading your question, it appears you don't really want to delete the rows from the table - you just want to display the TOP 20 distinct rows - you try something like this:
;WITH LastPerContestParticipantId AS
(
SELECT
ContestParticipantId,
-- add whatever other columns you want to select here
ROW_NUMBER() OVER(PARTITION BY ContestParticipantId
ORDER BY SubsidiaryDifference) AS 'RowNum'
FROM dbo.Test
)
SELECT TOP (20)
ContestParticipantId,
-- add whatever other columns you want to select here
SubsidiaryDifference
FROM
LastPerContestParticipantId
WHERE
RowNum = 1
This will show you the most recent row for each distinct ContestParticipantId, order by SubsidiaryDifference - try it!
Update #2: I've created a quick sample - it uses the data from your original post - plus an additional SubID column so that I can order rows of the same ID by something...
When I run this with my CTE query, I do get only one entry for each ID - so what exactly is "not working" for you?
DECLARE #test TABLE (ID INT, EntryName VARCHAR(50), SubID INT)
INSERT INTO #test
VALUES(123456, 'hello', 1), (123456, 'hello', 2), (123654, 'hi', 1), (123655, 'yes', 3)
;WITH LastPerId AS
(
SELECT
ID, EntryName,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SubID DESC) AS 'RowNum'
FROM #test
)
SELECT TOP (3)
ID, EntryName
FROM
LastPerId
WHERE
RowNum = 1
Gives an output of:
ID EntryName
123456 hello
123654 hi
123655 yes
No duplicates.