How to increment a value in SQL based on a unique key - sql

Apologies in advance if some of the trigger solutions already cover this but I can't get them to work for my scenario.
I have a table of over 50,000 rows, all of which have an ID, with roughly 5000 distinct ID values. There could be 100 rows with an instrumentID = 1 and 50 with an instrumentID = 2 within the table etc but they will have slightly different column entries. So I could write a
SELECT * from tbl WHERE instrumentID = 1
and have it return 100 rows (I know this is easy stuff but just to be clear)
What I need to do is form an incrementing value for each time a instrument ID is found, so I've tried stuff like this:
IntIndex INT IDENTITY(1,1),
dDateStart DATE,
IntInstrumentID INT,
IntIndex1 AS IntInstrumentID + IntIndex,
at the table create step.
However, I need the IntIndex1 to increment when an instrumentID is found, irrespective of where the record is found in the table so that it effectively would provide a count of the records just by looking at the last IntIndex1 value alone. Rather than what the above does which is increment on all of the rows of the table irrespective of the instrumentID so you would get 5001,4002,4003 etc.
An example would be: for intInstruments 5000 and 4000
intInstrumentID | IntIndex1
--------- ------------------
5000 | 5001
5000 | 5002
4000 | 4001
5000 | 5003
4000 | 4002
The reason I need to do this is because I need to join two tables based on these values (a start and end date for each instrumentID). I have tried GROUP BY etc but this can't work in both tables and the JOIN then doesn't work.
Many thanks

I'm not entirely sure I understand your problem, but if you just need IntIndex1 to join to, could you just join to the following query, rather than trying to actually keep the calculated value in the database:
SELECT *,
intInstrumentID + RANK() OVER(PARTITION BY intInstrumentID ORDER BY dDateStart ASC) AS IntIndex1
FROM tbl
Edit: If I understand your comment correctly (which is not certain!), then presumably, you know that your end date and start date tables have the exact same number of rows, which leads to a one to one mapping between them based on thir respective end dates within instrument id?
If that's the case then maybe this join is what you are looking for:
SELECT SD.intInstrumentID, SD.dDateStart, ED.dEndDate
FROM
(
SELECT intInstrumentID,
dStartDate,
RANK() OVER(PARTITION BY intInstrumentID ORDER BY dDateStart ASC) AS IntIndex1
FROM tblStartDate
) SD
JOIN
(
SELECT intInstrumentID,
dEndDate,
RANK() OVER(PARTITION BY intInstrumentID ORDER BY dEndDate ASC) AS IntIndex1
FROM tblStartDate
) ED
ON SD.intInstrumentID = ED.intInstrumentID
AND SD.IntIndex1 = ED.IntIndex1
If not, please will you post some example data for both tables and the expected results?

Related

Updating column according to index within group

In our databases we have a table called conditions which references a table called attributes.
So it looks like this (ignoring some other columns that aren't relevant to the question)
id
attribute_id
execution_index
1
1000
1
2
1000
2
3
1000
1
4
2000
1
5
2000
2
6
2000
2
In theory the combination of attribute_id and execution_index should always be unique, but in practice they're not, and the software ends up essentially using the id to decide which comes first between two conditions with the same execution index. We want to add a uniqueness constraint to the table, but before we do that we need to update the execution indexes. So essentially we want to group them by attribute_id, order them by execution_index then id, and give them new execution indexes so that it becomes
id
attribute_id
execution_index
1
1000
1
2
1000
3
3
1000
2
4
2000
1
5
2000
2
6
2000
3
I'm not sure how to do this without just ordering by attribute_id, execution_index, id and then iterating through incrementing the execution_index by 1 each time and resetting it to be 1 whenever the attribute_id changes. (That would work but it'd be slow and someone is going to have to run this script on several dozen databases so I'd rather it didn't take more than a couple of seconds per database.)
Really I'd like to do something along the lines of
UPDATE c
SET c.execution_index = [this needs to be the index within the group somehow]
FROM condities c
GROUP BY c.attribute_id
ORDER BY c.execution_index asc, c.id asc
But I don't know how to make that actually work.
It looks like you can use an updatable CTE:
with cte as (
select *,
Row_Number() over(partition by attribute_id order by execution_index, id) new
from conditions
)
update cte set execution_index = new
I would suggest adding a new column and first updating that and checking the results are as expected.
Example Fiddle
WITH cte AS
(
SELECT
*,
ROW_NUMBER() OVER
(
PARTITION BY attribute_id
ORDER BY execution_index, id
) AS RowNum
FROM condities
)
UPDATE cte
SET execution_index = RowNum

The nearest row in the other table

One table is a sample of users and their purchases.
Structure:
Email | NAME | TRAN_DATETIME (Varchar)
So we have customer email + FirstName&LastName + Date of transaction
and the second table that comes from second system contains all users, they sensitive data and when they got registered in our system.
Simplified Structure:
Email | InstertDate (varchar)
My task is to count minutes difference between the rows insterted from sale(first table)and the rows with users and their sensitive data.
The issue is that second table contain many rows and I want to find the nearest in time row that was inserted in 2nd table, because sometimes it may be a few minutes difeerence(delay or opposite of delay)and sometimes it can be a few days.
So for x email I have row in 1st table:
E_MAIL NAME TRAN_DATETIME
p****#****.eu xxx xxx 2021-10-04 00:03:09.0000000
But then I have 3 rows and the lastest is the one I want to count difference
Email InstertDate
p****#****.eu 2021-05-20 19:12:07
p****#****.eu 2021-05-20 19:18:48
p****#****.eu 2021-10-03 18:32:30 <--
I wrote that some query, but I have no idea how to match nearest row in the 2nd table
SELECT DISTINCT TOP (100)
,a.[E_MAIL]
,a.[NAME]
,a.[TRAN_DATETIME]
,CASE WHEN b.EMAIL IS NOT NULL THEN 'YES' ELSE 'NO' END AS 'EXISTS'
,(ABS(CONVERT(INT, CONVERT(Datetime,LEFT(a.[TRAN_DATETIME],10),120))) - CONVERT(INT, CONVERT(Datetime,LEFT(b.[INSERTDATE],10),120))) as 'DateAccuracy'
FROM [crm].[SalesSampleTable] a
left join [crm].[SensitiveTable] b on a.[E_MAIL]) = b.[EMAIL]
Totally untested: I'd need sample data and database the area of suspect is the casting of dates and the datemath.... since I dont' know what RDBMS and version this is.. consider the following "pseudo code".
We assign a row number to the absolute difference in seconds between the dates those with rowID of 1 win.
WTIH CTE AS (
SELECT A.*, B.* row_number() over (PARTITION BY A.e_mail
ORDER BY abs(datediff(second, cast(Tran_dateTime as Datetime), cast(InsterDate as DateTime)) desc) RN
FROM [crm].[SalesSampleTable] a
LEFT JOIN [crm].[SensitiveTable] b
on a.[E_MAIL] = b.[EMAIL])
SELECT * FROM CTE WHERE RN = 1

Summing numbers don't match up?

I have a table with data like this (acc_v in query):
value
id
100
1
300
2
200
1
As you can see there are multiple values per id. I want to sum all of the values across the ids and end up with this:
value
id
300
1
300
2
;with accV as (SELECT
d_id
,[period_end_date]
,max(a_value) as value
,id
FROM bh
where period_end_date = '2021-6-30'
group by d_id, period_end_date, id)
SELECT bh.id, sum(value)
FROM bh join accV on accV.id = bh.id
group by bh.id
For some reason the total values are adding up to significantly larger amounts than they should be. I verified this by taking the original values and summing them in excel. If anyone knows what I am doing wrong the help is much appreciated.
You can use window functions:
select id, sum(sum(value)) over () as total_sum
from t
group by id;

SQL Remove Duplicates, save lowest of certain column

I've been looking for an answer to this but couldn't find anything the same as this particular situation.
So I have a one table that I want to remove duplicates from.
__________________
| JobNumber-String |
| JobOp - Number |
------------------
So there are multiples of these two values, together they make the key for the row. I want keep all distinct job numbers with the lowest job op. How can I do this? I've tried a bunch of things, mainly trying the min function, but that only seems to work on the entire table not just the JobNumber sets. Thanks!
Original Table Values:
JobNumber Jobop
123 100
123 101
456 200
456 201
780 300
Code Ran:
DELETE FROM table
WHERE CONCAT(JobNumber,JobOp) NOT IN
(
SELECT CONCAT(JobNumber,MIN(JobOp))
FROM table
GROUP BY JobNumber
)
Ending Table Values:
JobNumber Jobop
123 100
456 200
780 300
With SQL Server 2008 or higher you can enhance the MIN function with an OVER clause specifying a PARTITION BY section.
Please have a look at https://msdn.microsoft.com/en-us/library/ms189461.aspx
You can simply select the values you want to keep:
select jobOp, min(number) from table group by jobOp
Then you can delete the records you don't want:
DELETE t FROM table t
left JOIN (select jobOp, min(number) as minnumber from table group by jobOp ) e
ON t.jobob = e.jobob and t.number = e.minnumber
Where e.jobob is null
I like to do this with window functions:
with todelete as (
select t.*, min(jobop) over (partition by numbers) as minjop
from table t
)
delete from todelete
where jobop > minjop;
It sounds like you are not using the correct GROUP BY clause when using the MIN function. This sql should give you the minimum JobOp value for each JobNumber:
SELECT JobNumber, MIN(JobOp) FROM test.so_test GROUP BY JobNumber;
Using this in a subquery, along with CONCAT (this is from MySQL, SQL Server might use different function) because both fields form your key, gives you this sql:
SELECT * FROM so_test WHERE CONCAT(JobNumber,JobOp)
NOT IN (SELECT CONCAT(JobNumber,MIN(JobOp)) FROM test.so_test GROUP BY JobNumber);

How to efficiently get a value from the last row in bulk on SQL Server

I have a table like so
Id | Type | Value
--------------------
0 | Big | 2
1 | Big | 3
2 | Small | 3
3 | Small | 3
I would like to get a table like this
Type | Last Value
--------------------
Small | 3
Big | 3
How can I do this. I understand there is an SQL Server method called LAST_VALUE(...) OVER .(..) but I can't get this to work with GROUP BY.
I've also tried using SELECT MAX(ID) & SELECT TOP 1.. but this seems a bit inefficient since there would be a subquery for each value. The queries take too long when the table has a few million rows in it.
Is there a way to quickly get the last value for these, perhaps using LAST_VALUE?
You can do it using rownumber:
select
type,
value
from
(
select
type,
value,
rownumber() over (partition by type order by id desc) as RN
) TMP
where RN = 1
Can't test this now since SQL Fiddle doesn't seem to work, but hopefully that's ok.
The most efficient method might be not exists, which uses an anti-join for the underlying operator:
select type, value
from likeso l
where not exists (select 1 from likeso l2 where l2.type = l.type and l2.id > l.id)
For performance, you want an index on likeso(type, id).
I really wonder if there is more efficent solution but, I use following query on such needs;
Select Id, Type, Value
From ( Select *, Max (Id) Over (Partition By Type) As LastId
From #Table) T
Where Id = LastId