How do I refer to a record in sql that immediately precedes another record in a group? - sql

I have a weird update query to write.
Here's the table
PK-ID (int) --- FK-ID (int) --- Value (int)
In my data set, if I group by FK-ID and order by PK-ID, suppose this is an example of one group:
5 --- 10 --- 23
7 --- 10 --- 49
8 --- 10 --- 81
Due to a bug in some old software, records 7 and 8 have incorrect values. The correct value for 7 is (49-23) = 26 and the correct value for 8 is (81-49) = 32. Record 5 is correct.
I need to update each record to subtract the value of the record immediately preceding it when it is grouped by FK-ID and ordered by PK-ID. If there is no preceding record I do not need to change the value.
Is there a way to write a general sql update query to accomplish this? How would I (conditionally) retrieve the value of the preceding record in the group? I'm using SQL server 2008.
Thanks!

with ordered as (
select *, rn = row_number() over (partition by fk_id order by pk_id)
from tbl
)
update cur
set value = cur.value - prior.value
from ordered cur
join ordered prior on prior.fk_id = cur.fk_id
and prior.rn = cur.rn-1;

This is what I believe to be the correct answer, using a similar idea to the previous one. The toupdate subquery calculates the values, based on the rules in the question (update records with the same foreign key and consecutive primary keys). It does assume that the ids are nuemric values of some sort.
with toupdate as (
select t.pkid, t.value - tprev.value as newval
from t join
t tprev
on t.pkid = tprev.pkid+1 and t.fkid = tprev.fkid
)
update t
set value = newvalue
from toupdate
where t.pkid = toupdate.pkid

update t set value = value -
isnull((select top 1 value
from t t2
where t2.FKID=t.FKID
and t2.PKID<t.PKID
order by PKID desc),0);
Here is a SQLFiddle demo

I hope it should return what you want(sorry, I cannot try it the moment); you just need to incorporate it with UPDATE
WITH cte1 AS
(SELECT pk_id, fk_id, value, ROW_NUMBER() OVER (PARTITION BY fk_id ORDER BY pk_id DESC)
as num
FROM your_table
)
SELECT a.*,
--CASE
-- WHEN b.pk_id IS NOT NULL THEN a.value-b.value
-- ELSE 0 END
a.value-b.value as valid_number
FROM cte1 a
--LEFT JOIN cte1 b ON (b.fk_id = a.fk_id AND b.num = a.num-1)
INNER JOIN cte1 b ON (b.fk_id = a.fk_id AND b.num = a.num-1)

Related

TSQL syntax to feed results into subquery

I'm after some help on how best to write a query that does the following. I think I need a subquery but I don't know how to use the data returned in the row to feed back into the subquery without hardcoding values? A subquery may not be the right thing here?
Ideally I only want 1 variable ...WHERE t_Date = '2018-01-01'
Desired Output:
The COUNT Criteria column has the following rules
Date < current row
Area = current row
Name = current row
Value = 1
For example, the first row indicates there are 2 records with Date < '2018-01-01' AND Area = 'Area6' AND Name = 'Name1' AND Value = 1
Example Data:
SQLFiddle: http://sqlfiddle.com/#!18/92ba3/4
Effectively I only want to return the first 2 rows but summarise the historic data into a column based on the output in that column.
The right way to do this is to use the cumulative sum functionality in ANSI SQL and SQL Server since 2012:
select t.*,
sum(case when t.value = 1 then 1 else 0 end) over (partition by t_area, t_name order by t_date)
from t;
This actually includes the current row. If you have only one row per date (for the area/name combo), then you can just subtract it or use a windowing clause:
select t.*,
sum(case when t.value = 1 then 1 else 0 end) over
(partition by t_area, t_name
order by t_date
rows between unbounded preceding and 1 preceding
)
from t;
Use a self join to find records in the same table that are related to a particular record:
SELECT t1.t_Date, t1.t_Area, t1.t_Name, t1.t_Value,
COUNT(t2.t_Name) AS COUNTCriteria
FROM Table1 as t1
LEFT OUTER JOIN Table1 as t2
ON t1.t_Area=t2.t_Area
AND t1.t_Name=t2.T_Name
AND t2.t_Date<t1.t_Date
AND t2.t_Value=1
GROUP BY t1.t_Date, t1.t_Area, t1.t_Name, t1.t_Value

Querying database for records that don't have a 2nd record canceling it out

I have a table where I'm trying to find a set of particular records. Here's what my table looks like...
tblA
ID VouchID Action Amount
1 177-17 Add 700
2 177-17 Update 1
3 198-01 Add 600
4 198-01 Update 620
So what happens here, is if a record was canceled/deleted, the action would be 'Update' and Amount would be updated to 1. In other words, the VouchID = 177-17, would not be counted/be selected in this query...
What I'm hoping to do here is only select records, that don't have a corresponding Update record with Amount = 1
Select distinct vouchID where Action='add'
However, this query does not take under consideration VoucherID's that have an 'update' action. Update action can be applied in two instances, in VouchID 177-17 the amount = 1 on action='update' that means, that the ADD action does not count, it's almost as if we removed the record all together (it's just there for record keeping). Another Update in case of VoucherID = 198-01, the update line and amount = 620, means that the Amount was updated by 20 to 620, that record i hope to be able to see in my end reuslt
Desired end result from above table:
ID VouchID Action Amount
3 198-01 Add 600
You could use LEAD (SQL Server 2012 and above):
WITH cte AS (
SELECT *, LEAD(Amount) OVER(PARTITION BY VouchID ORDER BY ID) AS next_amount
FROM table
)
SELECT *
FROM cte
WHERE (next_amount <> 1 OR next_amount IS NULL) AND Action='add';
EDIT
non-recursive CTE can be always replaced with simple subquery:
SELECT *
FROM (SELECT *,
LEAD(Amount) OVER(PARTITION BY VouchID ORDER BY ID) AS next_amount
FROM table) sub
WHERE (next_amount <> 1 OR next_amount IS NULL) AND Action='add';
EDIT:
Using EXISTS:
SELECT *
FROM table t1
WHERE Action='add'
AND NOT EXISTS (SELECT TOP 1
FROM table t2
WHERE t1.VouchId = t2.VouchId
AND Action='Update'
AND Amount = 1
ORDER BY ID ASC);
What I'm hoping to do here is only select records, that don't have a
corresponding Update record with Amount = 1
Seems easy enough with NOT EXISTS():
Select distinct vouchID FROM MyTable t1 where Action='add'
AND NOT EXISTS(SELECT * FROM MyTable t2
WHERE Action='Update'
AND Amount=1
AND t2.VouchId=t1.VouchId
Are you using SQL Server 2008 or better? If you are, I would try something like :
SELECT
ID, vouchID, Action, Amount
FROM tblA s
WHERE
Action='add'
AND NOT EXISTS(Select 1 from tblA l where l.vouchID = s.vouchID and l.Action = 'Update' and l.Amount = 1);

Getting row number for query

I have a query which will return one row. Is there any way I can find the row index of the row I'm querying when the table is sorted?
I've tried rowid but got #582 when I was expecting row #7.
Eg:
CategoryID Name
I9GDS720K4 CatA
LPQTOR25XR CatB
EOQ215FT5_ CatC
K2OCS31WTM CatD
JV5FIYY4XC CatE
--> C_L7761O2U CatF <-- I want this row (#5)
OU3XC6T19K CatG
L9YKCYAYMG CatH
XKWMQ7HREG CatI
I've tried rowid with unexpected results:
SELECT rowid FROM Categories WHERE CategoryID = 'C_L7761O2U ORDER BY Name
EDIT: I've also tried J Cooper's suggestion (below), but the row numbers just aren't right.
using (var cmd = conn.CreateCommand()) {
cmd.CommandText = string.Format(#"SELECT (SELECT COUNT(*) FROM Recipes AS t2 WHERE t2.RecipeID <= t1.RecipeID) AS row_Num
FROM Recipes AS t1
WHERE RecipeID = 'FB3XSAXRWD'
ORDER BY Name";
cmd.Parameters.AddWithValue("#recipeId", id);
idx = Convert.ToInt32(cmd.ExecuteScalar());
Here is a way to get the row number in Sqlite:
SELECT CategoryID,
Name,
(SELECT COUNT(*)
FROM mytable AS t2
WHERE t2.Name <= t1.Name) AS row_Num
FROM mytable AS t1
ORDER BY Name, CategoryID;
Here's a funny trick you can use in Spatialite to get the order of values. If you use the count() function with a WHERE clause limiting to only values >= the current value, then the count will actually give the order. So if I have a point layer called "mypoints" with columns "value" and "val_order" then:
SELECT value, (
SELECT count(*) FROM mypoints AS my
WHERE my.value>=mypoints.value) AS val_order
FROM mypoints
ORDER BY value DESC;
Gives the descending order of the values.
I can update the "val_order" column this way:
UPDATE mypoints SET val_order = (
SELECT count(*) FROM mypoints AS my
WHERE my.value>=mypoints.value
);
What you are asking can be explained in two different ways, but I'm assuming you want to sort the resulting table and then number those rows according to the sort.
declare #resultrow int
select
#resultrow = row_number() OVER (ORDER BY Name Asc) as 'Row Number'
from Categories WHERE CategoryID = 'C_L776102U'
select #resultrow

SQL if breaking number pattern, mark record?

I have the following query:
SELECT AccountNumber, RptPeriod
FROM dbo.Report
ORDER BY AccountNumber, RptPeriod.
I get the following results:
123 200801
123 200802
123 200803
234 200801
344 200801
344 200803
I need to mark the record where the rptperiod doesnt flow concurrently for the account. For example 344 200803 would have an X next to it since it goes from 200801 to 200803.
This is for about 19321 rows and I want it on a company basis so between different companies I dont care what the numbers are, I just want the same company to show where there is breaks in the number pattern.
Any Ideas??
Thanks!
OK, this is kind of ugly (double join + anti-join) but it gets the work done, AND is pure portable SQL:
SELECT *
FROM dbo.Report R1
, dbo.Report R2
WHERE R1.AccountNumber = R2.AccountNumber
AND R2.RptPeriod - R1.RptPeriod > 1
-- subsequent NOT EXISTS ensures that R1,R2 rows found are "next to each other",
-- e.g. no row exists between them in the ordering above
AND NOT EXISTS
(SELECT 1 FROM dbo.Report R3
WHERE R1.AccountNumber = R3.AccountNumber
AND R2.AccountNumber = R3.AccountNumber
AND R1.RptPeriod < R3.RptPeriod
AND R3.RptPeriod < R2.RptPeriod
)
Something like this should do it:
-- cte lists all items by AccountNumber and RptPeriod, assigning an ascending integer
-- to each RptPeriod and restarting at 1 for each new AccountNumber
;WITH cte (AccountNumber, RptPeriod, Ranking)
as (select
AccountNumber
,RptPeriod
,row_number() over (partition by AccountNumber order by AccountNumber, RptPeriod) Ranking
from dbo.Report)
-- and then we join each row with each preceding row based on that "Ranking" number
select
This.AccountNumber
,This.RptPeriod
,case
when Prior.RptPeriod is null then '' -- Catches the first row in a set
when Prior.RptPeriod = This.RptPeriod - 1 then '' -- Preceding row's RptPeriod is one less that This row's RptPeriod
else 'x' -- -- Preceding row's RptPeriod is not less that This row's RptPeriod
end UhOh
from cte This
left outer join cte Prior
on Prior.AccountNumber = This.AccountNumber
and Prior.Ranking = This.Ranking - 1
(Edited to add comments)
WITH T
AS (SELECT *,
/*Each island of contiguous data will have
a unique AccountNumber,Grp combination*/
RptPeriod - ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) Grp,
/*RowNumber will be used to identify first record
per company, this should not be given an 'X'. */
ROW_NUMBER() OVER (PARTITION BY AccountNumber
ORDER BY RptPeriod ) AS RN
FROM Report)
SELECT AccountNumber,
RptPeriod,
/*Check whether first in group but not first over all*/
CASE
WHEN ROW_NUMBER() OVER (PARTITION BY AccountNumber, Grp
ORDER BY RptPeriod) = 1
AND RN > 1 THEN 'X'
END AS Flag
FROM T
SELECT *
FROM report r
LEFT JOIN report r2
ON r.accountnumber = r.accountnumber
AND {r2.rptperiod is one day after r.rptPeriod}
JOIN report r3
ON r3.accountNumber = r.accountNumber
AND r3.rptperiod > r1.rptPeriod
WHERE r2.rptPeriod IS NULL
AND r3 IS NOT NULL
I'm not sure of sql servers date logic syntax, but hopefully you get the idea. r will be all the records where the next rptPeriod is NULL (r2) and there exists at least one greater rptPeriod (r3). The query isn't super straight forward I guess, but if you have an index on the two columns, it'll probably be the most efficent way to get your data.
Basically, you number rows within every account, then, using the row numbers, compare the RptPeriod values for the neighbouring rows.
It is assumed here that RptPeriod is the year and month encoded, for which case the year transition check has been added.
;WITH Report_sorted AS (
SELECT
AccountNumber,
RptPeriod,
rownum = ROW_NUMBER() OVER (PARTITION BY AccountNumber ORDER BY RptPeriod)
FROM dbo.Report
)
SELECT
AccountNumber,
RptPeriod,
CASE ISNULL(CASE WHEN r1.RptPeriod / 100 < r2.RptPeriod / 100 THEN 12 ELSE 0 END
+ r1.RptPeriod - r2.RptPeriod, 1) AS Chk
WHEN 1 THEN ''
ELSE 'X'
END
FROM Report_sorted r1
LEFT JOIN Report_sorted r2
ON r1.AccountNumber = r2.AccountNumber AND r1.rownum = r2.rownum + 1
It could be complicated further with an additional check for gaps spanning a year and more, if you need that.

Variant use of the GROUP BY clause in TSQL

Imagine the following schema and sample data (SQL Server 2008):
OriginatingObject
----------------------------------------------
ID
1
2
3
ValueSet
----------------------------------------------
ID OriginatingObjectID DateStamp
1 1 2009-05-21 10:41:43
2 1 2009-05-22 12:11:51
3 1 2009-05-22 12:13:25
4 2 2009-05-21 10:42:40
5 2 2009-05-20 02:21:34
6 1 2009-05-21 23:41:43
7 3 2009-05-26 14:56:01
Value
----------------------------------------------
ID ValueSetID Value
1 1 28
etc (a set of rows for each related ValueSet)
I need to obtain the ID of the most recent ValueSet record for each OriginatingObject. Do not assume that the higher the ID of a record, the more recent it is.
I am not sure how to use GROUP BY properly in order to make sure the set of results grouped together to form each aggregate row includes the ID of the row with the highest DateStamp value for that grouping. Do I need to use a subquery or is there a better way?
You can do it with a correlated subquery or using IN with multiple columns and a GROUP-BY.
Please note, simple GROUP-BY can only bring you to the list of OriginatingIDs and Timestamps. In order to pull the relevant ValueSet IDs, the cleanest solution is use a subquery.
Multiple-column IN with GROUP-BY (probably faster):
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
(V.OriginatingID, V.DateStamp) IN
(
SELECT OriginatingID, Max(DateStamp)
FROM ValueSet
GROUP BY OriginatingID
)
Correlated Subquery:
SELECT O.ID, V.ID
FROM Originating AS O, ValueSet AS V
WHERE O.ID = V.OriginatingID
AND
V.DateStamp =
(
SELECT Max(DateStamp)
FROM ValueSet V2
WHERE V2.OriginatingID = O.ID
)
SELECT OriginatingObjectID, id
FROM (
SELECT id, OriginatingObjectID, RANK() OVER(PARTITION BY OriginatingObjectID
ORDER BY DateStamp DESC) as ranking
FROM ValueSet)
WHERE ranking = 1;
This can be done with a correlated sub-query. No GROUP-BY necessary.
SELECT
vs.ID,
vs.OriginatingObjectID,
vs.DateStamp,
v.Value
FROM
ValueSet vs
INNER JOIN Value v ON v.ValueSetID = vs.ID
WHERE
NOT EXISTS (
SELECT 1
FROM ValueSet
WHERE OriginatingObjectID = vs.OriginatingObjectID
AND DateStamp > vs.DateStamp
)
This works only if there can not be two equal DateStamps for a OriginatingObjectID in the ValueSet table.