SQL identifying where the previous row number contained xyz

SQL identifying where the previous row number contained xyz - sql

I wanted to know how I can make a query in SQL where I have pulled out some data and given row_numbers (partitioned by a trade id) and when I pull a item that has the specific action 'AMEND' done on it (e.g. if row 2 record had the AMEND action) then I want to see what the row 3 action was.
If row 3 action was 'UPDATE' then I want a new field to say 'REMOVE' on both and if there was either no row 3 for that trade or row 3 action was not UPDATE then I want the new field to say 'KEEP'
Is this easily done?
Thanks

You have to self-reference that table so you will be able to pull out the previus row data
with cte as( select *
, rownumber() over by (partition by trade_id order by......) as seq
from table )
select * from cte t1
left join cte t2 on t1.seq= t2.seq + 1
Lets say, your field names in cte are aliesed as A and B, since you have self referenced the table. Now insert the generated results into a temp table.
Then, you can structure your CASE statement based on the ACTION from the previous row:
select
a.action
, case when (b.action= 'Update' and a.action= 'amend') then 'REmove'
else 'keep'
end NEWField
from #temp

Related

Efficiently outer join two array columns row-wise in BigQuery table

I'll first state the question as simply as possible, then elaborate with more detail and an example.
Concise question without context
I have a table with rows containing columns of arrays. I need to outer join the elements of some pairs of these, compute some variables, and then aggregate the results back into a new array. I'm currently using a pattern where I:
unnest each column in the pair to be joined (cross join to PK of row)
full outer join the two on the PK and compute desired fields
group by PK to get back to single row with array column that summarizes the results
Is there a way to do this without the multiple unnesting and grouping back down?
More context and an example
I have a table which represents edits to an entity that is made up of multiple sub-records. Each row represents a single entity. There is a column before that contains the records before the edit, and another after that contains the records afterwards.
My goal is to label each sub-record with exactly one of the four valid edit types:
DELETE - record exists in before but not after
ADD - record exists in after but not before
EDIT - record exists in both before and after but any field was changed
NONE - record exists in both before and after and no fields were changed
Each of the sub-record values is represented by its ID and a hash of all of its fields. I've created some fake data and provided my initial implementation below. This works, but it seems very roundabout.
WITH source_data AS (
SELECT
1 AS pkField,
[
STRUCT(1 AS id, 1 AS fieldHash),
STRUCT(2 AS id, 2 AS fieldHash),
STRUCT(3 AS id, 3 AS fieldHash)
] AS before,
[
STRUCT(1 AS id, 1 AS fieldHash),
STRUCT(2 AS id, 0 AS fieldHash), -- record 2 edited
-- record 3 deleted
STRUCT(4 AS id, 4 AS fieldHash), -- record 4 added
STRUCT(5 AS id, 5 AS fieldHash) -- record 5 added
] AS after
)
SELECT
pkField,
ARRAY_AGG(STRUCT(
id,
CASE
WHEN beforeHash IS NULL THEN "ADD"
WHEN afterHash IS NULL THEN "DELETE"
WHEN beforeHash <> afterHash THEN "EDIT"
ELSE "NONE"
END AS editType
)) AS edits
FROM (
SELECT pkField, id, fieldHash AS beforeHash
FROM source_data
CROSS JOIN UNNEST(source_data.before)
)
FULL OUTER JOIN (
SELECT pkField, id, fieldHash AS afterHash
FROM source_data
CROSS JOIN UNNEST(source_data.after)
)
USING (pkField, id)
GROUP BY pkField
Is there a simpler and/or more efficient way to do this? Perhaps something that avoids the multiple unnesting and grouping back down?

I think, what you have is already simple and efficient way!
Meantime, you can consider below optimized version
select pkField,
array(select struct(
id, case
when b.fieldHash is null then 'ADD'
when a.fieldHash is null then 'DELETE'
when b.fieldHash != a.fieldHash then 'EDIT'
else 'NONE'
end as editType
) edits
from (select id, fieldHash from t.before) b
full outer join (select id, fieldHash from t.after) a
using(id)
) edits
from source_data t
if applied to sample data in your question - output is

SQL Server 2008 R2: update one occurrence of a group's NULL value and delete the rest

I have a table of orders which has multiple rows of orders missing a Type and I'm struggling to get the queries right. I'm pretty new to SQL so please bear with me.
I've illustrated an example in the picture below. I need help creating the query that will take the table to the right and UPDATE it to look like the right table.
The orders are sorted by group. Each group should have one instance of type OK (IF A NULL OR OK ALREADY EXISTS), and no instances of NULL. I would like to achieve this by updating one of the groups' orders with type NULL to have type OK and delete the rest of the respective group's NULL rows.
I've managed to get the rows that I want to keep by
Create a temporary table where I insert the orders and replace NULL types with EMPTY
From the temporary table, get the existing OK orders for groups that already have one OK order, else an EMPTY order that should be changed to OK.
I've done this with the following:
SELECT * FROM Orders
SELECT *
INTO #modified
FROM
(SELECT
Id, IdGroup,
CASE WHEN Type IS NULL
THEN 'EMPTY'
ELSE Type
END Type
FROM
Orders) AS XXX
SELECT MIN(x.Id) Id, x.IdGroup, x.Type
FROM #modified x
JOIN
(SELECT
IdGroup, MIN (Type) AS min_Type
FROM #modified a
WHERE Type = 'OK' OR Type = 'EMPTY'
GROUP BY IdGroup) y ON y.IdGroup = x.IdGroup AND y.min_Type = x.Type
GROUP BY x.IdGroup, x.Type
DROP TABLE #modified
The rest of the EMPTY orders should after this step be deleted, but I don't know how to proceed from here. Maybe this is a poor approach from the beginning and maybe it could be done even easier?

Well done for writing a question that shows some effort and clearly explains what you're after. That's a rare thing unfortunately!
This is how I would do it:
First backup the table (I like to put them into a different schema to keep things neat)
CREATE SCHEMA bak;
SELECT * INTO bak.Orders FROM dbo.Orders;
Now you can do a trial run on the bak table if you like.
Anyway...
Set all the NULL types to OK
UPDATE Orders SET Type = 'OK' WHERE Type IS NULL;
Now repeatedly delete redundant records. Find records with more than one OK and delete them:
DELETE Orders WHERE ID In
(
SELECT MIN(Id) Id
FROM Orders
WHERE Type = 'OK';
GROUP BY idGroup
HAVING COUNT(*) > 1
);
You'll need to run that one a few times until it affects zero records

Assuming there are no multiple OKs and each group has at least one Ok or NULL value, you can do:
select t.id, t.idGroup, t.Type
from lefttable t
where t.Type is not null and t.Type <> 'OK'
union all
select t.id, t.idGroup, 'OK'
from (select t.*, row_number() over (partition by idGroup order by coalesce(t.Type, 'ZZZ')) as seqnum
from lefttable t
where t.Type is null or t.Type = 'OK'
) t
where seqnum = 1;
Actually, this will work even if you do have multiple OKs, but it will keep only of of the rows.
The first subquery selects all rows that are not OK or NULL. The second chooses exactly one of those group and assign the type as OK.

If you want to keep any OK ones in preference to a NULL, this will work. It creates a temp table with everything we need to work on (OK and NULL), and numbers them starting from one with each group, ordered so you list OK records before null ones. Then it makes sure all the first records are OK, and deletes all the rest
Create table #work (Id int, RowNo int)
--Get a list of all the rows we need to work on, and number them for each group
--(order by type desc puts OK before nulls)
Insert into #work (Id, RowNo)
Select Id, ROW_NUMBER() over (partition by IdGroup order by type desc) as RowNo
From Orders O
where (type is null OR type = 'OK');
-- Make sure the one we keep is OK, not null
Update O set type = 'OK'
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo = 1 and O.type IS NULL;
--Delete the remaining ones (any rowno > 1)
Delete O
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo > 1;
drop table #work;

Can't you just delete the rows where Type equals null?
DELETE FROM Orders WHERE Type IS NULL

Order by data as per supplied Id in sql

Query:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
Right now the data is ordered by the auto id or whatever clause I'm passing in order by.
But I want the data to come in sequential format as per id's I have passed
Expected Output:
All Data for 1247881
All Data for 174772
All Data for 808454
All Data for 2326154
Note:
Number of Id's to be passed will 300 000

One option would be to create a CTE containing the ration_card_id values and the orders which you are imposing, and the join to this table:
WITH cte AS (
SELECT 1247881 AS ration_card_id, 1 AS position
UNION ALL
SELECT 174772, 2
UNION ALL
SELECT 808454, 3
UNION ALL
SELECT 2326154, 4
)
SELECT t1.*
FROM [MemberBackup].[dbo].[OriginalBackup] t1
INNER JOIN cte t2
ON t1.ration_card_id = t2.ration_card_id
ORDER BY t2.position DESC
Edit:
If you have many IDs, then neither the answer above nor the answer given using a CASE expression will suffice. In this case, your best bet would be to load the list of IDs into a table, containing an auto increment ID column. Then, each number would be labelled with a position as its record is being loaded into your database. After this, you can join as I have done above.

If the desired order does not reflect a sequential ordering of some preexisting data, you will have to specify the ordering yourself. One way to do this is with a case statement:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
ORDER BY CASE ration_card_id
WHEN 1247881 THEN 0
WHEN 174772 THEN 1
WHEN 808454 THEN 2
WHEN 2326154 THEN 3
END
Stating the obvious but note that this ordering most likely is not represented by any indexes, and will therefore not be indexed.

Insert your ration_card_id's in #temp table with one identity column.
Re-write your sql query as:
SELECT a.*
FROM [MemberBackup].[dbo].[OriginalBackup] a
JOIN #temps b
on a.ration_card_id = b.ration_card_id
order by b.id

CTE query not returning duplicate rows

For auditing purposes, I use triggers to insert changes into an audit table. I then use a CTE to recursively get all audit rows of a particular item.
This SQLFiddle adds a few audit rows and the CTE that I'm currently using.
Here is the CTE:
;WITH cteAudit AS
(
SELECT id, [user_name], date_time, item_parent_type, item_parent_id,
item_type, item_id, item_action, row_guid, 1 AS audit_level
FROM audit
WHERE
(item_type = 22 AND item_id = 925)
UNION ALL
SELECT a.id, a.[user_name], a.date_time, a.item_parent_type, a.item_parent_id,
a.item_type, a.item_id, a.item_action, a.row_guid, cteAudit.audit_level + 1
FROM audit a
INNER JOIN cteAudit
ON a.item_parent_id = cteAudit.item_id
AND a.item_parent_type = cteAudit.item_type
WHERE
a.item_parent_type <> a.item_type AND
a.item_parent_id <> a.item_id
)
SELECT
cteAudit.id,
cteAudit.[user_name],
cteAudit.date_time,
cteAudit.item_parent_id,
#itemtype AS item_type,
cteAudit.item_id,
item_action,
cteAudit.audit_level,
CONVERT(nvarchar(36), cteAudit.row_guid) AS row_guid
ORDER BY date_time DESC, audit_level desc
However, with this CTE I have 2 problems:
The query does not return all 13 rows. It doesn't show rows 11-13. This is because the main table (customer) wasn't audited because the user just changed the address of a contact person.
This happens because I modified the 3rd level (contact person) and this level has a link to the 2nd level (contact list. However the 2nd level, which has the link to the 1st level (customer), was not modified and therefore is not in the audit table and therefore there's no link to the 1st level.
The query returns duplicate records.
How can I return all rows for the main table? And why is the query return duplicate rows?

Your problem comes from either incorrect query or incorrect data.
You ask your recursive query for these constraint:
WHERE
a.item_parent_type <> a.item_type AND
a.item_parent_id <> a.item_id
but for row 11 - 13, item_parent_id equals item_id
You also ask for
AND a.item_parent_type = cteAudit.item_type
but for row 11 - 13 your types are flipped and therefore incorrect.

SQL Server 2012 - updating a column based on row to row comparison

I have a table that contains dates and times. For example columns are Date, ExTime, NewTime, Status. I am ordering them based on a expkey column that makes them show in the right order.
I want to do a row by row comparison and compare the second row column of extime to the first row column NewTime. If extime < Newtime then I want to update status with a "1". And then traverse through the table row by row where second row in the above example becomes the first and a new second is pull and used. Here is a sample of what I have now - but it is not hitting and working all all of the rows for some reason.
UPDATE t
SET t.Status = 1
FROM MyTable t
CROSS APPLY (SELECT TOP 1 NewTime
FROM MyTable
WHERE ID = t.ID AND [Date] = t.[Date]
ORDER BY ExpKey) t1
WHERE t.Extime < t1.NewTime
This is not hitting all the rows like I want it to. I have the where clause comparing fields ID and Date to insure that the rows are attached to the same person. If the ID or Dates are not the same it is not attached to the same person so I would not want to update the status. So basically if the ID of Row 2 = ID of Row 1 and Date of Row 2 = Date of Row 1 I want to compare extime of row 2 and see if it is less than newtime of Row 1 - if so then update the status field of row 2.
Any help in figuring out why this sort of works but not on all would be appreciated.
Ad.

On SQL Server 2012 you can easily update status with window function lag():
with cte as (
select
extime,
lag(newtime) over(partition by id, date order by expKey) as newtime,
status
from table1
)
update cte set status = 1 where extime < newtime;
sql fiddle demo

I haven't tested this, but I've dealt with similar issues of comparing adjacent rows. I put this together off-the-cuff, so it may need tweaking, but give it a try.
;WITH CTE AS
( SELECT ID,[Date],ExpKey,ExTime,NewTime,
ROW_NUMBER()OVER(PARTITION BY ID,[Date] ORDER BY ExpKey) AS Sort
FROM MyTable
)
UPDATE row2
SET row2.[Status] = 2
WHERE row2.ExTime < row1.NewTime
FROM CTE row2
CROSS JOIN CTE row1
ON row1.ID = row2.ID
AND row1.[Date] = row2.[Date]
AND row1.Sort = row2.Sort-1 --Join to prior row

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL identifying where the previous row number contained xyz - sql

Related

Efficiently outer join two array columns row-wise in BigQuery table

SQL Server 2008 R2: update one occurrence of a group's NULL value and delete the rest

Order by data as per supplied Id in sql

CTE query not returning duplicate rows

SQL Server 2012 - updating a column based on row to row comparison

Categories

Resources