removing non-matching rows based on order in SQL within a WITH statement

removing non-matching rows based on order in SQL within a WITH statement - sql

I've got a many-to-many setup where there are items and item names(based on languageID)
I want to retrieve all names for a set id, where the name is replace with an alternate name (same itemID, but different languageID) when name is NULL.
I've set up a table that receives all combinations of itemids and itemnames, even the missing ones, and have the name ordered by an hasName flag, that is set based on name existing to 0,1 or 2. 0 means languageId and name exist, 1 means only name exists, and 2 means neither. I then sort the results: ORDER BY itemId, hasName, languageId this works well enough, because the top 1 row of every itemid meats the critera, and I can just pull that.
However I still need to process other queries using the result, so this doesn't work well, because as soon as I use a WITH statement, the order cannot be used, so it breaks the functionality
What I'm using instead is a join, where I select the top 1 matching row on the ordered table
the problem there is that the time to execute goes up 10x
any ideas what else I could try?
using SQL server 10.50
the slow query:
SELECT
*,
(SELECT top 1 ItemName FROM ItemNameMultiLang x WHERE x.ItemId = tc.ItemId ORDER BY ItemID, hasName, LangID) AS ItemName
FROM ItemCategories tc
ORDER BY ItemId

One way to approach this is with row_number(), so you can get the first row from itemNameMultiLang, which is what you want:
SELECT tc.*, inml.ItemName
FROM ItemCategories tc left outer join
(select inml.*, row_number() over (partition by inml.ItemId order by hasname, langId) as seqnum
from ItemNameMultiLang
) inml
on tc.ItemItem = inml.ItemId and
inml.seqnum = 1
ORDER BY tc.ItemId;

Related

SQL update depending on values from view

I thought this had to be a rather easy task, but I failed to see the complexity.
I have a table that I need refactoring, one of its FKs has duplicate entries only that they seem not to be as they have different case for names, say FTP and ftp for example
So, I've built a view that spits out a lowercase version of the duplicates with its id and the amount of times it has been used in the table I'm refactoring. The reason I do this count is so that I can assign the id of the version of the record that has been used the most, in the hope that it will not have such a big impact on the UI when the record changes visually
So, the output of the view is something like this:
Ok, so here I want to keep 'tagger' with id = 1 and 'mytag' with id = 4 as they are the ones with higher usage (countyCo)
Now, I have table tag, which holds all the details for 'tagger' and 'mytag', and then I have the table picture_tag that is the one relating the tags to the pictures.
Output of picture_tag could be:
I added a 'name_aux' with the original names from the tag table to make them easier to link
I've tried several queries, but the issue I'm facing is how to group all the ids of the rows that have the "same" name in the picture_tag table and then assign the id with the MAX countyCo for the group of entries in the view that have the same value
So far everything I've done can't manage to use more than one entry and ends up assigning 'tagger' to every entry, overwriting 'mytag'
My last attempt looked like this:
UPDATE picture_tag SET tag_id = (SELECT id FROM duplicate_stuff WHERE (countyCo, name) IN (
SELECT MAX(countyCO), name FROM duplicate_stuff dst
JOIN picture_tag ptag
ON dst.id = ptag.tag_id
AND dst.name = LOWER(ptag.name_aux)
GROUP BY name))
But the subquery returned more than one element.
Another attempt was this one:
UPDATE picture_tag ptag SET tag_id = (SELECT id FROM duplicate_stuff ds WHERE ds.name =
(SELECT LOWER(tag) FROM tag WHERE id = ds.id)
AND id = (SELECT id FROM tag WHERE id = ds.id)
ORDER BY ds.countyCo DESC
LIMIT 1
)
WHERE ptag.tag_id IN (SELECT id FROM duplicate_stuff)
But of course, the limit 1 has me overwriting the 'mytag' as well as the 'tagger'

You can use ROW_NUMBER to get the tags that are used the most. Then join this "duplicates" table with itself. And join the table that you want to update.
Something like this:
WITH cte_tags
as
(
SELECT *,ROW_NUMBER() over (PARTITION BY tagLower ORDER BY Counter DESC ) as rn
FROM duplicate_stuff
)
UPDATE upd
SET TagId = tagMax.TagId
FROM cte_tags tagMax
JOIN cte_tags tagNotMax on tagMax.tagLower = tagNotMax.tagLower and tagNotMax.rn>1
JOIN picture_tag upd on upd.tagId = tagNotMax.TagId
where tagmax.rn =1;

I ended up hitting the right algorithm:
UPDATE picture_tag ptag
SET tag_id = (SELECT id FROM duplicate_stuff
WHERE name = LOWER(ptag.name_aux)
ORDER BY countyCo DESC
LIMIT 1)
WHERE LOWER(name_aux) IN (SELECT name FROM duplicate_stuff)
Or to avoid issues with MSSQL and its limit when using IN:
UPDATE picture_tag ptag
SET tag_id = (SELECT id FROM duplicate_stuff
WHERE name = LOWER(ptag.name_aux)
ORDER BY countyCo DESC
LIMIT 1)
WHERE EXISTS (SELECT name FROM duplicate_stuff WHERE name = LOWER(name_aux))

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here

You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).

Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).

A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

SQL Server 2008 R2: update one occurrence of a group's NULL value and delete the rest

I have a table of orders which has multiple rows of orders missing a Type and I'm struggling to get the queries right. I'm pretty new to SQL so please bear with me.
I've illustrated an example in the picture below. I need help creating the query that will take the table to the right and UPDATE it to look like the right table.
The orders are sorted by group. Each group should have one instance of type OK (IF A NULL OR OK ALREADY EXISTS), and no instances of NULL. I would like to achieve this by updating one of the groups' orders with type NULL to have type OK and delete the rest of the respective group's NULL rows.
I've managed to get the rows that I want to keep by
Create a temporary table where I insert the orders and replace NULL types with EMPTY
From the temporary table, get the existing OK orders for groups that already have one OK order, else an EMPTY order that should be changed to OK.
I've done this with the following:
SELECT * FROM Orders
SELECT *
INTO #modified
FROM
(SELECT
Id, IdGroup,
CASE WHEN Type IS NULL
THEN 'EMPTY'
ELSE Type
END Type
FROM
Orders) AS XXX
SELECT MIN(x.Id) Id, x.IdGroup, x.Type
FROM #modified x
JOIN
(SELECT
IdGroup, MIN (Type) AS min_Type
FROM #modified a
WHERE Type = 'OK' OR Type = 'EMPTY'
GROUP BY IdGroup) y ON y.IdGroup = x.IdGroup AND y.min_Type = x.Type
GROUP BY x.IdGroup, x.Type
DROP TABLE #modified
The rest of the EMPTY orders should after this step be deleted, but I don't know how to proceed from here. Maybe this is a poor approach from the beginning and maybe it could be done even easier?

Well done for writing a question that shows some effort and clearly explains what you're after. That's a rare thing unfortunately!
This is how I would do it:
First backup the table (I like to put them into a different schema to keep things neat)
CREATE SCHEMA bak;
SELECT * INTO bak.Orders FROM dbo.Orders;
Now you can do a trial run on the bak table if you like.
Anyway...
Set all the NULL types to OK
UPDATE Orders SET Type = 'OK' WHERE Type IS NULL;
Now repeatedly delete redundant records. Find records with more than one OK and delete them:
DELETE Orders WHERE ID In
(
SELECT MIN(Id) Id
FROM Orders
WHERE Type = 'OK';
GROUP BY idGroup
HAVING COUNT(*) > 1
);
You'll need to run that one a few times until it affects zero records

Assuming there are no multiple OKs and each group has at least one Ok or NULL value, you can do:
select t.id, t.idGroup, t.Type
from lefttable t
where t.Type is not null and t.Type <> 'OK'
union all
select t.id, t.idGroup, 'OK'
from (select t.*, row_number() over (partition by idGroup order by coalesce(t.Type, 'ZZZ')) as seqnum
from lefttable t
where t.Type is null or t.Type = 'OK'
) t
where seqnum = 1;
Actually, this will work even if you do have multiple OKs, but it will keep only of of the rows.
The first subquery selects all rows that are not OK or NULL. The second chooses exactly one of those group and assign the type as OK.

If you want to keep any OK ones in preference to a NULL, this will work. It creates a temp table with everything we need to work on (OK and NULL), and numbers them starting from one with each group, ordered so you list OK records before null ones. Then it makes sure all the first records are OK, and deletes all the rest
Create table #work (Id int, RowNo int)
--Get a list of all the rows we need to work on, and number them for each group
--(order by type desc puts OK before nulls)
Insert into #work (Id, RowNo)
Select Id, ROW_NUMBER() over (partition by IdGroup order by type desc) as RowNo
From Orders O
where (type is null OR type = 'OK');
-- Make sure the one we keep is OK, not null
Update O set type = 'OK'
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo = 1 and O.type IS NULL;
--Delete the remaining ones (any rowno > 1)
Delete O
from #Work W
inner join Orders O on O.Id = W.Id
Where W.RowNo > 1;
drop table #work;

Can't you just delete the rows where Type equals null?
DELETE FROM Orders WHERE Type IS NULL

SSRS Table of locations per item type

I have a basic query which shows what the latest product to be put in each location (FVTank) is:
SELECT TOP 1
T0.[DateTime],
T0.[TankName],
T1.[Item]
FROM
t005_pci_data T0
INNER JOIN t001_fvbatch T1 ON T1.[FVBatch] = T0.[FVBatch]
WHERE
T0.[TankName] = 'FV101'
UNION
SELECT TOP 1
T0.[DateTime],
T0.[TankName],
T1.[Item]
FROM
t005_pci_data T0
INNER JOIN t001_fvbatch T1 ON T1.[FVBatch] = T0.[FVBatch]
WHERE
T0.[TankName] = 'FV102'
[...etc...]
ORDER BY
T0.[DateTime] DESC
Which gives a result like this:
What I'd like to do is create a summary page on SSRS which would display all the locations which currently hold each item. Ideally it would look something like this:
There are 50 locations and 7 main items so I need it to have 8 headers (one additional one for "other".)
Is there a way to do this in SSRS? Or is there a better solution by doing it in SQL?
Thank you.

Add an additional column to your dataset that calculates a row number for each Item, ordered by the DateTime field:
row_number() over (partition by Item order by DateTime desc) as rn
Judging by your source query in your question, this may be best included as a wrapping select around your final query:
select DateTime
,TankName
,Item
,row_number() over (partition by Item order by DateTime desc) as rn
from(
<Your original query here>
) a
You can then use this as your row group, as without one you will not get the top aligned format you are after in each Item x column. Remember to delete the rn column but keep the grouping:
When you run this report you will get the following format (I didn't bother typing out all your data into my dataset query, hence the missing values):

Sql Server Query Selecting Top and grouping by

SpousesTable
SpouseID
SpousePreviousAddressesTable
PreviousAddressID, SpouseID, FromDate, AddressTypeID
What I have now is updating the most recent for the whole table and assigning the most recent regardless of SpouseID the AddressTypeID = 1
I want to assign the most recent SpousePreviousAddress.AddressTypeID = 1
for each unique SpouseID in the SpousePreviousAddresses table.
UPDATE spa
SET spa.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa INNER JOIN Spouses ON spa.SpouseID = Spouses.SpouseID,
(SELECT TOP 1 SpousePreviousAddresses.* FROM SpousePreviousAddresses
INNER JOIN Spouses AS s ON SpousePreviousAddresses.SpouseID = s.SpouseID
WHERE SpousePreviousAddresses.CountryID = 181 ORDER BY SpousePreviousAddresses.FromDate DESC) as us
WHERE spa.PreviousAddressID = us.PreviousAddressID
I think I need a group by but my sql isn't all that hot. Thanks.
Update that is Working
I was wrong about having found a solution to this earlier. Below is the solution I am going with
WITH result AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) AS rowNumber, *
FROM SpousePreviousAddresses
WHERE CountryID = 181
)
UPDATE result
SET AddressTypeID = 1
FROM result WHERE rowNumber = 1

Presuming you are using SQLServer 2005 (based on the error message you got from the previous attempt) probably the most straightforward way to do this would be to use the ROW_NUMBER() Function couple with a Common Table Expression, I think this might do what you are looking for:
WITH result AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY SpouseID ORDER BY FromDate DESC) as rowNumber,
*
FROM
SpousePreviousAddresses
)
UPDATE SpousePreviousAddresses
SET
AddressTypeID = 2
FROM
SpousePreviousAddresses spa
INNER JOIN result r ON spa.SpouseId = r.SpouseId
WHERE r.rowNumber = 1
AND spa.PreviousAddressID = r.PreviousAddressID
AND spa.CountryID = 181
In SQLServer2005 the ROW_NUMBER() function is one of the most powerful around. It is very usefull in lots of situations. The time spent learning about it will be re-paid many times over.
The CTE is used to simplyfy the code abit, as it removes the need for a temporary table of some kind to store the itermediate result.
The resulting query should be fast and efficient. I know the select in the CTE uses *, which is a bit of overkill as we dont need all the columns, but it may help to show what is happening if anyone want to see what is happening inside the query.

Here's one way to do it:
UPDATE spa1
SET spa1.AddressTypeID = 1
FROM SpousePreviousAddresses AS spa1
LEFT OUTER JOIN SpousePreviousAddresses AS spa2
ON (spa1.SpouseID = spa2.SpouseID AND spa1.FromDate < spa2.FromDate)
WHERE spa1.CountryID = 181 AND spa2.SpouseID IS NULL;
In other words, update the row spa1 for which no other row spa2 exists with the same spouse and a greater (more recent) date.
There's exactly one row for each value of SpouseID that has the greatest date compared to all other rows (if any) with the same SpouseID.
There's no need to use a GROUP BY, because there's kind of an implicit grouping done by the join.
update: I think you misunderstand the purpose of the OUTER JOIN. If there is no row spa2 that matches all the join conditions, then all columns of spa2.* are returned as NULL. That's how outer joins work. So you can search for the cases where spa1 has no matching row spa2 by testing that spa2.SpouseID IS NULL.

UPDATE spa SET spa.AddressTypeID = 1
WHERE spa.SpouseID IN (
SELECT DISTINCT s1.SpouseID FROM Spa S1, SpousePreviousAddresses S2
WHERE s1.SpouseID = s2.SpouseID
AND s2.CountryID = 181
AND s1.PreviousAddressId = s2.PreviousAddressId
ORDER BY S2.FromDate DESC)
Just a guess.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

removing non-matching rows based on order in SQL within a WITH statement - sql

Related

SQL update depending on values from view

Modify my SQL Server query -- returns too many rows sometimes

SQL Server 2008 R2: update one occurrence of a group's NULL value and delete the rest

SSRS Table of locations per item type

Sql Server Query Selecting Top and grouping by

Categories

Resources