More than one row returned by a subquery used as an expression when UPDATE on multiple rows - sql

I'm trying to update rows in a single table by splitting them into two "sets" of rows.
The top part of the set should have a status set to X and the bottom one should have a status set to status Y.
I've tried putting together a query that looks like this
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
((SELECT id from x_status), 'X'),
((SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (folks.ids);
When I run this query I get the following error:
pq: more than one row returned by a subquery used as an expression
This makes sense, folks.ids is expected to return a list of IDs, hence the IN clause in the UPDATE statement, but I suspect the problem is I can not return the list in the values statement in the FROM clause as it turns into something like this:
(1, 2, 3, 4, 5, 5)
(6, 7, 8, 9, 1)
Is there a way how this UPDATE can be done using a CTE query at all? I could split this into two separate UPDATE queries, but CTE query would be better and in theory faster.

I think I understand now... if I get your problem, you want to set the status to 'X' for the oldest five records and 'Y' for everything else?
In that case I think the row_number() analytic would work -- and it should do it in a single pass, two scans, and eliminating one order by. Let me know if something like this does what you seek.
with ranked as (
select
id, row_number() over (order by date_registered desc) as rn
from people
)
update people p
set
status = case when r.rn <= 5 then 'X' else 'Y' end
from ranked r
where
p.id = r.id
Any time you do an update from another data set, it's helpful to have a where clause that defines the relationship between the two datasets (the non-ANSI join syntax). This makes it iron-clad what you are updating.
Also I believe this code is pretty readable so it will be easier to build on if you need to make tweaks.
Let me know if I missed the boat.

So after more tinkering, I've come up with a solution.
The problem with why the previous query fails is we are not grouping the IDs in the subqueries into arrays so the result expands into a huge list as I suspected.
The solution is grouping the IDs in the subqueries into ARRAY -- that way they get returned as a single result (tuple) in ids value.
This is the query that does the job. Note that we must unnest the IDs in the WHERE clause:
WITH x_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
LIMIT 5
), y_status AS (
SELECT id
FROM people
WHERE surname = 'foo'
ORDER BY date_registered DESC
OFFSET 5
)
UPDATE people
SET status = folks.status
FROM (values
(ARRAY(SELECT id from x_status), 'X'),
(ARRAY(SELECT id from y_status), 'Y')
) as folks (ids, status)
WHERE id IN (SELECT * from unnest(folks.ids));

Related

SQL Server: Each GROUP BY expression must contain at least one column that is not an outer reference

scenario 1:
I have two tables INFUSION_APP_APPOINTMENT,INFUSION_APP_NURSE_NOTES where
INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID=INFUSION_APP_APPOINTMENT.ID and i want to find out the INFUSION_APP_NURSE_NOTES.ID's where INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID is same.
for eg. if the INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID = 1 and INFUSION_APP_NURSE_NOTES.ID is 12,15,78, then i want to display all the
INFUSION_APP_NURSE_NOTES.ID's where INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID =1.
i use below script
SELECT INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.ID
FROM INFUSION_APP_NURSE_NOTES
GROUP BY INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.ID
HAVING COUNT(INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID)>1
but it does not gives me any records.
scenario 2:
I am running below script with the intention to get the duplicate records with different INFUSION_APP_NURSE_NOTES.ID's but same INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID.
SELECT INFUSION_APP_NURSE_NOTES.ID,INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.TYPE
FROM INFUSION_APP_NURSE_NOTES
WHERE
EXISTS (
SELECT 1 FROM INFUSION_APP_APPOINTMENT
WHERE
INFUSION_APP_NURSE_NOTES.ENABLE=1
AND INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID=INFUSION_APP_APPOINTMENT.ID
GROUP BY INFUSION_APP_NURSE_NOTES.ID
HAVING COUNT(INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID)>1
)
ORDER BY INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID;
but getting below error
SQL Error(164): Each GROUP BY expression must contain at least one
column that is not an outer reference
how to solve it?
i want the only row which has common APPOINTMENT_ID but different n
The question is unclear. Finding duplicates is typically performed using ranking functions like ROW_NUMBER(). This query :
SELECT *,ROW_NUMBER(PARTITION BY APPOINTMENT_ID ORDER BYID) as RN
FROM INFUSION_APP_NURSE_NOTES
WHERE
ENABLE=1
Will rank notes for the same appointment by ID and return 1, 2, 3 etc starting from the earliest note. ORDER BY ID DESC would return 1 for the latest note.
This can be used in a subquery or CTE to find the first, last or or duplicate records, eg :
with notes as (
SELECT *,ROW_NUMBER(PARTITION BY APPOINTMENT_ID ORDER BYID) as RN
FROM INFUSION_APP_NURSE_NOTES
WHERE
ENABLE=1
)
select *
from notes
where RN=1
Will return the first note per appointment while :
where RN>1
Will return only duplicates.
The question doesn't say what should be done with the duplicates though.
If the question is how to return all notes from appointments with multiple notes, a subquery can be used to return the APPOINTMENT_IDs that have more than one note. There's no need to include the INFUSION_APP_APPOINTMENT table though :
SELECT *
FROM INFUSION_APP_NURSE_NOTES
where
ENABLE=1 AND
APPOINTMENT_ID IN ( SELECT APPOINTMENT_ID
FROM INFUSION_APP_NURSE_NOTES
WHERE
ENABLE=1
group by APPOINTMENT_ID
having count(*)>1)
Try this
SELECT INFUSION_APP_NURSE_NOTES.ID,INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID,INFUSION_APP_NURSE_NOTES.TYPE
FROM INFUSION_APP_NURSE_NOTES
WHERE
EXISTS (
SELECT COUNT(B.APPOINTMENT_ID), B.ID
FROM INFUSION_APP_APPOINTMENT A
INNER JOIN INFUSION_APP_NURSE_NOTES B ON B.APPOINTMENT_ID = A.ID
WHERE
B.ENABLE=1
GROUP BY B.ID
HAVING COUNT(B.APPOINTMENT_ID)>1
)
ORDER BY INFUSION_APP_NURSE_NOTES.APPOINTMENT_ID;

Modify my SQL Server query -- returns too many rows sometimes

I need to update the following query so that it only returns one child record (remittance) per parent (claim).
Table Remit_To_Activate contains exactly one date/timestamp per claim, which is what I wanted.
But when I join the full Remittance table to it, since some claims have multiple remittances with the same date/timestamps, the outermost query returns more than 1 row per claim for those claim IDs.
SELECT * FROM REMITTANCE
WHERE BILLED_AMOUNT>0 AND ACTIVE=0
AND REMITTANCE_UUID IN (
SELECT REMITTANCE_UUID FROM Claims_Group2 G2
INNER JOIN Remit_To_Activate t ON (
(t.ClaimID = G2.CLAIM_ID) AND
(t.DATE_OF_LATEST_REGULAR_REMIT = G2.CREATE_DATETIME)
)
where ACTIVE=0 and BILLED_AMOUNT>0
)
I believe the problem would be resolved if I included REMITTANCE_UUID as a column in Remit_To_Activate. That's the REAL issue. This is how I created the Remit_To_Activate table (trying to get the most recent remittance for a claim):
SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
MAX(claim_id) AS ClaimID,
INTO Latest_Remit_To_Activate
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID
Claims_Group2 contains these fields:
REMITTANCE_UUID,
CLAIM_ID,
BILLED_AMOUNT,
CREATE_DATETIME
Here are the 2 rows that are currently giving me the problem--they're both remitts for the SAME CLAIM, with the SAME TIMESTAMP. I only want one of them in the Remits_To_Activate table, so only ONE remittance will be "activated" per Claim:
enter image description here
You can change your query like this:
SELECT
p.*, latest_remit.DATE_OF_LATEST_REMIT
FROM
Remittance AS p inner join
(SELECT MAX(create_datetime) as DATE_OF_LATEST_REMIT,
claim_id,
FROM Claims_Group2
WHERE BILLED_AMOUNT>0
GROUP BY Claim_ID
ORDER BY Claim_ID) as latest_remit
on latest_remit.claim_id = p.claim_id;
This will give you only one row. Untested (so please run and make changes).
Without having more information on the structure of your database -- especially the structure of Claims_Group2 and REMITTANCE, and the relationship between them, it's not really possible to advise you on how to introduce a remittance UUID into DATE_OF_LATEST_REMIT.
Since you are using SQL Server, however, it is possible to use a window function to introduce a synthetic means to choose among remittances having the same timestamp. For example, it looks like you could approach the problem something like this:
select *
from (
select
r.*,
row_number() over (partition by cg2.claim_id order by cg2.create_datetime desc) as rn
from
remittance r
join claims_group2 cg2
on r.remittance_uuid = cg2.remittance_uuid
where
r.active = 0
and r.billed_amount > 0
and cg2.active = 0
and cg2.billed_amount > 0
) t
where t.rn = 1
Note that that that does not depend on your DATE_OF_LATEST_REMIT table at all, it having been subsumed into the inline view. Note also that this will introduce one extra column into your results, though you could avoid that by enumerating the columns of table remittance in the outer select clause.
It also seems odd to be filtering on two sets of active and billed_amount columns, but that appears to follow from what you were doing in your original queries. In that vein, I urge you to check the results carefully, as lifting the filter conditions on cg2 columns up to the level of the join to remittance yields a result that may return rows that the original query did not (but never more than one per claim_id).
A co-worker offered me this elegant demonstration of a solution. I'd never used "over" or "partition" before. Works great! Thank you John and Gaurasvsa for your input.
if OBJECT_ID('tempdb..#t') is not null
drop table #t
select *, ROW_NUMBER() over (partition by CLAIM_ID order by CLAIM_ID) as ROW_NUM
into #t
from
(
select '2018-08-15 13:07:50.933' as CREATE_DATE, 1 as CLAIM_ID, NEWID() as
REMIT_UUID
union select '2018-08-15 13:07:50.933', 1, NEWID()
union select '2017-12-31 10:00:00.000', 2, NEWID()
) x
select *
from #t
order by CLAIM_ID, ROW_NUM
select CREATE_DATE, MAX(CLAIM_ID), MAX(REMIT_UUID)
from #t
where ROW_NUM = 1
group by CREATE_DATE

Order by data as per supplied Id in sql

Query:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
Right now the data is ordered by the auto id or whatever clause I'm passing in order by.
But I want the data to come in sequential format as per id's I have passed
Expected Output:
All Data for 1247881
All Data for 174772
All Data for 808454
All Data for 2326154
Note:
Number of Id's to be passed will 300 000
One option would be to create a CTE containing the ration_card_id values and the orders which you are imposing, and the join to this table:
WITH cte AS (
SELECT 1247881 AS ration_card_id, 1 AS position
UNION ALL
SELECT 174772, 2
UNION ALL
SELECT 808454, 3
UNION ALL
SELECT 2326154, 4
)
SELECT t1.*
FROM [MemberBackup].[dbo].[OriginalBackup] t1
INNER JOIN cte t2
ON t1.ration_card_id = t2.ration_card_id
ORDER BY t2.position DESC
Edit:
If you have many IDs, then neither the answer above nor the answer given using a CASE expression will suffice. In this case, your best bet would be to load the list of IDs into a table, containing an auto increment ID column. Then, each number would be labelled with a position as its record is being loaded into your database. After this, you can join as I have done above.
If the desired order does not reflect a sequential ordering of some preexisting data, you will have to specify the ordering yourself. One way to do this is with a case statement:
SELECT *
FROM [MemberBackup].[dbo].[OriginalBackup]
where ration_card_id in
(
1247881,174772,
808454,2326154
)
ORDER BY CASE ration_card_id
WHEN 1247881 THEN 0
WHEN 174772 THEN 1
WHEN 808454 THEN 2
WHEN 2326154 THEN 3
END
Stating the obvious but note that this ordering most likely is not represented by any indexes, and will therefore not be indexed.
Insert your ration_card_id's in #temp table with one identity column.
Re-write your sql query as:
SELECT a.*
FROM [MemberBackup].[dbo].[OriginalBackup] a
JOIN #temps b
on a.ration_card_id = b.ration_card_id
order by b.id

Select rows with missing value for each item

I'm trying to make a report to find rows in a table, which have a mistake, a missing item order. I.e.
ID Item Order
----------------
1 A 1
2 A 2
3 A 3
4 B 1
5 B 2
6 C 2
7 C 3
8 D 1
Note, that Item "C" is missing row with Order index "1". I need to find all items, which are missing index "1" and start with "2" or other.
One way I figured is this:
SELECT DIstinct(Item) FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1)
But surprisingly (to me), it does not give me any results even though I know I have such items. I guess, it first selects items wich are not in sub-select and then distincts them, but what I wanted to is select distinct Items and find which of them have no lines with "Order = 1".
Also, this code is to be executed over some 70 thousands of lines, so it has to be feasible (another way I can think of is a CURSOR, but that would be very slow and possibly unstable?).
Regards,
Oak
The idea is sound, but there is one tiny detail with NOT IN that may be problematic. That is, if the subquery after NOT IN results in any NULLs, the NOT IN is evaluated as if it were false. This may be the reason why you get no results. You can try NOT EXISTS, like in the other answer, or just
SELECT DISTINCT Item FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL)
You can find the missing orders using a HAVING clause. HAVING allows you to filter on aggregated records. In this case we are filtering for Items with a min Order in excess of 1.
The benefit of this approach over a sub query in the WHERE clause is SQL Server doesn't have to rerun the sub query multiple times. It should run faster on large datasets.
Example
/* HAVING allows us to filter on aggregated records.
*/
WITH SampleData AS
(
/* This CTE creates some sample records
* to experiment with.
*/
SELECT
r.*
FROM
(
VALUES
( 1, 'A', 1),
( 2, 'A', 2),
( 3, 'A', 3),
( 4, 'B', 1),
( 5, 'B', 2),
( 6, 'C', 2),
( 7, 'C', 3),
( 8, 'D', 1)
) AS r(ID, Item, [Order])
)
SELECT
Item,
COUNT([Order]) AS Count_Order,
MIN([Order]) AS Min_Order
FROM
SampleData
GROUP BY
Item
HAVING
MIN([Order]) > 1
;
Your query should work. The problem is probably that Item could be NULL. So try this:
SELECT Distinct(Item)
FROM ITEMS as I
WHERE I.Item NOT IN (SELECT Item FROM Items WHERE Order = 1 AND Item IS NOT NULL);
This is why NOT EXISTS is preferable to NOT IN.
I would do this, though, with an aggregation query:
select item
from items
group by item
having sum(case when [order] = 1 then 1 else 0 end) = 0;
You can use NOT EXISTS:
SELECT DISTINCT(i1.Item) FROM ITEMS i1
WHERE NOT EXISTS
(
SELECT 1 FROM Items i2
WHERE i1.Item = i2.Item AND i2.[Order] = 1
)
NOT IN has it's issues, worth reading:
http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join
The main problem is that the results can be surprising if the target
column is NULLable (SQL Server processes this as a left anti semi
join, but can't reliably tell you if a NULL on the right side is equal
to – or not equal to – the reference on the left side). Also,
optimization can behave differently if the column is NULLable, even if
it doesn't actually contain any NULL values
because of this...
Instead of NOT IN, use a correlated NOT EXISTS for this query pattern.
Always. Other methods may rival it in terms of performance, when all
other variables are the same, but all of the other methods introduce
either performance problems or other challenges.

SQL WHERE IN (...) sort by order of the list?

Let's say I have query a database with a where clause
WHERE _id IN (5,6,424,2)
Is there any way for the returned cursor to be sorted in the order that the _id's where listed in the list? _id attribute from first to last in Cursor to be 5, 6, 424, 2?
This happens to be on Android through a ContentProvider, but that's probably not relevant.
Select ID list using subquery and join with it:
select t1.*
from t1
inner join
(
select 1 as id, 1 as num
union all select 5, 2
union all select 3, 3
) ids on t1.id = ids.id
order by ids.num
UPD: Code fixed
One approach might be to do separate SQL queries with a UNION between each. You would obviously issue each query in the order you would like it returned to you.
...
order by
case when _id=5 then 1
when _id=6 then 2
end
etc.
You can join it to a virtual table that contains the list required in sort order
select tbl.*
from tbl
inner join (
select 1 as sorter, 5 as value union all
select 2, 6 union all
select 3, 424 union all
select 4, 2) X on tbl._id = X.value
ORDER BY X.sorter
List? You don't have a list! ;)
This:
WHERE _id IN (5,6,424,2)
is mere syntactic sugar for this:
WHERE (
_id = 5
OR _id = 6
OR _id = 424
OR _id = 2
)
SQL has but one data structure, being the table. Your (5,6,424,2) isn't a table either! :)
You could create a table of values but your next problem is that tables do not have any logical ordering. Therefore, as per #cyberkiwi's answer, you'd have to create a column explicitly to model the sort order. And in order to make it explicit to the calling application, ensure you expose this column in the SELECT clause of your query.