Change IN with JOIN to work around limit of 1000

Change IN with JOIN to work around limit of 1000 - sql

I have an oracle query to delete rows in a child table, but the query doesn't work because of too many values in the in clause. Is there a different way I can write this by using join or something to make it work?
delete from PROCESS
where PACKAGE_ID in (select id from PACKAGE where NAME like 'Test%');
Had used * instead of id in the inner select there, so when I switched to id it worked. But I'm still curious if this can be written in a different way, as there is a limit of 1000(?) items in an in clause.

The limit of 1000 items in an IN clause only applies when you "manually" specify them. It doesn't apply when the items are returned by a sub-query.
I think the way you have it now is the way to go.

delete from
PROCESS
where
exists(select 1 from PACKAGE where NAME like 'Test%' and id = PROCESS.id);
An index over (PACKAGE.id, PACKAGE.NAME) would be very helpful to speed up the sub-query.

You could try adding a GROUP BY
delete from PROCESS
where PACKAGE_ID in (select id from PACKAGE where NAME like 'Test%' GROUP BY id);

Had used * instead of id in the inner select there, so when I switched to id it worked.
That will not work, because it selects all columns, whereas you need just one.
If you expand the * you get something like the following, which makes no sense:
where PACKAGE_ID in (select id, something, foo, name from PACKAGE);
But I'm still curious if this can be written in a different way, as there is a limit of 1000(?) items in an in clause.
There is no such limit for a sub-select. In fact, that should be the best way to write this query.
There is a limit (maybe 1000) for an IN list:
where id in (1,2,3,4,5)

Related

Rewrite SQL query to remove duplicate SELECTs in [WHERE xxx IN] condition

I need to execute following query:
DELETE FROM notification
WHERE account_id IN ( SELECT id FROM missing )
OR receiver_id IN ( SELECT id FROM missing )
OR created_by_id IN ( SELECT id FROM missing )
RETURNING id
What is bothering me - is that it has to select same values 3 times.
I am sure that there is a better, proper way of doing it.
Could you please suggest how this query might be rewritten?

You can use an EXISTS condition with an IN:
delete from notification n
where exists (select *
from missing m
where m.id in (n.account_id, n.receiver_id, n.created_by_id))
returning id;
Which is more or less the same as:
delete from notification n
using missing m
where m.id in (n.account_id, n.receiver_id, n.created_by_id)
returning n.id;
However, the majority of the time will be spent by the actual DELETE part, rather than by finding the rows. So unless missing is really huge or a really complicated subquery, I doubt you will see a big performance difference.
After a few simple tests (250000 rows in notifications, 10000 rows in missing) it seems that the original version is way faster than the EXISTS or USING alternative.

How to get a distinct result ordered by a different column? SQL Postgresql

I'm trying to figure the best way to perform this query in postgresql. I have a messages table and I want to grab the last message a user received from each distinct user. I need to select everything from the row.
I would think this is where I want to group by the senders id "msgfromid", but when I do this it complains that I haven't included everything from my select statement in my group by statement, but I only want to group by the one column, not all of them. So if I try to use Distinct on the one column it forces me to order by the 'distinct on' column first ("msgfromid") which prevents me from being able to get the correct row I need (ordered by the last message sent from the sender "msgsenttime").
My goal is to make this as efficient as possible on my server and database.
This is what I have right now, not working. (This is a sub-query of another query I use to join relevant information afterwards but I figure that is irrelevant)
SELECT DISTINCT ON ("msgfromid") "msgfromid", "msgid", "msgtoid", "msgsenttime", "msgreadtime", "msgcontent", "msgreportstatus", "senderun", "recipientun"
FROM "messages"
WHERE "msgtoid" = ?
ORDER BY "msgsenttime" desc, "msgfromid"
I thought maybe if I pre-ordered them in a sub-query it would work but it just seems to randomly pick one anyway, and this can't be very efficient, even if it were to work, since I'm pulling every message out to begin with, right?:
SELECT DISTINCT ON ("msgfromid") "msgfromid", "msgid", "msgtoid", "msgsenttime", "msgreadtime", "msgcontent", "msgreportstatus", "senderun", "recipientun"
FROM
(
SELECT * FROM "messages"
WHERE "msgtoid" = ?
ORDER BY "msgsenttime" desc
) as "mqo"
Thanks for any help.

Your order by keys are in the wrong order:
SELECT DISTINCT ON ("msgfromid") m.*
FROM "messages" m
WHERE "msgtoid" = ?
ORDER BY "msgfromid", "msgsenttime" desc;
For DISTINCT ON, the keys in parentheses need to be the first keys in the ORDER BY.
If you want the final result ordered in a different way, then you need to use a subquery, with a different ORDER BY on the outer query.

Selecting the biggest ZIP code from a column

I want to get the biggest ZIP code in DB. Normally I do this
SELECT *
FROM (
Select * From tbuser ORDER BY zip DESC
)
WHERE rownum = 1
with this code I can get the biggest zip code value without a duplicate row (since zip code is not a primary key).
But the main company at Japan said that I cant use it since when the connection is slow or the DB have very large data, you cant get the right row of it. It will be a great help for me if someone can helps.

I want to get the biggest ZIP code in DB.
If you really only want the zip code, try that:
SELECT MAX(zip) FROM TBUSER;
This will use the index on the zip column (if it exists).
That being said, Oracle is usually smart enough to properly optimize sub-query selection using ROWNUM. Maybe your main company is more concerned about the possible "full table" ̀ORDER BY` in the subquery ? OTH, if the issue is really with "slow network", maybe worth taking some time with your DBA to look on the wire using a network analyzer or some other tool if your approach really leads to "excessive bandwidth consumption". I sincerely doubt about that...
If you want to retrieve the whole row having the maximum zip code here is a slight variation on an other answer (in my opinion, this is one of the rare case for using a NATURAL JOIN):
select * from t
natural join (select max(zip) zip from t);
Of course, in case of duplicates, this will return multiple rows. You will have to combine that with one of the several options posted in the various other answers to return only 1 row.
As an extra solution, and since you are not allowed to use ROWNUM (and assuming row_number is arbitrary forbidden too), you can achieve the desired result using something as contrived as:
select * from t
where rowid = (
select min(t.rowid) rid from t
natural join (select max(zip) zip from t)
);
See http://sqlfiddle.com/#!4/3bd63/5
But honestly, there isn't any serious reason to hope that such query will perform better than the simple ... ORDER BY something DESC) WHERE rownum <= 1 query.

This sounds to me like bad advice (masquerading as a rule) from a newbie data base administrator who doesn't understand what he's looking at. That insight isn't going to help you, though. Rarely does a conversation starting with "you're an obstructionist incompetent" achieve anything.
So, here's the thing. First of all, you need to make sure there's an index on your zip column. It doesn't have to be a primary key.
Second, you can try explaining that Oracle's table servers do, in fact, optimize the ... ORDER BY something DESC) WHERE rownum <= 1 style of query. Their servers do a good job of that. Your use case is very common.
But if that doesn't work on your DBA, try saying "I heard you" and do this.
SELECT * FROM (
SELECT a.*
FROM ( SELECT MAX(zip) zip FROM zip ) b
JOIN ZIP a ON (a.zip = b.zip)
) WHERE rownum <= 1
This will get one row with the highest numbered zip value without the ORDER BY that your DBA mistakenly believes is messing up his server's RAM pool. And, it's reasonably efficient. As long as zip has an index.

As you are looking for a way to get the desired record without rownum now, ...
... here is how to do it from Oracle 12c onward:
select *
from tbuser
order by zip desc fetch first 1 row only;
... and here is how to do it before Oracle 12c:
select *
from (select tbuser.*, row_number() over(order by zip desc) as rn from tbuser)
where rn = 1;
EDIT: As Sylvain Leroux pointed out, it is more work for the dbms to sort all records rather than just find the maximum. Here is a max query without rownum:
select *
from tbuser where rowid =
(select max(rowid) keep (dense_rank last order by zip) from tbuser);
But as Sylvain Leroux also mentioned, it makes also a difference whether there is an index on the column. Some tests I did show that with an index on the column, the analytic functions are slower than the traditional functions. Your original query would just get into the index, go to the highest value, pick the record and then stop. You won't get this any faster. My last mentioned query being quite fast on a none-indexed column is slower than yours on an indexed column.

Your requirements seem arbitrary, but this should give you the result you've requested.
SELECT *
FROM (SELECT * FROM tbuser
WHERE zip = (SELECT MAX(zip) FROM tbuser))
WHERE rownum = 1

OK - try something like this:
SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER);
Fetch a single row from a cursor based on the above statement, then close the cursor. If you're using PL/SQL you could do it like this:
FOR aRow IN (SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER))
LOOP
-- Do something with aRow
-- then force an exit from the loop
EXIT;
END LOOP;
Share and enjoy.

I was wondering that nobody posted this answer yet. I think that is the way, you should do something like that.
SELECT *
FROM (
Select a.*, max(zip) over () max_zip
From tbuser a
)
WHERE zip=max_zip
and rownum = 1

Your query gets exactly one random row of all records having the max zip code. So it cannot be the problem that you retrieve a record with another zip code or more than one record or zero records (as long as there is at least one record in the table).
Maybe Japan simply expects one of the other rows with that zip code? Then you may just have to add another order criteria to get that particular desired row.
Another thought: As they are talking about slow connection speed, it may also be that they enter a new max zip code on one session, query with another and get the old max zip, because the insert statement of the other session hasn't gone through yet. But well, that's just the way this works of course.
BTW: A strange thing to select a maximum zip code. I guess that's just an example to illustrate the problem?

IF you are getting multiple records using MAX function (which is not possible, but in your case you are getting, I don't know how until you post screenshot) then You can use DISTINCT in your sql query to get single record
SELECT DISTINCT MAX(zipcode) FROM TableUSER
SQL FIDDLE

efficiently find subset of records as well as total count

I'm writing a function in ColdFusion that returns the first couple of records that match the user's input, as well as the total count of matching records in the entire database. The function will be used to feed an autocomplete, so speed/efficiency are its top concerns. For example, if the function receives input "bl", it might return {sampleMatches:["blue", "blade", "blunt"], totalMatches:5000}
I attempted to do this in a single query for speed purposes, and ended up with something that looked like this:
select record, count(*) over ()
from table
where criteria like :criteria
and rownum <= :desiredCount
The problem with this solution is that count(*) over () always returns the value of :desiredCount. I saw a similar question to mine here, but my app will not have permissions to create a temp table. So is there a way to solve my problem in one query? Is there a better way to solve it? Thanks!

I'm writing this on top of my head, so you should definitely have to time this, but I believe that using following CTE
only requires you to write the conditions once
only returns the amount of records you specify
has the correct total count added to each record
and is evaluated only once
SQL Statement
WITH q AS (
SELECT record
FROM table
WHERE criteria like :criteria
)
SELECT q1.*, q2.*
FROM q q1
CROSS JOIN (
SELECT COUNT(*) FROM q
) q2
WHERE rownum <= :desiredCount

A nested subquery should return the results you want
select record, cnt
from (select record, count(*) over () cnt
from table
where criteria like :criteria)
where rownum <= :desiredCount
This will, however, force Oracle to completely process the query in order to generate the accurate count. This seems unlikely to be what you want if you're trying to do an autocomplete particularly when Oracle may decide that it would be more efficient to do a table scan on table if :criteria is just b since that predicate isn't selective enough. Are you really sure that you need a completely accurate count of the number of results? Are you sure that your table is small enough/ your system is fast enough/ your predicates are selective enough for that to be a requirement that you could realistically meet? Would it be possible to return a less-expensive (but less-accurate) estimate of the number of rows? Or to limit the count to something smaller (say, 100) and have the UI display something like "and 100+ more results"?

How to get all results, except one row based on a timestamp?

I have an simple question (?) about SQL. I have come across this problem a few times before and I have always solved it, but I'm looking for a more elegant solution and perhaps a faster solution.
The problem is that I would like to select all rows in a table except the one with the max value in a timestampvalue (in this case this is a summary row but it's not marked as this is any way, and it's not releveant to my result).
I could do something like this:
select * from [table] t
where loggedat < (select max(loggedat) from [table] and somecolumn='somevalue')
and somecolumn='somevalue'
But when working with large tables this seems kind of slow. Any suggestions?

If you don't want to change your DB structure, then your query (or one with a slight variation using <> instead of <) is the way to go.
You could add a column IsSummary bit to the table, and always mark the most recent row as true (and all others false). Then your query would change to:
Select * from [table] where IsSummary = 0 and somecolumn = 'somevalue'
This would sacrifice slower speed on inserts (since an insert would also trigger an update of the IsSummary value) in exchange for faster speed on the select query.

If only you don't mind one tiny (4 byte) extra column, then you might possibly go like this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY loggedat DESC) AS rownum
FROM [table] t
WHERE somecolumn = 'somevalue'
/* and all the other filters you want */
) s
WHERE rownum > 1
In case you do mind the extra column, you'll just have to list the necessary columns explicitly in the outer SELECT.

It may not be the elegant SQL query you're looking for, but it would be trivial to do it in Java, PHP, etc, after fetching the results. To make it as simple as possible, use ORDER BY timestamp DESC and discard the first row.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Change IN with JOIN to work around limit of 1000 - sql

The limit of 1000 items in an IN clause only applies when you "manually" specify them. It doesn't apply when the items are returned by a sub-query. I think the way you have it now is the way to go.

delete from PROCESS where exists(select 1 from PACKAGE where NAME like 'Test%' and id = PROCESS.id); An index over (PACKAGE.id, PACKAGE.NAME) would be very helpful to speed up the sub-query.

You could try adding a GROUP BY delete from PROCESS where PACKAGE_ID in (select id from PACKAGE where NAME like 'Test%' GROUP BY id);

Related

Rewrite SQL query to remove duplicate SELECTs in [WHERE xxx IN] condition

How to get a distinct result ordered by a different column? SQL Postgresql

Selecting the biggest ZIP code from a column

efficiently find subset of records as well as total count

How to get all results, except one row based on a timestamp?

Categories

Resources