Selecting the biggest ZIP code from a column - sql

I want to get the biggest ZIP code in DB. Normally I do this
SELECT *
FROM (
Select * From tbuser ORDER BY zip DESC
)
WHERE rownum = 1
with this code I can get the biggest zip code value without a duplicate row (since zip code is not a primary key).
But the main company at Japan said that I cant use it since when the connection is slow or the DB have very large data, you cant get the right row of it. It will be a great help for me if someone can helps.

I want to get the biggest ZIP code in DB.
If you really only want the zip code, try that:
SELECT MAX(zip) FROM TBUSER;
This will use the index on the zip column (if it exists).
That being said, Oracle is usually smart enough to properly optimize sub-query selection using ROWNUM. Maybe your main company is more concerned about the possible "full table" ̀€ORDER BY` in the subquery ? OTH, if the issue is really with "slow network", maybe worth taking some time with your DBA to look on the wire using a network analyzer or some other tool if your approach really leads to "excessive bandwidth consumption". I sincerely doubt about that...
If you want to retrieve the whole row having the maximum zip code here is a slight variation on an other answer (in my opinion, this is one of the rare case for using a NATURAL JOIN):
select * from t
natural join (select max(zip) zip from t);
Of course, in case of duplicates, this will return multiple rows. You will have to combine that with one of the several options posted in the various other answers to return only 1 row.
As an extra solution, and since you are not allowed to use ROWNUM (and assuming row_number is arbitrary forbidden too), you can achieve the desired result using something as contrived as:
select * from t
where rowid = (
select min(t.rowid) rid from t
natural join (select max(zip) zip from t)
);
See http://sqlfiddle.com/#!4/3bd63/5
But honestly, there isn't any serious reason to hope that such query will perform better than the simple ... ORDER BY something DESC) WHERE rownum <= 1 query.

This sounds to me like bad advice (masquerading as a rule) from a newbie data base administrator who doesn't understand what he's looking at. That insight isn't going to help you, though. Rarely does a conversation starting with "you're an obstructionist incompetent" achieve anything.
So, here's the thing. First of all, you need to make sure there's an index on your zip column. It doesn't have to be a primary key.
Second, you can try explaining that Oracle's table servers do, in fact, optimize the ... ORDER BY something DESC) WHERE rownum <= 1 style of query. Their servers do a good job of that. Your use case is very common.
But if that doesn't work on your DBA, try saying "I heard you" and do this.
SELECT * FROM (
SELECT a.*
FROM ( SELECT MAX(zip) zip FROM zip ) b
JOIN ZIP a ON (a.zip = b.zip)
) WHERE rownum <= 1
This will get one row with the highest numbered zip value without the ORDER BY that your DBA mistakenly believes is messing up his server's RAM pool. And, it's reasonably efficient. As long as zip has an index.

As you are looking for a way to get the desired record without rownum now, ...
... here is how to do it from Oracle 12c onward:
select *
from tbuser
order by zip desc fetch first 1 row only;
... and here is how to do it before Oracle 12c:
select *
from (select tbuser.*, row_number() over(order by zip desc) as rn from tbuser)
where rn = 1;
EDIT: As Sylvain Leroux pointed out, it is more work for the dbms to sort all records rather than just find the maximum. Here is a max query without rownum:
select *
from tbuser where rowid =
(select max(rowid) keep (dense_rank last order by zip) from tbuser);
But as Sylvain Leroux also mentioned, it makes also a difference whether there is an index on the column. Some tests I did show that with an index on the column, the analytic functions are slower than the traditional functions. Your original query would just get into the index, go to the highest value, pick the record and then stop. You won't get this any faster. My last mentioned query being quite fast on a none-indexed column is slower than yours on an indexed column.

Your requirements seem arbitrary, but this should give you the result you've requested.
SELECT *
FROM (SELECT * FROM tbuser
WHERE zip = (SELECT MAX(zip) FROM tbuser))
WHERE rownum = 1

OK - try something like this:
SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER);
Fetch a single row from a cursor based on the above statement, then close the cursor. If you're using PL/SQL you could do it like this:
FOR aRow IN (SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER))
LOOP
-- Do something with aRow
-- then force an exit from the loop
EXIT;
END LOOP;
Share and enjoy.

I was wondering that nobody posted this answer yet. I think that is the way, you should do something like that.
SELECT *
FROM (
Select a.*, max(zip) over () max_zip
From tbuser a
)
WHERE zip=max_zip
and rownum = 1

Your query gets exactly one random row of all records having the max zip code. So it cannot be the problem that you retrieve a record with another zip code or more than one record or zero records (as long as there is at least one record in the table).
Maybe Japan simply expects one of the other rows with that zip code? Then you may just have to add another order criteria to get that particular desired row.
Another thought: As they are talking about slow connection speed, it may also be that they enter a new max zip code on one session, query with another and get the old max zip, because the insert statement of the other session hasn't gone through yet. But well, that's just the way this works of course.
BTW: A strange thing to select a maximum zip code. I guess that's just an example to illustrate the problem?

IF you are getting multiple records using MAX function (which is not possible, but in your case you are getting, I don't know how until you post screenshot) then You can use DISTINCT in your sql query to get single record
SELECT DISTINCT MAX(zipcode) FROM TableUSER
SQL FIDDLE

Related

efficiently find subset of records as well as total count

I'm writing a function in ColdFusion that returns the first couple of records that match the user's input, as well as the total count of matching records in the entire database. The function will be used to feed an autocomplete, so speed/efficiency are its top concerns. For example, if the function receives input "bl", it might return {sampleMatches:["blue", "blade", "blunt"], totalMatches:5000}
I attempted to do this in a single query for speed purposes, and ended up with something that looked like this:
select record, count(*) over ()
from table
where criteria like :criteria
and rownum <= :desiredCount
The problem with this solution is that count(*) over () always returns the value of :desiredCount. I saw a similar question to mine here, but my app will not have permissions to create a temp table. So is there a way to solve my problem in one query? Is there a better way to solve it? Thanks!
I'm writing this on top of my head, so you should definitely have to time this, but I believe that using following CTE
only requires you to write the conditions once
only returns the amount of records you specify
has the correct total count added to each record
and is evaluated only once
SQL Statement
WITH q AS (
SELECT record
FROM table
WHERE criteria like :criteria
)
SELECT q1.*, q2.*
FROM q q1
CROSS JOIN (
SELECT COUNT(*) FROM q
) q2
WHERE rownum <= :desiredCount
A nested subquery should return the results you want
select record, cnt
from (select record, count(*) over () cnt
from table
where criteria like :criteria)
where rownum <= :desiredCount
This will, however, force Oracle to completely process the query in order to generate the accurate count. This seems unlikely to be what you want if you're trying to do an autocomplete particularly when Oracle may decide that it would be more efficient to do a table scan on table if :criteria is just b since that predicate isn't selective enough. Are you really sure that you need a completely accurate count of the number of results? Are you sure that your table is small enough/ your system is fast enough/ your predicates are selective enough for that to be a requirement that you could realistically meet? Would it be possible to return a less-expensive (but less-accurate) estimate of the number of rows? Or to limit the count to something smaller (say, 100) and have the UI display something like "and 100+ more results"?

Change IN with JOIN to work around limit of 1000

I have an oracle query to delete rows in a child table, but the query doesn't work because of too many values in the in clause. Is there a different way I can write this by using join or something to make it work?
delete from PROCESS
where PACKAGE_ID in (select id from PACKAGE where NAME like 'Test%');
Had used * instead of id in the inner select there, so when I switched to id it worked. But I'm still curious if this can be written in a different way, as there is a limit of 1000(?) items in an in clause.
The limit of 1000 items in an IN clause only applies when you "manually" specify them. It doesn't apply when the items are returned by a sub-query.
I think the way you have it now is the way to go.
delete from
PROCESS
where
exists(select 1 from PACKAGE where NAME like 'Test%' and id = PROCESS.id);
An index over (PACKAGE.id, PACKAGE.NAME) would be very helpful to speed up the sub-query.
You could try adding a GROUP BY
delete from PROCESS
where PACKAGE_ID in (select id from PACKAGE where NAME like 'Test%' GROUP BY id);
Had used * instead of id in the inner select there, so when I switched to id it worked.
That will not work, because it selects all columns, whereas you need just one.
If you expand the * you get something like the following, which makes no sense:
where PACKAGE_ID in (select id, something, foo, name from PACKAGE);
But I'm still curious if this can be written in a different way, as there is a limit of 1000(?) items in an in clause.
There is no such limit for a sub-select. In fact, that should be the best way to write this query.
There is a limit (maybe 1000) for an IN list:
where id in (1,2,3,4,5)

How to get all results, except one row based on a timestamp?

I have an simple question (?) about SQL. I have come across this problem a few times before and I have always solved it, but I'm looking for a more elegant solution and perhaps a faster solution.
The problem is that I would like to select all rows in a table except the one with the max value in a timestampvalue (in this case this is a summary row but it's not marked as this is any way, and it's not releveant to my result).
I could do something like this:
select * from [table] t
where loggedat < (select max(loggedat) from [table] and somecolumn='somevalue')
and somecolumn='somevalue'
But when working with large tables this seems kind of slow. Any suggestions?
If you don't want to change your DB structure, then your query (or one with a slight variation using <> instead of <) is the way to go.
You could add a column IsSummary bit to the table, and always mark the most recent row as true (and all others false). Then your query would change to:
Select * from [table] where IsSummary = 0 and somecolumn = 'somevalue'
This would sacrifice slower speed on inserts (since an insert would also trigger an update of the IsSummary value) in exchange for faster speed on the select query.
If only you don't mind one tiny (4 byte) extra column, then you might possibly go like this:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY loggedat DESC) AS rownum
FROM [table] t
WHERE somecolumn = 'somevalue'
/* and all the other filters you want */
) s
WHERE rownum > 1
In case you do mind the extra column, you'll just have to list the necessary columns explicitly in the outer SELECT.
It may not be the elegant SQL query you're looking for, but it would be trivial to do it in Java, PHP, etc, after fetching the results. To make it as simple as possible, use ORDER BY timestamp DESC and discard the first row.

Remove duplicate rows - Impossible to find a decisive answer

You'd immediately think I went straight to here to ask my question but I googled an awful lot to not find a decisive answer.
Facts: I have a table with 3.3 million rows, 20 columns.
The first row is the primary key thus unique.
I have to remove all the rows where column 2 till column 11 is duplicate. In fact a basic question but so much different approaches whereas everyone seeks the same solution in the end, removing the duplicates.
I was personally thinking about GROUP BY HAVING COUNT(*) > 1
Is that the way to go or what do you suggest?
Thanks a lot in advance!
L
As a generic answer:
WITH cte AS (
SELECT ROW_NUMBER() OVER (
PARTITION BY <groupbyfield> ORDER BY <tiebreaker>) as rn
FROM Table)
DELETE FROM cte
WHERE rn > 1;
I find this more powerful and flexible than the GROUP BY ... HAVING. In fact, GROUP BY ... HAVING only gives you the duplicates, you're still left with the 'trivial' task of choosing a 'keeper' amongst the duplicates.
ROW_NUMBER OVER (...) gives more control over how to distinguish among duplicates (the tiebreaker) and allows for behavior like 'keep first 3 of the duplicates', not only 'keep just 1', which is a behavior really hard to do with GROUP BY ... HAVING.
The other part of your question is how to approach this for 3.3M rows. Well, 3.3M is not really that big, but I would still recommend doing this in batches. Delete TOP 10000 at a time, otherwise you'll push a huge transaction into the log and might overwhelm your log drives.
And final question is whether this will perform acceptably. It depends on your schema. IF the ROW_NUMBER() has to scan the entire table and spool to count, and you have to repeat this in batches for N times, then it won't perform. An appropriate index will help. But I can't say anything more, not knowing the exact schema involved (structure of clustered index/heap, all non-clustered indexes etc).
Group by the fields you want to be unique, and get an aggregate value (like min) for your pk field. Then insert those results into a new table.
If you have SQL Server 2005 or newer, then the easiest way would be to use a CTE (Common Table Expression).
You need to know what criteria you want to "partition" your data by - e.g. create partitions of data that is considered identical/duplicate - and then you need to order those partitions by something - e.g. a sequence ID, a date/time or something.
You didn't provide much details about your tables - so let me just give you a sample:
;WITH Duplicates AS
(
SELECT
OrderID,
ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate DESC) AS RowN
FROM
dbo.Orders
)
DELETE FROM dbo.Orders
WHERE RowN > 1
The CTE ( WITH ... AS :... ) gives you an "inline view" for the next SQL statement - it's not persisted or anything - it just lives for that next statement and then it's gone.
Basically, I'm "grouping" (partitioning) my Orders by CustomerID, and ordering by OrderDate. So for each CustomerID, I get a new "group" of data, which gets a row number starting with 1. The ORDER BY OrderDate DESC gives the newest order for each customer the RowN = 1 value - this is the one order I keep.
All other orders for each customer are deleted based on the CTE (the WITH..... expression).
You'll need to adapt this for your own situation, obviously - but the CTE with the PARTITION BY and ROW_NUMBER() are a very reliable and easy technique to get rid of duplicates.
If you don't want to deal with a new table delete then just use DELETE TOP(1). Use a subquery to get all the ids of rows that are duplicates and then use the delete top to delete where there is multiple rows. You might have to run more than once if there are more than one duplicate but you get the point.
DELETE TOP(1) FROM Table
WHERE ID IN (SELECT ID FROM Table GROUP BY Field HAVING COUNT(*) > 1)
You get the idea hopefully. This is just some pseudo code to help demonstrate.

which query is more preferable and why

i am trying to teach myself SQL and of course I would like to follow best practices.
I have created two querys to find the latest record :
select * from AppSurvey where
DateLastUsed >= ( SELECT MAX(DateLastUsed) FROM AppSurvey)
and
select top 1 * from AppSurvey order by DateLastUsed desc
is one of these methods more efficent than the other or does it really matter
There is a similiar post on this site to what you are trying to get at.
For autoincrement fields: MAX(ID) vs TOP 1 ID ORDER BY ID DESC
The preferred answer seems to be: "In theory, they will use same plans and run almost same time"
The first one could get more than one row, if your DateLastUsed column isn't unique.