Writing back a GROUP number in SQL - sql

I have an existing app I can’t modify. It needs to execute a SQL GROUP BY, but cannot. However it can and does read a GroupNumber field from the same table.
What I’m doing now is executing the grouping SQL statement, processing it in code and writing back the GroupNumber to the table so that App can do its thing. What I’d like to do is execute a single SQL statement to do both the grouping and the writeback in a single step. I can’t figure out how to do this, if indeed it’s possible. Simple example:
SELECT FirstName, LastName, Age
FROM Persons
WHERE ....
GROUP BY Age
ORDER BY Age
I execute this, then do
for ( i = 1; i <= result_set.n; i++ )
Sql = “UPDATE Persons
SET GroupNumber = “ + fixed( i )
+ “WHERE Age = “ + fixed( result_set.Age[i] )
I need to do this every time a record gets added to the table (so yes, if someone younger than me gets added, my group number changes - don’t ask).

Clearly you want a trigger. However trigger definitions vary from database server to database server. I'll hazard a guess and say you are using some version of Microsoft SQL Server: the create trigger syntax and a couple of examples can be found at http://msdn.microsoft.com/en-us/library/ms189799.aspx. There might be some small complication with the trigger modifying the same table it is sourcing data from, but I believe you can generally do that in most SQL server databases (SQLite may be one of the few where that is difficult).
Try that and see if that helps.

I'm not really sure what you are after, here is my best guess:
;WITH AllRows AS (--get one row per age, and number them
SELECT
Age, ROW_NUMBER() OVER (PARTITION BY AGE ORDER BY Age) AS RowNumber
FROM Persons
WHERE ...
GROUP BY Age
)
UPDATE p --update all the people, getting their GroupNumber based on their Age's row number
SET GroupNumber=a.RowNumber
FROM Persons p
INNER JOIN AllRows a ON p.Age=a.Age
WHERE GroupNumber IS NULL OR GroupNumber!=a.RowNumber
I use SQL Server, but this is fairly standards based code.

Related

How to optimize SQL delete query with subselect for Firebird?

The following query is extremely slow. It seems the subselect is executed for each row in the table?!
delete from HISTORY
where ID in (
select ID from (
select ID, ROW_NUMBER() over(partition by SOURCE order by ID desc) as NUM from HISTORY
) where NUM > 100
);
This is a cleanup query. It should delete everything but the 100 most recent records per SOURCE.
The time required seems to depend only on the number of records in the table and not on how many records are to be deleted. Even with only 10,000 records it takes several minutes. However, if I only execute the sub-select, it is fast.
Of course there is a PK on ID and a FK and index on SOURCE (both are Integer columns).
Firebird 3 added DELETE option into MERGE clause. It was first mentioned in Release Notes. It is now properly documented in Firebird 3 SQL Reference.
Modelling by the examples there the cleanup query would look like that:
merge into HISTORY HDel
using ( select ID, SOURCE, ROW_NUMBER() over
(partition by SOURCE order by ID desc) as NUM
from HISTORY ) HVal
on (HVal.NUM > 100) and (HVal.ID = HDel.ID) and (HVal.Source = HDel.Source)
WHEN MATCHED THEN DELETE
In your specific database (HVal.Source = HDel.Source) filtering seems redundant, but i still decided to add it to make the query as generic as possibe, for future readers. Better safe than sorry :-)
Firebird 2.x did not provide for that feature, and with FB3's MERGE/DELETE and Window Functions features missing one can fall back to explicit imperative programming and write good old loops. It would take writing and executing a small PSQL program (either a persistent named Stored Procedure or ad hoc EXECUTE BLOCK statement) with making explicit loop over SOURCE values.
Something like (i did not syntax-check it, just scratching from memory):
execute block as
declare variable SRC_VAL integer;
declare variable ID_VAL integer;
begin
for select distinct SOURCE from HISTORY into :SRC_VAL do begin
:ID_VAL = NULL;
select first(1) skip(100) ID from HISTORY
where SOURCE = :SRC_VAL
order by ID desc
into :ID_VAL;
if (:ID_VAL IS NOT NULL) then
delete from HISTORY
where SOURCE = :SRC_VAL
and ID <= :ID_VAL;
end
end

Selecting the biggest ZIP code from a column

I want to get the biggest ZIP code in DB. Normally I do this
SELECT *
FROM (
Select * From tbuser ORDER BY zip DESC
)
WHERE rownum = 1
with this code I can get the biggest zip code value without a duplicate row (since zip code is not a primary key).
But the main company at Japan said that I cant use it since when the connection is slow or the DB have very large data, you cant get the right row of it. It will be a great help for me if someone can helps.
I want to get the biggest ZIP code in DB.
If you really only want the zip code, try that:
SELECT MAX(zip) FROM TBUSER;
This will use the index on the zip column (if it exists).
That being said, Oracle is usually smart enough to properly optimize sub-query selection using ROWNUM. Maybe your main company is more concerned about the possible "full table" ̀ORDER BY` in the subquery ? OTH, if the issue is really with "slow network", maybe worth taking some time with your DBA to look on the wire using a network analyzer or some other tool if your approach really leads to "excessive bandwidth consumption". I sincerely doubt about that...
If you want to retrieve the whole row having the maximum zip code here is a slight variation on an other answer (in my opinion, this is one of the rare case for using a NATURAL JOIN):
select * from t
natural join (select max(zip) zip from t);
Of course, in case of duplicates, this will return multiple rows. You will have to combine that with one of the several options posted in the various other answers to return only 1 row.
As an extra solution, and since you are not allowed to use ROWNUM (and assuming row_number is arbitrary forbidden too), you can achieve the desired result using something as contrived as:
select * from t
where rowid = (
select min(t.rowid) rid from t
natural join (select max(zip) zip from t)
);
See http://sqlfiddle.com/#!4/3bd63/5
But honestly, there isn't any serious reason to hope that such query will perform better than the simple ... ORDER BY something DESC) WHERE rownum <= 1 query.
This sounds to me like bad advice (masquerading as a rule) from a newbie data base administrator who doesn't understand what he's looking at. That insight isn't going to help you, though. Rarely does a conversation starting with "you're an obstructionist incompetent" achieve anything.
So, here's the thing. First of all, you need to make sure there's an index on your zip column. It doesn't have to be a primary key.
Second, you can try explaining that Oracle's table servers do, in fact, optimize the ... ORDER BY something DESC) WHERE rownum <= 1 style of query. Their servers do a good job of that. Your use case is very common.
But if that doesn't work on your DBA, try saying "I heard you" and do this.
SELECT * FROM (
SELECT a.*
FROM ( SELECT MAX(zip) zip FROM zip ) b
JOIN ZIP a ON (a.zip = b.zip)
) WHERE rownum <= 1
This will get one row with the highest numbered zip value without the ORDER BY that your DBA mistakenly believes is messing up his server's RAM pool. And, it's reasonably efficient. As long as zip has an index.
As you are looking for a way to get the desired record without rownum now, ...
... here is how to do it from Oracle 12c onward:
select *
from tbuser
order by zip desc fetch first 1 row only;
... and here is how to do it before Oracle 12c:
select *
from (select tbuser.*, row_number() over(order by zip desc) as rn from tbuser)
where rn = 1;
EDIT: As Sylvain Leroux pointed out, it is more work for the dbms to sort all records rather than just find the maximum. Here is a max query without rownum:
select *
from tbuser where rowid =
(select max(rowid) keep (dense_rank last order by zip) from tbuser);
But as Sylvain Leroux also mentioned, it makes also a difference whether there is an index on the column. Some tests I did show that with an index on the column, the analytic functions are slower than the traditional functions. Your original query would just get into the index, go to the highest value, pick the record and then stop. You won't get this any faster. My last mentioned query being quite fast on a none-indexed column is slower than yours on an indexed column.
Your requirements seem arbitrary, but this should give you the result you've requested.
SELECT *
FROM (SELECT * FROM tbuser
WHERE zip = (SELECT MAX(zip) FROM tbuser))
WHERE rownum = 1
OK - try something like this:
SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER);
Fetch a single row from a cursor based on the above statement, then close the cursor. If you're using PL/SQL you could do it like this:
FOR aRow IN (SELECT *
FROM TBUSER
WHERE ZIP = (SELECT MAX(ZIP) FROM TBUSER))
LOOP
-- Do something with aRow
-- then force an exit from the loop
EXIT;
END LOOP;
Share and enjoy.
I was wondering that nobody posted this answer yet. I think that is the way, you should do something like that.
SELECT *
FROM (
Select a.*, max(zip) over () max_zip
From tbuser a
)
WHERE zip=max_zip
and rownum = 1
Your query gets exactly one random row of all records having the max zip code. So it cannot be the problem that you retrieve a record with another zip code or more than one record or zero records (as long as there is at least one record in the table).
Maybe Japan simply expects one of the other rows with that zip code? Then you may just have to add another order criteria to get that particular desired row.
Another thought: As they are talking about slow connection speed, it may also be that they enter a new max zip code on one session, query with another and get the old max zip, because the insert statement of the other session hasn't gone through yet. But well, that's just the way this works of course.
BTW: A strange thing to select a maximum zip code. I guess that's just an example to illustrate the problem?
IF you are getting multiple records using MAX function (which is not possible, but in your case you are getting, I don't know how until you post screenshot) then You can use DISTINCT in your sql query to get single record
SELECT DISTINCT MAX(zipcode) FROM TableUSER
SQL FIDDLE

SQL averaging update query

I got an update query that I have to do and I'm struggling with it.
I have 3 columns, ID, Income, AverageIncome.
ID: string but is ordered alphabetically.
AverageIncome-. averageIncome of the previous 10 Income entries.
All the values of the AverageIncome are incorrect and I need to update them to be correct.
Any Tip?
Thanks!
In MySQL syntax, and assuming that the order is defined by the id column:
CREATE TEMPORARY TABLE my_table_copy AS SELECT * FROM my_table;
UPDATE my_table t
SET average_income = (SELECT AVG(tc.income)
FROM my_table_copy tc
WHERE tc.id < t.id
ORDER BY tc.id DESC
LIMIT 10
);
DROP TABLE my_table_copy;
Of course you will have to make sure that CREATE TABLE and UPDATE execute atomically, i.e. without any modification of the data between one an the other.
Also keep in mind that this is not a very good design, as other users already pointed out. You will have redundancy in your data. You might be better off with a view in this case.

How to find column information for an aggregate grouping

I have a complicated query written on SQL Server 2000 which in part contains a join onto a derived table. This table is unfortunately not returning exactly how I desired as the underlying data differs to what I expected. Say the data are like this:
USERID,OS,DATEINSTALLED
237,win,01/01/1980
237,dos,01/02/1978
237,lin,08/08/2002
132,win,03/03/1982
132,dos,03/07/2002
132,mac,03/07/2002
Then my query looked as so:
SELECT USERID, DATEINSTALLED = max(DATEINSTALLED)
FROM INSTALLATIONS
GROUP BY USERID
Which would give a result set of
237,08/08/2002
132,03/07/2002
But what I require is a result set of:
237,lin,08/08/2002
132,dos,03/07/2002
OR
237,lin,08/08/2002
132,mac,03/07/2002
I'm not really fussed if it picks up mac or dos but it must not give 3 rows; as what I need is one row per userid, with a max date and "a" valid OS for that combination. So mac or dos are valid, but win is not (for user 132).
As it's a derived table as part of a more complicated query I need to keep it as clean and simple as possible due to execution time (source table is a few hundred thousand rows in size). As implied by the tags I'm limited to SQL Server 2000.
Try this:
SELECT USERID, OS, DATEINSTALLED
FROM INSTALLATIONS
JOIN (
SELECT USERID, DATEINSTALLED = max(DATEINSTALLED)
FROM INSTALLATIONS
GROUP BY USERID
) AS T
ON INSTALLATIONS.USERID = T.USERID AND INSTALLATIONS.DATEINSTALLED = T.DATEINSTALLED

SQL select groups of distinct items in prepared statement?

I have a batch job that I run on a table which I'm sure I could write as a prepared statement. Currently it's all in Java and no doubt less efficient than it could be. For a table like so:
CREATE TABLE thing (
`tag` varchar,
`document` varchar,
`weight` float,
)
I want to create a new table that contains the top N entries for every tag. Currently I do this:
create new table with same schema
select distinct tag
for each tag:
select * limit N insert into the new table
This requires executing a query to get the distinct tags, then selecting the top N items for that tag and inserting them... all very inefficient.
Is there a stored procedure (or even a simple query) that I could use to do this? If dialect is important, I'm using MySQL.
(And yes, I do have my indexes sorted!)
Cheers
Joe
I haven't done this in a while (spoiled by CTE's in SQL Server), and I'm assuming that your data is ordered by weight; try
SELECT tag, document, weight
FROM thing
WHERE (SELECT COUNT(*)
FROM thing as t
WHERE t.tag = thing.tag AND t.weight < thing.weight
) < N;
I think that will do it.
EDIT: corrected error in code; need < N, not <= N.
If you were using SQL Server, I would suggest using the ROW_NUMBER function, grouped by tag, and select where row_number < N. (So in other words, order and number the rows for each tag according to their position in the tag group, then pick the top N rows from each group.) I found an article about simulating the ROW_NUMBER function in MySQL here:
http://www.xaprb.com/blog/2006/12/02/how-to-number-rows-in-mysql/
See if this helps you out!