generate rownum in select statement without using db specific functions - sql

I want to get the rownumbers in sql select statement but it shouldn't be DB specific query like I cant use rownum of oracle.Please let me know how can i achieve this.
I have table structure as follows pid,emplid,desc as colums and pid and emplid combination will be used as primary key. So suggest the query in this use case.
Thanks,
Shyam

The row_number() function is supported on a lot of the major RDBMS but I don't believe it's in MySQL so it really depends how agnostic you want it to be. Might be best to move it out of the database layer if you want it truly agnostic.
EDIT: valex's method of calculating rownum is probably a better option than moving it out of DB

To do it you table has to have an unique Id- like field - anything to distinguish one row from another. If it is then:
select t1.*,
(select count(id) from t as t2 where t2.id<=t1.id) as row_number
from t as t1 order by Id
UPD: if you have 2 columns to make an order then it will look like:
select t1.*,
(select count(id) from t as t2 where t2.id1<=t1.id1 and t2.id2<=t1.id2)
as row_number
from t as t1 order by Id1,id2

Related

SQL simple GROUP BY query

Is there a way to make a simple GROUP BY query with SQL and not use COUNT,AVG or SUM? I want to show all columns and group it with a single column.
SELECT * FROM [SPC].[dbo].[BoardSFC] GROUP BY boardsn
The query above is working on Mysql but not on SQL, is there a way to achieve this? any suggestion would be great
UPDATE: Here is my data I just need to group them by boardsn and get imulti equals to 1
I thing you just understand 'group data' in a different way than it is implemented in sql server. You simply want rows that have the same value together in the result and that would be ordering not grouping. So maybe what you need is:
SELECT *
FROM [SPC].[dbo].[BoardSFC]
WHERE imulti = 1
ORDER BY boardsn
The query above is working on Mysql but not on SQL, is there a way to achieve this? any suggestion would be great
No, there is not. MySQL only lets you do this because it violates the various SQL standards quite egregiously.
You need to name each column you want in the result-set whenever you use GROUP BY. The SELECT * feature is only provided as a convenience when working with data interactively - in production code you should never use SELECT *.
You could use a TOP 1 WITH TIES combined with a ORDER BY ROW_NUMBER.
SELECT TOP 1 WITH TIES *
FROM [SPC].[dbo].[BoardSFC]
ORDER BY ROW_NUMBER() OVER (PARTITION BY boardsn ORDER BY imulti)
Or more explicitly, use ROW_NUMBER in a sub-query
SELECT *
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY boardsn ORDER BY imulti) as RN
FROM [SPC].[dbo].[BoardSFC]
) q
where RN = 1

Re-indexing a column with either SQL or PL/SQL

I have several tables that use an ID number plus a column called xsequence that are both primary keys. Currently, I have a bunch of data that looks like this:
ID_NUMBER,XSEQUENCE
001,2
001,5
001,8
002,1
002,6
What I need to end up with is:
ID_NUMBER,XSEQUENCE
001,1
001,2
001,3
002,1
002,2
What is the best way of going about starting this? Every time I try, I just end up spinning my wheels.
Try something like this:
select id_number,
row_number() over (partition by id_number order by xsequence) new_xsequence
from yourtable
That's an analytic function really handy for this sort of thing. Using the Partition keyword - "resets" the counter at each id_number. (so 1,2,3 .. then starts again 1,2,3 ... etc.).
(The Partition keyword in analytic functions behaves very similar to the GROUP by keyword)
[edit]
To UPDATE the original table, I actually prefer the MERGE statement - it's a bit simpler syntax wise, and seems a bit more intuitive ;) )
MERGE INTO yourtable base
USING (
select rowid rid,
id_number,
row_number() over (partition by id_number order by xsequence) new_xsequence,
xsequence old_xsequence
from yourtable
) new
ON ( base.rowid = new.rid )
WHEN MATCHED THEN UPDATE
SET base.xsequence = new.new_xsequence
[edit]

SQL query to get single row value from an aggregate

I have an Oracle table with two columns ID and START_DATE, I want to run a query to get the ID of the record with the most recent date, initially i wrote this:
select id from (select * from mytable order by start_date desc) where rownum = 1
Is there a more cleaner and efficient way of doing this? I often run into this pattern in SQL and end up creating a nested query.
SELECT id FROM mytable WHERE start_date = (SELECT MAX(start_date) FROM mytable)
Still a nested query, but more straightforward and also, in my experience, more standard.
This looks to be a pretty clean and efficient solution to me - I don't think you can get any better than that, of course assuming that you've an index on start_date. If you want all ids for the latest start date then froadie's solution is better.

How do I calculate a moving average using MySQL?

I need to do something like:
SELECT value_column1
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
Except in addition to value_column1, I also need to retrieve a moving average of the previous 20 values of value_column1.
Standard SQL is preferred, but I will use MySQL extensions if necessary.
This is just off the top of my head, and I'm on the way out the door, so it's untested. I also can't imagine that it would perform very well on any kind of large data set. I did confirm that it at least runs without an error though. :)
SELECT
value_column1,
(
SELECT
AVG(value_column1) AS moving_average
FROM
Table1 T2
WHERE
(
SELECT
COUNT(*)
FROM
Table1 T3
WHERE
date_column1 BETWEEN T2.date_column1 AND T1.date_column1
) BETWEEN 1 AND 20
)
FROM
Table1 T1
Tom H's approach will work. You can simplify it like this if you have an identity column:
SELECT T1.id, T1.value_column1, avg(T2.value_column1)
FROM table1 T1
INNER JOIN table1 T2 ON T2.Id BETWEEN T1.Id-19 AND T1.Id
I realize that this answer is about 7 years too late. I had a similar requirement and thought I'd share my solution in case it's useful to someone else.
There are some MySQL extensions for technical analysis that include a simple moving average. They're really easy to install and use: https://github.com/mysqludf/lib_mysqludf_ta#readme
Once you've installed the UDF (per instructions in the README), you can include a simple moving average in a select statement like this:
SELECT TA_SMA(value_column1, 20) AS sma_20 FROM table1 ORDER BY datetime_column1
When I had a similar problem, I ended up using temp tables for a variety of reasons, but it made this a lot easier! What I did looks very similar to what you're doing, as far as the schema goes.
Make the schema something like ID identity, start_date, end_date, value. When you select, do a subselect avg of the previous 20 based on the identity ID.
Only do this if you find yourself already using temp tables for other reasons though (I hit the same rows over and over for different metrics, so it was helpful to have the small dataset).
My solution adds a row number in table. The following example code may help:
set #MA_period=5;
select id1,tmp1.date_time,tmp1.c,avg(tmp2.c) from
(select #b:=#b+1 as id1,date_time,c from websource.EURUSD,(select #b:=0) bb order by date_time asc) tmp1,
(select #a:=#a+1 as id2,date_time,c from websource.EURUSD,(select #a:=0) aa order by date_time asc) tmp2
where id1>#MA_period and id1>=id2 and id2>(id1-#MA_period)
group by id1
order by id1 asc,id2 asc
In my experience, Mysql as of 5.5.x tends not to use indexes on dependent selects, whether a subquery or join. This can have a very significant impact on performance where the dependent select criteria change on every row.
Moving average is an example of a query which falls into this category. Execution time may increase with the square of the rows. To avoid this, chose a database engine which can perform indexed look-ups on dependent selects. I find postgres works effectively for this problem.
In mysql 8 window function frame can be used to obtain the averages.
SELECT value_column1, AVG(value_column1) OVER (ORDER BY datetime_column1 ROWS 19 PRECEDING) as ma
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
This calculates the average of the current row and 19 preceding rows.

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.
SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)
There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1
I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).
you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)