SQL query to get single row value from an aggregate - sql

I have an Oracle table with two columns ID and START_DATE, I want to run a query to get the ID of the record with the most recent date, initially i wrote this:
select id from (select * from mytable order by start_date desc) where rownum = 1
Is there a more cleaner and efficient way of doing this? I often run into this pattern in SQL and end up creating a nested query.

SELECT id FROM mytable WHERE start_date = (SELECT MAX(start_date) FROM mytable)
Still a nested query, but more straightforward and also, in my experience, more standard.

This looks to be a pretty clean and efficient solution to me - I don't think you can get any better than that, of course assuming that you've an index on start_date. If you want all ids for the latest start date then froadie's solution is better.

Related

Remove case insensitive duplicates in sql (postgres)

I have a postgresql database, and I'm trying to delete (or even just get the ids) of the older of the duplicates I have in my table, but only those who are because of case sensitivity, for example helLo and hello.
The table is quite large and my nested query takes a really long time, I wonder if there is a better, more efficient way to do my query in one go, and not split it up to multiple queries, cause there's a lot of ids in question
SELECT * FROM some_table AS out
WHERE (SELECT count(*) FROM some_table AS in
WHERE out.text != in.text
AND LOWER(in.text) = LOWER(out.text)
AND in.created_at > out.created_at) > 1
Thanks!
Can you try
SELECT LOWER(text), ROW_NUMBER() OVER( PARTITION by LOWER(text) ORDER by created_at ) as rn
FROM some_table
You can then use the rn column as a filter
To help this query, create an expression index on LOWER(text). Include created_at in the index to help the date comparisons.
CREATE INDEX text_lower ON some_table(LOWER(text), created_at);
It's hard to test this without your data, though.

generate rownum in select statement without using db specific functions

I want to get the rownumbers in sql select statement but it shouldn't be DB specific query like I cant use rownum of oracle.Please let me know how can i achieve this.
I have table structure as follows pid,emplid,desc as colums and pid and emplid combination will be used as primary key. So suggest the query in this use case.
Thanks,
Shyam
The row_number() function is supported on a lot of the major RDBMS but I don't believe it's in MySQL so it really depends how agnostic you want it to be. Might be best to move it out of the database layer if you want it truly agnostic.
EDIT: valex's method of calculating rownum is probably a better option than moving it out of DB
To do it you table has to have an unique Id- like field - anything to distinguish one row from another. If it is then:
select t1.*,
(select count(id) from t as t2 where t2.id<=t1.id) as row_number
from t as t1 order by Id
UPD: if you have 2 columns to make an order then it will look like:
select t1.*,
(select count(id) from t as t2 where t2.id1<=t1.id1 and t2.id2<=t1.id2)
as row_number
from t as t1 order by Id1,id2

How does sql optimization work internally?

My previous question:
Date of max id: sql/oracle optimization
In my previous question, I was finding different ways of finding the date of the record with the highest id number. Below are several of the offered solutions, and their 'cost' as calculated by explain plan.
select date from table where id in (
select max(id) from table)
has a cost of 8
select date from table where rownum < 2 order by id desc;
has a cost of 5
select date from (select date from table order by id desc) where rownum < 2;
also has a cost of 5
with ranked_table as (select rownum as rn, date from table order by id desc)
select date from ranked_table where rn = 1;
has a cost of 906665
SELECT t1.date
FROM table t1
LEFT OUTER JOIN table t2
ON t1.id < t2.id
WHERE t2.id IS NULL;
has a cost of 1438619
Obviously the index on id is doing its job. But I was wondering, in what cases would the last two perform at least as well, if not better? I want to understand the benefits of doing it that way.
This was done in Oracle. All varieties can be discussed, but kindly say what your answer applies to.
Use solution #1 if you want the most portable SQL that will work on a wide variety of other brands of RDBMS (i.e. not all brands support rownum):
select date from table where id in (select max(id) from table);
Use solution #3 if you want the most efficient solution for Oracle:
select date from (select date from table order by id desc) where rownum < 2;
Note that solution #2 doesn't always give the right answer, because it returns the "first" two rows before it has sorted them by id. If this happens to return the rows with the highest id values, it's only by coincidence.
select date from table where rownum < 2 order by id desc;
Regarding the more complex queries #4 and #5 that give such a high cost, I agree I wouldn't recommend using them for such a simple task as fetching the row with the highest id. But understanding how to use subquery factoring and self-joins can be useful for solving other more complex types of queries, where the simple solutions simply don't do the job.
Example: given a hierarchy of threaded forum comments, show the "hottest" comments with the most direct replies.
Almost all decent databases have introduced instructions called optimizer hints which are not portable, there are default costs on joining tables, you can advice the query optimizer to use nested loop joins or dynamic table hash joins. A good explaination for oracle you find in oracle performance tuning guide

How do I calculate a moving average using MySQL?

I need to do something like:
SELECT value_column1
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
Except in addition to value_column1, I also need to retrieve a moving average of the previous 20 values of value_column1.
Standard SQL is preferred, but I will use MySQL extensions if necessary.
This is just off the top of my head, and I'm on the way out the door, so it's untested. I also can't imagine that it would perform very well on any kind of large data set. I did confirm that it at least runs without an error though. :)
SELECT
value_column1,
(
SELECT
AVG(value_column1) AS moving_average
FROM
Table1 T2
WHERE
(
SELECT
COUNT(*)
FROM
Table1 T3
WHERE
date_column1 BETWEEN T2.date_column1 AND T1.date_column1
) BETWEEN 1 AND 20
)
FROM
Table1 T1
Tom H's approach will work. You can simplify it like this if you have an identity column:
SELECT T1.id, T1.value_column1, avg(T2.value_column1)
FROM table1 T1
INNER JOIN table1 T2 ON T2.Id BETWEEN T1.Id-19 AND T1.Id
I realize that this answer is about 7 years too late. I had a similar requirement and thought I'd share my solution in case it's useful to someone else.
There are some MySQL extensions for technical analysis that include a simple moving average. They're really easy to install and use: https://github.com/mysqludf/lib_mysqludf_ta#readme
Once you've installed the UDF (per instructions in the README), you can include a simple moving average in a select statement like this:
SELECT TA_SMA(value_column1, 20) AS sma_20 FROM table1 ORDER BY datetime_column1
When I had a similar problem, I ended up using temp tables for a variety of reasons, but it made this a lot easier! What I did looks very similar to what you're doing, as far as the schema goes.
Make the schema something like ID identity, start_date, end_date, value. When you select, do a subselect avg of the previous 20 based on the identity ID.
Only do this if you find yourself already using temp tables for other reasons though (I hit the same rows over and over for different metrics, so it was helpful to have the small dataset).
My solution adds a row number in table. The following example code may help:
set #MA_period=5;
select id1,tmp1.date_time,tmp1.c,avg(tmp2.c) from
(select #b:=#b+1 as id1,date_time,c from websource.EURUSD,(select #b:=0) bb order by date_time asc) tmp1,
(select #a:=#a+1 as id2,date_time,c from websource.EURUSD,(select #a:=0) aa order by date_time asc) tmp2
where id1>#MA_period and id1>=id2 and id2>(id1-#MA_period)
group by id1
order by id1 asc,id2 asc
In my experience, Mysql as of 5.5.x tends not to use indexes on dependent selects, whether a subquery or join. This can have a very significant impact on performance where the dependent select criteria change on every row.
Moving average is an example of a query which falls into this category. Execution time may increase with the square of the rows. To avoid this, chose a database engine which can perform indexed look-ups on dependent selects. I find postgres works effectively for this problem.
In mysql 8 window function frame can be used to obtain the averages.
SELECT value_column1, AVG(value_column1) OVER (ORDER BY datetime_column1 ROWS 19 PRECEDING) as ma
FROM table1
WHERE datetime_column1 >= '2009-01-01 00:00:00'
ORDER BY datetime_column1;
This calculates the average of the current row and 19 preceding rows.

Aggregate functions in WHERE clause in SQLite

Simply put, I have a table with, among other things, a column for timestamps. I want to get the row with the most recent (i.e. greatest value) timestamp. Currently I'm doing this:
SELECT * FROM table ORDER BY timestamp DESC LIMIT 1
But I'd much rather do something like this:
SELECT * FROM table WHERE timestamp=max(timestamp)
However, SQLite rejects this query:
SQL error: misuse of aggregate function max()
The documentation confirms this behavior (bottom of page):
Aggregate functions may only be used in a SELECT statement.
My question is: is it possible to write a query to get the row with the greatest timestamp without ordering the select and limiting the number of returned rows to 1? This seems like it should be possible, but I guess my SQL-fu isn't up to snuff.
SELECT * from foo where timestamp = (select max(timestamp) from foo)
or, if SQLite insists on treating subselects as sets,
SELECT * from foo where timestamp in (select max(timestamp) from foo)
There are many ways to skin a cat.
If you have an Identity Column that has an auto-increment functionality, a faster query would result if you return the last record by ID, due to the indexing of the column, unless of course you wish to put an index on the timestamp column.
SELECT * FROM TABLE ORDER BY ID DESC LIMIT 1
I think I've answered this question 5 times in the past week now, but I'm too tired to find a link to one of those right now, so here it is again...
SELECT
*
FROM
table T1
LEFT OUTER JOIN table T2 ON
T2.timestamp > T1.timestamp
WHERE
T2.timestamp IS NULL
You're basically looking for the row where no other row matches that is later than it.
NOTE: As pointed out in the comments, this method will not perform as well in this kind of situation. It will usually work better (for SQL Server at least) in situations where you want the last row for each customer (as an example).
you can simply do
SELECT *, max(timestamp) FROM table
Edit:
As aggregate function can't be used like this so it gives error. I guess what SquareCog had suggested was the best thing to do
SELECT * FROM table WHERE timestamp = (select max(timestamp) from table)