SQL getting record for maximum value: why not use "ORDER BY"? - sql

I know that the "select record corresponding to the maximum value for a field" has been exhaustively answered, but I was wondering why nobody suggested using an ORDER BY clause to get the right row.
For example, I have this table:
| other_field | target_field |
| 1 | 15 |
| 2 | 25 |
| 3 | 20 |
and I want to find the other_field value corresponding to the maximum target_field (e.g. in this case, I want to find 2).
Many people suggested using GROUP and JOIN, however my first idea was to use:
SELECT other_field FROM table ORDER by target_field DESC LIMIT 1;
Is there anything wrong with this? The only problem I can think of is that maybe ordering takes longer then just find the maximum (although on the other hand the JOIN might also take a while).
Thanks!
EDIT: sorry guys for the late replies, I'm new here and I was expecting to get some e-mails for notifications :)

Yes.
It actually has to sort every record before it can return any data. It's highly inefficient. It will return what you want, but not in the best possible way. Aggregate functions tend to do it much better, and much quicker.
With your current query, once you reached a much higher data load, it would take ages to process and materialize. (With smaller data sets, you should be fine)

If you need single value from one or more than one tables then you have to go for Max and GroupBy
if you are only one table and requires multiple columns then it is ok to use Order By Desc.
if you again need a single value from single table then MAX is preferred here too.
I hope you got my points

You can try to use the following query :
select top 1 other_field from tester order by target_field desc;
It works well in Sybase. Not sure of other databases.

Related

Normalisation - best way to clean duplicate misspelled values in a sql column

+---------+
| Language|
+---------+
|Spanish |
|spanish |
|venezla |
|venezuala|
|irish |
|Irish |
+---------+
Best approach for normalising data in a sql column? I was thinking of converting to lower case and then using multiple replace functions. Is this the only way? Any insight appreciated thanks :)
There are many ways in sql to do it my friend, it totally depends on the scenario and how you want to utilize it.
Looking at the above ask, you can use LOWER function and then extract the DISTINCT values to give unique values instead of putting multiple REPLACE functions every time you see a new mismatched value.
Or you can delete duplicate values by applying ROW_NUMBER and LOWER function if you want to play around with 1 table only.
Let me know your feedback and i can revert with more inputs.

Access 2016 SQL: Find minimum absolute difference between two columns of different tables

I haven't been able to figure out exactly how to put together this SQL string. I'd really appreciate it if someone could help me out. I am using Access 2016, so please only provide answers that will work with Access. I have two queries that both have different fields except for one in common. I need to find the minimum absolute difference between the two similar columns. Then, I need to be able to pull the data from that corresponding record. For instance,
qry1.Col1 | qry1.Col2
-----------|-----------
10245.123 | Have
302044.31 | A
qry2.Col1 | qry2.Col2
----------------------
23451.321 | Great
345622.34 | Day
Find minimum absolute difference in a third query, qry3. For instance, Min(Abs(qry1!Col1 - qry2!Col1) I imagine it would produce one of these tables for each value in qry1.Col1. For the value 10245.123,
qry3.Col1
----------
13206.198
335377.217
Since 13206.198 is the minimum absolute difference, I want to pull the record corresponding to that from qry2 and associate it with the data from qry1 (I'm assuming this uses a JOIN). Resulting in a fourth query like this,
qry4.Col1 (qry1.Col1) | qry4.Col2 (qry1.Col2) | qry4.Col3 (qry2.Col2)
----------------------------------------------------------------------
10245.123 | Have | Great
302044.31 | A | Day
If this is all doable in one SQL string, that would be great. If a couple of steps are required, that's okay as well. I just would like to avoid having to time consumingly do this using loops and RecordSet.Findfirst in VBA.
You can use a correlated subquery:
select q1.*,
(select top 1 q2.col2
from qry2 as q2
order by abs(q2.col1 - q1.col1), q2.col2
) as qry2_col2
from qry1 as q1;

How to sort string data that represents numbers

My client has a set of numeric data stored in a string field in a database. So of course it doesn't sort correctly. These rows sort like this:
105
3
44
When they should sort like this:
3
44
105
This is very much a legacy database and I can't change it at all. I also can't change the software that uses the database. The client doesn't own it or have the source code. It has never worked the way they want. However, there is an unused string field that I could use to sort on (only a small number of fields can be sorted on.)
What I would like to do is take the input data, derive a string from it, and store the new string in the unused field, such that when the data is sorted on this new data, the original data sorts correctly, i.e., numerically.
So, for an overly simplistic example, if the algorithm produced the following new data:
105 -> c
3 -> a
44 -> b
Then when the second column was sorted, the first column would look 'correct'.
The tricky bit is that when new rows are added to the database, they must also sort correctly, without having to regenerate the sort data for all rows. This is the part of the problem that has my brain in a twist. I'm not sure it's actually possible.
You can assume that the number will never be more than 5 'digits'.
I realize this is a total kludge, but since I can't change the system, I have to find a work around, rather than a quality solution. Welcome to the real world.
~~~~~~~~~~~~~~~~~~~~~~ S O L U T I O N ~~~~~~~~~~~~~~~~~~
I don't think this is an uncommon problem, so here are the results of Gordon's solution:
mysql> select * from t order by new;
+------+------------+
| orig | new |
+------+------------+
| 3 | 0000000003 |
| 44 | 0000000044 |
| 105 | 0000000105 |
+------+------------+
In most databases, you can just do:
order by cast(col as int)
This will convert the string representation to a number and use that for ordering. There is no need for an additional column. If you add one, I would recommend adding a numeric column to contain the actual value.
If you really want to store something in the unused field, then you can left pad the number. How to do this depends on the database, but here is one typical method:
update t
set unused = right(concat('0000000000', col), 10);
Not all databases support these two specific functions, but all offer this basic functionality in some method.
Try something like
SELECT column1 FROM table1 ORDER BY LENGTH(column1) ASC, column1 ASC
(Adjust the column and table name for your environment.)
This is a bit of a hack but works as long as the "numbers" in your string column are natural, non-negative numbers only.
If you are looking for a more sophisticated approach or algorithm, try searching for natural sort together with your DBMS.

Is there a way to apply a moving limit in SQL>

I have a large database I use for plotting and data examination. For simplicity, say it looks something like this:
| id | day | obs |
+----------+-----------+-----------+
| 1 | 500 | 4.5 |
| 2 | 500 | 4.4 |
| 3 | 500 | 4.7 |
| 4 | 500 | 4.8 |
| 5 | 600 | 5.1 |
| 6 | 600 | 5.2 |
...
This could be stock market data, where we have many points per day that are measured.
What I want to do is look at much longer trends, where the multiple points per day are unnecessarily resolved, and clog my plotting application. (I want to look at 30000 days, each has about 100 observations).
Is there a way to do something like SELECT ... LIMIT 1 PER "day"
I suppose I could perform a few SELECT DISTINCT queries to find correct ID's, but I'd rather do something simple if it is built in.
It doesn't matter if its the first, last, or an average value per day. Just a single value. I just prefer what is fastest.
Also, this I'd like to do this for Postgres, MySQL, and SQLite. My application is built to use all three and I frequently switch between them.
Thanks!
Background: This is for a Ruby on Rails plotting application, so a trick with ActiveRecord will work too. https://github.com/ZachDischner/Rails-Plotter
You need to tag your question with the brand of RDBMS you're using. Frequently for Rails developers, they're using MySQL, but the answer to your question depends on this.
For all brands except for MySQL, the correct and standard solution is to use windowing functions:
SELECT * FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY day) AS RN, *
FROM stockmarketdata
) AS t
WHERE t.RN = 1;
For MySQL, which doesn't support windowing functions yet, you can simulate them in a kind of clumsy way with session variables:
SELECT * FROM (SELECT #day:=0, #r:=0) AS _init,
(
SELECT IF(day=#day, #r:=#r+1, #r:=0) AS RN, #day:=day AS d, *
FROM stockmarketdata
) AS t
WHERE t.RN = 1
You left a lot of room for options with your statement:
It doesn't matter if its the first, last, or an average value per day. Just a single value. I just prefer what is fastest.
So, I'm going to leave the id out of it and first propose going with average of obs for each group as the simplest and probably the most practical, though maybe not the fastest to be running stat functions vs. limit:
MyModel.group(:day).average(:obs)
If you wanted the minimum:
MyModel.group(:day).minimum(:obs)
If you wanted the maximum:
MyModel.group(:day).maximum(:obs)
(Note: The following 2 examples are less efficient than just entering the SQL, but might be more portable.)
But you might want all three:
ActiveRecord::Base.connection.execute(MyModel.select('MIN(obs), AVG(obs), MAX(obs)').group(:day).to_sql).to_a
Or just the data without hashes:
ActiveRecord::Base.connection.exec_query(MyModel.select('MIN(obs), AVG(obs), MAX(obs)').group(:day).to_sql)
If you want median, see this question which is more DB specific, and there are other related posts about it if you search.
And for more, some DB's like postgres have variance(...), stddev(...), etc. built-in.
Finally, check out the query section in the Rails guide and ARel for more info on constructing queries. You can do a limit in an ActiveRecord relation via first or limit for example, and in ARel, take lets you do a limit. Subqueries are possible too, as shown in answers to this question, and so is group by, etc. If you are sharing this project with others, try to limit the amount of non-portable SQL you are using unless you plan on adding support for other databases on your own and maintaining that.

Best way to use hibernate for complex queries like top N per group

I'm working now for a while on a reporting applications where I use hibernate to define my queries. However, more and more I get the feeling that for reporting use cases this is not the best approach.
The queries only result partial columns, and thus not typed objects
(unless you cast all fields in java).
It is hard to express queries without going straight into sql or
hql.
My current problem is that I want to get the top N per group, for example the last 5 days per element in a group, where on each day I display the amount of visitors.
The result should look like:
| RowName | 1-1-2009 | 2-1-2009 | 3-1-2009 | 4-1-2009 | 5-1-2009
| SomeName| 1 | 42 | 34 | 32 | 35
What is the best approach to transform the data which is stored per day per row to an output like this? Is it time to fall back on regular sql and work with untyped data?
I really want to use typed objects for my results but java makes my life pretty hard for that. Any suggestions are welcome!
Using the Criteria API, you can do this:
Session session = ...;
Criteria criteria = session.createCriteria(MyClass.class);
criteria.setFirstResult(1);
criteria.setMaxResults(5);
... any other criteria ...
List topFive = criteria.list();
To do this in vanilla SQL (and to confirm that Hibernate is doing what you expect) check out this SO post: