Why is ORDER BY necessary in this SQL statement

Why is ORDER BY necessary in this SQL statement - sql

This is leetcode question 613:
Write an SQL query to report the shortest distance between any two points from the Point table.
Here's the table schema:
Point
+-------------+------+
| Column Name | Type |
+-------------+------+
| x | int |
+-------------+------+
x is the primary key column for this table.
Each row of this table indicates the position of a point on the X-axis.
I thought this would be a straight forward application of the LEAD function with an offset of 1, so my answer was:
SELECT MIN(dist) as shortest
FROM (SELECT ABS(x - LEAD(x, 1, NULL) OVER () ) AS dist
FROM POINT) AS dist_table;
OVER () will cause LEAD to use all of the rows except the same row twice. However, this gave the wrong answer.
The only change from my answer to the right answer was using OVER(ORDER BY X) which makes no sense to me. The way I see it we would still be taking the distances of every row(minus the same rows). If someone could please explain, I'd be truly grateful. I've been pouring over articles to no help.

Related

SELECT MAX values for duplicate values in another column

I am having some trouble finding an answer for this one, so I apologize if it was somewhere else.
I have a table 'dbo.MileageImport' that has the following layout which I pulled to find duplicate entries:
|KEY | DATA |
---------------------
|V9864653 | 180288 |
|V9864653 | 22189 |
|V9864811 | 11464 |
|V9864811 | 12688 |
What I am having troubles with is when I run the following SQL in a DB2 environment:
SELECT KEY, MIN(DATA)
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1);
It ends up pulling the following data:
|KEY | DATA |
---------------------
|V9864811 | 11464 |
|V9864653 | 180288 |
For some reason it's pulling the MIN value for V9864811, but not V9864653. If I inverse that and put MAX instead of MIN, it pulls the opposite values.
Is there something I am missing here so I can pull the MIN DATA value for only duplicate KEY records, or is there another way to do this? The report where this data comes from changes from month to month, so there could be different keys that end up being duplicated that I need to correct. Ultimately I am turning this into a DELETE statement to delete the lower of the two (or more) duplicated mileage entries.

Is your DATA column numerical? or a VARCHAR?
If you find its better to change it to a number if you can, maybe an integer if you aren't having any fractions and its just round numbers.
if not, then you could cast them to an integer value, but if there are lots of transactions or its a big table it will be slow and not ideal. Its bad practise to do that if you could just change the datatype!
SELECT KEY, MIN(CAST(DATA as Int))
FROM dbo.MileageImport
GROUP BY KEY
HAVING (COUNT(KEY)>1)

Access 2016 SQL: Find minimum absolute difference between two columns of different tables

I haven't been able to figure out exactly how to put together this SQL string. I'd really appreciate it if someone could help me out. I am using Access 2016, so please only provide answers that will work with Access. I have two queries that both have different fields except for one in common. I need to find the minimum absolute difference between the two similar columns. Then, I need to be able to pull the data from that corresponding record. For instance,
qry1.Col1 | qry1.Col2
-----------|-----------
10245.123 | Have
302044.31 | A
qry2.Col1 | qry2.Col2
----------------------
23451.321 | Great
345622.34 | Day
Find minimum absolute difference in a third query, qry3. For instance, Min(Abs(qry1!Col1 - qry2!Col1) I imagine it would produce one of these tables for each value in qry1.Col1. For the value 10245.123,
qry3.Col1
----------
13206.198
335377.217
Since 13206.198 is the minimum absolute difference, I want to pull the record corresponding to that from qry2 and associate it with the data from qry1 (I'm assuming this uses a JOIN). Resulting in a fourth query like this,
qry4.Col1 (qry1.Col1) | qry4.Col2 (qry1.Col2) | qry4.Col3 (qry2.Col2)
----------------------------------------------------------------------
10245.123 | Have | Great
302044.31 | A | Day
If this is all doable in one SQL string, that would be great. If a couple of steps are required, that's okay as well. I just would like to avoid having to time consumingly do this using loops and RecordSet.Findfirst in VBA.

You can use a correlated subquery:
select q1.*,
(select top 1 q2.col2
from qry2 as q2
order by abs(q2.col1 - q1.col1), q2.col2
) as qry2_col2
from qry1 as q1;

Spitting long column values to managable size for presenting data neatly

Hi I was wondering if there is a way to split long column values in this case I am using SSRS to get the distinct values with the number of product ID against a category into a matrix/pivot table in SSRS. The problem lies with the amount of distinct category makes it a nightmare to make the report look pretty shall we say. Is there a dynamic way to split the columns in say groups of 10 to make the table look nicer and easy to read. I was thinking of using in operator then the list of values but that means managing the data every time a new category gets added. Is there a dynamic way to present the data in the best way possible? There are 135 distinct category values
Also I am open to suggestions to make the report to nicer if anyone has any thoughts. I am new to SSRS and trying to get to grips with its.
Here is an example of my problem
enter image description here

Are your column names coming back from the database under the SubCat field you note in the comments above? If so I imagine your dataset looks something like this
Subcat | Logno
---------+---------------
SubCatA | 34
SubCatB | 65
SubCatC | 120
SubCatD | 8
SubCatE | 19
You can edit this so that there is an index of each individual category being returned also, using the Row_Number() function. Add the field
ROW_NUMBER() OVER (ORDER BY SubCat ASC) AS ColID
To your query. This will result in the following.
Subcat | LogNo | ColID
-----------+--------------+----------
SubCatA | 34 | 1
SubCatB | 65 | 2
SubCatC | 120 | 3
SubCatD | 8 | 4
SubCatE | 19 | 5
Now there is a numeric identifier for each column you can perform some logic on it to arrange itself nicely on the page.
This solution involves a Tablix, nested inside a Matrix nested inside a Matrix as follows
First create a Matrix (Matrix1), and set it’s datasource to your dataset. Set the Row Group Properties to group on the following expression where ‘4’ is the number of columns you wish to display horizontally.
=CInt(Floor((Fields!ColID.Value - 1) / 4))
Then in the data section of the Matrix (bottom right corner) insert a rectangle and on this insert a new Matrix (Matrix 2). Remove the leftmost row. Set the column header to be the Column Name SubCat. This will automatically set the column grouping to be SubCat.
Finally, in the Data Section of Matrix 2 add a new Rectangle and Add a Tablix on it. Remove the Header Row, and set it to be one column wide only. Set the Data to be the information you wish to display, i.e. LogNo.
Finally, delete the Leftmost and Topmost rows/columns from Matrix 1 to make it look tidier (Note Delete Column Row only! Not associated groups!)
Then when the report is run it should look similar to the following. Note in my example SubCat = ColName, and LogNo = NumItems, and I have multiple values per SubCat.
Hopefully you find this helpful. If not, please ask for clarification.

Can you do something like this:
The following gives the steps (in two columns, down then across)

Select N unique rows randomly while excluding 1 specific row (in SQLite DB)

I need to select x unique rows randomly from a table with n rows, while excluding 1 specific row. (x is small, 3 for example) This can be done in several queries if needed and I can also compute anything in programming language (Java). The one important thing is that it must be done faster than O(n), consuming O(x) memory and indefinite looping (retrying) is also undesirable.
Probability of selection should be equal for all rows (except the one which is excluded, of course)
Example:
| id | some data |
|————|———————————|
| 1 | … |
| 2 | … |
| 3 | … |
| 4 | … |
| 5 | … |
The algorithm is ran with arguments (x = 3, exclude_id = 4), so:
it should select 3 different random rows from rows with id in 1,2,3,5.
(end of example)
I tried the following approach:
get row count (= n);
get the position of the excluded row by something like select count(*) where id < excluded_id, assuming id is monotonically increasing;
select the numbers from 0..n, obeying all the rules, by using some "clever" algorithms, it's something like O(x), in other words fast enough;
select these x rows one by one by using limit(index, 1) SQL clause.
However, it turned out that it's possible for rows to change positions (I'm not sure why), so the auto-generated ids are not monotonically increasing. And in this case the second step (get the position of the excluded row) produces wrong result and the algorithm fails to do its job correctly.
How can this be improved?
Also, if this is vastly easier with a different SQL-like database system, it would be interesting, too. (the DB is on a server and I can install any software there as long as it's compatible with Ubuntu 14.04 LTS)

(I'm sorry for a bit of confusion;) the algorithm I used is actually correct it the id is monotonically increasing, I just forgot that it was not itself auto-generated, it was taken from another table where it's auto-generated, and it was possible to add these rows in different order.
So I added another id for this table, which is auto-generated, and used it for row selection, and now it works as it should.

SQL getting record for maximum value: why not use "ORDER BY"?

I know that the "select record corresponding to the maximum value for a field" has been exhaustively answered, but I was wondering why nobody suggested using an ORDER BY clause to get the right row.
For example, I have this table:
| other_field | target_field |
| 1 | 15 |
| 2 | 25 |
| 3 | 20 |
and I want to find the other_field value corresponding to the maximum target_field (e.g. in this case, I want to find 2).
Many people suggested using GROUP and JOIN, however my first idea was to use:
SELECT other_field FROM table ORDER by target_field DESC LIMIT 1;
Is there anything wrong with this? The only problem I can think of is that maybe ordering takes longer then just find the maximum (although on the other hand the JOIN might also take a while).
Thanks!
EDIT: sorry guys for the late replies, I'm new here and I was expecting to get some e-mails for notifications :)

Yes.
It actually has to sort every record before it can return any data. It's highly inefficient. It will return what you want, but not in the best possible way. Aggregate functions tend to do it much better, and much quicker.
With your current query, once you reached a much higher data load, it would take ages to process and materialize. (With smaller data sets, you should be fine)

If you need single value from one or more than one tables then you have to go for Max and GroupBy
if you are only one table and requires multiple columns then it is ok to use Order By Desc.
if you again need a single value from single table then MAX is preferred here too.
I hope you got my points

You can try to use the following query :
select top 1 other_field from tester order by target_field desc;
It works well in Sybase. Not sure of other databases.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Why is ORDER BY necessary in this SQL statement - sql

Related

SELECT MAX values for duplicate values in another column

Access 2016 SQL: Find minimum absolute difference between two columns of different tables

Spitting long column values to managable size for presenting data neatly

Select N unique rows randomly while excluding 1 specific row (in SQLite DB)

SQL getting record for maximum value: why not use "ORDER BY"?

Categories

Resources