How does order by clause works if two values are equal? - sql

This is my NEWSPAPER table.
National News A 1
Sports D 1
Editorials A 12
Business E 1
Weather C 2
Television B 7
Births F 7
Classified F 8
Modern Life B 1
Comics C 4
Movies B 4
Bridge B 2
Obituaries F 6
Doctor Is In F 6
When i run this query
select feature,section,page from NEWSPAPER
where section = 'F'
order by page;
It gives this output
Doctor Is In F 6
Obituaries F 6
Births F 7
Classified F 8
But in Kevin Loney's Oracle 10g Complete Reference the output is like this
Obituaries F 6
Doctor Is In F 6
Births F 7
Classified F 8
Please help me understand how is it happening?

If you need reliable, reproducible ordering to occur when two values in your ORDER BY clause's first column are the same, you should always provide another, secondary column to also order on. While you might be able to assume that they will sort themselves based on order entered (almost always the case to my knowledge, but be aware that the SQL standard does not specify any form of default ordering) or index, you never should (unless it is specifically documented as such for the engine you are using--and even then I'd personally never rely on that).
Your query, if you wanted alphabetical sorting by feature within each page, should be:
SELECT feature,section,page FROM NEWSPAPER
WHERE section = 'F'
ORDER BY page, feature;

In relational databases, tables are sets and are unordered. The order by clause is used primarily for output purposes (and a few other cases such as a subquery containing rownum).
This is a good place to start. The SQL standard does not specify what has to happen when the keys on an order by are the same. And this is for good reason. Different techniques can be used for sorting. Some might be stable (preserving original order). Some methods might not be.
Focus on whether the same rows are in the sets, not their ordering. By the way, I would consider this an unfortunate example. The book should not have ambiguous sorts in its examples.

When you use the SELECT statement to query data from a table, the order which rows appear in the result set may not be what you expected.
In some cases, the rows that appear in the result set are in the order that they are stored in the table physically. However, in case the query optimizer uses an index to process the query, the rows will appear as they are stored in the index key order. For this reason, the order of rows in the result set is undetermined or unpredictable.
The query optimizer is a built-in software component in the database
system that determines the most efficient way for an SQL statement to
query the requested data.

Related

Removing SQL Rows from Query if two rows have an identical ID but differences in the columns

I´m currently working stuck on a SQL issue (well, mainly because I can´t find a way to google it and my SQL skills do not suffice to solve it myself)
I´m working on a system where documents are edited. If the editing process is finished, users mark the document as solved. In the MSSQL database, the corresponding row is not updated but instead, a new row is inserted. Thus, every document that has been processed has [e.g.: should have] multiple rows in the DB.
See the following situation:
ID
ID2
AnotherCondition
Steps
Process
Solved
1
1
yes
Three
ATAT
AF
2
2
yes
One
ATAT
FR
2
3
yes
One
ATAT
EG
2
4
yes
One
ATAT
AF
3
5
no
One
ABAT
AF
4
6
yes
One
ATAT
FR
5
7
no
One
AVAT
EG
6
8
yes
Two
SATT
FR
6
9
yes
Two
SATT
EG
6
10
yes
Two
SATT
AF
I need to select the rows which have not been processed yet. A "processed" document has a "FR" in the "Solved" column. Sadly other versions of the document exist in the DB, with other codes in the "Solved" columns.
Now: If there is a row which has "FR" in the "Solved" column I need to remove every row with the same ID from my SELECT statement as well. Is this doable?
In order to achieve this, I have to remove the rows with the IDs 2 | 4 (because the system sadly isn´t too reliable I guess) | and 6 in my select statement. Is this possible in general?
What I could do is to filter out the duplicates afterwards, in python/js/whatever. But I am curious whether I can "remove" these rows directly in the SQL statement as well.
To rephrase it another time: How can I make a select statement which returns only (in this example) the rows containing the ID´s 1, 3 and 5?
If you need to delete all rows where every id doesn't have any "Solved = 'no'", you can use a DELETE statement that will exclude all "id" values that have at least one "Solved = 'no'" in the corresponding rows.
DELETE FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'no');
Check the demo here.
Edit. If you need to use a SELECT statement, you can simply reverse the condition in the subquery:
SELECT *
FROM tab
WHERE id NOT IN (SELECT id FROM tab WHERE Solved1 = 'yes');
Check the demo here.
I'm not sure I understand your question correct:
...every document that has been processed has [...] multiple rows in the DB
I need to find out which documents have not been processed yet
So it seems you need to find unique documents with no versions, this could be done using a GROUP BY with a HAVING clause:
SELECT
Id
FROM dbo.TableName
GROUP BY Id
HAVING COUNT(*) = 1

SQL aggregation for identical values

Suppose I have data something like this
id
project-id
thing-count
country
1
1
4
GBR
2
1
2
GBR
3
1
8
GBR
4
2
1
USA
5
2
4
USA
6
2
9
USA
I want to group the data using the project-id and keep the country. I know that the country does not vary within a project. There seem to be two ways I can do this:
SELECT
project-id,
MIN(country) AS country
FROM data
GROUP BY project-id
or
SELECT
project-id,
country
FROM data
GROUP BY
project-id,
country
These both work, but neither seem right. The first puts an extra burden on the GROUP BY since there's an unnecessary MIN calculation, while the second suggests to anyone reading the query that I want to GROUP BY the country data.
I'm always surprised that there is no FIRST, LAST, or ANY aggregation function, but as far as I can tell neither SQL Server, MySQL, nor Postgres have that.
How can I write his query so that the GROUP BY does not need to do extra work aggregating a column whose entries are identical, in a way that makes it obvious in the SQL that I do not care which of the values being aggregated is chosen to represent the set of values?
Two errors in my original question: FIRST and LAST. I need to constantly remind myself that SQL rows have no implicit ordering (think sets not lists) and so these aggregate functions would not make sense. But ANY does make sense, and I have expected it to be available time and time again.
In a comment on the question #amir-saleem points out that this does exist in MySQL. Using the ANY_VALUE aggregation function I could write
SELECT project-id, ANY_VALUE(country) as country, FROM data GROUP BY project-id
(N.B. ANY_VALUE looks like a useful aggregation function, but it is included in the Miscellaneous Functions documentation rather than the Aggregate Functions documentation; I do not know why.)
There is no similar aggregate function in Postgres as far as I can tell, though I could use a custom aggregate function to achieve this. Here are some community provided examples that would work in my case:
First/last (aggregate)
Aggregate_Random
(I have not investigated SQL Server based solutions.)
If I'm understanding correctly, you basically want a list of all project IDs and their associated countries?
If so, you can do some by simply adding a distinct clause after select as shown below
Select distinct project-id,country from data

sql use different columns for same query (directed graph as undirected )

Suppose I have a table of relationships like in a directed graph. For some pairs of ids there are both 1->2 and 2->1 relations, for others there are not. Some nodes are only present in one column.
a b
1 2
2 1
1 3
4 1
5 2
Now I want to work with it as undirected graph. For example, grouping, filtering using both columns present. For example filter node 5 and count neighbors of the rest
node neighbor_count
1 3
2 1
3 1
4 1
Is it possible to compose queries in such a way that first column a is used and then column b is used in the same manner?
I know it is achievable by doubling the table:
select a,count(distinct(b))
from
(select * from grap
union all
select b as a, a as b from grap)
where (not a in (5,6,7)) and (not b in (5,6,7))
group by a;
However, the real tables are quite large (10^9 - 1^10 of pairs). Would union require additional disk usage? A single scan through the base is already quite slow for me. Are there better ways to do this?
(Currently database is sqlite, but the less platform specific the answer the better)
The union all is generated only for the duration of the query. Does it use more disk space? Not permanently.
If the processing of the query requires saving the data out to disk, then it will use more temporary storage for intermediate results.
I would suggests, though, that if you want an undirected graph with this representation, then add in the addition pairs that are not already in the table. This will use more disk space. But you won't have to play games with queries.

Access SQL - Add Row Number to Query Result for a Multi-table Join

What I am trying to do is fairly simple. I just want to add a row number to a query. Since this is in Access is a bit more difficult than other SQL, but under normal circumstances is still doable using solutions such as DCount or Select Count(*), example here: How to show row number in Access query like ROW_NUMBER in SQL or Access SQL how to make an increment in SELECT query
My Issue
My issue is I'm trying to add this counter to a multi-join query that orders by fields from numerous tables.
Troubleshooting
My code is a bit ridiculous (19 fields, seven of which are long expressions, from 9 different joined tables, and ordered by fields from 5 of those tables). To make things simple, I have an simplified example query below:
Example Query
SELECT DCount("*","Requests_T","[Requests_T].[RequestID]<=" & [Requests_T].[RequestID]) AS counter, Requests_T.RequestHardDeadline AS Deadline, Requests_T.RequestOverridePriority AS Priority, Requests_T.RequestUserGroup AS [User Group], Requests_T.RequestNbrUsers AS [Nbr of Users], Requests_T.RequestSubmissionDate AS [Submitted on], Requests_T.RequestID
FROM (((((((Requests_T
INNER JOIN ENUM_UserGroups_T ON ENUM_UserGroups_T.UserGroups = Requests_T.RequestUserGroup)
INNER JOIN ENUM_RequestNbrUsers_T ON ENUM_RequestNbrUsers_T.NbrUsers = Requests_T.RequestNbrUsers)
INNER JOIN ENUM_RequestPriority_T ON ENUM_RequestPriority_T.Priority = Requests_T.RequestOverridePriority)
ORDER BY Requests_T.RequestHardDeadline, ENUM_RequestPriority_T.DisplayOrder DESC , ENUM_UserGroups_T.DisplayOrder, ENUM_RequestNbrUsers_T.DisplayOrder DESC , Requests_T.RequestSubmissionDate;
If the code above is trying to select a field from a table not included, I apologize - just trust the field comes from somewhere (lol i.e. one of the other joins I excluded to simply the query). A great example of this is the .DisplayOrder fields used in the ORDER BY expression. These are fields from a table that simply determines the "priority" of an enum. Example: Requests_T.RequestOverridePriority displays to the user as an combobox option of "Low", "Med", "High". So in a table, I assign a numerical priority to these of "1", "2", and "3" to these options, respectively. Thus when ENUM_RequestPriority_T.DisplayOrder DESC is called in order by, all "High" priority requests will display above "Medium" and "Low". Same holds true for ENUM_UserGroups_T.DisplayOrder and ENUM_RequestNbrUsers_T.DisplayOrder.
I'd also prefer to NOT use DCOUNT due to efficiency, and rather do something like:
select count(*) from Requests_T where Requests_T.RequestID>=RequestID) as counter
Due to the "Order By" expression however, my 'counter' doesn't actually count my resulting rows sequentially since both of my examples are tied to the RequestID.
Example Results
Based on my actual query results, I've made an example result of the query above.
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
5 12/01/2016 High IT 2-4 01/01/2016 5
7 01/01/2017 Low IT 2-4 05/06/2016 8
10 Med IT 2-4 07/13/2016 11
15 Low IT 10+ 01/01/2016 16
8 Low IT 2-4 01/01/2016 9
2 Low IT 2-4 05/05/2016 2
The query is displaying my results in the proper order (those with the nearest deadline at the top, then those with the highest priority, then user group, then # of users, and finally, if all else is equal, it is sorted by submission date). However, my "Counter" values are completely wrong! The counter field should simply intriment +1 for each new row. Thus if displaying a single request on a form for a user, I could say
"You are number: Counter [associated to RequestID] in the
development queue."
Meanwhile my results:
Aren't sequential (notice the first four display sequentially, but then the final two rows don't)! Even though the final two rows are lower in priority than the records above them, they ended up with a lower Counter value simply because they had the lower RequestID.
They don't start at "1" and increment +1 for each new record.
Ideal Results
Thus my ideal result from above would be:
Counter Deadline Priority User_Group Nbr_of_Users Submitted_on RequestID
1 12/01/2016 High IT 2-4 01/01/2016 5
2 01/01/2017 Low IT 2-4 05/06/2016 8
3 Med IT 2-4 07/13/2016 11
4 Low IT 10+ 01/01/2016 16
5 Low IT 2-4 01/01/2016 9
6 Low IT 2-4 05/05/2016 2
I'm spoiled by PLSQL and other software where this would be automatic lol. This is driving me crazy! Any help would be greatly appreciated.
FYI - I'd prefer an SQL option over VBA if possible. VBA is very much welcomed and will definitely get an up vote and my huge thanks if it works, but I'd like to mark an SQL option as the answer.
Unfortuantely, MS Access doesn't have the very useful ROW_NUMBER() function like other clients do. So we are left to improvise.
Because your query is so complicated and MS Access does not support common table expressions, I recommend you follow a two step process. First, name that query you already wrote IntermediateQuery. Then, write a second query called FinalQuery that does the following:
SELECT i1.field_primarykey, i1.field2, ... , i1.field_x,
(SELECT field_primarykey FROM IntermediateQuery i2
WHERE t2.field_primarykey <= t1.field_primarykey) AS Counter
FROM IntermediateQuery i1
ORDER BY Counter
The unfortunate side effect of this is the more data your table returns, the longer it will take for the inline subquery to calculate. However, this is the only way you'll get your row numbers. It does depend on having a primary key in the table. In this particular case, it doesn't have to be an explicitly defined primary key, it just needs to be a field or combination of fields that is completely unique for each record.

SQL Recursive Tables

I have the following tables, the groups table which contains hierarchically ordered groups and group_member which stores which groups a user belongs to.
groups
---------
id
parent_id
name
group_member
---------
id
group_id
user_id
ID PARENT_ID NAME
---------------------------
1 NULL Cerebra
2 1 CATS
3 2 CATS 2.0
4 1 Cerepedia
5 4 Cerepedia 2.0
6 1 CMS
ID GROUP_ID USER_ID
---------------------------
1 1 3
2 1 4
3 1 5
4 2 7
5 2 6
6 4 6
7 5 12
8 4 9
9 1 10
I want to retrieve the visible groups for a given user. That it is to say groups a user belongs to and children of these groups. For example, with the above data:
USER VISIBLE_GROUPS
9 4, 5
3 1,2,4,5,6
12 5
I am getting these values using recursion and several database queries. But I would like to know if it is possible to do this with a single SQL query to improve my app performance. I am using MySQL.
Two things come to mind:
1 - You can repeatedly outer-join the table to itself to recursively walk up your tree, as in:
SELECT *
FROM
MY_GROUPS MG1
,MY_GROUPS MG2
,MY_GROUPS MG3
,MY_GROUPS MG4
,MY_GROUPS MG5
,MY_GROUP_MEMBERS MGM
WHERE MG1.PARENT_ID = MG2.UNIQID (+)
AND MG1.UNIQID = MGM.GROUP_ID (+)
AND MG2.PARENT_ID = MG3.UNIQID (+)
AND MG3.PARENT_ID = MG4.UNIQID (+)
AND MG4.PARENT_ID = MG5.UNIQID (+)
AND MGM.USER_ID = 9
That's gonna give you results like this:
UNIQID PARENT_ID NAME UNIQID_1 PARENT_ID_1 NAME_1 UNIQID_2 PARENT_ID_2 NAME_2 UNIQID_3 PARENT_ID_3 NAME_3 UNIQID_4 PARENT_ID_4 NAME_4 UNIQID_5 GROUP_ID USER_ID
4 2 Cerepedia 2 1 CATS 1 null Cerebra null null null null null null 8 4 9
The limit here is that you must add a new join for each "level" you want to walk up the tree. If your tree has less than, say, 20 levels, then you could probably get away with it by creating a view that showed 20 levels from every user.
2 - The only other approach that I know of is to create a recursive database function, and call that from code. You'll still have some lookup overhead that way (i.e., your # of queries will still be equal to the # of levels you are walking on the tree), but overall it should be faster since it's all taking place within the database.
I'm not sure about MySql, but in Oracle, such a function would be similar to this one (you'll have to change the table and field names; I'm just copying something I did in the past):
CREATE OR REPLACE FUNCTION GoUpLevel(WO_ID INTEGER, UPLEVEL INTEGER) RETURN INTEGER
IS
BEGIN
DECLARE
iResult INTEGER;
iParent INTEGER;
BEGIN
IF UPLEVEL <= 0 THEN
iResult := WO_ID;
ELSE
SELECT PARENT_ID
INTO iParent
FROM WOTREE
WHERE ID = WO_ID;
iResult := GoUpLevel(iParent,UPLEVEL-1); --recursive
END;
RETURN iResult;
EXCEPTION WHEN NO_DATA_FOUND THEN
RETURN NULL;
END;
END GoUpLevel;
/
Joe Cleko's books "SQL for Smarties" and "Trees and Hierarchies in SQL for Smarties" describe methods that avoid recursion entirely, by using nested sets. That complicates the updating, but makes other queries (that would normally need recursion) comparatively straightforward. There are some examples in this article written by Joe back in 1996.
I don't think that this can be accomplished without using recursion. You can accomplish it with with a single stored procedure using mySQL, but recursion is not allowed in stored procedures by default. This article has information about how to enable recursion. I'm not certain about how much impact this would have on performance verses the multiple query approach. mySQL may do some optimization of stored procedures, but otherwise I would expect the performance to be similar.
Didn't know if you had a Users table, so I get the list via the User_ID's stored in the Group_Member table...
SELECT GroupUsers.User_ID,
(
SELECT
STUFF((SELECT ',' +
Cast(Group_ID As Varchar(10))
FROM Group_Member Member (nolock)
WHERE Member.User_ID=GroupUsers.User_ID
FOR XML PATH('')),1,1,'')
) As Groups
FROM (SELECT User_ID FROM Group_Member GROUP BY User_ID) GroupUsers
That returns:
User_ID Groups
3 1
4 1
5 1
6 2,4
7 2
9 4
10 1
12 5
Which seems right according to the data in your table. But doesn't match up with your expected value list (e.g. User 9 is only in one group in your table data but you show it in the results as belonging to two)
EDIT: Dang. Just noticed that you're using MySQL. My solution was for SQL Server. Sorry.
-- Kevin Fairchild
There was already similar question raised.
Here is my answer (a bit edited):
I am not sure I understand correctly your question, but this could work My take on trees in SQL.
Linked post described method of storing tree in database -- PostgreSQL in that case -- but the method is clear enough, so it can be adopted easily for any database.
With this method you can easy update all the nodes depend on modified node K with about N simple SELECTs queries where N is distance of K from root node.
Good Luck!
I don't remember which SO question I found the link under, but this article on sitepoint.com (second page) shows another way of storing hierarchical trees in a table that makes it easy to find all child nodes, or the path to the top, things like that. Good explanation with example code.
PS. Newish to StackOverflow, is the above ok as an answer, or should it really have been a comment on the question since it's just a pointer to a different solution (not exactly answering the question itself)?
There's no way to do this in the SQL standard, but you can usually find vendor-specific extensions, e.g., CONNECT BY in Oracle.
UPDATE: As the comments point out, this was added in SQL 99.