SQL CTE and ORDER BY affecting result set - sql

I've pasted a very simplified version of my SQL query below. The problem that I'm running into is that the ORDER BY statement is affecting the select results of my CTE. I haven't been able to understand why this is, my original thinking was that within the CTE, I execute some SELECT statement, then the ORDER BY should work on THOSE results.
Unfortunately the behavior that I'm seeing is that my inner SELECT statement is being affected by the order by, giving me 'items' that are not in the TOP 10.
Here is an example of data:
(Indexed in reverse order by ID)
ID, Date
9600 2010-10-12
9599 2010-09-08
9598 2010-08-31
9597 2010-08-31
9596 2010-08-30
9595 2010-08-11
9594 2010-08-06
9593 2010-08-05
9592 2010-08-02
....
9573 2010-08-10
....
8174 2010-08-05
....
38 2029-12-20
My basic query:
;with results as(
select TOP 10 ID, Date
from dbo.items
)
SELECT ID
FROM results
query returns:
ID, Date
9600 2010-10-12
9599 2010-09-08
9598 2010-08-31
9597 2010-08-31
9596 2010-08-30
9595 2010-08-11
9594 2010-08-06
9593 2010-08-05
9592 2010-08-02
My query with the ORDER BY
;with results as(
select TOP 10 ID, Date
from dbo.items
)
SELECT ID
FROM results
ORDER BY Date DESC
query returns:
ID, Date
38 2029-12-20
9600 2010-10-12
9599 2010-09-08
9598 2010-08-31
9597 2010-08-31
9596 2010-08-30
9595 2010-08-11
9573 2010-08-10
9594 2010-08-06
8174 2010-08-05
Can anyone explain why the first query will only return IDs that are in the top 10 of the table, and the second query returns the top 10 of the entire table (after the sorting is applied).

When you use SELECT TOP n you must supply an ORDER BY if you want deterministic behaviour otherwise the server is free to return any 10 rows it feels like. The behaviour you are seeing is perfectly valid.
To solve the problem, specify an ORDER BY inside the CTE:
WITH results AS
(
SELECT TOP 10 ID, Date
FROM dbo.items
ORDER BY ID DESC
)
SELECT ID
FROM results
ORDER BY Date

I think you can add new column like
SELECT ROW_NUMBER() OVER(ORDER BY <ColumnName>;) AS RowNo
and then all your columns.. this would help you to query using the CTE anchor... using between, where etc clauses..

Related

Filter the rows of the last N group

Suppose I have created the following query (I use SQL Server), which returns the following output:
SELECT *
FROM DB
ORDER BY CLIENT_ID
In such case how can I update my above query to select only the 2 last CLIENT ID, and I should be able to use whatever other number like last 20, last 60, last 100, etc
In my example the expected output would be
meaning that we see only the rows related to the 2 last clients which are client B99 and C93 (meaning that first client A19 is filtered out since it does not belong to the last 2)
This will give you the expected output. But as others already mentioned it's unclear what last mean. I'm just guessing from your expected result.
Also please don't post photo of tables next time.
SELECT A.CLIENT_ID, A.PRICE_BILL
FROM DB A
WHERE A.CLIENT_ID IN (
SELECT DISTINCT TOP(2) A.CLIENT_ID
FROM DB A
ORDER BY A.CLIENT_ID DESC
)
ORDER BY A.CLIENT_ID ASC, A.PRICE_BILL ASC
See Demo
You can accomplish what you require using dense_rank() and filtering out the last 2 rankings.
The reason you use dense_rank is because it assigns the same ranking to ties thereby ranking all of the same CLIENT_ID the same. Also note the reverse ordering of the dense_rank to make it easy to filter out the last 2 values because they are ranked 1 & 2.
declare #MyTable table (CLIENT_ID varchar(3), PRICE_BILL int);
insert into #MyTable (CLIENT_ID, PRICE_BILL)
values
('A19',91), ('A19',29), ('A19',92)
, ('B99',85), ('B99',202)
, ('C93',399), ('C93',929), ('C93',929);
with cte as (
select *
, dense_rank() over (order by CLIENT_ID desc) dr
from #MyTable
)
select *
from cte
where dr < 3
order by CLIENT_ID;
Returns:
CLIENT_ID
PRICE_BILL
dr
B99
85
2
B99
202
2
C93
399
1
C93
929
1
C93
929
1
fiddle
Note the provision of sample data as DDL+DML makes it much easier for people to assist.

Turn several queries into one in SQL Server

I have a table in SQL Server called schedule that has the following columns (and others not listed):
scheduleId
roomId
dateRegistered
dateFreed
4564
2
2022-12-25
2022-12-26
4565
3
2022-12-25
2022-12-27
4566
15
2022-12-26
2022-12-27
4567
2
2022-12-28
2022-12-31
4568
3
2022-12-28
2022-12-30
In some part of my app I need to show all the rooms occupied at a certain date.
Currently I run a query like this:
SELECT TOP (1) *
FROM schedule
WHERE roomId = [theNeededRoom] AND dateFreed < [providedDate]
ORDER BY dateFreed DESC
The thing is that I have to run that query in a for loop so that I get the information for every room.
I'm sure there has to be a better way to do this in a single query that returns a row for each of the different roomIds possible, how can I go about this?
Also, when the room is first registered, the dateFreed column has a null value, if I wanted to take this into account, how can I make the query so that, in the case the dateFreed value is null, that is the row that gets chosen?
You can use TOP(1) WITH TIES, while ordering on the last "dateFreed" date.
In order to have a "tied" value to match on, instead of ordering on "dateFreed DESC" we can use the ROW_NUMBER window function to generate a ranking on the same field (which will store 1 for each most recent "dateFreed" value, per "roomId").
SELECT TOP (1) WITH TIES *
FROM schedule
WHERE dateFreed < [providedDate]
ORDER BY ROW_NUMBER() OVER(PARTITION BY roomId ORDER BY dateFreed DESC)
SELECT
t.*
FROM
(
SELECT
roomId AS rId,
max(dateFreed) AS dateFreedMax
FROM
schedule s
GROUP BY
s.roomId
) AS t
WHERE
t.dateFreedMax < [providedDate]
OR t.dateFreedMax IS NULL
Or
SELECT roomId
FROM
schedule s
GROUP BY s.roomId, dateFreed
HAVING
max(dateFreed)<[providedDate] OR dateFreed IS NULL

Remove duplicate rows based on field in a select query with PostgreSQL?

Considering the table mdl_files that contains the following fields: id, contenthash, timecreated, filesize.
This tables stores attachment files.
We consider that all the rows with the same content hash are duplicate rows and I just want to keep the oldest row (or first if dates are equals).
How can I do that?
The following query:
SELECT
id,
contenthash,
filesize,
to_timestamp(timecreated) :: DATE
FROM mdl_files
ORDER BY contenthash;
returns:
2480229 00002e87605311feb82b70473b61e81f0223c774 18178 2016-10-05
2997411 0000bfd20ef84948eee6811ce5bbac03de42ccb0 1293 2017-03-31
1304839 000280169fc78d704a2d4569bfb6f42ea4a1d5ae 8203 2015-11-10
1364656 000280169fc78d704a2d4569bfb6f42ea4a1d5ae 8203 2015-11-17
71568 0003c6aec5835964870902d697c06d21abf76bf7 139439 2013-04-19
2959945 000419c19d77df7285e669614075b47414e3ab2c 398 2017-03-20
3483049 00061dc0bc2452304107ddc75e7ee2908c729905 28618 2017-08-17
3483047 00061dc0bc2452304107ddc75e7ee2908c729905 28618 2017-08-17
I want to get this resultset:
2480229 00002e87605311feb82b70473b61e81f0223c774 18178 2016-10-05
2997411 0000bfd20ef84948eee6811ce5bbac03de42ccb0 1293 2017-03-31
1304839 000280169fc78d704a2d4569bfb6f42ea4a1d5ae 8203 2015-11-10
71568 0003c6aec5835964870902d697c06d21abf76bf7 139439 2013-04-19
2959945 000419c19d77df7285e669614075b47414e3ab2c 398 2017-03-20
3483049 00061dc0bc2452304107ddc75e7ee2908c729905 28618 2017-08-17
I want the following duplicated lines to be removed from the resultset:
1364656 000280169fc78d704a2d4569bfb6f42ea4a1d5ae 8203 2015-11-17
3483047 00061dc0bc2452304107ddc75e7ee2908c729905 28618 2017-08-17
Use DISTINCT ON:
SELECT DISTINCT ON (contenthash)
id,
contenthash,
filesize,
to_timestamp(timecreated) :: DATE
FROM mdl_files
ORDER BY contenthash, timecreated, id;
DISTINCT ON is a Postgres extension that makes sure that returns one row for each unique combination of the keys in parentheses. The specific row is the first one found based on the order by clause.
You can try to use ROW_NUMBER() with windows function to make row number then delete it.
SELECT t.*
FROM (
SELECT
id,
contenthash,
filesize,
ROW_NUMBER() OVER (PARTITION BY contenthash,filesize order by timecreated) rn
FROM mdl_files
) t
where t.rn = 1
sqlfiddle
If you want to DELETE duplicate data you can use EXISTS in where clause.
DELETE
FROM mdl_files f WHERE EXISTS(
SELECT 1
FROM (
SELECT
id,
contenthash,
filesize,
ROW_NUMBER() OVER (PARTITION BY contenthash,filesize order by timecreated) rn
FROM mdl_files
) t
where t.rn > 1 and t.id = f.id
)
sqlfiddle

SQL: How to make a query that return last created row per each user from table's data

Consider following table's data
ID UserID ClassID SchoolID Created
2184 19313 10 28189 2010-10-25 14:16:39.823
46697 19313 10 27721 2011-04-04 14:50:49.433
•47423 19313 11 27721 2011-09-15 09:15:51.740
•47672 19881 11 42978 2011-09-19 17:31:12.853
3176 19881 11 42978 2010-10-27 22:29:41.130
22327 19881 9 45263 2011-02-14 19:42:41.320
46661 32810 11 41861 2011-04-04 14:26:14.800
•47333 32810 11 51721 2011-09-13 22:43:06.053
131 32810 11 51721 2010-09-22 03:16:44.520
I want to make a sql query that return the last created row for each UserID in which the result will be as below ( row that begin with • in the above rows ) :
ID UserID ClassID SchoolID Created
47423 19313 11 27721 2011-09-15 09:15:51.740
47672 19881 11 42978 2011-09-19 17:31:12.853
47333 32810 11 51721 2011-09-13 22:43:06.053
You can use a CTE (Common Table Expression) with the ROW_NUMBER function:
;WITH LastPerUser AS
(
SELECT
ID, UserID, ClassID, SchoolID, Created,
ROW_NUMBER() OVER(PARTITION BY UserID ORDER BY Created DESC) AS 'RowNum')
FROM dbo.YourTable
)
SELECT
ID, UserID, ClassID, SchoolID, Created,
FROM LastPerUser
WHERE RowNum = 1
This CTE "partitions" your data by UserID, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by Created DESC - so the latest row gets RowNum = 1 (for each UserID) which is what I select from the CTE in the SELECT statement after it.
I know this is an old question at this point, but I was having the same problem in MySQL, and I think I have figured out a standard sql way of doing this. I have only tested this with MySQL, but I don't believe I am using anything MySQL-specific.
select mainTable.* from YourTable mainTable, (
select UserID, max(Created) as Created
from YourTable
group by UserID
) dateTable
where mainTable.UserID = dateTable.UserID
and mainTable.Created = dateTable.Created

Sql Get the last 6

I need a sql query that use the Records ID to get the last 6 times of the trips also the number of time.
So the records in the table are like the following,
RecordID Nooftime Day&Time
1001 1 12/11/2009 14:11
1001 2 13/11/2009 12:11
1001 3 14/11/2009 11:11
1001 4 16/11/2009 14:11
1001 5 17/11/2009 14:11
1001 6 20/11/2009 13:11
1001 7 25/11/2009 09:11
I Need a query that show only the last 6 vist's and in one line.
If you're dealing with SQL Server try
Select Top 6 RecordID, NoOfTime, Day&Time from (table) order by Day&Time DESC
Oracle
SELECT x.*
FROM (SELECT t.*
FROM TABLE t
ORDER BY day&time DESC) x
WHERE ROWNUM <= 6
SQL Server
SELECT TOP 6 t.*
FROM TABLE t
ORDER BY day&time DESC
MySQL/Postgres
SELECT t.*
FROM TABLE t
ORDER BY day&time DESC
LIMIT 6
How about
SELECT TOP(6) *
FROM [Trips]
ORDER BY [Day&Time] DESC
I'm not sure what you mean by "in one line," but this will fetch the 6 most recent records.
For the same line portion, you can use a cursor if you are using SQL server 2008 (might work with 2005 I am not sure).
SELECT RecordID, NoofTime, Day&Time FROM table ORDER BY Day&Time DESC LIMIT 6;
depending on your SQL engine you will probably have to put some quotes around Day&Time.
The above syntax works with Mysql and PostgreSQL, for varous syntax used for this kins of request you can have a look there select-limit