SQLServer SQL query with a row counter - sql

I have a SQL query, that returns a set of rows:
SELECT id, name FROM users where group = 2
I need to also include a column that has an incrementing integer value, so the first row needs to have a 1 in the counter column, the second a 2, the third a 3 etc
The query shown here is just a simplified example, in reality the query could be arbitrarily complex, with several joins and nested queries.
I know this could be achieved using a temporary table with an autonumber field, but is there a way of doing it within the query itself ?

For starters, something along the lines of:
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
However, it's important to note that the ROW_NUMBER() OVER (ORDER BY ...) construct only determines the values of Row_Counter, it doesn't guarantee the ordering of the results.
Unless the SELECT itself has an explicit ORDER BY clause, the results could be returned in any order, dependent on how SQL Server decides to optimise the query. (See this article for more info.)
The only way to guarantee that the results will always be returned in Row_Counter order is to apply exactly the same ordering to both the SELECT and the ROW_NUMBER():
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
ORDER BY my_order_column -- exact copy of the ordering used for Row_Counter
The above pattern will always return results in the correct order and works well for simple queries, but what about an "arbitrarily complex" query with perhaps dozens of expressions in the ORDER BY clause? In those situations I prefer something like this instead:
SELECT t.*
FROM
(
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY ...) AS Row_Counter -- complex ordering
FROM my_table
) AS t
ORDER BY t.Row_Counter
Using a nested query means that there's no need to duplicate the complicated ORDER BY clause, which means less clutter and easier maintenance. The outer ORDER BY t.Row_Counter also makes the intent of the query much clearer to your fellow developers.

In SQL Server 2005 and up, you can use the ROW_NUMBER() function, which has options for the sort order and the groups over which the counts are done (and reset).

The simplest way is to use a variable row counter. However it would be two actual SQL commands. One to set the variable, and then the query as follows:
SET #n=0;
SELECT #n:=#n+1, a.* FROM tablename a
Your query can be as complex as you like with joins etc. I usually make this a stored procedure. You can have all kinds of fun with the variable, even use it to calculate against field values. The key is the :=

Heres a different approach.
If you have several tables of data that are not joinable, or you for some reason dont want to count all the rows at the same time but you still want them to be part off the same rowcount, you can create a table that does the job for you.
Example:
create table #test (
rowcounter int identity,
invoicenumber varchar(30)
)
insert into #test(invoicenumber) select [column] from [Table1]
insert into #test(invoicenumber) select [column] from [Table2]
insert into #test(invoicenumber) select [column] from [Table3]
select * from #test
drop table #test

Related

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

ORDER BY in a partition - SELECT keyword

Please see the DDL below:
create table #names (name varchar(20), Gender char(1))
insert into #names VALUES ('Ian', 'M')
insert into #names values ('Marie', 'F')
insert into #names values ('andy', 'F')
insert into #names values ('karen', 'F')
and the SQL below:
select row_number() over (order by (select null)) from #names
This adds a unique number to each row. I could also do this (which does not add a unique row):
select row_number() over (partition by gender order by (name)) from #names
Why do you not need 'SELECT name', however you do not need SELECT null?
As far as I can tell, this is just a quirk of SQL Server. SQL Server does not permit constants in ORDER BY (nor in GROUP BY, which can occur in other contexts).
Probably the origin of this is the ORDER BY clause in a SELECT statement:
ORDER BY 1
where "1" is a column reference rather than a constant. To prevent confusion, (I am guessing), the designers of the language do not allow other constants there. After all, would ORDER BY 2 + 1 refer to the third column? To the sum of the values in the two columns? To the constant 3?
I think this was just carried over into the windows syntax. There is a way around it -- as you have seen -- by using a subquery. The following should also work:
ROW_NUMBER() ORDER BY (CASE WHEN NAME = NULL THEN 'Never Happens' ELSE 'Always' END)
Because a column is mentioned, this is permitted. But, = NULL never returns true, so a constant is used for the sorting. I use the SELECT NULL subquery, however.
The Order By clause has 4 basic syntax structures.
Specifying a single column defined in the select list
Specifying a column that is not defined in the select list
Specifying an alias as the sort column
Specifying an expression as the sort column
You can review the MSDN documentation here.
https://msdn.microsoft.com/en-us/library/ms188385.aspx
I believe that your SELECT NULL, or really any constant that you want to specify in your order by clause, would require the select because the database engine is evaluating the constant as a #4 structure, an expression. As proof, in my query example, I have used a COUNT(*) in lieu of your select null.
I believe when you specify Name or Group in your order by clause, you are actually using a different order by structure, possibly #1. Here is my proof from the execution plan and results of your corrected first and original second query's sort operation. I have removed the partition because it's not relevant to our discussion.
select row_number() over (order by (select null)),name from #names
select row_number() over (order by name),name from #names
select row_number() over (order by Gender),name from #names
I have attached the execution plans for the three queries.
As you can see, no Sort Operation is performed on the data that is passed to the Segment operator which handles the window function. This is mirrored in the results of these queries, also pictured below.
So basically, SQL Server just ignored or did not operate on your Order By clause sub-query, because it could not associate the values which you returned in the sub-query to a particular parent column using method #4 and the reason that you do not specify "SELECT name" in your Order By sub-query is because you are actually using a different Order By syntax structure.

Get count and result from SQL query in Go

I'm running a pretty straightforward query using the database/sql and lib/pq (postgres) packages and I want to toss the results of some of the fields into a slice, but I need to know how big to make the slice.
The only solution I can find is to do another query that is just SELECT COUNT(*) FROM tableName;.
Is there a way to both get the result of the query AND the count of returned rows in one query?
Conceptually, the problem is that the database cursor may not be enumerated to the end so the database does not really know how many records you will get before you actually read all of them. The only way to count (in general case) is to go through all the records in the resultset.
But practically, you can enforce it to do so by using subqueries like
select *, (select count(*) from table) from table
and just ignore the second column for records other than first. But it is very rude and I do not recommend doing so.
Not sure if this is what you are asking for but you can call the ##Rowcount function to return the count of the previous select statement that has been executed.
SELECT mytable.mycol FROM mytable WHERE mytable.foo = 'bar'
SELECT ##Rowcount
If you want the row count included in your result set you can use the the OVER clause (MSDN)
SELECT mytable.mycol, count(*) OVER(PARTITION BY mytable.foo) AS 'Count' FROM mytable WHERE mytable.foo = 'bar'
You could also perhaps just separate two SQL statements with the a ; . This would return a result set of both statements executed.
You would used count(*)
SELECT count(distinct last)
FROM (XYZTable)
WHERE date(FROM_UNIXTIME(time)) >= '2013-10-28' AND
id = 90 ;

Joining Two Same-Sized Resultsets by Row Number

I have two table functions that return a single column each. One function is guaranteed to return the same number of rows as the other.
I want to insert the values into a new two-column table. One colum will receive the value from the first udf, the second column from the second udf. The order of the inserts will be the order in which the rows are returned by the udfs.
How can I JOIN these two udfs given that they do not share a common key? I've tried using a ROW_NUMBER() but can't quite figure it out:
INSERT INTO dbo.NewTwoColumnTable (Column1, Column2)
SELECT udf1.[value], udf2.[value]
FROM dbo.udf1() udf1
INNER JOIN dbo.udf2() udf2 ON ??? = ???
This will not help you, but SQL does not guarantee row order unless it is asked to explicitly, so the idea that they will be returned in the order you expect may be true for a given set, but as I understand the idea of set based results, is fundamentally not guaranteed to work properly. You probably want to have a key returned from the UDF if it is associated with something that guarantees the order.
Despite this, you can do the following:
declare #val int
set #val=1;
Select Val1,Val2 from
(select Value as Val2, ROW_NUMBER() over (order by #val) r from udf1) a
join
(select Value as Val2, ROW_NUMBER() over (order by #val) r from udf2) b
on a.r=b.r
The variable addresses the issue of needing a column to sort by.
If you have the privlidges to edit the UDF, I think the better practice is to already sort the data coming out of the UDF, and then you can add ident int identity(1,1) to your output table in the udf, which makes this clear.
The reaosn this might matter is if your server decided to split the udf results into two packets. If the two arrive out of the order you expected, SQL could return them in the order received, which ruins the assumption made that he UDF will return rows in order. This may not be an issue, but if the result is needed later for a real system, proper programming here prevents unexpected bugs later.
In SQL, the "order returned by the udfs" is not guaranteed to persist (even between calls).
Try this:
WITH q1 AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY whatever1) rn
FROM udf1()
),
q2 AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY whatever2) rn
FROM udf2()
)
INSERT
INTO dbo.NewTwoColumnTable (Column1, Column2)
SELECT q1.value, q2.value
FROM q1
JOIN q2
ON q2.rn = q1.rn
PostgreSQL 9.4+ could append a INT8 column at the end of the udfs result using the WITH ORDINALITY suffix
-- set returning function WITH ORDINALITY
SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
ls | n
-----------------+----
pg_serial | 1
pg_twophase | 2
postmaster.opts | 3
pg_notify | 4
official doc: http://www.postgresql.org/docs/devel/static/functions-srf.html
related blogspot: http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-with-ordinality/

How to find the item index number in a query result

is there a simple way to get the number of the current item in a simple SELECT? I need to deliver a column based on a calculation that involves the number of the current index in the select. I am simplifing my problem to an extreme, but roughly speaking, here is an example:
SELECT column1 * ITEMINDEX FROM table
I hope I am being clear. I am using SQL Server. Thank you.
In SQL Server 2005+:
SELECT m.*, ROW_NUMBER() OVER (ORDER BY column) AS rn
FROM mytable m
SQL does not have concept of implicit row number, that's why you need ORDER BY clause to define the order of rows.