Joining Two Same-Sized Resultsets by Row Number - sql

I have two table functions that return a single column each. One function is guaranteed to return the same number of rows as the other.
I want to insert the values into a new two-column table. One colum will receive the value from the first udf, the second column from the second udf. The order of the inserts will be the order in which the rows are returned by the udfs.
How can I JOIN these two udfs given that they do not share a common key? I've tried using a ROW_NUMBER() but can't quite figure it out:
INSERT INTO dbo.NewTwoColumnTable (Column1, Column2)
SELECT udf1.[value], udf2.[value]
FROM dbo.udf1() udf1
INNER JOIN dbo.udf2() udf2 ON ??? = ???

This will not help you, but SQL does not guarantee row order unless it is asked to explicitly, so the idea that they will be returned in the order you expect may be true for a given set, but as I understand the idea of set based results, is fundamentally not guaranteed to work properly. You probably want to have a key returned from the UDF if it is associated with something that guarantees the order.
Despite this, you can do the following:
declare #val int
set #val=1;
Select Val1,Val2 from
(select Value as Val2, ROW_NUMBER() over (order by #val) r from udf1) a
join
(select Value as Val2, ROW_NUMBER() over (order by #val) r from udf2) b
on a.r=b.r
The variable addresses the issue of needing a column to sort by.
If you have the privlidges to edit the UDF, I think the better practice is to already sort the data coming out of the UDF, and then you can add ident int identity(1,1) to your output table in the udf, which makes this clear.
The reaosn this might matter is if your server decided to split the udf results into two packets. If the two arrive out of the order you expected, SQL could return them in the order received, which ruins the assumption made that he UDF will return rows in order. This may not be an issue, but if the result is needed later for a real system, proper programming here prevents unexpected bugs later.

In SQL, the "order returned by the udfs" is not guaranteed to persist (even between calls).
Try this:
WITH q1 AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY whatever1) rn
FROM udf1()
),
q2 AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY whatever2) rn
FROM udf2()
)
INSERT
INTO dbo.NewTwoColumnTable (Column1, Column2)
SELECT q1.value, q2.value
FROM q1
JOIN q2
ON q2.rn = q1.rn

PostgreSQL 9.4+ could append a INT8 column at the end of the udfs result using the WITH ORDINALITY suffix
-- set returning function WITH ORDINALITY
SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n);
ls | n
-----------------+----
pg_serial | 1
pg_twophase | 2
postmaster.opts | 3
pg_notify | 4
official doc: http://www.postgresql.org/docs/devel/static/functions-srf.html
related blogspot: http://michael.otacoo.com/postgresql-2/postgres-9-4-feature-highlight-with-ordinality/

Related

Incorporate a concatenation and count in a SQL update command?

I am looking for a way to update records so each entry adds 1 to the end of the string. In my case, I'm trying to update a field named FiberID. Each Record should have JCK0.R000.Ax, where x is equal to 1,2,3...,24.
Ideal result:
FiberID
JCK0.R000.A1
JCK0.R000.A2
JCK0.R000.A3
... and so on until it reaches A24.
Here is an example of the data.
This seems so useful that I'm sure it has been discussed here before, but for what ever reason I'm not seeing anything.
You could use an row_number and an updatable CTE:
with cte as (
select
fiber_id,
concat(
fiber_id,
'.A',
cast(row_number() over (partition by fiber_id order by id) as varchar(2))
) new_fiber_id
from mytable
)
update cte set fiber_id = new_fiber_id
This assumes that you have a column called id that can be used to order records having the same fiber_id.
Side note: it is unclear why you should have exactly 24 numbers per fiber_id, and you sample data does not describes that. This will assign increasing numbers to duplicate fiber_ids, regardless of how many there are.

SQL Server - Pagination Without Order By Clause

My situation is that a SQL statement which is not predictable, is given to the program and I need to do pagination on top of it. The final SQL statement would be similar to the following one:
SELECT * FROM (*Given SQL Statement*) b
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
The problem here is that the *Given SQL Statement* is unpredictable. It may or may not contain order by clause. I am not able to change the query result of this SQL Statement and I need to do pagination on it.
I searched for solution on the Internet, but all of them suggested to use an arbitrary column, like primary key, in order by clause. But it will change the original order.
The short answer is that it can't be done, or at least can't be done properly.
The problem is that SQL Server (or any RDBMS) does not and can not guarantee the order of the records returned from a query without an order by clause.
This means that you can't use paging on such queries.
Further more, if you use an order by clause on a column that appears multiple times in your resultset, the order of the result set is still not guaranteed inside groups of values in said column - quick example:
;WITH cte (a, b)
AS
(
SELECT 1, 'a'
UNION ALL
SELECT 1, 'b'
UNION ALL
SELECT 2, 'a'
UNION ALL
SELECT 2, 'b'
)
SELECT *
FROM cte
ORDER BY a
Both result sets are valid, and you can't know in advance what will you get:
a b
-----
1 b
1 a
2 b
2 a
a b
-----
1 a
1 b
2 a
2 b
(and of course, you might get other sorts)
The problem here is that the *Given SQL Statement" is unpredictable. It may or may not contain order by clause.
your inner query(unpredictable sql statement) should not contain order by,even if it contains,order is not guaranteed.
To get guaranteed order,you have to order by some column.for the results to be deterministic,the ordered column/columns should be unique
Please note: what I'm about to suggest is probably horribly inefficient and should really only be used to help you go back to the project leader and tell them that pagination of an unordered query should not be done. Having said that...
From your comments you say you are able to change the SQL statement before it is executed.
You could write the results of the original query to a temporary table, adding row count field to be used for subsequent pagination ordering.
Therefore any original ordering is preserved and you can now paginate.
But of course the reason for needing pagination in the first place is to avoid sending large amounts of data to the client application. Although this does prevent that, you will still be copying data to a temp table which, depending on the row size and count, could be very slow.
You also have the problem that the page size is coming from the client as part of the SQL statement. Parsing the statement to pick that out could be tricky.
As other notified using anyway without using a sorted query will not be safe, But as you know about it and search about it, I can suggest using a query like this (But not recommended as a good way)
;with cte as (
select *,
row_number() over (order by (select 0)) rn
from (
-- Your query
) t
)
select *
from cte
where rn between (#pageNumber-1)*#pageSize+1 and #pageNumber*#pageSize
[SQL Fiddle Demo]
I finally found a simple way to do it without any order by on a specific column:
declare #start AS INTEGER = 1, #count AS INTEGER = 5;
select * from (SELECT *,ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS fakeCounter
FROM (select * from mytable) AS t) AS t2 order by fakeCounter OFFSET #start ROWS
FETCH NEXT #count ROWS ONLY
where select * from mytable can be any query

Compare result of two table functions using one column from each

According the instructions here I have created two functions that use EXECUTE FORMAT and return the same table of (int,smallint).
Sample definitions:
CREATE OR REPLACE FUNCTION function1(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
CREATE OR REPLACE FUNCTION function2(IN _tbl regclass, IN _tbl2 regclass,
IN field1 integer)
RETURNS TABLE(id integer, dist smallint)
Both functions return the exact same number of rows. Sample result (will be always ordered by dist):
(49,0)
(206022,3)
(206041,3)
(92233,4)
Is there a way to compare values of the second field between the two functions for the same rows, to ensure that both results are the same:
For example:
SELECT
function1('tblp1','tblp2',49),function2('tblp1_v2','tblp2_v2',49)
Returns something like:
(49,0) (49,0)
(206022,3) (206022,3)
(206041,3) (206041,3)
(92233,4) (133,4)
Although I am not expecting identical results (each function is a topK query and I have ties which are broken arbitrarily / with some optimizations in the second function for faster performance) I can ensure that both functions return correct results, if for each row the second numbers in the results are the same. In the example above, I can ensure I get correct results, because:
1st row 0 = 0,
2nd row 3 = 3,
3rd row 3 = 3,
4th row 4 = 4
despite the fact that for the 4th row, 92233!=133
Is there a way to get only the 2nd field of each function result, to batch compare them e.g. with something like:
SELECT COUNT(*)
FROM
(SELECT
function1('tblp1','tblp2',49).field2,
function2('tblp1_v2','tblp2_v2',49).field2 ) n2
WHERE function1('tblp1','tblp2',49).field2 != function1('tblp1','tblp2',49).field2;
I am using PostgreSQL 9.3.
Is there a way to get only the 2nd field of each function result, to batch compare them?
All of the following answers assume that rows are returned in matching order.
Postgres 9.3
With the quirky feature of exploding rows from SRF functions returning the same number of rows in parallel:
SELECT count(*) AS mismatches
FROM (
SELECT function1('tblp1','tblp2',49) AS f1
, function2('tblp1_v2','tblp2_v2',49) AS f2
) sub
WHERE (f1).dist <> (f2).dist; -- note the parentheses!
The parentheses around the row type are necessary to disambiguate from a possible table reference. Details in the manual here.
This defaults to Cartesian product of rows if the number of returned rows is not the same (which would break it completely for you).
Postgres 9.4
WITH ORDINALITY to generate row numbers on the fly
You can use WITH ORDINALITY to generate a row number o the fly and don't need to depend on pairing the result of SRF functions in the SELECT list:
SELECT count(*) AS mismatches
FROM function1('tblp1','tblp2',49) WITH ORDINALITY AS f1(id,dist,rn)
FULL JOIN function2('tblp1_v2','tblp2_v2',49) WITH ORDINALITY AS f2(id,dist,rn) USING (rn)
WHERE f1.dist IS DISTINCT FROM f2.dist;
This works for the same number of rows from each function as well as differing numbers (which would be counted as mismatch).
Related:
PostgreSQL unnest() with element number
ROWS FROM to join sets row-by-row
SELECT count(*) AS mismatches
FROM ROWS FROM (function1('tblp1','tblp2',49)
, function2('tblp1_v2','tblp2_v2',49)) t(id1, dist1, id2, dist2)
WHERE t.dist1 IS DISTINCT FROM t.dist2;
Related answer:
Is it possible to answer queries on a view before fully materializing the view?
Aside:
EXECUTE FORMAT is not a set plpgsql functionality. RETURN QUERY is. format() is just a convenient function for building a query string, can be used anywhere in SQL or plpgsql.
The order in which the rows are returned from the functions is not guaranteed. If you can return the row_number() (rn in the below example) from the functions then:
select
count(f1.dist is null or f2.dist is null or null) as diff_count
from
function1('tblp1','tblp2',49) f1
inner join
function2('tblp1_v2','tblp2_v2',49) f2 using(rn)
For future reference:
Checking difference in number of rows:
SELECT
ABS(count(f1a.*)-count(f2a.*))
FROM
(SELECT f1.dist, row_number() OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a FULL JOIN
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
USING (rn);
Checking difference in dist for same ordered rows:
SELECT
COUNT(*)
FROM
(SELECT f1.dist, row_number() OVER(ORDER BY f1.dist) rn
FROM
function1('tblp1','tblp2',49) f1)
f1a
(SELECT f2.dist, row_number() OVER(ORDER BY f2.dist) rn
FROM
function2('tblp1_v2','tblp2_v2',49) f2) f2a
WHERE f1a.rn=f2a.rn
AND f1a.distance <> f2a.distance;
A simple OVER() might also work since results of the functions are already ordered but is added for extra check.

How to retrieve the last 2 records from table?

I have a table with n number of records
How can i retrieve the nth record and (n-1)th record from my table in SQL without using derived table ?
I have tried using ROWID as
select * from table where rowid in (select max(rowid) from table);
It is giving the nth record but i want the (n-1)th record also .
And is there any other method other than using max,derived table and pseudo columns
Thanks
You cannot depend on rowid to get you to the last row in the table. You need an auto-incrementing id or creation time to have the proper ordering.
You can use, for instance:
select *
from (select t.*, row_number() over (order by <id> desc) as seqnum
from t
) t
where seqnum <= 2
Although allowed in the syntax, the order by clause in a subquery is ignored (for instance http://docs.oracle.com/javadb/10.8.2.2/ref/rrefsqlj13658.html).
Just to be clear, rowids have nothing to do with the ordering of rows in a table. The Oracle documentation is quite clear that they specify a physical access path for the data (http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i6732). It is true that in an empty database, inserting records into a newtable will probably create a monotonically increasing sequence of row ids. But you cannot depend on this. The only guarantees with rowids are that they are unique within a table and are the fastest way to access a particular row.
I have to admit that I cannot find good documentation on Oracle handling or not handling order by's in subqueries in its most recent versions. ANSI SQL does not require compliant databases to support order by in subqueries. Oracle syntax allows it, and it seems to work in some cases, at least. My best guess is that it would probably work on a single processor, single threaded instance of Oracle, or if the data access is through an index. Once parallelism is introduced, the results would probably not be ordered. Since I started using Oracle (in the mid-1990s), I have been under the impression that order bys in subqueries are generally ignored. My advice would be to not depend on the functionality, until Oracle clearly states that it is supported.
select * from (select * from my_table order by rowid) where rownum <= 2
and for rows between N and M:
select * from (
select * from (
select * from my_table order by rowid
) where rownum <= M
) where rownum >= N
Try this
select top 2 * from table order by rowid desc
Assuming rowid as column in your table:
SELECT * FROM table ORDER BY rowid DESC LIMIT 2

SQLServer SQL query with a row counter

I have a SQL query, that returns a set of rows:
SELECT id, name FROM users where group = 2
I need to also include a column that has an incrementing integer value, so the first row needs to have a 1 in the counter column, the second a 2, the third a 3 etc
The query shown here is just a simplified example, in reality the query could be arbitrarily complex, with several joins and nested queries.
I know this could be achieved using a temporary table with an autonumber field, but is there a way of doing it within the query itself ?
For starters, something along the lines of:
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
However, it's important to note that the ROW_NUMBER() OVER (ORDER BY ...) construct only determines the values of Row_Counter, it doesn't guarantee the ordering of the results.
Unless the SELECT itself has an explicit ORDER BY clause, the results could be returned in any order, dependent on how SQL Server decides to optimise the query. (See this article for more info.)
The only way to guarantee that the results will always be returned in Row_Counter order is to apply exactly the same ordering to both the SELECT and the ROW_NUMBER():
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY my_order_column) AS Row_Counter
FROM my_table
ORDER BY my_order_column -- exact copy of the ordering used for Row_Counter
The above pattern will always return results in the correct order and works well for simple queries, but what about an "arbitrarily complex" query with perhaps dozens of expressions in the ORDER BY clause? In those situations I prefer something like this instead:
SELECT t.*
FROM
(
SELECT my_first_column, my_second_column,
ROW_NUMBER() OVER (ORDER BY ...) AS Row_Counter -- complex ordering
FROM my_table
) AS t
ORDER BY t.Row_Counter
Using a nested query means that there's no need to duplicate the complicated ORDER BY clause, which means less clutter and easier maintenance. The outer ORDER BY t.Row_Counter also makes the intent of the query much clearer to your fellow developers.
In SQL Server 2005 and up, you can use the ROW_NUMBER() function, which has options for the sort order and the groups over which the counts are done (and reset).
The simplest way is to use a variable row counter. However it would be two actual SQL commands. One to set the variable, and then the query as follows:
SET #n=0;
SELECT #n:=#n+1, a.* FROM tablename a
Your query can be as complex as you like with joins etc. I usually make this a stored procedure. You can have all kinds of fun with the variable, even use it to calculate against field values. The key is the :=
Heres a different approach.
If you have several tables of data that are not joinable, or you for some reason dont want to count all the rows at the same time but you still want them to be part off the same rowcount, you can create a table that does the job for you.
Example:
create table #test (
rowcounter int identity,
invoicenumber varchar(30)
)
insert into #test(invoicenumber) select [column] from [Table1]
insert into #test(invoicenumber) select [column] from [Table2]
insert into #test(invoicenumber) select [column] from [Table3]
select * from #test
drop table #test