SQL Join doesn't match all elements - sql

I was solving this question: How can I randomly do a partial outer join in SQL At the end didnt work because can assign multiple times the same row.
But I have a behavior I can't explain where the query doesn't return the expected number of rows
SQL DEMO
WITH tableA as (
SELECT T.id
FROM ( VALUES (111), (222), (333), (444), (555) ) T(id)
), tableB as (
SELECT *, row_number() over (order by note) as rn
FROM ( VALUES ('a'), ('b'), ('c'), ('d'), ('e'),
('f'), ('g'), ('h'), ('i'), ('j'),
('k'), ('l'), ('m'), ('n'), ('o')
) T(note)
), parameter as (
SELECT 3 as row_limit, (SELECT MAX(rn) FROM tableB) as max_limit
), Nums AS (
SELECT n = ROW_NUMBER() OVER (ORDER BY [object_id])
FROM sys.all_objects
), random_id as (
SELECT tableA.*, T.n, floor(p.max_limit * RAND(convert(varbinary, newid()))) + 1 magic_number
FROM tableA
CROSS JOIN parameter p
CROSS JOIN (SELECT n
FROM Nums
CROSS JOIN parameter p
WHERE n <= p.row_limit ) T
)
-- SELECT * FROM random_id
SELECT R.*, note
FROM random_id R
JOIN tableB
ON R.magic_number = tableB.rn
ORDER BY id
The setup: tableA 5 rows, tableB 15 rows. 3 random tableB rows for each row in tableA. So in total should return 3 * 5 = 15 rows
I create a row_number() from 1 to 15 to match to the magic number
Create the random_id cte to assing three random number to each row of tableA. Here you can see the 15 rows with a random number, also show the problem when assign same value twice
SELECT * FROM random_id;
But the JOIN return a random number of rows. more and less than 15
SELECT R.*, note
FROM random_id R
JOIN tableB
ON R.magic_number = tableB.rn
ORDER BY id
But if I use LEFT JOIN instead always return 15 rows.
Question: If random_id cte always return 15 rows how JOIN return more rows, and how return less if all rn values are in the tableB.
And how LEFT JOIN always return 15 rows.
I just test another query where I include the n value and the JOIN

Unfortunately you would have to persist the rows that are using the magic_number to temporary table or other similar construct.
rextester demo here: http://rextester.com/ICS74177
Unfortunately persisting it to a temporary table does result in a loss of elegance of the answer you were attempting. I have run into the same situation in the past trying to do the same thing and encountered the same bug. It's both somewhat exciting and ultimately disappointing when you run into it for the first time, so congrats on that at least!
I can not explain it any better, so please upvote Paul White's answer here: https://dba.stackexchange.com/a/30348/43889
Reference:
bug with newid() and table expressions (ctes)
newid() In Joined Virtual Table Causes Unintended Cross Apply Behavior - Answer by Paul White

Related

MS SQL Server how to instantly insert a 1 to 10 number column to table (virtual column)?

any idea about to instantly add a column for number (1 to 10) for each rows value on existing table?
You can use a CROSS JOIN in concert with an ad-hoc tally table
Example
Select A.*
,B.Code
From YourTable A
Cross Join ( Select Top 10 Code=row_number() Over (Order By (Select NULL)) From master..spt_values n1 ) B
You can generate the rows with a recurisve query, then cross join that with your table.
with codes as (
select 1 code
union all select code + 1 from cte where code < 10
)
select t.*, c.code
from mytable t
cross join codes c
For a small number of rows, I would expect the recusive query to be faster than top 10 against a large table.

Msg 120, Level 15, State 1, Procedure Generate_Exame, Line 6,The select list for the INSERT statement contains fewer items than the insert list

I want insert in question table that has these columns
C#_T_F_Id, C#_T_F_Q, C#_T_F_Choices, C#_Mcq_Id, C#_MCQ_Q, C#_Choices
After execute Generate_Exame procedure what should I do :
create procedure Generate_Exame
#course_id int
as
if #course_id = 600
begin
insert into [dbo].[Question](C#_T_F_Id, C#_T_F_Q, C#_T_F_Choices,
C#_Mcq_Id, C#_MCQ_Q, C#_Choices)
select *
from
(select top(3)
T.C#_T_F_Id, T.C#_T_F_Q, T.C#_T_F_Choices
from
C#_T_F T
order by
newid()) as t1
union all
select *
from
(select top(7)
C.C#_Mcq_Id C#_Q_id, C.C#_MCQ_Q C#_question, C.C#_Choices Choices
from
C#_MCQ C
order by
newid()) as t2)
end
If I understand well you want to:
Insert data into a table from a combined result set.
Combine two result sets side by side. The first one provides columns 1, 2, and 3, while the second one provides column 4, 5, and 6.
On top of this both result sets (left and right) do not have the same lenght. One has 3 rows, while the other has 7 rows. I assume these numbers may vary.
There's no set order for the rows on the left, or the rows on the right. You are producing them by ordering using a random UUID, so that can change every time you run the query.
In order to do this you need to produce a row number on each side. Then a simple full join will combine both result sets.
For example:
insert into [dbo].[Question] (
C#_T_F_Id, C#_T_F_Q, C#_T_F_Choices,
C#_Mcq_Id, C#_MCQ_Q, C#_Choices
)
select -- Step #4: produce combined rows, ready for insert
a.T.C#_T_F_Id, a.T.C#_T_F_Q, a.T.C#_T_F_Choices,
b.C#_Q_id, b.C#_question, b.Choices
from ( -- Step #1: Produce the left result set with row number (rn)
select *, row_number() over(order by ord) as rn
from (
select top(3)
T.C#_T_F_Id, T.C#_T_F_Q, T.C#_T_F_Choices,
newid() as ord
from C#_T_F T
order by ord
) x
) a
full join ( -- Step #2: Produce the right result set with row number (rn)
select *, row_number() over(order by ord) as rn
from (
select top(7)
C.C#_Mcq_Id C#_Q_id, C.C#_MCQ_Q C#_question, C.C#_Choices Choices,
newid() as ord
from C#_MCQ C
order by ord
) y
) b on a.rn = b.rn -- Step #3: Full join both result sets by row number (rn)
You are having six columns in the INSERT clause. But, you have only 3 columns coming out of the UNION query.
-- You are inserting 6 columns
insert into [dbo].[Question](C#_T_F_Id, C#_T_F_Q, C#_T_F_Choices,
C#_Mcq_Id, C#_MCQ_Q, C#_Choices)
-- You are selecting only 3 columns.
select *
from
(select top(3)
T.C#_T_F_Id, T.C#_T_F_Q, T.C#_T_F_Choices
from
C#_T_F T
order by
newid()) as t1
union all
select *
from
(select top(7)
C.C#_Mcq_Id C#_Q_id, C.C#_MCQ_Q C#_question, C.C#_Choices Choices
from
C#_MCQ C
order by
newid()) as t2)
If you need to have 6 columns, you need to join the two SELECT statements in some way, based on JOIN condition.

How to join two tables with the same number of rows in SQLite?

I have almost the same problem as described in this question. I have two tables with the same number of rows, and I would like to join them together one by one.
The tables are ordered, and I would like to keep this order after the join, if it is possible.
There is a rowid based solution for MSSql, but in SQLite rowid can not be used if the table is coming from a WITH statement (or RECURSIVE WITH).
It is guaranteed that the two tables have the exact same number of rows, but this number is not known beforehand. It is also important to note, that the same element may occur more than twice. The results are ordered, but none of the columns are unique.
Example code:
WITH
table_a (n) AS (
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
)
SELECT table_a.n, table_b.s
FROM table_a
LEFT JOIN table_b ON ( table_a.rowid = table_b.rowid )
The result I would like to achieve is:
(2, 'valuex'),
(4, 'valuey'),
(5, 'valuez')
SQLFiddle: http://sqlfiddle.com/#!5/9eecb7/6888
This is quite complicated in SQLite -- because you are allowing duplicates. But you can do it. Here is the idea:
Summarize the table by the values.
For each value, get the count and offset from the beginning of the values.
Then use a join to associate the values and figure out the overlap.
Finally use a recursive CTE to extract the values that you want.
The following code assumes that n and s are ordered -- as you specify in your question. However, it would work (with small modifications) if another column specified the ordering.
You will notice that I have included duplicates in the sample data:
WITH table_a (n) AS (
SELECT 2 UNION ALL
SELECT 4 UNION ALL
SELECT 4 UNION ALL
SELECT 4 UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex' UNION ALL
SELECT 'valuey' UNION ALL
SELECT 'valuey' UNION ALL
SELECT 'valuez' UNION ALL
SELECT 'valuez'
),
a as (
select a.n, count(*) as a_cnt,
(select count(*) from table_a a2 where a2.n < a.n) as a_offset
from table_a a
group by a.n
),
b as (
select b.s, count(*) as b_cnt,
(select count(*) from table_b b2 where b2.s < b.s) as b_offset
from table_b b
group by b.s
),
ab as (
select a.*, b.*,
max(a.a_offset, b.b_offset) as offset,
min(a.a_offset + a.a_cnt, b.b_offset + b.b_cnt) - max(a.a_offset, b.b_offset) as cnt
from a join
b
on a.a_offset + a.a_cnt - 1 >= b.b_offset and
a.a_offset <= b.b_offset + b.b_cnt - 1
),
cte as (
select n, s, offset, cnt, 1 as ind
from ab
union all
select n, s, offset, cnt, ind + 1
from cte
where ind < cnt
)
select n, s
from cte
order by n, s;
Here is a DB Fiddle showing the results.
I should note that this would be much simpler in almost any other database, using window functions (or perhaps variables in MySQL).
Since the tables are ordered, you can add row_id values by comparing n values.
But still the best way in order to get better performance would be inserting the ID values while creating the tables.
http://sqlfiddle.com/#!5/9eecb7/7014
WITH
table_a_a (n, id) AS
(
WITH table_a (n) AS
(
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
)
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
) ,
table_b_b (n, id) AS
(
WITH table_a (n) AS
(
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
)
SELECT table_a.n, (select count(1) from table_a b where b.n <= table_a.n) id
FROM table_a
)
select table_a_a.n,table_b_b.n from table_a_a,table_b_b where table_a_a.ID = table_b_b.ID
or convert the input set to comma separated list and try like this:
http://sqlfiddle.com/#!5/9eecb7/7337
WITH RECURSIVE table_b( id,element, remainder ) AS (
SELECT 0,NULL AS element, 'valuex,valuey,valuz,valuz' AS remainder
UNION ALL
SELECT id+1,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM table_b
WHERE remainder IS NOT NULL
),
table_a( id,element, remainder ) AS (
SELECT 0,NULL AS element, '2,4,5,7' AS remainder
UNION ALL
SELECT id+1,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, 0, INSTR( remainder, ',' ) )
ELSE
remainder
END AS element,
CASE
WHEN INSTR( remainder, ',' )>0 THEN
SUBSTR( remainder, INSTR( remainder, ',' )+1 )
ELSE
NULL
END AS remainder
FROM table_a
WHERE remainder IS NOT NULL
)
SELECT table_b.element, table_a.element FROM table_b, table_a WHERE table_a.element IS NOT NULL and table_a.id = table_b.id;
SQL
SELECT a1.n, b1.s
FROM table_a a1
LEFT JOIN table_b b1
ON (SELECT COUNT(*) FROM table_a a2 WHERE a2.n <= a1.n) =
(SELECT COUNT(*) FROM table_b b2 WHERE b2.s <= b1.s)
Explanation
The query simply counts the number of rows up until the current one for each table (based on the ordering column) and joins on this value.
Demo
See SQL Fiddle demo.
Assumptions
A single column in used for the ordering in each table. (But the query could easily be modified to allow multiple ordering columns).
The ordering values in each table are unique.
The values in the ordering column aren't necessarily the same between the two tables.
It is known that table_a contains either the same or more rows than table_b. (If this isn't the case then a FULL OUTER JOIN would need to be emulated since SQLite doesn't provide one.)
No further changes to the table structure are allowed. (If they are, it would be more efficient to have pre-populated columns for the ordering).
Either way...
Use something like
WITH
v_table_a (n, rowid) AS (
SELECT 2, 1
UNION ALL
SELECT 4, 2
UNION ALL
SELECT 5, 3
),
v_table_b (s, rowid) AS (
SELECT 'valuex', 1
UNION ALL
SELECT 'valuey', 2
UNION ALL
SELECT 'valuez', 3
)
SELECT v_table_a.n, v_table_b.s
FROM v_table_a
LEFT JOIN v_table_b ON ( v_table_a.rowid = v_table_b.rowid );
for "virtual" tables (with WITH or without),
WITH RECURSIVE vr_table_a (n, rowid) AS (
VALUES (2, 1)
UNION ALL
SELECT n + 2, rowid + 1 FROM vr_table_a WHERE rowid < 3
)
, vr_table_b (s, rowid) AS (
VALUES ('I', 1)
UNION ALL
SELECT s || 'I', rowid + 1 FROM vr_table_b WHERE rowid < 3
)
SELECT vr_table_a.n, vr_table_b.s
FROM vr_table_a
LEFT JOIN vr_table_b ON ( vr_table_a.rowid = vr_table_b.rowid );
for "virtual" tables using recursive WITHs (in this example the values are others then yours, but I guess you get the point) and
CREATE TABLE p_table_a (n INT);
INSERT INTO p_table_a VALUES (2), (4), (5);
CREATE TABLE p_table_b (s VARCHAR(6));
INSERT INTO p_table_b VALUES ('valuex'), ('valuey'), ('valuez');
SELECT p_table_a.n, p_table_b.s
FROM p_table_a
LEFT JOIN p_table_b ON ( p_table_a.rowid = p_table_b.rowid );
for physical tables.
I'd be careful with the last one though. A quick test shows, that the numbers of rowid are a) reused -- when some rows are deleted and others are inserted, the inserted rows get the rowids from the old rows (i.e. rowid in SQLite isn't unique past the lifetime of a row, whereas e.g. Oracle's rowid AFAIR is) -- and b) corresponds to the order of insertion. But I don't know and didn't find a clue in the documentation, if that's guaranteed or is subject to change in other/future implementations. Or maybe it's just a mere coincidence in my test environment.
(In general physical order of rows may be subject to change (even within the same database using the same DMBS as a result of some reorganization) and is therefore no good choice to rely on. And it's not guaranteed, a query will return the result ordered by physical position in the table as well (it might use the order of some index instead or have a partial result ordered some other way influencing the output's order). Consider designing your tables using common (sort) keys in corresponding rows for ordering and to join on.)
You can create temp tables to carry CTE data row. then JOIN them by sqlite row_id column.
CREATE TEMP TABLE temp_a(n integer);
CREATE TEMP TABLE temp_b(n VARCHAR(255));
WITH table_a(n) AS (
SELECT 2 n
UNION ALL
SELECT 4
UNION ALL
SELECT 5
UNION ALL
SELECT 5
)
INSERT INTO temp_a (n) SELECT n FROM table_a;
WITH table_b (n) AS
(
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
UNION ALL
SELECT 'valuew'
)
INSERT INTO temp_b (n) SELECT n FROM table_b;
SELECT *
FROM temp_a a
INNER JOIN temp_b b on a.rowid = b.rowid;
sqlfiddle:http://sqlfiddle.com/#!5/9eecb7/7252
It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it.
Something like this:
with tablea AS (
select id, rowid AS rid from someids),
tableb AS (
select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
tablea
left join tableb on tablea.rid = tableb.rid;
It is however as they have already warned you a really bad idea. What if the app breaks after inserting in one table but before the other one? What if you delete an old row? If you want to join two tables you need to specify the field to do so. There are so many things that could go wrong with this design. The most similar thing to this would be an incremental id field that you would save in the table and use in your application. Even simpler, make those into one table.
Read this link for more information about the rowid: https://www.sqlite.org/lang_createtable.html#rowid
sqlfiddle: http://sqlfiddle.com/#!7/29fd8/1
It is possible to use the rowid inside a with statement but you need to select it and make it available to the query using it. Something like this:
with tablea AS (select id, rowid AS rid from someids),
tableb AS (select details, rowid AS rid from somedetails)
select tablea.id, tableb.details
from
tablea
left join tableb on tablea.rid = tableb.rid;
The problem statement indicates:
The tables are ordered
If this means that the ordering is defined by the ordering of the values in the UNION ALL statements, and if SQLite respects that ordering, then the following solution may be of interest because, apart from small tweaks to the last three lines of the sample program, it adds just two lines:
A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)
That is, table A is table_a augmented with a rowid, and similarly for table B.
Unfortunately, there is a caveat, though it might just be the result of my not having found the relevant specifications. Before delving into that, however, here is the full proposed solution:
WITH
table_a (n) AS (
SELECT 2
UNION ALL
SELECT 4
UNION ALL
SELECT 5
),
table_b (s) AS (
SELECT 'valuex'
UNION ALL
SELECT 'valuey'
UNION ALL
SELECT 'valuez'
),
A(rid,n) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, n FROM table_a),
B(rid,s) AS (SELECT ROW_NUMBER() OVER ( ORDER BY 1 ) rid, s FROM table_b)
SELECT A.n, B.s
FROM A LEFT JOIN B
ON ( A.rid = B.rid );
Caveat
The proposed solution has been tested against a variety of data sets using sqlite version 3.29.0, but whether or not it is, and will continue to be, "guaranteed" to work is unclear to me.
Of course, if SQLite offers no guarantees with respect to the ordering of the UNION ALL statements (that is, if the question is based on an incorrect assumption), then it would be interesting to see a well-founded reformulation.

How to find missing number from single column of table using SQL?

There are 999 rows and they have distinct 0-1000 number with one number missing in it. How to find that number using a SQL query?
Use something like this:
SELECT Min(ID) As firstMissedID
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY ID) rn
FROM std) dt
WHERE rn < ID
I am assuming that number column will have only numbers from 0 to 1000.
declare #minnumber int
declare #missingnumber int
select #minnumber=min(number) from Test
if(#minnumber=0)
begin
--500500 is the sum of 1 to 1000 number
select #missingnumber=500500-sum(number) from Test
end
else
begin
set #missingnumber=0
end
Use a tally table to get the list of 1000 rows then use LEFT OUTER JOIN or NOT EXISTS to find the missing Number
WITH Tally (n) AS
(
-- 1000 rows
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) a(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) b(n)
CROSS JOIN (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) c(n)
)
SELECT Min(n)
FROM Tally T
LEFT OUTER JOIN YOURTABLE Y
ON T.N = Y.MISSING_NUM_COLUMN
WHERE Y.MISSING_NUM_COLUMN IS NULL
If you want to find the list of missing Numbers then remove the MIN operator from the last select query.
SELECT Missing_Numbers = n
FROM Tally T
LEFT OUTER JOIN YOURTABLE Y
ON T.N = Y.MISSING_NUM_COLUMN
WHERE Y.MISSING_NUM_COLUMN IS NULL
not sure if this is the best approach but it'll work.
Create a temp table with a column to hold integer values
=#temptbl1
fill the table with values 1 to 1000 then run this query which will return an entry only if it's not contained in your datatable.
select value from #temptable1 where value not in (select Othervalue from othertable)
the select statement will return all missing values.

Pick random number within a range

I have a table (TableA) with an integer column(ColumnA) and there is data already in the table. Need to write a select statement to insert into this table with the integer column having random values within 5000. This value should not already be in columnA of TableA
Insert into TableA (columnA,<collist....>)
SELECT <newColId> ,<collist....> from TableB <where clause>
you can create a helper numbers table for this:
-- create helper numbers table, faster than online recursive CTE
-- can use master..spt_values, but actually numbers table will be useful
-- for other tasks too
create table numbers (n int primary key)
;with cte_numbers as (
select 1 as n
union all
select n + 1 from cte_numbers where n < 5000
)
insert into numbers
select n
from cte_numbers
option (maxrecursion 0);
and then insert some numbers you don't have in TableA (using join on row_number() so you can insert multiple rows at once):
;with cte_n as (
select n.n, row_number() over(order by newid()) as rn
from numbers as n
where not exists (select * from tableA as t where t.columnA = n.n)
), cte_b as (
select
columnB, row_number() over(order by newid()) as rn
from tableB
)
insert into TableA(columnA, columnB)
select n.n, b.ColumnB
from cte_b as b
inner join cte_n as n on n.rn = b.rn
If you're sure that there could be only one row from TableB which will be inserted, you can
use this query
insert into TableA(columnA, columnB)
select
a.n, b.columnB
from tableB as b
outer apply (
select top 1 n.n
from numbers as n
where not exists (select * from tableA as t where t.columnA = n.n)
order by newid()
) as a
Note it's better to have index on ColumnA column to check existence faster.
sql fiddle demo