UNION between select with 0 rows and select with 20745 gives 20740 - sql

In SQL Server 2008 I have a behaviour I don't understand.
I'm doing a UNION between two select statements.
First select returns 20745 rows
Second select returns 0 rows
When I using union bewteen the two selects, I get 20740 rows, I would exspect 20745 as union only returns distinct values.
To get the excepted result I used union all but there is something I don't understand about it. Does anyone have an explanation?

There must be duplicate rows in your first SELECT statement. Note that UNION eliminates duplicates from your result set.
If you want to return all rows, use UNION ALL instead.
Example:
--UNION ALL
WITH TableA(n) AS (
SELECT * FROM (
VALUES(1),(2),(3),(4),(1)
)t(n)
),
TableB(n) AS (
SELECT * FROM (
VALUES(10),(20),(30),(40)
)t(n)
)
SELECT n FROM TableA UNION ALL
SELECT n FROM TableB
The above will return:
n
-----------
1
2
3
4
1
10
20
30
40
While the UNION variant
SELECT n FROM TableA UNION
SELECT n FROM TableB
will return:
n
-----------
1
2
3
4
10
20
30
40

union removes duplicate results, regardless of whether they come from two different selects or from the same one. If you want to preserve duplicated, use union all instead:
SELECT *
FROM table1
UNION ALL
SELECT *
FROM table2

First select statement has duplicates :) That's normal behavior.
Try putting a distinct in the first select statement - it should also return 20740 rows.
That should help you better understand what is happening.

Related

SQL UNION ALL but with lots of columns on BigQuery?

Above image is a screenshot of my table just as a quick initial reference.
The focal point are the multiple mech columns (mech1, mech2, mech3, and mech4).
Board games in this tables have multiple attributes called mechanisms so I've separated them into 4 different columns.
So I've learned how to combine columns vertically via UNION ALL so that I can query the count of all unique game mechanisms in my table.
However, it got me wondering if there's a shorter and more efficient way to achieve what I've done:
WITH mechanism_info AS
(
WITH
mechanism_col_combined AS
(
SELECT mech1 AS all_mech_columns_combined
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
## There's no IS NOT NULL condition defined for column 'mech1' since there's at least one mechanism noted for a game.
SELECT mech2
FROM `ckda-portfolio-2022.bg_collection.base`
WHERE mech2 IS NOT NULL
UNION ALL
SELECT mech3
FROM `ckda-portfolio-2022.bg_collection.base`
WHERE mech3 IS NOT NULL
UNION ALL
SELECT mech4
FROM `ckda-portfolio-2022.bg_collection.base`
WHERE mech4 IS NOT NULL
)
## Temporary table with all mechanism column in the collection combined.
SELECT DISTINCT(all_mech_columns_combined) AS unique_mechanisms, COUNT(*) AS count
FROM mechanism_col_combined
GROUP BY all_mech_columns_combined
ORDER BY all_mech_columns_combined
)
SELECT *
FROM mechanism_info
By querying this temp. table, SQL returns the information that I've anticipated as below:
unique_mechanisms | count
Acting | 1
Action Points | 3
Action Queue | 1
Action Retrieval | 1
Area Movement | 1
Auction/Bidding | 5
Bag Building | 1
Betting & Bluffing| 2
Bingo | 1
Bluffing | 7
Now, I want to shorten my code and I know there has to be a way to shorten the repetitive process of combining columns with UNION ALL.
And if there's any other tips or methods on how to shorten my query, please let me know!
Thank you.
You can convert the multiple columns [mech1, mech2, ...] into a column of array mech_arr and then using UNNEST to convert the column to have scalar value in each row.
For example:
WITH table1 AS (
SELECT 'AA' AS mech1, 'BB' AS mech2, 'CC' AS mech3,
UNION ALL SELECT 'AA' AS mech1, 'CC' AS mech2, 'EE' AS mech3
),
table2 AS (SELECT [mech1, mech2, mech3] AS mech_arr FROM table1)
SELECT mech, COUNT(*) AS mech_counts
FROM table2, UNNEST(mech_arr) AS mech
GROUP BY mech
Output
mech mech_counts
AA 2
BB 1
CC 2
EE 1
You could send join into the table, but the performance would not improve and the query would be just as long.
You can simplify as follows:
SELECT
mech_column,
count(*) "number"
FROM (
SELECT mech1 AS mech_column
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech2
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech3
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech4
FROM `ckda-portfolio-2022.bg_collection.base`
) m
WHERE mech_column IS NOT NULL
GROUP BY mech_column
ORDER BY mech_column;
Didn't find a smoother way to query but I did find a way to remove the process of adding WHERE column IS NOT NULL for each and every columns that was used to vertically aggregate them into a single column:
mechanism_info AS
(
WITH
mechanism_col_combined AS
(
SELECT mech1 AS mech_columns
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech2
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech3
FROM `ckda-portfolio-2022.bg_collection.base`
UNION ALL
SELECT mech4
FROM `ckda-portfolio-2022.bg_collection.base`
## Removed all WHERE clause from the above columns
and added it below instead.
)
## Temporary table with all mechanism columns in the collection combined.
SELECT DISTINCT(mech_columns) AS mechanisms, COUNT(*) AS count
FROM mechanism_col_combined
WHERE mech_columns IS NOT NULL ## <--- Added here!
GROUP BY mech_columns
ORDER BY mech_columns
)
SELECT *
FROM mechanism_info
Since mechanism_info is a nested temp. table, I can just add WHERE mech_columns IS NOT NULL clause and condition to the initial temp. table's setting.
I'm still looking to reduce this query down to something more efficient. It's unfortunate that UNION ALL can't select multiple columns with a single call :(

Why does this sql snippet return 8 or 1 always?

What is the result of:
WITH Tbl AS (SELECT 5 AS A UNION SELECT 6 AS A)
SELECT COUNT(*) AS Tbl FROM Tbl AS A, Tbl AS B, Tbl AS C;
I know the result is supposed to be 8 but I don't know why. Also when I change both values (the 5 or 6) to the same thing it returns a table with the value 1 instead of 8 but all other instances it returns 8 no matter what numbers if they are different. I tested it out with an online sql executor.
Here is what the query does:
the common table expression (the subquery within the with clause) generates a derived table made of two rows
then, in the outer query, the from clause generates a cartesian product of this resultset twice: that's a total of 8 rows (2 * 2 * 2)
the select clause counts the number of rows - that's 8
The content of the rows in the with clause does not matter: this 5 and 6 could very well be foo and bar, or null and null, the result would be the same.
What makes a difference is the number of rows that the with clause generates. If it was generating just one row, you would get 1 as a result (1 * 1 * 1). If it was generating 3 rows, you would get 27 - and so on.
This expression:
WITH Tbl AS (SELECT 5 AS A UNION SELECT 6 AS A)
creates a (derived) table with two rows.
This expression:
WITH Tbl AS (SELECT 5 AS A UNION SELECT 5 AS A)
creates a (derived) table with one row, because UNION removes duplicates.
The rest of the query just counts the number of rows in the 3-way Cartesian product, which is either 111 or 222.

How to return the rows between 20th and 30th in Oracle Sql [duplicate]

This question already has answers here:
Oracle SQL: Filtering by ROWNUM not returning results when it should
(2 answers)
Closed 4 years ago.
so I have a large table that I'd like to output, however, I only want to see the rows between 20 and 30.
I tried
select col1, col2
from table
where rownum<= 30 and rownum>= 20;
but sql gave an error
I also tried --where rownum between 20 and 30
it also did not work.
so whats the best way to do this?
SELECT *
FROM T
ORDER BY I
OFFSET 20 ROWS --skips 20 rows
FETCH NEXT 10 ROWS ONLY --takes 10 rows
This shows only rows 21 to 30. Take care here that you need to sort the data, otherwise you may get different results every time.
See also here in the documentation.
Addendum: As in the possible duplicate link shown, your problem here is that there can't be a row with number 20 if there is no row with number 19. That's why the rownum-approach works to take only the first x records, but when you need to skip records you need a workaround by selecting the rownum in a subquery or using offset ... fetch
Example for a approach with using rownum (for lower oracle versions or whatever):
with testtab as (
select 'a' as "COL1" from dual
union all select 'b' from dual
union all select 'c' from dual
union all select 'd' from dual
union all select 'e' from dual
)
select * from
(select rownum as "ROWNR", testtab.* from testtab) tabWithRownum
where tabWithRownum.ROWNR > 2 and tabWithRownum.ROWNR < 4;
--returns only rownr 3, col1 'c'
Whenever you use rownum, it counts the rows that your query returns. SO if you are trying to filter by selecting all records between rownum 20 and 30, that is only 10 rows, so 20 and 30 dont exist. You can however, use WITH (whatever you want to name it) as and then wrap your query and rename your rownum column. This way you are selecting from your select. Example.
with T as (
select requestor, request_id, program, rownum as "ROW_NUM"
from fnd_conc_req_summary_v where recalc_parameters='N')
select * from T where row_num between 20 and 30;

Select Query by Pair of fields using an in clause

I have a table called players as follows:
First_Id Second_Id Name
1 1 Durant
2 1 Kobe
1 2 Lebron
2 2 Dwight
1 3 Dirk
I wish to write a select statement on this table to retrieve all rows whose first ids and second ids match a bunch of specified first and second ids.
So for example, I wish to select all rows whose first and second ids are as follows: (1,1), (1,2) and (1,3). This would retreive the following 3 rows:
First_Id Second_Id Name
1 1 Durant
1 2 Lebron
1 3 Dirk
Is it possible to write a select query in a manner such as:
SELECT *
FROM PLAYERS
WHERE (First_Id, Second_Id) IN ((1,1), (1,2) and (1,3))?
If there is a way to write the SQL similar to the above I would like to know. Is there a way to specify values for an IN clause that represents multiple rows as illustrated.
I'm using DB2.
This works on my DB2 (version 9.7 on Linux/Unix/Windows) by using this syntax:
SELECT *
FROM PLAYERS
WHERE (First_Id, Second_Id) IN (VALUES (1,1), (1,2), (1,3))
This syntax won't work on DB2 on the Mainframe (at least in version 9.1) because you can't substitute a sub-select with a VALUES expression. This syntax will work:
SELECT *
FROM PLAYERS
WHERE (First_Id, Second_Id) IN (SELECT 1, 1 FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 1, 2 FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 1, 3 FROM SYSIBM.SYSDUMMY1)
Here's a very similar solution in postgresql:
SELECT tmp_table.val1, tmp_table.val2
FROM tmp_table
WHERE (tmp_table.val1, tmp_table.val2) not in (select tmp_table2.val1, tmp_table2.val2 from tmp_table2);
With compound primary keys, I would concatenate the two ids and match compound strings.
select id1 + id2 as FullKey, *
from players
where FullKey in ('11','12','13')
(If ids are not strings, simply cast them as such.)
This syntax works in MySQL:
SELECT *
FROM PLAYERS
WHERE (First_Id, Second_Id) IN ((1,1), (1,2), (1,3))
SELECT * FROM <your table> where (<field1>, <field2>, ...) in (SELECT <field1>, <field2>, ... FROM <your table> where <your condition>)
This worked wonder for me.
This type of query works in DB2.
SELECT * FROM A
WHERE (C1, C2) IN (SELECT B1, B2 FROM B WHERE B3=1);
If data is needed in below order:
First_Id
Second_Id
Name
1
1
Durant
1
2
Lebron
1
3
Dirk
2
1
Kobe
2
2
Dwight
then simple and easy way would be:
select * from players order by First_id, second_id;
"where" clause can be used if any condition is needed on any of the column.

generate_series() equivalent in DB2

I'm trying to search the DB2 equivalent of generate_series() (the PostgreSQL-way of generating rows). I obviously don't want to hard-code the rows with a VALUES statement.
select * from generate_series(2,4);
generate_series
-----------------
2
3
4
(3 rows)
The where clause needs to be a bit more explicit about the bounds of the recursion in order for DB2 to suppress the warning. Here's a slightly adjusted version that does not trigger the warning:
with dummy(id) as (
select 2 from SYSIBM.SYSDUMMY1
union all
select id + 1 from dummy where id < 4
)
select id from dummy
I managed to write a recursive query that fits :
with dummy(id) as (
select 2 from SYSIBM.SYSDUMMY1
union all
select id + 1 from dummy where id + 1 between 2 and 4
)
select id from dummy
The query can be adapted to whatever for(;;) you can dream of.