I need to find the all unique possible combination of values in a column in a table. For example, for column values 1,2,3,4,5. i want the result to be [1,2],[1,3],[1,4],[1,5],[2,1],[2,3] etc.
Will appreciate any pointers to construct the query to find the combination of the values.
thanks
You can do a cross join in BigQuery by using a subselect that adds a constant key value, then joining on that constant value.
For example, here is a query that will compute the cross join of {1, 2, 3} and {2, 4, 6}:
SELECT t1.num as first, t2.num as second
FROM (
SELECT num, 1 as key
FROM (
SELECT 1 as num), (
SELECT 2 as num), (
SELECT 3 as num)) as t1
JOIN (
SELECT num, 1 as key
FROM (
SELECT 2 as num), (
SELECT 4 as num), (
SELECT 6 as num)) as t2
ON t1.key = t2.key
WHERE t1.num <> t2.num
Note this uses a BigQuery "trick" to create the two input tables. If you were just doing this with an existing table, it would look like:
SELECT t1.num as first, t2.num as second
FROM (
SELECT foo as num, 1 as key
FROM [my_dataset.my_table]) as t1
JOIN (
SELECT foo as num, 1 as key
FROM [my_dataset.my_table]) as t2
ON t1.key = t2.key
WHERE t1.num <> t2.num
A cross join might be usefull.
See this demo: http://www.sqlfiddle.com/#!12/59af5/1
The ANSI SQL syntax uses a CROSS JOIN operator:
create table val( x int );
insert into val values(1),(2),(3),(4),(5);
SELECT a.x a, b.x b
FROM val a
CROSS JOIN val b
WHERE a.x <> b.x
ORDER BY a,b;
Another form of this query without CROSS JOIN should work on most DBMS system, but ANSI form is recommended for clearness:
SELECT a.x a, b.x b
FROM val a, val b
WHERE a.x <> b.x
ORDER BY a,b;
Beware that the cross join for large datasets can kill your database performance, for 100 values it generates 100x100 = 10.000 rows, for 1000 --> 1.000.000 rows.
Related
I have been working with some big data in SQL/BigQuery and found that it has some holes in it that need to be filled with values in order to complete the dataset. What I'm struggling with is how to insert the missing values properly.
Say that I have multiple levels of a variable (1, 2, 3... no upper bound) and for each of these levels, they should have an A, B, C value. Some of these records will have data, others will not.
Current dataset:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 C 2c_data
3 B 3b_data
What I want the dataset to look like:
level value data
1 A 1a_data
1 B 1b_data
1 C 1c_data
2 A 2a_data
2 B NULL
2 C 2c_data
3 A NULL
3 B 3b_data
3 C NULL
What would be the best way to do this?
You need a CROSS join of the distinct levels with the distinct values and a LEFT join to the table:
SELECT l.level, v.value, t.data
FROM (SELECT DISTINCT level FROM tablename) l
CROSS JOIN (SELECT DISTINCT value FROM tablename) v
LEFT JOIN tablename t ON t.level = l.level AND t.value = v.value
ORDER BY l.level, v.value;
See the demo.
We can use an INSERT INTO ... SELECT with the help of a calendar table:
INSERT INTO yourTable (level, value, data)
SELECT t1.level, t2.value, NULL
FROM (SELECT DISTINCT level FROM yourTable) t1
CROSS JOIN (SELECT DISTINCT value FROM yourTable) t2
LEFT JOIN yourTable t3
ON t3.level = t1.level AND
t3.value = t2.value
WHERE t3.data IS NULL;
I am looking for an answer which is actually
Is It possible to rewrite every Join to equivalent Subquery
I know that Subquery columns can not be selected outer query.
I run a query in sql server which is
select DISTINct A.*,B.ParentProductCategoryID from [SalesLT].[Product] as
A inner join [SalesLT].[ProductCategory] as B on
A.ProductCategoryID=B.ProductCategoryID
select A.*
from [SalesLT].[Product] as A
where EXISTS(select B.ParentProductCategoryID from [SalesLT].
[ProductCategory] as B where A.ProductCategoryID=B.ProductCategoryID)
Both of these query giving me output 293 rows which I expected.
Now Problem is How do I select [SalesLT].[ProductCategory] the column in the 2nd case?
Do I need to co-relate this subquery in the select clause to get this column to be shown in output?
Is It possible to rewrite every Join to equivalent Subquery
No, because joins can 1) remove rows or 2) multiply rows
ex 1)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3)
SELECT * FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
The row containing value 1 from t1 was removed. This does not happen in a subquery.
ex 2)
CREATE TABLE t1 (num int)
CREATE TABLE t2 (num int)
INSERT INTO t1 VALUES (1), (2), (3)
INSERT INTO t2 VALUES (2) ,(3), (3), (3), (3)
SELECT t1.num AS t1num, t2.num as t2num FROM t1 INNER JOIN t2 ON t1.num = t2.num
Gives output
t1num t2num
2 2
3 3
3 3
3 3
3 3
A subquery would not change the number of rows in the table being queried.
In your example, you do an exists... this is not going to return the value from the 2nd table.
This is how I would subquery:
select A.*
,(SELECT B.ParentProductCategoryID
FROM [SalesLT].[ProductCategory] B
WHERE B.ProductCategoryID = A.ProductCategoryID) AS [2nd table ProductCategoryID]
from [SalesLT].[Product] as A
You might use
select A.*,
(
select B.ParentProductCategoryID
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID
) ParentProductCategoryID
from [SalesLT].[Product] as A
where EXISTS(select 1
from [SalesLT].[ProductCategory] as B
where A.ProductCategoryID=B.ProductCategoryID)
however, I find the JOIN version much more intuitive.
There is no way for you to use any data from the EXISTS subquery in the outer query. The only purpose of the subquery is to evaluate whether the EXISTS is true or false for each product.
I have tables
table1
col1 col2
a b
c d
and table2
mycol1 mycol2
e f
g h
i j
k l
I want to combine the two tables, which have no common field into one table looking like:
table 3
col1 col2 mycol1 mycol2
a b e f
c d g h
null null i j
null null k l
ie, it is like putting the two tables side by side.
I'm stuck! Please help!
Get a row number for each row in each table, then do a full join using those row numbers:
WITH CTE1 AS
(
SELECT ROW_NUMBER() OVER(ORDER BY col1) AS ROWNUM, * FROM Table1
),
CTE2 AS
(
SELECT ROW_NUMBER() OVER (ORDER BY mycol1) AS ROWNUM, * FROM Table2
)
SELECT col1, col2, mycol1, mycol2
FROM CTE1 FULL JOIN CTE2 ON CTE1.ROWNUM = CTE2.ROWNUM
This is assuming SQL Server >= 2005.
It's really good if you put in a description of why this problem needs to be solved. I'm guessing it is just to practice sql syntax?
Anyway, since the rows don't have anything connecting them, we have to create a connection. I chose the ordering of their values. Also since they have nothing connecting them that also begs the question on why you would want to put them next to each other in the first place.
Here is the complete solution: http://sqlfiddle.com/#!6/67e4c/1
The select code looks like this:
WITH rankedt1 AS
(
SELECT col1
,col2
,row_number() OVER (order by col1,col2) AS rn1
FROM table1
)
,rankedt2 AS
(
SELECT mycol1
,mycol2
,row_number() OVER (order by mycol1,mycol2) AS rn2
FROM table2
)
SELECT
col1,col2,mycol1,mycol2
FROM rankedt1
FULL OUTER JOIN rankedt2
ON rn1=rn2
Option 1: Single Query
You have to join the two tables, and if you want each row in table1 to match to only one row in table2, you have to restrict the join somehow. Calculate row numbers in each table and join on that column. Row numbers are database-specific; here is a solution for mysql:
SELECT
t1.col1, t1.col2, t2.mycol1, t2.mycol2
FROM
(SELECT col1, col2, #t1_row := t1_row + 1 AS rownum FROM table1, (SELECT #t1_row := 0) AS r1) AS t1
LEFT JOIN
(SELECT mycol1, mycol2, #t2_row := t2_row + 1 AS rownum FROM table2, (SELECT #t2_row := 0) AS r2) AS t2
ON t1.rownum = t2.rownum;
This assumes table1 is longer than table2; if table2 is longer, either use RIGHT JOIN or switch the order of the t1 and t2 sub-selects. Also note that you can specify the order of each table separately using an ORDER BY clause in the sub-selects.
(See select increment counter in mysql)
Option 2: Post-processing
Consider making two selects, and then concatenating the results with your favorite scripting language. This is a much more reasonable approach.
I have this query in oracle:
select * from table where col2 in (1,2,3,4);
lets say I got this result
col1 | col2
-----------
a 1
b 2
My 'in (1,2,3,4)' part has like 20 or more options, how can I determinate which values I don't found in my table? in my example 3 and 4 doesn't exist in the table
You can't in the way you want.
You need to insert the values you want to find into a table and than select all the values which don't exist in the desired table.
Lets say the data you want to find is in A and you want to know which doesn't exist in B.
SELECT *
FROM table_a A
WHERE NOT EXISTS (SELECT *
FROM table_b B
WHERE B.col1 = A.col1);
IN lists are stupid, or at least not very useful. Use a SQL Type collection to store your values instead because we can turn them into tables.
In this example I'm using the obscure SYS.KU$_OBJNUMSET type, which is the only nested table of Number I know of on 10g. (There's lots more in 11g).
So
select t.column_value
from table ( SYS.KU$_OBJNUMSET (1,2,3,4) ) t
left join your_table
on col2 = t.column_value
where col2 is null;
Here would be a way to do it if you're just using integers for your specific example:
SELECT *
FROM (
Select Rownum r
From dual
Connect By Rownum IN (1,2,3,4)
) T
LEFT JOIN YourTable T2 ON T.r = T2.Col2
WHERE T2.Col2 IS NULL
And the Fiddle.
This creates a table out of your where criteria 1,2,3,4 and uses that to LEFT JOIN on.
--EDIT
Because values aren't ints, here is another "ugly" option:
SELECT *
FROM (
Select 'a' r From dual UNION
Select 'b' r From dual UNION
Select 'c' r From dual UNION
Select 'd' r From dual
) T
LEFT JOIN YourTable T2 ON T.r = T2.Col2
WHERE T2.Col2 IS NULL
http://www.sqlfiddle.com/#!4/5e769/2
Good luck.
I have two tables with binding primary key in database and I desire to find a disjoint set between them. For example,
Table1 has columns (ID, Name) and sample data: (1 ,John), (2, Peter), (3, Mary)
Table2 has columns (ID, Address) and sample data: (1, address2), (2, address2)
So how do I create a SQL query so I can fetch the row with ID from table1 that is not in table2. In this case, (3, Mary) should be returned?
PS: The ID is the primary key for those two tables.
Try this
SELECT ID, Name
FROM Table1
WHERE ID NOT IN (SELECT ID FROM Table2)
Use LEFT JOIN
SELECT a.*
FROM table1 a
LEFT JOIN table2 b
on a.ID = b.ID
WHERE b.id IS NULL
There are basically 3 approaches to that: not exists, not in and left join / is null.
LEFT JOIN with IS NULL
SELECT l.*
FROM t_left l
LEFT JOIN
t_right r
ON r.value = l.value
WHERE r.value IS NULL
NOT IN
SELECT l.*
FROM t_left l
WHERE l.value NOT IN
(
SELECT value
FROM t_right r
)
NOT EXISTS
SELECT l.*
FROM t_left l
WHERE NOT EXISTS
(
SELECT NULL
FROM t_right r
WHERE r.value = l.value
)
Which one is better? The answer to this question might be better to be broken down to major specific RDBMS vendors. Generally speaking, one should avoid using select ... where ... in (select...) when the magnitude of number of records in the sub-query is unknown. Some vendors might limit the size. Oracle, for example, has a limit of 1,000. Best thing to do is to try all three and show the execution plan.
Specifically form PostgreSQL, execution plan of NOT EXISTS and LEFT JOIN / IS NULL are the same. I personally prefer the NOT EXISTS option because it shows better the intent. After all the semantic is that you want to find records in A that its pk do not exist in B.
Old but still gold, specific to PostgreSQL though: https://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
Fast Alternative
I ran some tests (on postgres 9.5) using two tables with ~2M rows each. This query below performed at least 5* better than the other queries proposed:
-- Count
SELECT count(*) FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2;
-- Get full row
SELECT table1.* FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2 JOIN table1 ON t1_not_in_t2.id=table1.id;
Keeping in mind the points made in #John Woo's comment/link above, this is how I typically would handle it:
SELECT t1.ID, t1.Name
FROM Table1 t1
WHERE NOT EXISTS (
SELECT TOP 1 NULL
FROM Table2 t2
WHERE t1.ID = t2.ID
)
SELECT COUNT(ID) FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For count
SELECT ID FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For results