Finding unique combinations of columns - sql

I'm trying to write a select query but am having trouble, probably because I'm not familiar with SQL Server (usually use MySQL).
Basically what I need to do is find the number of unique combinations of 2 columns, one a Varchar and one a Double.
There are less rows in one than another, so I've been trying to figure out the right way to do this.
Essentially pretend Table.Varchar has in it:
Table.Varchar
--------------
apple
orange
and Table.Float has in it:
Table.Float
--------------
1
2
3.
How could I write a query which returns
QueryResult
-------------
apple1
apple2
apple3
orange1
orange2
orange3
Long day at work and I think I'm just overthinking this what I've tried so far is to concat the two columns and then count but it's not working. Any ideas to better go about this?

Select T1.VarcharField + CAST(T2.FloatField as Varchar(10)) as [Concat]
from Table.Varchar T1
CROSS JOIN Table.Float T2
this way, you are generating the fields
so, then group by and use Count
select T.Concat, count(*) from
(Select T1.VarcharField + CAST(T2.FloatField as Varchar(10)) as [Concat]
from Table.Varchar T1
CROSS JOIN Table.Float T2) T
group by T.Concat order by count(*) asc

If they are in the same table:
SELECT a.Field1, b.Field2
FROM [Table] a
CROSS JOIN [Table] b
or if they are in seperate tables:
SELECT a.Field1, b.Field2
FROM [Table1] a
CROSS JOIN [Table2] b
Keep in mind that the above queries will match ALL records from the first table with ALL records from the second table, creating a cartesian product.

This will eliminate duplicates:
DECLARE #Varchar TABLE(v VARCHAR(32));
DECLARE #Float TABLE(f FLOAT);
INSERT #Varchar SELECT 'apple'
UNION ALL SELECT 'orange'
UNION ALL SELECT 'apple';
INSERT #Float SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3;
SELECT v.v + CONVERT(VARCHAR(12), f.f)
FROM #Varchar AS v
CROSS JOIN #Float AS f
GROUP BY v.v, f.f;

A cross join is a join where each record in one table is combined with each record of the other table. Select the distinct values from the table and join them.
select x.Varchar, y.Float
from (select distinct Varchar from theTable) x
cross join (select distinct Float from theTable) y
To find the number of combinations you don't have to actually return all combinations, just count them.
select
(select count(distinct Varchar) from theTable) *
(select count(distinct Float) from theTable)

Try This
Possible Cominations.
SELECT
DISTINCT T1.VarField+CONVERT(VARCHAR(12),T2.FtField) --Get Unique Combinations
FROM Table1 T1 CROSS JOIN Table2 T2 --From all possible combinations
WHERE T1.VarField IS NOT NULL AND T2.FtField IS NOT NULL --Making code NULL Proof
and to just get the Possible Cominations Count
SELECT Count(DISTINCT T1.VarcharField + CONVERT(VARCHAR(12), T2.FloatField))
FROM Table1 T1
CROSS JOIN Table2 T2
WHERE T1.VarcharField IS NOT NULL AND T2.FloatField IS NOT NULL

Related

How can you figure out if Column A contains something from Column B?

I've been trying to figure out a way to grab information from Table A Column A compared to Table B Column A, for example:
TableA
Name
abcd_1234_efgh
zxcdde_gets_3214_
jkil_uelso_5555_aseil
uuuu_kkkk_iiii_3333
TableB
ID
1234
3214
5555
3333
I've tried doing an INNER JOIN from Table A to Table B then doing a WHERE TableA.A LIKE TableB.B, but I think I'm missing a section to make it work.
SELECT
a.Name,
b.ID
FROM
TableA a
INNER JOIN
TableB b
ON
a.Name LIKE CAST(b.ID AS STRING)
The result I want from it is:
Name ID
abcd_1234_efgh 1234
zxcdde_gets_3214_ 3214
jkil_uelso_5555_aseil 5555
uuuu_kkkk_iiii_3333 3333
But currently I'm getting nothing as a result. I believe I'm missing something or might be thinking of the wrong way to go about getting the result needed. Any help would be greatly appreciated!
-Maykid
You are close. I think this will work in BigQuery:
SELECT a.Name, b.ID
FROM TableA a INNER JOIN
TableB b
ON a.Name LIKE CONCAT('%', CAST(b.ID AS STRING), '%');
But you may really want:
SELECT a.Name, b.ID
FROM TableA a CROSS JOIN
UNNEST(SPLIT(a.Name, '_')) namepart JOIN
TableB b
ON namepart = CAST(b.ID AS STRING);
This looks like each part of the name separately and allows BigQuery to do an equality join -- which should be more scalable.
Below is for BigQuery Standard SQL
#standardSQL
SELECT *
FROM `project.dataset.tableA`
CROSS JOIN `project.dataset.tableB`
WHERE REGEXP_CONTAINS(Name, id)
you can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.tableA` AS (
SELECT 'abcd_1234_efgh' Name UNION ALL
SELECT 'zxcdde_gets_3214_' UNION ALL
SELECT 'jkil_uelso_5555_aseil' UNION ALL
SELECT 'uuuu_kkkk_iiii_3333'
), `project.dataset.tableB` AS (
SELECT '1234' id UNION ALL
SELECT '3214' UNION ALL
SELECT '5555' UNION ALL
SELECT '3333'
)
SELECT *
FROM `project.dataset.tableA`
CROSS JOIN `project.dataset.tableB`
WHERE REGEXP_CONTAINS(Name, id)
with result
Row Name id
1 abcd_1234_efgh 1234
2 zxcdde_gets_3214_ 3214
3 jkil_uelso_5555_aseil 5555
4 uuuu_kkkk_iiii_3333 3333
Note: using REGEXP_CONTAINS gives you quite a power of regular expressions but it is a little expensive so instead you can use STRPOS() as in example below
#standardSQL
SELECT *
FROM `project.dataset.tableA`
CROSS JOIN `project.dataset.tableB`
WHERE STRPOS(Name, id) > 0
Quick Update:
I just realised that id is not a STRING but rather INT in your question - so:
REGEXP_CONTAINS(Name, id) should be replace with REGEXP_CONTAINS(Name, CAST(id AS STRING))
and same for STRPOS(Name, id)
Given your data structure, maybe something like this helps (note the application of SAFE_CAST):
select name, c, t2.number from (
select t1.name, split(t1.name, "_") splitted from TableA t1
), unnest(splitted) c
left join TableB t2 on t2.number = SAFE_CAST(c as int64)
where number is not null

missing rows, how to select values

How to accomplish this: I have bunch of numbers (for example: 2342423; 34443123; 3523423) and some of them are in my database table as primary key value. I want to select only those numbers, which are not in my table. What is the best way to do this?
If it is just a few numbers you can do
select tmp.num
from
(
select 2342423 as num
union all
select 34443123
union all
select 3523423
) tmp
left join your_table t on t.id = tmp.num
where t.id is null
If it is more than a few numbers you should insert these into a table and left join against that table like this
select twntc.num
from table_with_numbers_to_check twntc
left join your_table t on t.id = twntc.num
where t.id is null

How to join several unrelated tables

I have five queries and each of them will return me single column multiple row output. I want to to write a function which will contain all of these queries.
Can anyone help?
query 1:
Select Col1 as X from Table1;
query 2:
Select Col3 as Y from Table2;
From a function I want to get a table which will have columns
X, Y
How to club these queries under single function?
Add a ROW_NUMBER() to each of the queries and join them by the row number.
Depending on number of rows returned by each of the query you'd join then by inner, left or full join.
Example below assumes that two queries return the same number of rows.
WITH
CTE1
AS
(
SELECT Col1 as X, ROW_NUMBER() OVER(ORDER BY Col1) AS rn
FROM Table1
)
,CTE2
AS
(
SELECT Col3 as Y, ROW_NUMBER() OVER(ORDER BY Col3) AS rn
FROM Table2
)
SELECT
CTE1.X, CTE2.Y
FROM
CTE1
INNER JOIN CTE2 ON CTE1.rn = CTE2.rn
Use the UNION operator:
SELECT
column_1
FROM
tbl_name_1
UNION ALL
SELECT
column_1
FROM
tbl_name_2;
If there is a relation between the two tables, try using a join.
Maybe a simple inner join would be possible here?
select Col1 as X from Table1
join
on Table1.Col1_name = Table2.col3_name

SQL query to find record with ID not in another table

I have two tables with binding primary key in database and I desire to find a disjoint set between them. For example,
Table1 has columns (ID, Name) and sample data: (1 ,John), (2, Peter), (3, Mary)
Table2 has columns (ID, Address) and sample data: (1, address2), (2, address2)
So how do I create a SQL query so I can fetch the row with ID from table1 that is not in table2. In this case, (3, Mary) should be returned?
PS: The ID is the primary key for those two tables.
Try this
SELECT ID, Name
FROM Table1
WHERE ID NOT IN (SELECT ID FROM Table2)
Use LEFT JOIN
SELECT a.*
FROM table1 a
LEFT JOIN table2 b
on a.ID = b.ID
WHERE b.id IS NULL
There are basically 3 approaches to that: not exists, not in and left join / is null.
LEFT JOIN with IS NULL
SELECT l.*
FROM t_left l
LEFT JOIN
t_right r
ON r.value = l.value
WHERE r.value IS NULL
NOT IN
SELECT l.*
FROM t_left l
WHERE l.value NOT IN
(
SELECT value
FROM t_right r
)
NOT EXISTS
SELECT l.*
FROM t_left l
WHERE NOT EXISTS
(
SELECT NULL
FROM t_right r
WHERE r.value = l.value
)
Which one is better? The answer to this question might be better to be broken down to major specific RDBMS vendors. Generally speaking, one should avoid using select ... where ... in (select...) when the magnitude of number of records in the sub-query is unknown. Some vendors might limit the size. Oracle, for example, has a limit of 1,000. Best thing to do is to try all three and show the execution plan.
Specifically form PostgreSQL, execution plan of NOT EXISTS and LEFT JOIN / IS NULL are the same. I personally prefer the NOT EXISTS option because it shows better the intent. After all the semantic is that you want to find records in A that its pk do not exist in B.
Old but still gold, specific to PostgreSQL though: https://explainextended.com/2009/09/16/not-in-vs-not-exists-vs-left-join-is-null-postgresql/
Fast Alternative
I ran some tests (on postgres 9.5) using two tables with ~2M rows each. This query below performed at least 5* better than the other queries proposed:
-- Count
SELECT count(*) FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2;
-- Get full row
SELECT table1.* FROM (
(SELECT id FROM table1) EXCEPT (SELECT id FROM table2)
) t1_not_in_t2 JOIN table1 ON t1_not_in_t2.id=table1.id;
Keeping in mind the points made in #John Woo's comment/link above, this is how I typically would handle it:
SELECT t1.ID, t1.Name
FROM Table1 t1
WHERE NOT EXISTS (
SELECT TOP 1 NULL
FROM Table2 t2
WHERE t1.ID = t2.ID
)
SELECT COUNT(ID) FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For count
SELECT ID FROM tblA a
WHERE a.ID NOT IN (SELECT b.ID FROM tblB b) --For results

SQL Inner Join On Null Values

I have a Join
SELECT * FROM Y
INNER JOIN X ON ISNULL(X.QID, 0) = ISNULL(y.QID, 0)
Isnull in a Join like this makes it slow. It's like having a conditional Join.
Is there any work around to something like this?
I have a lot of records where QID is Null
Anyone have a work around that doesn't entail modifying the data
You have two options
INNER JOIN x
ON x.qid = y.qid OR (x.qid IS NULL AND y.qid IS NULL)
or easier
INNER JOIN x
ON x.qid IS NOT DISTINCT FROM y.qid
If you want null values to be included from Y.QID then Fastest way is
SELECT * FROM Y
LEFT JOIN X ON y.QID = X.QID
Note: this solution is applicable only if you need null values from Left table i.e. Y (in above case).
Otherwise
INNER JOIN x ON x.qid IS NOT DISTINCT FROM y.qid
is right way to do
This article has a good discussion on this issue. You can use
SELECT *
FROM Y
INNER JOIN X ON EXISTS(SELECT X.QID
INTERSECT
SELECT y.QID);
Are you committed to using the Inner join syntax?
If not you could use this alternative syntax:
SELECT *
FROM Y,X
WHERE (X.QID=Y.QID) or (X.QUID is null and Y.QUID is null)
I'm pretty sure that the join doesn't even do what you want. If there are 100 records in table a with a null qid and 100 records in table b with a null qid, then the join as written should make a cross join and give 10,000 results for those records. If you look at the following code and run the examples, I think that the last one is probably more the result set you intended:
create table #test1 (id int identity, qid int)
create table #test2 (id int identity, qid int)
Insert #test1 (qid)
select null
union all
select null
union all
select 1
union all
select 2
union all
select null
Insert #test2 (qid)
select null
union all
select null
union all
select 1
union all
select 3
union all
select null
select * from #test2 t2
join #test1 t1 on t2.qid = t1.qid
select * from #test2 t2
join #test1 t1 on isnull(t2.qid, 0) = isnull(t1.qid, 0)
select * from #test2 t2
join #test1 t1 on
t1.qid = t2.qid OR ( t1.qid IS NULL AND t2.qid IS NULL )
select t2.id, t2.qid, t1.id, t1.qid from #test2 t2
join #test1 t1 on t2.qid = t1.qid
union all
select null, null,id, qid from #test1 where qid is null
union all
select id, qid, null, null from #test2 where qid is null
Hey it is kind of late to answer that but I got the same question, what I realized is that you must have a record with the ID of 0 in you second table to make this :
SELECT * FROM Y
INNER JOIN X ON ISNULL(Y.QID, 0) = ISNULL(X.QID, 0)
to happen, it actually says if there is none, then use 0. BUT what if Y table does NOT have a record with the ID of 0?
So, I found this method, (and worked for my case):
SELECT
ISNULL(Y.QName, 'ThereIsNone') AS YTableQName
FROM
X
LEFT OUTER JOIN Y ON X.QID = Y.QID
A snapshot of my case
This way you DON'T need a record with 0 ID value in your second table (which is Y in this case and Customers in my case), OR any record at all
UPDATE:
You can also take a look at this post for better understanding.
Basically you want to join two tables together where their QID columns are both not null, correct? However, you aren't enforcing any other conditions, such as that the two QID values (which seems strange to me, but ok). Something as simple as the following (tested in MySQL) seems to do what you want:
SELECT * FROM `Y` INNER JOIN `X` ON (`Y`.`QID` IS NOT NULL AND `X`.`QID` IS NOT NULL);
This gives you every non-null row in Y joined to every non-null row in X.
Update: Rico says he also wants the rows with NULL values, why not just:
SELECT * FROM `Y` INNER JOIN `X`;
You could also use the coalesce function. I tested this in PostgreSQL, but it should also work for MySQL or MS SQL server.
INNER JOIN x ON coalesce(x.qid, -1) = coalesce(y.qid, -1)
This will replace NULL with -1 before evaluating it. Hence there must be no -1 in qid.