select query respecting conditions - sql

i have my table containing 4 Columns (id, val1, val2, val3).
Does anyone knows how to select rows where val3 is the same where val1 is different.
for example
row1: (id1, user1, matheos, cvn)
row2: (id2, user2, matheos, cvn)
row3: (id3, user3, Claudia, bnps)
then i return the row1 and row2.

Your explanation is not entirely clear, but the following query will find matching rows according to the criteria you specified:
select a.*, b.*
from my_table a
join my_table b on b.val3 = a.val3
and b.val2 <> a.val2
and b.id < a.id
In order to produce the rows separately, you can also do:
select *
from my_table a
where exists (
select null from my_table b where b.val3 = a.val3 and b.val2 <> a.val2
)

Based on your explanation, you can try this:
select distinct t1.* from mytable t1
JOIN mytable t2 where t1.val3 = t2.val3
and t1.val1 != t2.val1;
Demo: SQL Fiddle

Related

efficient way to compare two tables in bigquery

I am interested in comparing, whether two tables contain the same data.
I could do it like this:
#standardSQL
SELECT
key1, key2
FROM
(
SELECT
table1.key1,
table1.key2,
table1.column1 - table2.column1 as col1,
table1.col2 - table2.col2 as col2
FROM
`table1` AS table1
LEFT JOIN
`table2` AS table2
ON
table1.key1 = table2.key1
AND
table1.key2 = table2.key2
)
WHERE
col1 != 0
OR
col2 != 0
But when I want to compare all numerical columns, this is kind of hard, especially if I want to do it for multiple table combinations.
Therefore my question: Is someone aware of a possibility to iterate over all numerical columns and restrict the result set to those keys where any of these differences where not zero?
In Standard SQL, we found using a UNION ALL of two EXCEPT DISTINCT's works for our use cases:
(
SELECT * FROM table1
EXCEPT DISTINCT
SELECT * from table2
)
UNION ALL
(
SELECT * FROM table2
EXCEPT DISTINCT
SELECT * from table1
)
This will produce differences in both directions:
rows in table1 that are not in table2
rows in table2 that are not in table1
Notes and caveats:
table1 and table2 must be of the same width and have columns in the same order and type.
this does not work directly with STRUCT or ARRAY data types. You should either UNNEST, or use TO_JSON_STRING to convert the these data types first.
this does not directly work with GEOGRAPHY either, you must cast to text first using ST_AsText
First, I want to bring up issues with your original query
The main issues are 1) using LEFT JOIN ; 2) using col != 0
Below is how it should be modified to really capture ALL differences from both tables
Run your original query and below one - and hopefully you will see the difference
#standardSQL
SELECT key1, key2
FROM
(
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2,
table1.column1 - table2.column1 AS col1,
table1.col2 - table2.col2 AS col2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
)
WHERE IFNULL(col1, 1) != 0
OR IFNULL(col2, 1) != 0
or you can just try to run your original and above version against dummy data to see the difference
#standardSQL
WITH `table1` AS (
SELECT 1 key1, 1 key2, 1 column1, 2 col2 UNION ALL
SELECT 2, 2, 3, 4 UNION ALL
SELECT 3, 3, 5, 6
), `table2` AS (
SELECT 1 key1, 1 key2, 1 column1, 29 col2 UNION ALL
SELECT 2, 2, 3, 4 UNION ALL
SELECT 4, 4, 7, 8
)
SELECT key1, key2
FROM
(
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2,
table1.column1 - table2.column1 AS col1,
table1.col2 - table2.col2 AS col2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
)
WHERE IFNULL(col1, 1) != 0
OR IFNULL(col2, 1) != 0
Secondly, below will highly simplify your overall query
#standardSQL
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
WHERE TO_JSON_STRING(table1) != TO_JSON_STRING(table2)
You can test it with the same dummy data example as above
Note: in this solution you don't need to pick specific columns - it just compare all columns! but if you need to compare only specific columns - you still will need to cherry-pick them like in below example
#standardSQL
SELECT
IFNULL(table1.key1, table2.key1) key1,
IFNULL(table1.key2, table2.key2) key2
FROM `table1` AS table1
FULL OUTER JOIN `table2` AS table2
ON table1.key1 = table2.key1
AND table1.key2 = table2.key2
WHERE TO_JSON_STRING((table1.column1, table1.col2)) != TO_JSON_STRING((table2.column1, table2.col2))
You will need to specify which are the numerical columns, but looking at a representation of all of them will do the fast compare:
#standardSQL
WITH table_a AS (
SELECT 1 id, 2 n1, 3 n2
), table_b AS (
SELECT 1 id, 2 n1, 4 n2
)
SELECT id
FROM table_a a
JOIN table_b b
USING(id)
WHERE TO_JSON_STRING([a.n1, a.n2]) != TO_JSON_STRING([b.n1, b.n2])

Nested query in Hive not working: ParesException

I want the entire query to run for each value returned by the sub-query in where clause. I am unable to figure out what i am doing wrong here. Please help?
SELECT a.*, b.*, c.*
FROM table1 a, table2 b, table3 c
WHERE a.val1 = ( select val1 from table1 )
AND a.val2 = b.val3
AND a.val4 = c.val5;
in instead of =
SELECT a.*, b.*, c.*
FROM table1 a, table2 b, table3 c
WHERE a.val1 in ( select val1 from table1 )
AND a.val2 = b.val3
AND a.val4 = c.val5;

SQL join with distinct column on one table

Maybe I'm searching using the wrong words because I can't find the answer elswhere, but I need to join two tables but make sure the ID from one of the tables is distinct. Something like the below:
SELECT B.COLUMN_A, B.COLUMN_B, B.COLUMN_C
FROM TABLE1 A
JOIN TABLE2 B
ON (Distinct) A.COLUMN_A = B.COLUMN_A;
The value A.COLUMN_A from TABLE1 needs to be DISTINCT.
I've tried the below but that didn't work:
SELECT B.COLUMN_A, B.COLUMN_B, B.COLUMN_C
FROM TABLE1 A
JOIN (SELECT DISTINCT COLUMN_A FROM TABLE2) B
ON A.COLUMN_A = B.COLUMN_A;
I keep getting a ORA-00904: invalid identifer error on B.COLUMN_C. If I try to use ) AS B then I get a ORA-00905: missing keyword error.
If you don't care about the other values, use group by
SELECT b.column_a, b.column_b, b.column_c
FROM table1 a
JOIN (
SELECT column_a, max(column_b) as column_b, max(column_c) as column_c
FROM table2
GROUP BY column_a
) b ON a.column_a = b.column_a
Use a ROW_NUMBER to get a single row per COLUMN_A:
SELECT *
FROM table1 A
JOIN
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY COLUMN_A ORDER BY COLUMN_A) AS rn
FROM table2
) B
ON A.column_a = B.column_a
AND B.rn = 1
Maybe you need something like this:
select * from
(
select column_a,column_b,column_c
from
(
select column_a,column_b,column_c, count(1) over (partition by column_a) as num
from tableB
)
where num = 1
)tB
inner join tableA
using (column_a)
The double nesting is not necessary, but I hope it makes the query more readable
If you need col_a, col_b, col_c and want to ensure col_a never repeats and col_b, col_c values are not germane, then :
SELECT col_a, col_b, col_c
FROM table2
WHERE rowid in ( SELECT min(rowid)
FROM table2 A , table1 B )
WHERE B.col_a = A.col_a
GROUP BY A.col_a )
In above you choose one distinct row of Table2 that is also present in Table1. Then using that row's id you select all three columns.
You are not selecting any of the columns from TABLE1, so your join to (distinct) TABLE1 records is really just a semi-join, which is most easily expressed as:
SELECT B.COLUMN_A, B.COLUMN_B, B.COLUMN_C
FROM TABLE2 B
WHERE EXISTS ( SELECT 'at least one row in table1'
FROM TABLE1 A
WHERE A.COLUMN_A = B.COLUMN_A );

SQL Server. Delete from Select

I am using SQL Server 2012, and have the following query. Let's call this query A.
SELECT a.col, a.fk
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col
I want to delete only the rows returned from query A, specifically rows that match the returned col AND fk
I am thinking of doing the following, but it will only delete rows that match on the col.
delete from Table1
where col in (
SELECT a.col
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col)
)
Use delete from Join syntax
delete t1
from table1 t1
INNER JOIN (SELECT a.col, a.fk
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col) t2
ON t1.col1=t2.col1 and t1.fk=t2.fk
you can combine col and fk fields to be another unique filed to retrieve wanted rows
delete from Table1
where cast(col as varchar(50))+'//'+cast(fk as varchar(50)) in (
SELECT cast(a.col as varchar(50))+'//'+cast(a.fk as varchar(50))
FROM Table1 a
INNER JOIN (
select b.col
from Table1 b
group by b.col
having count(*) > 1)
b on b.col = a.col)
)
You can express Query A like this:
SELECT col, fk
FROM (
SELECT a.col, a.fk, COUNT(*) OVER (PARTITION BY a.col) AS [count]
FROM Table1 a
) counted
WHERE [count] > 1
Which leads to a nice way to do the DELETE using a CTE:
;WITH ToDelete AS (
SELECT a.col, a.fk, COUNT(*) OVER (PARTITION BY a.col) AS [count]
FROM Table1 a
)
DELETE FROM ToDelete
WHERE [count] > 1
This does give the same result as the DELETE statement in your question though.
If you want to delete all but one row with the duplicate col value you can use something like this:
;WITH ToDelete AS (
SELECT a.col, a.fk
, ROW_NUMBER() OVER (PARTITION BY a.col ORDER BY a.fk) AS [occurance]
FROM Table1 a
)
DELETE FROM ToDelete
WHERE [occurance] > 1
The ORDER BY clause will determine which row is kept.

SQL query to find ids where field1 = x and field1 = y

I have this table (simplified)
ID | Field1
---------------------------------
2 | Cat
2 | Goat
6 | Cat
6 | Dog
I need to find the IDs where a record exists whose value for field1 is cat and for the same id, another record exists whose value is Goat. In this case, it would only return ID 2.
Doing something like below will not work.
where Field1='Cat' and Field1='Goat'
I'm guessing I need some sort of subquery here? I'm not entirely sure. (Using SQL Server 2008)
Use:
SELECT t.id
FROM YOUR_TABLE t
WHERE t.field1 IN ('Cat', 'Goat')
GROUP BY t.id
HAVING COUNT(DISTINCT t.field1) = 2
The key here is using COUNT (DISTINCT ...) to get a count of the field1 values. Doesn't matter if a user has Cat 3x, and dog 1x... unless you need to filter those out?
Another option is INTERSECT, which returns any distinct values that are returned by both the query on the left and right sides of the INTERSECT operand:
SELECT a.id
FROM YOUR_TABLE a
WHERE a.field1 = 'Cat'
INTERSECT
SELECT b.id
FROM YOUR_TABLE b
WHERE b.field1 = 'Goat'
Try this:
SELECT id FROM
(
SELECT id FROM <YOUR_TABLE> WHERE Field1 = 'Cat'
INTERSECT
SELECT id FROM <YOUR_TABLE> WHERE Field1 = 'Goat'
) a
Alternative:
SELECT a.ID
FROM <YOUR_TABLE> a INNER JOIN <YOUR_TABLE> b
ON a.ID = b.ID
WHERE a.Field1 = 'CAT'
AND b.Field1 = 'GOAT'
Use a query like this
SELECT ID FROM table INNER JOIN
(SELECT ID, COUNT(FILED1) AS Expr1
FROM table GROUP BY ID
HAVING COUNT(FIELD1) > 1) SR ON table.ID = SR.ID WHERE table.FIELD1 = 'Cat'
So you just have to put a variable with a DECLARE for the 'Cat' if you want to have a more dynamic query
SELECT DISTINCT t1.ID
FROM table t1, table t2
WHERE t1.ID=t2.ID AND t1.Field1 <> t2.Field1
Not tested but something like this might work
select t1.ID from tbl t1 inner join tbl t2 on t1.ID=t2.ID
where (t1.Field1='Cat' and t2.Field1='Goat')