Query a table that have 2 cols with multiple criteria - sql

I have a table with the following structure and Example data:
Now I want to query the records that have value equals to # and #.
For example according to the above image, It should returns 1 and 2
id
-----
1
2
Also if the parameters were #, # and $ It should give us 1. Because only the records with id 1 have all the given values.
id
-----
1

You can use a group by and having to get the distinct Id's that contain a distinct count of the number of items you're looking for
SELECT Id
FROM Table
WHERE Value IN ('#','$')
GROUP BY Id
HAVING COUNT(DISTINCT Value) = 2
SELECT Id
FROM Table
WHERE Value IN ('#','$','#')
GROUP BY Id
HAVING COUNT(DISTINCT Value) = 3
SQL Fiddle you can use this link to test

There's several ways to do this.
The subquery method:
SELECT DISTINCT Id
FROM Table
WHERE Id IN (SELECT Id FROM Table WHERE Value = '#')
AND Id IN (SELECT Id FROM Table WHERE Value = '#');
The correlated subquery method:
SELECT DISTINCT t.Id
FROM Table t
WHERE EXISTS (SELECT 1 FROM Table a WHERE a.Id = t.Id and a.Value = '#')
AND EXISTS (SELECT 1 FROM Table b WHERE b.Id = t.Id and b.Value = '#');
And the INTERSECT method:
SELECT Id FROM Table WHERE Value = '#'
INTERSECT
SELECT Id FROM Table WHERE Value = '#';
Best performance will depend on RDBMS vendor, size of table, and indexes. Not all RDBMS vendors support all methods.

Maybe a multiple self join like this?
select
distinct t1.id
from
table t1
join table t2 on (t1.id=t2.id)
join table t3 on (t1.id=t3.id)
...
where
t1.value='#' and
t2.value='#' and
t3.value='$' and
...

Related

How to compare two tables in Hive based on counts

I have below hive tables
Table_1
ID
1
1
2
Table_2
ID
1
2
2
I am comparing two tables based on count of ID in both tables, I need the output like below
ID
1 - 2records in table 1 and 1 record in Table 2
2 - one record in Table 1 and 2 records in table 2
Table_1 is parent table
i am using below query
select count(*),ID from Table_1 group by ID;
select count(*),ID from Table_2 group by ID;
Just do a full outer join on your queries with the on condition as X.id = Y.id, and then select * from the resultant table checking for nulls on either side.
Select id, concat(cnt1, " entries in table 1, ",cnt2, "entries in table 2") from (select * from (select count(*) as cnt1, id from table1 group by id) X full outer join (select count(*) as cnt2, id from table2 group by id)
on X.id=Y.id
)
Try This. You may use a case statement to check if it should be record / records etc.
SELECT m.id,
CONCAT (COALESCE(a.ct, 0), ' record in table 1, ', COALESCE(b.ct, 0),
' record in table 2')
FROM (SELECT id
FROM table_1
UNION
SELECT id
FROM table_2) m
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_1
GROUP BY id) a
ON m.id = a.id
LEFT JOIN (SELECT Count(*) AS ct,
id
FROM table_2
GROUP BY id) b
ON m.id = b.id;
You could use this Python program to do a full comparison of 2 Hive tables:
https://github.com/bolcom/hive_compared_bq
If you want a quick comparison just based on counts, then pass the "--just-count" option (you can also specify the group by column with "--group-by-column").
The script also allows you to visually see all the differences on all rows and all columns if you want a complete validation.

Join table on Count

I have two tables in Access, one containing IDs (not unique) and some Name and one containing IDs (not unique) and Location. I would like to return a third table that contains only the IDs of the elements that appear more than 1 time in either Names or Location.
Table 1
ID Name
1 Max
1 Bob
2 Jack
Table 2
ID Location
1 A
2 B
Basically in this setup it should return only ID 1 because 1 appears twice in Table 1 :
ID
1
I have tried to do a JOIN on the tables and then apply a COUNT but nothing came out.
Thanks in advance!
Here is one method that I think will work in MS Access:
(select id
from table1
group by id
having count(*) > 1
) union -- note: NOT union all
(select id
from table2
group by id
having count(*) > 1
);
MS Access does not allow union/union all in the from clause. Nor does it support full outer join. Note that the union will remove duplicates.
Simple Group By and Having clause should help you
select ID
From Table1
Group by ID
having count(1)>1
union
select ID
From Table2
Group by ID
having count(1)>1
Based on your description, you do not need to join tables to find duplicate records, if your table is what you gave above, simply use:
With A
as
(
select ID,count(*) as Times From table group by ID
)
select * From A where A.Times>1
Not sure I understand what query you already tried, but this should work:
select table1.ID
from table1 inner join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1
Or if you have ID's in one table but not the other
select table1.ID
from table1 full outer join table2 on table1.id = table2.id
group by table1.ID
having count(*) > 1

sql combining count with other fields

Consider a scenario:
id name info done
-----------------------
1 abc x 0
2 abc y 1 <-- I have this id
3 pqr g 1
4 pqr h 0
5 pqr i 1 <-- I have this id
I have id for the last entry of every name.
The result I'm expecting consists of 2 things:
info for last entry of the name
number of done [having value 1] for that name
(1) can be easily achieved by select info from table where id = myid
But how can (2) be achieved in the same query? Can it be achieved in the same query?
Something like
select info, count(done) from table where id = myid group by name where ......
This is a bit complicated, but can be done using conditional aggregation:
select max(case when t.id = myid then info end), sum(done)
from table t
where t.name = (select name from table t2 where t2.id = myid);
The key is getting all the rows for the given name.
If you had multiple columns, then a correlated subquery might be the way to go:
select t.*,
(select sum(t2.done) from table t2 where t2.name = t.name) as numdone
from table t
where t.id = myid;
You could join the table back to itself on name to get this.
SELECT t1.myid, t1.info, sum(t2.done) as number_of_done
FROM table t1 INNER JOIN table t2 on t1.name = t2.name
WHERE t1.id = myid
GROUP BY t1.myid, t1.info
Considering done is either 1 or 0 you could just get the sum and display that.
select info, sum(done)
from table where id = mid
group by info
EDIT:
select info, s
from table
inner join (
select name, sum(done) as s
from table
group by name
) as zzz on zzz.name = table.name
where id = myid
If you want it to display with more detailed data, use a windowing function:
select info, count(done) over (partition by name) ...

Value present in more than one table

I have 3 tables. All of them have a column - id. I want to find if there is any value that is common across the tables. Assuming that the tables are named a.b and c, if id value 3 is present is a and b, there is a problem. The query can/should exit at the first such occurrence. There is no need to probe further. What I have now is something like
( select id from a intersect select id from b )
union
( select id from b intersect select id from c )
union
( select id from a intersect select id from c )
Obviously, this is not very efficient. Database is PostgreSQL, version 9.0
id is not unique in the individual tables. It is OK to have duplicates in the same table. But if a value is present in just 2 of the 3 tables, that also needs to be flagged and there is no need to check for existence in he third table, or check if there are more such values. One value, present in more than one table, and I can stop.
Although id is not unique within any given table, it should be unique across the tables; a union of distinct id should be unique, so:
select id from (
select distinct id from a
union all
select distinct id from b
union all
select distinct id from c) x
group by id
having count(*) > 1
Note the use of union all, which preserves duplicates (plain union removes duplicates).
I would suggest a simple join:
select a.id
from a join
b
on a.id = b.id join
c
on a.id = c.id
limit 1;
If you have a query that uses union or group by (or order by, but that is not relevant here), then you need to process all the data before returning a single row. A join can start returning rows as soon as the first values are found.
An alternative, but similar method is:
select a.id
from a
where exists (select 1 from b where a.id = b.id) and
exists (select 1 from c where a.id = c.id);
If a is the smallest table and id is indexes in b and c, then this could be quite fast.
Try this
select id from
(
select distinct id, 1 as t from a
union all
select distinct id, 2 as t from b
union all
select distinct id, 3 as t from c
) as t
group by id having count(t)=3
It is OK to have duplicates in the same table.
The query can/should exit at the first such occurrence.
SELECT 'OMG!' AS danger_bill_robinson
WHERE EXISTS (SELECT 1
FROM a,b,c -- maybe there is a place for old-style joins ...
WHERE a.id = b.id
OR a.id = c.id
OR c.id = b.id
);
Update: it appears the optimiser does not like carthesian joins with 3 OR conditions. The below query is a bit faster:
SELECT 'WTF!' AS danger_bill_robinson
WHERE exists (select 1 from a JOIN b USING (id))
OR exists (select 1 from a JOIN c USING (id))
OR exists (select 1 from c JOIN b USING (id))
;

SQL statement to conditionally select related records

I have a table with fields id (primary key) and fid. I want to get the record where id matches a particular value, as well as all related records that have its same fid value.
I can do this:
SELECT * FROM mytable
WHERE fid = (SELECT TOP 1 fid FROM mytable WHERE id = 'somevalue')
But I don't want the related records if the fid is a particular value (in my case an empty guid value).
Is there a way to do this in a single SQL statement? I am using SQL Server 2008 R2.
UPDATE:
Looking at the answers so far I think I may not have asked my question clearly. id and fid will never be equal. LEFT JOIN may be what I need, but I'm a bit SQL ignorant. What I'm hoping for is the following two queries as a single statement:
SELECT * FROM mytable WHERE id = 'somevalue'
SELECT * FROM mytable WHERE fid =
(SELECT TOP 1 fid FROM mytable
WHERE id = 'somevalue' AND fid != '00000000-0000-0000-0000-000000000000')
Based on your revision, the problem seems to be "select all rows where id has a certain value and all other rows with the id matches "somevalue" and the fid is not null.
The following captures this logic:
SELECT t.*
FROM mytable t left outer join
(SELECT TOP 1 fid
FROM mytable
WHERE id = 'somevalue' AND fid <> '00000000-0000-0000-0000-000000000000'
) t1
on t.fid = t1.fid
WHERE id = 'somevalue' or t1.fid is not null;
Because id is a primary key, the t1 subquery will return 0 or 1 rows. When it returns 0 rows, you will only get the original row matching 'somevalue'.
I'm not certain I understand your question, but I'll take a stab at it. What I think you're asking is if you can select all records from one table where either the id or fid fields equal a particular value, but you don't want the related fields if the particular value you're searching on equals an empty guid value. If so, here's how you can do it:
SELECT
*
FROM
mytable t1
LEFT JOIN
mytable t2 ON (t1.id = t2.fid) AND (t2.fid IS NOT NULL);
Is this what you were looking for?
I think this is what you are trying to do:
SELECT *
FROM mytable a
JOIN mytable b ON a.id = b.fid
WHERE a.id = 'somevalue';
This should return all records in a (joined with all records in b where a.id = b.fid) then filtered to show only records that have a.id = 'somevalue';
You could just add another clause to your sql statement like this:
SELECT * From mutable
WHERE fid = (SELECT TOP 1 fid FROM mytable WHERE id = 'somevalue'
AND fid != '00000000-0000-0000-0000-000000000000')
If you want more than one row, try a join as suggested by #zigdawgydawg.
Maybe this is what you are after:
select * from mytable
where id = 'somevalue'
or id = (select fid from mytable where id = 'somevalue')
Almost like zigdawgydawg's contribution, but slightly different:
SELECT * FROM mytable WHERE fid IN
(SELECT fid FROM mytable WHERE id = 'somevalue' )
AND NOT guid is null;