SQL statement for maximum common element in a set - sql

I have a table like
id contact value
1 A 2
2 A 3
3 B 2
4 B 3
5 B 4
6 C 2
Now I would like to get the common maximum value for a given set of contacts.
For example:
if my contact set was {A,B} it would return 3;
for the set {A,C} it would return 2
for the set {B} it would return 4
What SQL statement(s) can do this?

Try this:
SELECT value, count(distinct contact) as cnt
FROM my_table
WHERE contact IN ('A', 'C')
GROUP BY value
HAVING cnt = 2
ORDER BY value DESC
LIMIT 1
This is MySQL syntax, may differ for your database. The number (2) in HAVING clause is the number of elements in set.

SELECT max(value) FROM table WHERE contact IN ('A', 'C')
Edit: max common
declare #contacts table ( contact nchar(10) )
insert into #contacts values ('a')
insert into #contacts values ('b')
select MAX(value)
from MyTable
where (select COUNT(*) from #contacts) =
(select COUNT(*)
from MyTable t
join #contacts c on c.contact = t.contact
where t.value = MyTable.value)

Most will tell you to use:
SELECT MAX(t.value)
FROM TABLE t
WHERE t.contact IN ('A', 'C')
GROUP BY t.value
HAVING COUNT(DISTINCT t.*) = 2
Couple of caveats:
The DISTINCT is key, otherwise you could have two rows of t.contact = 'A'.
The number of COUNT(DISTINCT t.*) has to equal the number of values specified in the IN clause
My preference is to use JOINs:
SELECT MAX(t.value)
FROM TABLE t
JOIN TABLE t2 ON t2.value = t.value AND t2.contact = 'C'
WHERE t.contact = 'A'
The downside to this is that you have to do a self join (join to the same table) for every criteria (contact value in this case).

Related

SQL aggregate and filter functions

Consider following table:
Number | Value
1 a
1 b
1 a
2 a
2 a
3 c
4 a
5 d
5 a
I want to choose every row, where the value for one number is the same, so my result should be:
Number | Value
2 a
3 c
4 a
I manage to get the right numbers by using nested
SQL-Statements like below. I am wondering if there is a simpler solution for my problem.
SELECT
a.n,
COUNT(n)
FROM
(
SELECT number n , value k
FROM testtable
GROUP BY number, value
) a
GROUP BY n
HAVING COUNT(n) = 1
You can try this
SELECT NUMBER,MAX(VALUE) AS VALUE FROM TESTTABLE
GROUP BY NUMBER
HAVING MAX(VALUE)=MIN(VALUE)
You can try also this:
SELECT DISTINCT t.number, t.value
FROM testtable t
LEFT JOIN testtable t_other
ON t.number = t_other.number AND t.value <> t_other.value
WHERE t_other.number IS NULL
Another alternative using exists.
select distinct num, val from testtable a
where not exists (
select 1 from testtable b
where a.num = b.num
and a.val <> b.val
)
http://sqlfiddle.com/#!9/dd080dd/5

Count value across multiple columns

I am looking to count the number of times set of values occurred in a table. These values could occur in up to 10 different columns. I need to increment the count regardless of which column it is in. I know how I could count if they were all in the same column but not spanning multiple columns.
Values can be added in any order. I have about a thousand
Cpt1 Cpt2 Cpt3 Cpt4 Cpt5
63047 63048 63048 NULL NULL
I would want to for this row I'd expect this as the result
63047 1
63048 2
You could use a union all call to treat them as one column:
SELECT col, COUNT(*)
FROM (SELECT col1 FROM mytable
UNION ALL
SELECT col2 FROM mytable
UNION ALL
SELECT col3 FROM mytable
-- etc...
) t
GROUP BY col
It's not entirely clear what your table exactly looks like, but I'm guessing that what you're looking for is:
SELECT row_count = COUNT(*),
row_count_with_given_value = SUM ( CASE WHEN field1 = 'myValue' THEN 1
WHEN field2 = 'myValue' THEN 1
WHEN field3 = 'myValue' THEN 1
WHEN field4 = 'myValue' THEN 1 ELSE 0 END)
FROM myTable
Assuming the fieldx columns are not NULL-able, you could write it like this too:
SELECT row_count = COUNT(*),
row_count_with_given_value = SUM ( CASE WHEN 'myValue' IN (field1, field2, field3, field4) THEN 1 ELSE 0 END)
FROM myTable
Something like this might work (after adapting to your value domain and data types):
create table t1
(i1 int,
i2 int,
i3 int);
insert into t1 values (1,0,0);
insert into t1 values (1,1,1);
insert into t1 values (1,0,0);
declare #i int = 0;
select #i = #i + i1 + i2 + i3 from t1;
print #i;
drop table t1;
Output is: 5
Many databases support lateral joins, of one type of another. These can be used to simplify this operation. Using the SQL Server/Oracle 12C syntax:
select v.cpt, count(*)
from t cross apply
(values (cpt1), (cpt2), . . .
) v(cpt)
where cpt is not null
group by v.cpt;

Select rows having the same features than others

I've the following table with 3 columns: Id, FeatureName and Value:
Id FeatureName Value
-- ----------- -----
1 AAA 10
1 ABB 12
1 BBB 12
2 AAA 15
2 ABB 12
2 ACD 7
3 AAA 10
3 ABB 12
3 CCC 12
.............
Each Id has different features and each Feature has a value for that Id.
I need to write a query which gives me the Ids that have exactly the same features and values than a given one, but only taking into account those whose name starts with 'A'. For example, in the top table, I can use that query to search for all the Ids that have the same features. For example, features with values where Id=1 would result Id=3 with same features starting with 'A' and same values for these features.
I found a couple of different ways to do this, but all of them go very slow when the table has lots of rows (more than hundred of thousands)
The way I obtain the best performance is using the next query:
select a2.Id
from (select a.FeatureName, a.Value
from Table1 a
where a.Id = 1) a1,
(select a.Id, a.FeatureName, a.Value
from Table1 a
where a.FeatureName like 'A%') a2
where a1.FeatureName = a2.FeatureName
and a1.value = a2.value
group by a2.Id
having count(*) = 2
intersect
select a.Id
from Table1 a
where a.FeatureName like 'A%'
group by a.Id
having count(*)= 2
where #nFeatures is the number of features starting by 'A' in Id=1. I counted them before calling this query. I make the intersection to avoid results that have the same parameters than Id=1 but also some others whose name starts with 'A'.
I think that the slowest part is the second subquery:
select a.Id, a.FeaureName, a.Value
from MyTable a
where a.FeatureName = 'A%'
but I don't know how to make it faster. Maybe I will have to play with the indexes.
Any idea of how could I write a fast query for this purpose?
So you want all rows where the combination of FeatureName and Value is not unique? You can use EXISTS:
SELECT t.*
FROM dbo.Table1 t
WHERE t.FeatureName LIKE 'A%'
AND EXISTS(SELECT 1 FROM dbo.Table1 t2
WHERE t.Id <> t2.ID
AND t.FeatureName = t2.FeatureName
AND t.Value = t2.Value)
Demo
how could I write a fast query for this purpose?
If it's not fast enough create an index on FeatureName + Value.
I tried to eliminate the join with MyTable again to select the data for the ID's that have matching FeatureName and Value values. Here's the query:
with joined_set as
(
SELECT
mt1.*, mt2.id as mt2_id, mt2.featurename as mt2_FeatureName, mt2.value as mt2_value
from
(
select *
from mytable
where featurename like 'A%'
) mt1
left join
(
select *
from mytable
where featurename like 'A%'
) mt2
on mt2.id <> mt1.id and mt2.FeatureName = mt1.featurename and mt2.value = mt1.value
)
select distinct id
from joined_set
where id not in
(select id
from joined_set
group by id
having SUM(
CASE
WHEN mt2_id is null THEN 1
ELSE 0
END
) <> 0
);
Here is the SQL Fiddle demo. It has an extra condition in the inline view mt2, to perform this search only for id = 1.
I'm a little dense this morning, I'm not sure if you wanted just the ID's or...
Here's my take on it...
You could probably move the where FeatureName like 'A%' into the inner query to filter the data on the initial table scan.
with dupFeatures (FeatureName, Value, dupCount)
as
(
select FeatureName, Value, count(*) as dupCount from MyTable
group by FeatureName, Value
having count(*) > 1
)
select MyTable.Id, dupFeatures.FeatureName,dupFeatures.Value
from dupFeatures
join MyTable on (MyTable.FeatureName = dupFeatures.FeatureName and
MyTable.Value = dupFeatures.Value )
where dupFeatures.FeatureName like 'A%'
order by FeatureName, Value, Id
A general solution is
With Rows As (
select id
, FeatureName
, Value
, rows = Count(id) OVER (PARTITION BY id)
FROM test
WHERE FeatureName LIKE 'A%')
SELECT a.id aID, b.id bID
FROM Rows a
INNER JOIN Rows b ON a.id < b.id and a.FeatureName = b.FeatureName
and a.rows = b.rows
GROUP BY a.id, b.id
ORDER BY a.id, b.id
to limit the solution to a group just add a WHERE condition on the main query for a.ID. The CTE is needed to get the correct number of rows for each id
SQLFiddle demo, in the demo I changed little the test data to have a another couple of ID with only one of the FeatureName of 1 and 3

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))

Club Update query with group by based on two attributes //postgres

How to update a table based on count which is derived by group by on two attributes?
Eg:
Thers a table having columns a,b,c
I need to update c based on count which has common(a,b)
UPDATE in PostgreSQL has a FROM extension you could use:
update YourTable as yt1
set c = case when aggr.cnt > 5 then 'Q' else 'Z' end
from (
select a
, b
, count(*) as cnt
from YourTable
group by
a
, b
) as aggr
where aggr.a = yt1.a
and aggr.b = yt1.b