Select rows having the same features than others

Select rows having the same features than others - sql

I've the following table with 3 columns: Id, FeatureName and Value:
Id FeatureName Value
-- ----------- -----
1 AAA 10
1 ABB 12
1 BBB 12
2 AAA 15
2 ABB 12
2 ACD 7
3 AAA 10
3 ABB 12
3 CCC 12
.............
Each Id has different features and each Feature has a value for that Id.
I need to write a query which gives me the Ids that have exactly the same features and values than a given one, but only taking into account those whose name starts with 'A'. For example, in the top table, I can use that query to search for all the Ids that have the same features. For example, features with values where Id=1 would result Id=3 with same features starting with 'A' and same values for these features.
I found a couple of different ways to do this, but all of them go very slow when the table has lots of rows (more than hundred of thousands)
The way I obtain the best performance is using the next query:
select a2.Id
from (select a.FeatureName, a.Value
from Table1 a
where a.Id = 1) a1,
(select a.Id, a.FeatureName, a.Value
from Table1 a
where a.FeatureName like 'A%') a2
where a1.FeatureName = a2.FeatureName
and a1.value = a2.value
group by a2.Id
having count(*) = 2
intersect
select a.Id
from Table1 a
where a.FeatureName like 'A%'
group by a.Id
having count(*)= 2
where #nFeatures is the number of features starting by 'A' in Id=1. I counted them before calling this query. I make the intersection to avoid results that have the same parameters than Id=1 but also some others whose name starts with 'A'.
I think that the slowest part is the second subquery:
select a.Id, a.FeaureName, a.Value
from MyTable a
where a.FeatureName = 'A%'
but I don't know how to make it faster. Maybe I will have to play with the indexes.
Any idea of how could I write a fast query for this purpose?

So you want all rows where the combination of FeatureName and Value is not unique? You can use EXISTS:
SELECT t.*
FROM dbo.Table1 t
WHERE t.FeatureName LIKE 'A%'
AND EXISTS(SELECT 1 FROM dbo.Table1 t2
WHERE t.Id <> t2.ID
AND t.FeatureName = t2.FeatureName
AND t.Value = t2.Value)
Demo
how could I write a fast query for this purpose?
If it's not fast enough create an index on FeatureName + Value.

I tried to eliminate the join with MyTable again to select the data for the ID's that have matching FeatureName and Value values. Here's the query:
with joined_set as
(
SELECT
mt1.*, mt2.id as mt2_id, mt2.featurename as mt2_FeatureName, mt2.value as mt2_value
from
(
select *
from mytable
where featurename like 'A%'
) mt1
left join
(
select *
from mytable
where featurename like 'A%'
) mt2
on mt2.id <> mt1.id and mt2.FeatureName = mt1.featurename and mt2.value = mt1.value
)
select distinct id
from joined_set
where id not in
(select id
from joined_set
group by id
having SUM(
CASE
WHEN mt2_id is null THEN 1
ELSE 0
END
) <> 0
);
Here is the SQL Fiddle demo. It has an extra condition in the inline view mt2, to perform this search only for id = 1.

I'm a little dense this morning, I'm not sure if you wanted just the ID's or...
Here's my take on it...
You could probably move the where FeatureName like 'A%' into the inner query to filter the data on the initial table scan.
with dupFeatures (FeatureName, Value, dupCount)
as
(
select FeatureName, Value, count(*) as dupCount from MyTable
group by FeatureName, Value
having count(*) > 1
)
select MyTable.Id, dupFeatures.FeatureName,dupFeatures.Value
from dupFeatures
join MyTable on (MyTable.FeatureName = dupFeatures.FeatureName and
MyTable.Value = dupFeatures.Value )
where dupFeatures.FeatureName like 'A%'
order by FeatureName, Value, Id

A general solution is
With Rows As (
select id
, FeatureName
, Value
, rows = Count(id) OVER (PARTITION BY id)
FROM test
WHERE FeatureName LIKE 'A%')
SELECT a.id aID, b.id bID
FROM Rows a
INNER JOIN Rows b ON a.id < b.id and a.FeatureName = b.FeatureName
and a.rows = b.rows
GROUP BY a.id, b.id
ORDER BY a.id, b.id
to limit the solution to a group just add a WHERE condition on the main query for a.ID. The CTE is needed to get the correct number of rows for each id
SQLFiddle demo, in the demo I changed little the test data to have a another couple of ID with only one of the FeatureName of 1 and 3

Related

SQL aggregate and filter functions

Consider following table:
Number | Value
1 a
1 b
1 a
2 a
2 a
3 c
4 a
5 d
5 a
I want to choose every row, where the value for one number is the same, so my result should be:
Number | Value
2 a
3 c
4 a
I manage to get the right numbers by using nested
SQL-Statements like below. I am wondering if there is a simpler solution for my problem.
SELECT
a.n,
COUNT(n)
FROM
(
SELECT number n , value k
FROM testtable
GROUP BY number, value
) a
GROUP BY n
HAVING COUNT(n) = 1

You can try this
SELECT NUMBER,MAX(VALUE) AS VALUE FROM TESTTABLE
GROUP BY NUMBER
HAVING MAX(VALUE)=MIN(VALUE)

You can try also this:
SELECT DISTINCT t.number, t.value
FROM testtable t
LEFT JOIN testtable t_other
ON t.number = t_other.number AND t.value <> t_other.value
WHERE t_other.number IS NULL

Another alternative using exists.
select distinct num, val from testtable a
where not exists (
select 1 from testtable b
where a.num = b.num
and a.val <> b.val
)
http://sqlfiddle.com/#!9/dd080dd/5

How do you select all rows when data occurs in one of the rows?

I have a table with data
|FormID|Name|
1 A
1 B
2 A
2 C
3 B
3 C
I am trying to query all rows where Name 'A' appears, however i also want to get all rows with the same FormID when the name occurs
For example
Select * from table where name = 'A'
resultset
|FormID|Name|
1 A
2 A
1 B
2 C
Right now i am just querying for the FormID values where the name occurs and then doing another query with the FormID number (Select * from table where formID in (1,2) ) but there must be a way to do this in one sql statement

You can use exists:
select t.*
from t
where t.name = 'A' or
exists (select 1
from t t2
where t2.formid = t.formid and t2.name = 'A'
);
Actually, the first condition is not necessary, so this suffices:
select t.*
from t
where exists (select 1
from t t2
where t2.formid = t.formid and t2.name = 'A'
);

Another approach:
SELECT formid, name
FROM forms
WHERE formid IN (SELECT formid FROM forms WHERE name = 'A')
ORDER BY name;
gives
formid name
---------- ----------
1 A
2 A
1 B
2 C
Because the subquery in the IN doesn't depend on the current row being looked at, it only has to be evaluated once, making it more potentially more efficient for large tables.

How to sort the query result by the number of specific column in ACCESS SQL?

For example:
This is the original result
Alpha Beta
A 1
B 2
B 3
C 4
After Order by the number of Alpha, this is the result I want
Alpha Beta
B 2
B 3
A 1
C 4
I tried to use GroupBy and OrderBy, but ACCESS always ask me to include all columns.

Why is 'B' placed before 'A' ? I don't understand this order..
Any way, doesn't seem like you need a group by, not from your data sample, but for your desired result you can use CASE EXPRESSION :
SELECT t.alpha,t.beta FROM YourTable t
ORDER BY CASE WHEN t.alpha = 'B' THEN 1 ELSE 0 END DESC,
t.aplha,
t.beta
EDIT: Use this query:
SELECT t.alpha,t.beta FROM YourTable t
INNER JOIN(SELECT s.alpha,count(*) as cnt
FROM YourTable s
GROUP BY s.alpha) t2
ON(t.aplha = t2.alpha)
ORDER BY t2.cnt,t.alpha,t.beta

The query counts number of rows for every distinct Alpha and sorts. General Sql, tweak for ACCESS if needed.
SELECT t1.alpha,t1.beta
FROM t t1
JOIN (
SELECT t2.alpha, count(t2.*) AS n FROM t t2 GROUP BY t2.alpha
) t3 ON t3.alpha = t1.alpha
ORDER BY t3.n, t1.alpha, t1.beta

SQL using CASE in SELECT with GROUP BY. Need CASE-value but get row-value

so basicially there is 1 question and 1 problem:
1. question - when I have like 100 columns in a table(and no key or uindex is set) and I want to join or subselect that table with itself, do I really have to write out every column name?
2. problem - the example below shows the 1. question and my actual SQL-statement problem
Example:
A.FIELD1,
(SELECT CASE WHEN B.FIELD2 = 1 THEN B.FIELD3 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD1
(SELECT CASE WHEN B.FIELD2 = 2 THEN B.FIELD4 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD2
FROM TABLE A
GROUP BY A.FIELD1
The story is: if I don't put the CASE into its own select statement then I have to put the actual rowname into the GROUP BY and the GROUP BY doesn't group the NULL-value from the CASE but the actual value from the row. And because of that I would have to either join or subselect with all columns, since there is no key and no uindex, or somehow find another solution.
DBServer is DB2.
So now to describing it just with words and no SQL:
I have "order items" which can be divided into "ZD" and "EK" (1 = ZD, 2 = EK) and can be grouped by "distributor". Even though "order items" can have one of two different "departements"(ZD, EK), the fields/rows for "ZD" and "EK" are always both filled. I need the grouping to consider the "departement" and only if the designated "departement" (ZD or EK) is changing, then I want a new group to be created.
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
ZD
EK
TABLE.DISTRIBUTOR
TABLE.DEPARTEMENT
This here worked in the SELECT and ZD, EK in the GROUP BY. Only problem was, even if EK was not the designated DEPARTEMENT, it still opened a new group if it changed, because he was using the real EK value and not the NULL from the CASE, as I was already explaining up top.

And here ladies and gentleman is the solution to the problem:
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END),
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END),
TABLE.DISTRIBUTOR,
TABLE.DEPARTEMENT
#t-clausen.dk: Thank you!
#others: ...

Actually there is a wildcard equality test.
I am not sure why you would group by field1, that would seem impossible in your example. I tried to fit it into your question:
SELECT FIELD1,
CASE WHEN FIELD2 = 1 THEN FIELD3 END AS CASEFIELD1,
CASE WHEN FIELD2 = 2 THEN FIELD4 END AS CASEFIELD2
FROM
(
SELECT * FROM A
INTERSECT
SELECT * FROM B
) C
UNION -- results in a distinct
SELECT
A.FIELD1,
null,
null
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
) C
This will fail for datatypes that are not comparable

No, there's no wildcard equality test. You'd have to list every field you want tested individually. If you don't want to test each individual field, you could use a hack such as concatenating all the fields, e.g.
WHERE (a.foo + a.bar + a.baz) = (b.foo + b.bar + b.az)
but either way, you're listing all of the fields.

I might tend to solve it something like this
WITH q as
(SELECT
Department
, (CASE WHEN DEPARTEMENT = 1 THEN ZD
WHEN DEPARTEMENT = 2 THEN EK
ELSE null
END) AS GRP
, DISTRIBUTOR
, SOMETHING
FROM mytable
)
SELECT
Department
, Grp
, Distributor
, sum(SOMETHING) AS SumTHING
FROM q
GROUP BY
DEPARTEMENT
, GRP
, DISTRIBUTOR

If you need to find all rows in TableA that match in TableB, how about INTERSECT or INTERSECT DISTINCT?
select * from A
INTERSECT DISTINCT
select * from B
However, if you only want rows from A where the entire row matches the values in a row from B, then why does your sample code take some values from A and others from B? If the row matches on all columns, then that would seem pointless. (Perhaps your question could be explained a bit more fully?)

SQL statement for maximum common element in a set

I have a table like
id contact value
1 A 2
2 A 3
3 B 2
4 B 3
5 B 4
6 C 2
Now I would like to get the common maximum value for a given set of contacts.
For example:
if my contact set was {A,B} it would return 3;
for the set {A,C} it would return 2
for the set {B} it would return 4
What SQL statement(s) can do this?

Try this:
SELECT value, count(distinct contact) as cnt
FROM my_table
WHERE contact IN ('A', 'C')
GROUP BY value
HAVING cnt = 2
ORDER BY value DESC
LIMIT 1
This is MySQL syntax, may differ for your database. The number (2) in HAVING clause is the number of elements in set.

SELECT max(value) FROM table WHERE contact IN ('A', 'C')
Edit: max common
declare #contacts table ( contact nchar(10) )
insert into #contacts values ('a')
insert into #contacts values ('b')
select MAX(value)
from MyTable
where (select COUNT(*) from #contacts) =
(select COUNT(*)
from MyTable t
join #contacts c on c.contact = t.contact
where t.value = MyTable.value)

Most will tell you to use:
SELECT MAX(t.value)
FROM TABLE t
WHERE t.contact IN ('A', 'C')
GROUP BY t.value
HAVING COUNT(DISTINCT t.*) = 2
Couple of caveats:
The DISTINCT is key, otherwise you could have two rows of t.contact = 'A'.
The number of COUNT(DISTINCT t.*) has to equal the number of values specified in the IN clause
My preference is to use JOINs:
SELECT MAX(t.value)
FROM TABLE t
JOIN TABLE t2 ON t2.value = t.value AND t2.contact = 'C'
WHERE t.contact = 'A'
The downside to this is that you have to do a self join (join to the same table) for every criteria (contact value in this case).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas