Self Join to find duplicates but including all columns - sql

I would like to match any entries in a logs table which occurs having the same day and cause which are in the table more than once. I wrote the query for that fetches already the duplicates, my problem is that I need to have access to all the columns in the result from the table for later JOINs. Table looks like that:
| ID | DATE | CAUSE | USER | ... |
|--------------------------------------|
| x | 2017-01-01 | aaa | 100 | ... |
| x | 2017-01-02 | aaa | 101 | ... |
| x | 2017-01-03 | bbb | 101 | ... |
| x | 2017-01-03 | bbb | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-04 | ccc | 101 | ... |
| x | 2017-01-05 | aaa | 101 | ... |
| .....................................|
| .....................................|
| .....................................|
Query:
SELECT logs.* FROM
(SELECT day, cause FROM logs
GROUP BY day, cause HAVING COUNT(*) > 1) AS logsTwice, logs
WHERE logsTwice.day = logs.day AND logsTwice.cause = logs.cause
The sub select fetches exactly the right data (date and cause) but when I try to get the additional columns of these matches I get completely wrong data. What am I doing wrong?

You can just use window functions:
SELECT l.*
FROM (SELECT l.*,
COUNT(*) OVER (PARTITION BY day, cause) as cnt
FROM logs l
) l
WHERE cnt > 1;
In general, window functions will have better performance than the equivalent query using JOIN and GROUP BY.

Try this:
SELECT logs.* FROM logs
inner join
(SELECT day, cause FROM logs GROUP BY day, cause HAVING COUNT(*) > 1) logsTwice
on logsTwice.day = logs.day AND logsTwice.cause = logs.cause

You can try
SELECT l1.*
FROM logs l1
INNER JOIN logs l2
ON (l1.id <> l2.id
AND l1.day = l2.day
AND l1.cause = l2.cause
AND l1.user <> l2.user);

Related

In Oracle SQL how can i find all values in one column for which in another column exist more than one distinct value

I have an Oracle table like this
| id | code | info | More cols |
|----|------|------------------|-----------|
| 1 | 13 | The Thirteen | dggf |
| 1 | 18 | The Eighteen | ghdgffg |
| 1 | 18 | The Eighteen | |
| 1 | 9 | The Nine | ghdfgjgf |
| 1 | 9 | Die Neun | ghdfgjgf |
| 1 | 75 | The Seventy-five | ghfgh |
| 1 | 75 | The Seventy-five | ghfgh |
| 1 | 2 | The Two | ghfgh |
| 1 | 27 | The Twenty-Seven | |
| 1 | 27 | The Twenty-Seven | |
| 1 | 27 | el veintisiete | fghfg |
| . | . | . | . |
| . | . | . | . |
| . | . | . | . |
In this table I want to find all rows with values in column code which have more than one distinct value in the info column. So from the listed rows this would be the values 9 and 27 and the associated rows.
I tried to construct a first query like
SELECT code FROM mytable
WHERE COUNT(DISTINCT info) >1
but I get a "ORA-00934: group function is not allowed here" error. Also I don't know how to express the condition COUNT(DISTINCT info) "with a fixed postcode".
You need having with group by - aggregate functions don't work with where clause
SELECT code
FROM mytable
group by code
having COUNT(DISTINCT info) >1
I would write your query as:
SELECT code
FROM yourTable
GROUP BY code
HAVING MIN(info) <> MAX(info);
Writing the HAVING logic this ways leaves the query sargable, meaning that an index on (code, info) should be usable.
You could also do this using exists logic:
SELECT DISTINCT code
FROM yourTable t1
WHERE EXISTS (SELECT 1 FROM yourTable WHERE t2.code = t1.code AND t2.info <> t1.info);

List only repeating names

| personid | first | last | section |
| 1 | Jon | A | y3 |
| 2 | Bob | Z | t6 |
| 3 | Pat | G | h4 |
| 4 | Ron | Z | u3 |
| 5 | Sam | D | y3 |
| 6 | Sam | D | u3 |
| 7 | Pam | F | h4 |
I want to isolate all the repeat names, despite the other columns, like this:
| personid | first | last | section |
| 5 | Sam | D | y3 |
| 6 | Sam | D | u3 |
This is what I came up with but I cannot get it to work:
SELECT personid, last, first, section FROM d 01 WHERE EXISTS
(SELECT * FROM d 02 WHERE 02.last = 01.last AND 02.first = 01.first )
You could just do a window count and filter by that:
select personid, first, last, section
from (
select t.*, count(*) over(partition by first, last) cnt
from mytable t
) t
where cnt > 1
You must check that the 2 rows have different ids:
SELECT d1.personid, d1.last, d1.first, d1.section
FROM d d1 WHERE EXISTS (
SELECT *
FROM d d2
WHERE d1.personid <> d2.personid AND d2.last = d1.last AND d2.first = d1.first
)
Always qualify the column names with the table's name/alias and don't use numbers as aliases unless they are enclosed in backticks or square brackets.
See the demo.
Results:
| personid | last | first | section |
| -------- | ---- | ----- | ------- |
| 5 | D | Sam | y3 |
| 6 | D | Sam | u3 |
Another way to yield the same results as the other accepted answer:
SELECT personid,
A.firstName,
A.lastName,
section
FROM personTable as A
INNER JOIN (
SELECT
firstName,
lastName,
CASE
WHEN COUNT(*)>1 THEN 'Yes'
ELSE 'Null' , AS UseName
FROM
personTable
WHERE UseName='Yes') as B
ON A.firstName=B.firstName AND A.lastName=B.lastName
This solution subqueries itself. Since it is an inner join, it will only pull the values that join onto the subquery. Since I filtered anything with a count less than 2 out, only the duplicates will match.

SQL Server minimum value within column

I have table_1 with the following data:
| STORE | Add | dis | Cnt |
+-------+-----+-----+-----+
| 101 | X | abc | 2 |
| 101 | X | null| 3 |
| 101 | X |pqrd | 4 |
| 101 | X | null| 1 |
| 102 | y | null| 1 |
| 102 | y | xyz | 3 |
| 102 | y | pqr | 4 |
| 102 | y | null| 2 |
I tried to build a query to get data from table_1 where [dis] is not null and [cnt] should be minumum. So my result should looks like below:
| STORE | Add | dis | Cnt |
+-------+-----+-----+-----+
| 101 | X | abc | 2 |
| 102 | y | xyz | 3 |
My query looks like below :
SELECT store,add,dis,min(TMPLT_PRIORITY_NMBR)
FROM table_1 group by store,add;
But I get the following error:
Column 'dis' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
If I use [dis] in GROUP BY clause, I get the wrong result and giving max(dis) or min(dis) also provides the wrong result.
What would be the solution for this issue?
You could use rank to find the row with the minimal cnt value per store/add combination, and return all the columns from it:
SELECT store, add, dis, cnt
FROM (SELECT *, RANK() OVER (PARTITION BY store, add ORDER BY cnt) AS rk
FROM table_1
WHERE dis IS NOT NULL) t
WHERE rk = 1
Another option would be to use first_value and min with over:
SELECT distinct store,
add,
first_value(dis) over(partition by store, add order by Cnt) as dis,
min(Cnt) over(partition by store, add) as cnt
FROM table_1

Different results with agregate and analytical with distinct functions

I can't understand why there is the difference in results between two queries:
1) select distinct sum(lot.lot_size) over (partition by lot.detail_id) buying_size, lot.f_detail_id
from buying
join lot on lot.lot_id = buying.lot_id
where exists (select 1
from selling s
join buying b on b.buying_id = s.buying_buying_id
where s.deal_id = 123456
and buying.deal_id = b.deal_id
and s.selling_detailid = buying.buying_detailid)
and buying.buying_status <> 'Canceled'
result:
|buying_size|f_detail_id |
|-----------|------------|
| 105 | 1 |
| 200 | 2 |
| 75 | 3 |
| 225 | 4 |
| 300 | 5 |
2) select distinct *
from (select sum(lot.lot_size) over (partition by lot.detail_id) buying_size, lot.f_detail_id
from buying
join lot on lot.lot_id = buying.lot_id
where exists (select 1
from selling s
join buying b on b.buying_id = s.buying_buying_id
where s.deal_id = 123456
and buying.deal_id = b.deal_id
and s.selling_detailid = buying.buying_detailid)
and buying.buying_status <> 'Canceled')
result:
|buying_size|f_detail_id |
|-----------|------------|
| 105 | 1 |
| 200 | 2 |
| 75 | 3 |
| 225 | 4 |
| 150 | 5 |
I know that this query might be written by another way with using GROUP BY but the main thing I'm interested in is why the result of analytical function depends on DISTINCT.

Concatenated range descriptions in MySQL

I have data in a table looking like this:
+---+----+
| a | b |
+---+----+
| a | 1 |
| a | 2 |
| a | 4 |
| a | 5 |
| b | 1 |
| b | 3 |
| b | 5 |
| c | 5 |
| c | 4 |
| c | 3 |
| c | 2 |
| c | 1 |
+---+----+
I'd like to produce a SQL query which outputs data like this:
+---+-----------+
| a | 1-2, 4-5 |
| b | 1,3,5 |
| c | 1-5 |
+---+-----------+
Is there a way to do this purely in SQL (specifically, MySQL 5.1?)
The closest I have got is select a, concat(min(b), "-", max(b)) from test group by a;, but this doesn't take into account gaps in the range.
Use:
SELECT a, GROUP_CONCAT(x.island)
FROM (SELECT y.a,
CASE
WHEN MIN(y.b) = MAX(y.b) THEN
CAST(MIN(y.b) AS VARCHAR(10))
ELSE
CONCAT(MIN(y.b), '-', MAX(y.b))
END AS island
FROM (SELECT t.a, t.b,
CASE
WHEN #prev_b = t.b -1 THEN
#group_rank
ELSE
#group_rank := #group_rank + 1
END AS blah,
#prev_b := t.b
FROM TABLE t
JOIN (SELECT #group_rank := 1, #prev_b := 0) r
ORDER BY t.a, t.b) y
GROUP BY y.a, y.blah) x
GROUP BY a
The idea is if you assign a value to group sequencial values, then you can use MIN/MAX to get the appropriate vlalues. IE:
a | b | blah
---------------
a | 1 | 1
a | 2 | 1
a | 4 | 2
a | 5 | 2
I also found Martin Smith's answer to another question helpful:
printing restaurant opening hours from a database table in human readable format using php