List only repeating names - sql

| personid | first | last | section |
| 1 | Jon | A | y3 |
| 2 | Bob | Z | t6 |
| 3 | Pat | G | h4 |
| 4 | Ron | Z | u3 |
| 5 | Sam | D | y3 |
| 6 | Sam | D | u3 |
| 7 | Pam | F | h4 |
I want to isolate all the repeat names, despite the other columns, like this:
| personid | first | last | section |
| 5 | Sam | D | y3 |
| 6 | Sam | D | u3 |
This is what I came up with but I cannot get it to work:
SELECT personid, last, first, section FROM d 01 WHERE EXISTS
(SELECT * FROM d 02 WHERE 02.last = 01.last AND 02.first = 01.first )

You could just do a window count and filter by that:
select personid, first, last, section
from (
select t.*, count(*) over(partition by first, last) cnt
from mytable t
) t
where cnt > 1

You must check that the 2 rows have different ids:
SELECT d1.personid, d1.last, d1.first, d1.section
FROM d d1 WHERE EXISTS (
SELECT *
FROM d d2
WHERE d1.personid <> d2.personid AND d2.last = d1.last AND d2.first = d1.first
)
Always qualify the column names with the table's name/alias and don't use numbers as aliases unless they are enclosed in backticks or square brackets.
See the demo.
Results:
| personid | last | first | section |
| -------- | ---- | ----- | ------- |
| 5 | D | Sam | y3 |
| 6 | D | Sam | u3 |

Another way to yield the same results as the other accepted answer:
SELECT personid,
A.firstName,
A.lastName,
section
FROM personTable as A
INNER JOIN (
SELECT
firstName,
lastName,
CASE
WHEN COUNT(*)>1 THEN 'Yes'
ELSE 'Null' , AS UseName
FROM
personTable
WHERE UseName='Yes') as B
ON A.firstName=B.firstName AND A.lastName=B.lastName
This solution subqueries itself. Since it is an inner join, it will only pull the values that join onto the subquery. Since I filtered anything with a count less than 2 out, only the duplicates will match.

Related

Select from a concatenation of two columns after a left join

Problem description
Let the tables C and V have those values
>> Table V <<
| UnID | BillID | ProductDesc | Value | ... |
| 1 | 1 | 'Orange Juice' | 3.05 | ... |
| 1 | 1 | 'Apple Juice' | 3.05 | ... |
| 1 | 2 | 'Pizza' | 12.05 | ... |
| 1 | 2 | 'Chocolates' | 9.98 | ... |
| 1 | 2 | 'Honey' | 15.98 | ... |
| 1 | 3 | 'Bread' | 3.98 | ... |
| 2 | 1 | 'Yogurt' | 8.55 | ... |
| 2 | 1 | 'Ice Cream' | 7.05 | ... |
| 2 | 1 | 'Beer' | 9.98 | ... |
| 2 | 2 | 'League of Legends RP' | 40.00 | ... |
>> Table C <<
| UnID | BillID | ClientName | ... |
| 1 | 1 | 'Alexander' | ... |
| 1 | 2 | 'Tom' | ... |
| 1 | 3 | 'Julia' | ... |
| 2 | 1 | 'Tom' | ... |
| 2 | 2 | 'Alexander' | ... |
Table C have the values of each product, which is associated with a bill number. Table V has the relationship between the client name and the bill number. However, the bill number has a counter that is dependent on the UnId, which is the store unity ID. That being said, each store has it`s own Bill number 1, number 2, etc. Also, the number of bills from each store are not equal.
Solution description
I'm trying to make select between the C left join V without sucess. Because each BillID is dependent on the UnID, I have to make the join considering the concatenation between those two columns.
I've used this script, but it gives me an error.
SELECT
SUM(C.Value),
V.ClientName
FROM
C
LEFT JOIN
V
ON
CONCAT(C.UnID, C.BillID) = CONCAT(V.UnID, V.BillID)
GROUP BY
V.ClientName
and SQL server returns me this 'CONCAT' is not a recognized built-in function name.
I'm using Microsoft SQL Server 2008 R2
Is the use of CONCAT wrong? Or is it the way I tried to SELECT? Could you give me a hand?
[OBS: The tables I've present you are just for the purpose of explaining my difficulties. That being said, if you find any errors in the explanation, please let me know to correct them.]
You should be joining on the equality of the UnID and BillID columns in the two tables:
SELECT
c.ClientName,
COALESCE(SUM(v.Value), 0) AS total
FROM C c
LEFT JOIN V v
ON c.UnID = v.UnID AND
c.BillID = v.BillID
GROUP BY
c.ClientName;
In theory you could try joining on CONCAT(UnID, BillID). However, you could run into problems. For example, UnID = 1 with BillID = 23 would, concatenated together, be the same as UnID = 12 and BillID = 3.
Note: We wrap the sum with COALESCE, because should a given client have no entries in the V table, the sum would return NULL, which we then replace with zero.
concat is only available in sql server 2012.
Here's one option.
SELECT
SUM(C.Value),
V.ClientName
FROM
C
LEFT JOIN
V
ON
cast(C.UnID as varchar(100)) + cast(C.BillID as varchar(100)) = cast(V.UnID as varchar(100)) + cast(V.BillID as varchar(100))
GROUP BY
V.ClientName

Postgres: Aggregate accounts into a single identity by common email address

I'm building a directory of users, where:
each user can have an account on one or more external services, and
each of these accounts can have one or more email addresses.
What I want to know is, how can I aggregate these accounts into single identities through common email addresses?
For example, let's say I have two services, A and B. For each service, I have a table that relates an account to one or more email addresses.
So if service A has these account email addresses:
account_id | email_address
-----------|--------------
1 | a#foo.com
1 | b#foo.com
2 | c#foo.com
and service B has these account email addresses:
account_id | email_address
-----------|--------------
3 | a#foo.com
3 | a#bar.com
4 | d#foo.com
I'd like to create a table that aggregates the email addresses of these accounts into a single user identity:
user_id | email_address
--------|--------------
X | a#foo.com
X | b#foo.com
X | a#bar.com
Y | c#foo.com
Z | d#foo.com
As you can see, account 1 from service A and account 2 from service B have been merged into a common user X, based on the common email address a#foo.com. Here's an animated visual:
The closest answer I could find is this one, and I suspect the solution is a recursive CTE, but given the inputs and engine are different I'm having trouble implementing it.
Clarification: I'm looking for a solution that handles an arbitrary number of services, so perhaps the input table might be better off as:
service_id | account_id | email_address
-----------|------------|--------------
A | 1 | a#foo.com
A | 1 | b#foo.com
A | 2 | c#foo.com
B | 3 | a#foo.com
B | 3 | a#bar.com
B | 4 | d#foo.com
demo1:db<>fiddle, demo2:db<>fiddle
WITH combined AS (
SELECT
a.email as a_email,
b.email as b_email,
array_remove(ARRAY[a.id, b.id], NULL) as ids
FROM
a
FULL OUTER JOIN b ON (a.email = b.email)
), clustered AS (
SELECT DISTINCT
ids
FROM (
SELECT DISTINCT ON (unnest_ids)
*,
unnest(ids) as unnest_ids
FROM combined
ORDER BY unnest_ids, array_length(ids, 1) DESC
) s
)
SELECT DISTINCT
new_id,
unnest(array_cat) as email
FROM (
SELECT
array_cat(
array_agg(a_email) FILTER (WHERE a_email IS NOT NULL),
array_agg(b_email) FILTER (WHERE b_email IS NOT NULL)
),
row_number() OVER () as new_id
FROM combined co
JOIN clustered cl
ON co.ids <# cl.ids
GROUP BY cl.ids
) s
Step by step explanation:
For explanation I'll take this dataset. This is a little bit more complex than yours. It can illustrate my steps better. Some problems don't occur in your smaller set. Think about the characters as variables for email addresses.
Table A:
| id | email |
|----|-------|
| 1 | a |
| 1 | b |
| 2 | c |
| 5 | e |
Table B
| id | email |
|----|-------|
| 3 | a |
| 3 | d |
| 4 | e |
| 4 | f |
| 3 | b |
CTE combined:
JOIN of both tables on same email addresses to get a touch point. IDs of same Ids will be concatenated in one array:
| a_email | b_email | ids |
|-----------|-----------|-----|
| (null) | a#bar.com | 3 |
| a#foo.com | a#foo.com | 1,3 |
| b#foo.com | (null) | 1 |
| c#foo.com | (null) | 2 |
| (null) | d#foo.com | 4 |
CTE clustered (sorry for the names...):
Goal is to get all elements exactly in only one array. In combined you can see, for example currently there are more arrays with the element 4: {5,4} and {4}.
First ordering the rows by the length of their ids arrays because the DISTINCT later should take the longest array (because holding the touch point {5,4} instead of {4}).
Then unnest the ids arrays to get a basis for filtering. This ends in:
| a_email | b_email | ids | unnest_ids |
|---------|---------|-----|------------|
| b | b | 1,3 | 1 |
| a | a | 1,3 | 1 |
| c | (null) | 2 | 2 |
| b | b | 1,3 | 3 |
| a | a | 1,3 | 3 |
| (null) | d | 3 | 3 |
| e | e | 5,4 | 4 |
| (null) | f | 4 | 4 |
| e | e | 5,4 | 5 |
After filtering with DISTINCT ON
| a_email | b_email | ids | unnest_ids |
|---------|---------|-----|------------|
| b | b | 1,3 | 1 |
| c | (null) | 2 | 2 |
| b | b | 1,3 | 3 |
| e | e | 5,4 | 4 |
| e | e | 5,4 | 5 |
We are only interested in the ids column with the generated unique id clusters. So we need all of them only once. This is the job of the last DISTINCT. So CTE clustered results in
| ids |
|-----|
| 2 |
| 1,3 |
| 5,4 |
Now we know which ids are combined and should share their data. Now we join the clustered ids against the origin tables. Since we have done this in the CTE combined we can reuse this part (that's the reason why it is outsourced into a single CTE by the way: We do not need another join of both tables in this step anymore). The JOIN operator <# says: JOIN if the "touch point" array of combined is a subgroup of the id cluster of clustered. This yields in:
| a_email | b_email | ids | ids |
|---------|---------|-----|-----|
| c | (null) | 2 | 2 |
| a | a | 1,3 | 1,3 |
| b | b | 1,3 | 1,3 |
| (null) | d | 3 | 1,3 |
| e | e | 5,4 | 5,4 |
| (null) | f | 4 | 5,4 |
Now we are able to group the email addresses by using the clustered ids (rightmost column).
array_agg aggregates the mails of one column, array_cat concatenates the email arrays of both columns into one big email array.
Since there are columns where email is NULL we can filter these values out before clustering with the FILTER (WHERE...) clause.
Result so far:
| array_cat |
|-----------|
| c |
| a,b,a,b,d |
| e,e,f |
Now we group all email addresses for one single id. We have to generate new unique ids. That's what the window function row_number is for. It simply adds a row count to the table:
| array_cat | new_id |
|-----------|--------|
| c | 1 |
| a,b,a,b,d | 2 |
| e,e,f | 3 |
Last step is to unnest the array to get a row per email address. Since in the array are still some duplicates we can eliminate them in this step with a DISTINCT as well:
| new_id | email |
|--------|-------|
| 1 | c |
| 2 | a |
| 2 | b |
| 2 | d |
| 3 | e |
| 3 | f |
OK, provided you only have two 'services', and assuming that to begin with you are not overly concerned with how to best represent the new key (I've used text as the easiest to hand), then please try the below query. This works for me on Postgres 9.6:
WITH shared_addr AS
(
SELECT foo.account_a, foo.account_b, row_number() OVER (ORDER BY foo.account_a) AS shared_id
FROM (
SELECT
a.account_id as account_a
, b.account_id as account_b
FROM
service_a a
JOIN
service_b b
ON
a.email_address = b.email_address
GROUP BY a.account_id, b.account_id
) foo
)
SELECT
bar.account_id,
bar.email_address
FROM
(
SELECT
'A-' || service_a.account_id::text AS account_id,
service_a.email_address
FROM service_a
LEFT OUTER JOIN
shared_addr
ON
shared_addr.account_a = service_a.account_id
WHERE shared_addr.account_b IS NULL
UNION ALL
SELECT
'B-' ||service_b.account_id::text,
service_b.email_address FROM service_b
LEFT OUTER JOIN
shared_addr
ON
shared_addr.account_b = service_b.account_id
WHERE shared_addr.account_a IS NULL
UNION ALL
(
SELECT
'shared-' || shared_addr.shared_id::text,
service_b.email_address
FROM service_b
JOIN
shared_addr
ON
shared_addr.account_b = service_b.account_id
UNION
SELECT
'shared-' || shared_addr.shared_id::text,
service_a.email_address
FROM service_a
JOIN
shared_addr
ON
shared_addr.account_a = service_a.account_id
)
) bar
;

SQL - Rows that are repetitive with a particular condition

We have a table like this:
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 3 | Steve | SomeService3 | | | | 2 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
| 4 | Steve | SomeService4 | | | | 12 |
+----+-------+-----------------+----------------+-----------------+----------------+-----------------+
Every digit in zones is a tooth (dental science) and it means "John" has got "SomeService1" twice for tooth #3.
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| ID | Name | RecievedService | FirstZoneTeeth | SecondZoneTeeth | ThirdZoneTeeth | FourthZoneTeeth |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 1 | John | SomeService1 | 13 | | 4 | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
| 2 | John | SomeService1 | 34 | | | |
+----+------+-----------------+----------------+-----------------+----------------+-----------------+
Note that Steve has received services twice for tooth #2 (4th Zone) but services are not one.
I'd write some code that gives me a table with duplicate rows (Checking the only patient and received service)(using "group by" clause") but I need to check zones too.
I've tried this:
select ROW_NUMBER() over(order by vv.ID_sick) as RowNum,
bb.Radif,
bb.VCount as 'Count',
vv.ID_sick 'ID_Sick',
vv.ID_service 'ID_Service',
sick.FNamesick + ' ' + sick.LNamesick as 'Sick',
serv.NameService as 'Service',
vv.Mab_Service as 'MabService',
vv.Mab_daryafti as 'MabDaryafti',
vv.datevisit as 'DateVisit',
vv.Zone1,
vv.Zone2,
vv.Zone3,
vv.Zone4,
vv.ID_dentist as 'ID_Dentist',
dent.FNamedentist + ' ' + dent.LNamedentist as 'Dentist',
vv.id_do as 'ID_Do',
do.FNamedentist + ' ' + do.LNamedentist as 'Do'
from visiting vv inner join (
select ROW_NUMBER() OVER(ORDER BY a.ID_sick ASC) AS Radif,
count(a.ID_sick) as VCount,
a.ID_sick,
a.ID_service
from visiting a
group by a.ID_sick, a.ID_service, a.Zone1, a.Zone2, a.Zone3, a.Zone4
having count(a.ID_sick)>1)bb
on vv.ID_sick = bb.ID_sick and vv.ID_service = bb.ID_service
left join InfoSick sick on vv.ID_sick = sick.IDsick
left join infoService serv on vv.ID_service = serv.IDService
left join Infodentist dent on vv.ID_dentist = dent.IDdentist
left join infodentist do on vv.id_do = do.IDdentist
order by bb.ID_sick, bb.ID_service,vv.datevisit
But this code only returns rows with all tooths repeated. What I want is even one tooth repeats ...
How can I implement it?
I need to check characters in zones.
**Zone's datatype is varchar
This is a bad datamodel for what you are trying to do. By storing the teeth as a varchar, you have kind of decided that you are not interested in single teeth, but only in the group of teeth. Now, however, you are trying to investigate on single teeth.
You'd want a datamodel like this:
service
+------------+--------+-----------------+
| service_id | Name | RecievedService |
+------------+--------+-----------------+
| 1 | John | SomeService1 |
+------------+--------+-----------------+
| 3 | Steve | SomeService3 |
+------------+--------+-----------------+
| 4 | Steve | SomeService4 |
+------------+-------+-----------------+
service_detail
+------------+------+-------+
| service_id | zone | tooth |
+------------+------+-------+
| 1 | 1 | 1 |
| 1 | 1 | 3 |
| 1 | 3 | 4 |
+------------+------+-------+
| 1 | 1 | 3 |
| 1 | 1 | 4 |
+------------+------+-------+
| 3 | 4 | 2 |
+------------+------+-------+
| 4 | 4 | 1 |
| 4 | 4 | 2 |
+------------+------+-------+
What you can do with the given datamodel is to create such table on-the-fly using a recursive query and string manipulation:
with unpivoted(service_id, name, zone, teeth) as
(
select recievedservice, name, 1, firstzoneteeth
from mytable where len(firstzoneteeth) > 0
union all
select recievedservice, name, 2, secondzoneteeth
from mytable where len(secondzoneteeth) > 0
union all
select recievedservice, name, 3, thirdzoneteeth
from mytable where len(thirdzoneteeth) > 0
union all
select recievedservice, name, 4, fourthzoneteeth
from mytable where len(fourthzoneteeth) > 0
)
, service_details(service_id, name, zone, tooth, teeth) as
(
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from unpivoted
union all
select
service_id, name, zone, substring(teeth, 1, 1), substring(teeth, 2, 10000)
from service_details
where len(teeth) > 0
)
, duplicates(service_id, name) as
(
select distinct service_id, name
from service_details
group by service_id, name, zone, tooth
having count(*) > 1
)
select m.*
from mytable m
join duplicates d on d.service_id = m.recievedservice and d.name = m.name;
A lot of work and a rather slow query due to a bad datamodel, but still feasable.
Rextester demo: http://rextester.com/JVWK49901

Only include grouped observations where event order is valid

I have a table of dates for eye exams and eye wear purchases for individuals. I only want to keep instances where individuals bought their eye wear following an eye exam. In the example below, I would want to keep person 1, events 2 and 3 for person 2, person 3, but not person 4. How can I do this in SQL server?
| Person | Event | Order |
| 1 | Exam | 1 |
| 1 | Eyewear| 2 |
| 2 | Eyewear| 1 |
| 2 | Exam | 2 |
| 2 | Eyewear| 3 |
| 3 | Exam | 1 |
| 3 | Eyewear| 2 |
| 4 | Eyewear| 1 |
| 4 | Exam | 2 |
The final result would look like
| Person | Event | Order |
| 1 | Exam | 1 |
| 1 | Eyewear| 2 |
| 2 | Exam | 2 |
| 2 | Eyewear| 3 |
| 3 | Exam | 1 |
| 3 | Eyewear| 2 |
Self join should work...
select
t.Person
,t.Event
,t.[Order]
from
yourTable t
inner join
yourTable t2 on t2.Person = t.Person
and t2.[Order] = (t.[Order] +1)
where
t2.Event = 'Eyewear'
and t.Event = 'Exam'
I haven't tried to optimize it but this seems to work:
create table t(
person varchar(10),
event varchar(10),
[order] varchar(10)
);
insert into t values
('1','Exam','1'),
('1','Eyewear','2'),
('2','Eyewear','1'),
('2','Exam','2'),
('2','Eyewear','3'),
('3','Exam','1'),
('3','Eyewear','2'),
('4','Eyewear','1'),
('4','Exam','2');
with xxx(person,event_a,seq_a,event_b,seq_b) as (
select a.person,a.event,a.[order],b.event,b.[order]
from t a join t b
on a.person = b.person
and a.[order] < b.[order]
and a.event like 'exam'
and b.event like 'eyewear'
)
select person,event_a event,seq_a [order] from xxx
union
select person,event_b event,seq_b [order] from xxx
order by 1,3

Concatenated range descriptions in MySQL

I have data in a table looking like this:
+---+----+
| a | b |
+---+----+
| a | 1 |
| a | 2 |
| a | 4 |
| a | 5 |
| b | 1 |
| b | 3 |
| b | 5 |
| c | 5 |
| c | 4 |
| c | 3 |
| c | 2 |
| c | 1 |
+---+----+
I'd like to produce a SQL query which outputs data like this:
+---+-----------+
| a | 1-2, 4-5 |
| b | 1,3,5 |
| c | 1-5 |
+---+-----------+
Is there a way to do this purely in SQL (specifically, MySQL 5.1?)
The closest I have got is select a, concat(min(b), "-", max(b)) from test group by a;, but this doesn't take into account gaps in the range.
Use:
SELECT a, GROUP_CONCAT(x.island)
FROM (SELECT y.a,
CASE
WHEN MIN(y.b) = MAX(y.b) THEN
CAST(MIN(y.b) AS VARCHAR(10))
ELSE
CONCAT(MIN(y.b), '-', MAX(y.b))
END AS island
FROM (SELECT t.a, t.b,
CASE
WHEN #prev_b = t.b -1 THEN
#group_rank
ELSE
#group_rank := #group_rank + 1
END AS blah,
#prev_b := t.b
FROM TABLE t
JOIN (SELECT #group_rank := 1, #prev_b := 0) r
ORDER BY t.a, t.b) y
GROUP BY y.a, y.blah) x
GROUP BY a
The idea is if you assign a value to group sequencial values, then you can use MIN/MAX to get the appropriate vlalues. IE:
a | b | blah
---------------
a | 1 | 1
a | 2 | 1
a | 4 | 2
a | 5 | 2
I also found Martin Smith's answer to another question helpful:
printing restaurant opening hours from a database table in human readable format using php