Sql for distinct record comparison - sql

I am comparing a table to itself trying to determine whether an email in one record is being used in any one of four other columns in another record.
To make this easier, lets look at an example (simplified):
Name: Bob
Office Email: bob#aaa.com
Home Email: bob#home.com
Mobile Email: bobster#gmail.com
.
Name: Rob
Office Email: rob#bbb.com
Home Email: bob#home.com
Mobile Email: robert#gmail.com
Now I have a sql statement like this:
select c1.ContactId id1, c1.FullName Name1, 'Office Email 1' EmailType1, c1.EMailAddress1 Email,
c2.ContactId id2, c2.FullName Name2,
CASE c1.EmailAddress1
WHEN c2.EMailAddress1 THEN 'Office Email 1'
WHEN c2.Si_OfficeEmail2nd THEN 'Office Email 2'
WHEN c2.EMailAddress2 THEN 'Mobile Email'
WHEN c2.pc_hmemail THEN 'Home Email'
ELSE '?'
END EmailType2,
CASE c1.EmailAddress1
WHEN c2.EMailAddress1 THEN c2.EMailAddress1
WHEN c2.Si_OfficeEmail2nd THEN c2.Si_OfficeEmail2nd
WHEN c2.EMailAddress2 THEN c2.EMailAddress2
WHEN c2.pc_hmemail THEN c2.pc_hmemail
ELSE '?'
END DuplicateEmail
from Contact c1, Contact c2
where (
LTRIM(RTRIM(c1.EMailAddress1 )) = LTRIM(RTRIM(c2.EMailAddress1))
Or LTRIM(RTRIM(c1.EMailAddress1 )) = LTRIM(RTRIM(c2.EMailAddress2))
Or LTRIM(RTRIM(c1.EMailAddress1 )) = LTRIM(RTRIM(c2.pc_hmemail))
Or LTRIM(RTRIM(c1.EMailAddress1 )) = LTRIM(RTRIM(c2.Si_OfficeEmail2nd))
)
And c1.ContactId <> c2.ContactId
And c1.StateCode = 0
and c2.StateCode = 0
order by c1.FullName, c2.FullName
Unfortunately, because Bob and Rob have the same email 'type' (Home Email) that is duplicated due to a typo, my query returns two records, one which shows that Bobs email is duplicated in Robs email, and a second that Robs email is duplicated in Bobs email.
I only need one record. I'm sure this is a common problem but I don't quite know how to describe this problem well enough to have a search engine return something useful.
Perhaps there is a better way of going about this? If not, other than jumping through a bunch of intermediate temporary tables to eliminate these equivalent records, is there a way to write a single query for this?

The solution to your problem is to add the condition: c1.contactId < c2.ContactId. This limits the pairs you are looking at.
If you are looking at emails, you might find a faster approach to look directly at emails. Something like the following will return all emails (on separate rows) that are duplicated:
select e.*
from (select e.*, COUNT(*) over (partition by email) as NumTimes
from ((select contactId, 'Office' as which, EmailAddress1 as email
from Contact
) union all
(select contactId, 'Office2', Si_OfficeEmail2nd
from Contact
) union all
(select contact_id, 'Home', pc_hmemail
from Contact
) union all
(select contact_id, 'Mobile', EmailAddress2
from Contact
)
) e
where email is not null and email <> ''
) e
where NumTimes > 1
order by email

I'd first suggest to continue to normalise your datastructure. A person may have several types of contact information. Therefore the personID, typeID and value can be placed into another table. From this table you can create another relation with a type table, where you keep track of the different contact types (e.g. Home E-mail, Work e-mail, Twitter, linkedIn, Facebook etc). It does not only improve the extendibility of your system but also enables to run these types of queries much more efficiently.
SELECT user.username FROM user u LEFT JOIN contactinfo ci ON u.user_id=ci.user_id LEFT JOIN contacttype ct ON ci.type_id=ct.type_id GROUP BY ci.value HAVING count(value)>1 would be the query to find any duplicate source

Related

SQL sub-query check if data exists in another table

I am new to SQL and the question may already exist in some form. I am developing a web app that displays cards that contain title,description and more information regarding a place, and with a sub-query fetches the image data regarding each place. PostgreSQL is used. I want to add a sub-query or join that gets which items are liked by the user from the ones selected(present in the favoritePlaces with the corresponding user_id and place_id received from the query). Sample dynamically generated query:
SELECT places.place_id,
username,
title,
description,
visible,
score,
placelocation,
category,
price,
accessibility,
places.date,
dangerous,
url,
image_id
FROM
(SELECT *
FROM places
GROUP BY place_id
HAVING count(*) < 10) places
LEFT JOIN images ON images.place_id = places.place_id
WHERE description SIMILAR TO Concat('%', '', '%')
AND placelocation =placelocation
AND category =category
AND price =price
AND dangerous =dangerous
AND accessibility =accessibility
OR title SIMILAR TO Concat('%', '', '%')
AND visible=TRUE
AND placelocation =placelocation
AND category =category
AND price =price
AND dangerous =dangerous
AND accessibility =accessibility
The end goal is to have for each item whether it is liked or not(present in the the favoritePlaces table with regards to the current user). favoritePlaces Tables structure:
I looked at this answer: https://stackoverflow.com/a/44773730/14131447 but I am not sure how to implement it.
Sample data within the favoritePlaces table:
I solved the issue by simply adding the case after the SELECT columns
SELECT places.place_id,
username,
title,
description,
visible,
score,
placelocation,
category,
price,
accessibility,
places.date,
dangerous,
url,
image_id, CASE
WHEN EXISTS (select *
from "favoritePlaces"
where "favoritePlaces".place_id = places.place_id AND user_id=128
)
THEN 'true'
ELSE 'false'
END
FROM
(SELECT *
FROM places
GROUP BY place_id
HAVING count(*) < 10) places
LEFT JOIN images ON images.place_id = places.place_id
WHERE description SIMILAR TO Concat('%', '', '%')
AND placelocation =placelocation
AND category =category
AND price =price
AND dangerous =dangerous
AND accessibility =accessibility
OR title SIMILAR TO Concat('%', '', '%')
AND visible=TRUE
AND placelocation =placelocation
AND category =category
AND price =price
AND dangerous =dangerous
AND accessibility =accessibility;

How to sort friends by last message time like whatsapp

I'm working on a chat app and I want to get a query that pulls out the list of friends and sorts them by last message time just the way whatsapp does its own.
Three tables in the database are important.
Table name: UsersPurpose: It stores the list of all registered users in the chat app.
Columns:- sn, matricno, fullname, password, faculty, department, level, year, study_centre, gender, email,phoneno and picture.
Table name: Friends
Purpose: It stores all the list of friends and friend requests.
Columns:- sn, user1, user2, date_initiated,status(1=request sent, 2=they are friends, 3= They are no longer friends), date_accepted, date_unfriend
Table name: Messages
Purpose:- It stores all the messages that have been sent between friends
Columns:- sn, sender, recipient, content, date, mread(to indicate if the recipient has read the message)
So far, this query pulls the list of friends just the way I want, what is left is to combine the messages table and sort it using the date column
SELECT *
FROM users
WHERE matricno IN (SELECT user2
FROM friends
WHERE user1 = 'NOU1213131415'
AND STATUS = '2'
UNION
SELECT user1
FROM friends
WHERE user2 = 'NOU1213131415'
AND STATUS = '2')
The picture below is an example of the chat list it pulls out
I don't know the SQL dialect you use and didn't tested it, but maybe you can do something like this:
SELECT
u.*,
(SELECT MAX(date) FROM messages m WHERE m.sender = u.matricno OR m.recipient = u.matricno) AS max_date
FROM users u
JOIN friends f ON (u.matricno = f.user1 OR u.matricno = f.user2) AND f.status = 2
WHERE u.matricno = 'NOU1213131415'

SQL side grouping of field before plugging into Crystal Reports

My company uses numbers as a system identifier to branches. The problem is my end users do not like seeing 000201 as a branch name. Therefore I am trying to convert these numbers to a string and then roll up the satellite locations into the main branch. The branch format is as follows:
BBBBSS so, as an example, the Nashville main branch will follow 000201 and all satellites would follow sequentially 000202, 000203, 000204.
I want all of our details to roll up into "Nashville". So any instance that ORGID is like 0002** it would roll up everything into a field named "Nashville".
Sorry if I'm not too clear. I've been banging my head against the wall so my thoughts are jumbled.
If I understand your question, there are at least Two ways i can think of accomplishing this ;
The 1st way is straight forward, you'd add a case statement for each Branch, if you have many branches i'd go with the 2nd way.
Select case SUBSTRING(cast(Branch.Id as varchar(10)), 1, 4 )
when '0002' then 'Nashville'
when '0003' then 'Branch 03'
when '0004' then 'Branch 04'
else SUBSTRING(cast(Branch.Id as varchar(10)), 1, 4 ) end OrgName,
COUNT(*)
from Branch group by SUBSTRING(cast(Branch.Id as varchar(10)), 1, 4 )
This 2nd way you would have a separate table to Hold Branch Name, etc. For demonstration i will call this table OrgTable with a OrgId containing your "0002" Ids, and a OrgName containing the "Nashville".
Select OrgTable.OrgName, count(*)
from Branch
inner join OrgTable on ( OrgTable.OrgId = SUBSTRING(cast(Branch.Id as varchar(10)), 1, 4 ))
Group By OrgTable.OrgName
I haven't checked that the SQL above is without syntax errors, but hopefully you get the picture.
HTH,
You can create a simple formula in CR to convert your ORGID to the branch name:
//Convert ORGID to string and save branch code
local stringvar branch := left(totext({Table.ORGID},0,''),4);
//Display city based on branch code
select branch
case "0002" : "Nashville"
case "0003" : "Other City"
<...>
default : "No matching branch!"

How to SELECT infinity deep loop data with SELF JOIN

Trying to project effective DataBase:
We look at advertisements, where sellers post their phone numbers, sellers can have more than one phone number, and sellers can post their phones in different ways...
simple image to clearly understand:
I want to find ALL advertisements, phoneS and target seller knowing just one phone [phone_1|phone_2|...|phone_4]!
Here is the table structure I want to use:
Where you can notice:
Advertisement_1010001 has THREE phones: ..0001, ..0002, ..0003.
Advertisement_1010003 has TWO phones: ..0003, ..0004.
Advertisement_1010004 has ONE phone: ..0004.
(all advs. foregoing were published with ONE seller, because ALL phones LINKED!)
Advertisement_1010005 has TWO phones, but it belongs it belongs to another seller.
All is good, but how can I "SELECT phone, adver_num WHERE my_phone IN (head & tail_phones)" using just ONE query?
I am sure, self join will help me, but the way I am using it now is not very appropriate:
SELECT a1.`advert_site_num`,
a1.`head_phone`
FROM `test_table` a1
LEFT JOIN `test_table` a2 ON a1.`tail_phones` = a2.`head_phone`
UNION
SELECT a3.`advert_site_num`,
a3.`tail_phones`
FROM `test_table` a3
LEFT JOIN `test_table` a4 ON a3.`tail_phones` = a4.`head_phone`
ORDER BY `advert_site_num`, `head_phone`;
returns all records linked by phone whithout filtering with searching phone
why don't you just filter with where? I'm not sure to understand your problem:
SELECT `advert_site_num`, GROUP_CONCAT(`phone` SEPARATOR ',') FROM (
SELECT `advert_site_num`,
`tail_phones` AS `phone`
FROM `test_table`
WHERE a1.`tail_phones` = myphone
UNION
SELECT `advert_site_num`,
`head_phone` AS `phone`
FROM `test_table`
WHERE a1.`head_phone` = myphone
)
GROUP BY `advert_site_num`;

Two oracle db sets of data. How do I select data that is in one set and not in the other and vice versa

I have these two subqueries that I got working
(
SELECT GLOBAL_USERS.ID AS USER_ID, GLOBAL_USERS.USER_ID AS USERNAME,
GLOBAL_USERS.DEPARTMENT AS DEPARTMENT,
GLOBAL_USERS.FIRST_NAME AS FIRSTNAME, GLOBAL_USERS.LAST_NAME AS LASTNAME,
GLOBAL_USERS.TITLE AS TITLE,
USER_FIRST_ENTITLEMENTS.ENTITLEMENT_NAME AS ENTITLEMENTNAME,
USER_FIRST_ENTITLEMENTS.APPLICATION_NAME AS APPLICATIONNAME
FROM GLOBAL_USERS
INNER JOIN USER_FIRST_ENTITLEMENTS
ON GLOBAL_USERS.ID=USER_FIRST_ENTITLEMENTS.USER_ID AND
(USER_FIRST_ENTITLEMENTS.APPLICATION_NAME='MY APPLICATION NAME'
AND USER_FIRST_ENTITLEMENTS.ENTITLEMENT_NAME LIKE '%\fis\%')
) join1
(
SELECT DISTINCT injoin1.ID,injoin2.APPLICATION_ROLE_ID, injoin2.NAME FROM
(
SELECT GLOBAL_USERS.ID,USER_SECOND_ENTITLEMENTS.APPLICATION_ROLE_ID
FROM GLOBAL_USERS
INNER JOIN USER_SECOND_ENTITLEMENTS
ON GLOBAL_USERS.ID=USER_SECOND_ENTITLEMENTS.USER_ID
) injoin1,
(
SELECT USER_SECOND_ENTITLEMENTS.APPLICATION_ROLE_ID,SECOND_ENTITLEMENT_DEFINITIONS.NAME
FROM USER_SECOND_ENTITLEMENTS
INNER JOIN SECOND_ENTITLEMENT_DEFINITIONS
ON USER_SECOND_ENTITLEMENTS.APPLICATION_ROLE_ID=SECOND_ENTITLEMENT_DEFINITIONS.ID
) injoin2
WHERE injoin1.APPLICATION_ROLE_ID=injoin2.APPLICATION_ROLE_ID
) join2
essentially, what I need to do now is see if every join1.ENTITLEMENTNAME for each join1.USER_ID exists for that same join2.ID in join2.NAME. If it doesn't exist then I need to have a row that says join1.USERNAME join1.DEPARTMENT, join1.FIRSTNAME, join1.LASTNAME, join1.TITLE, join1.ENTITLEMENTNAME, join1.APPLICATIONNAME
I, at the same time, need to see if every join2.NAME for each join2.ID exists for that same join1.USER_ID in join1.ENTITLEMENTNAME
I am a little lost on how to do this, but I am sure it is with some type of join.
Notes: comparing join1.ENTITLEMENTNAME's with join2.NAME's and the userids are in join1.USER_ID and join2.ID.
I'm not worried about efficiency or speed, I just need functionality, so a simple answer will suffice.
Extra brownie points if you help me with the regex. The join2.NAME's are stored as themselves, but the join1.ENTITLEMENTNAME's are stored as "/fis/" I'm not exactly sure how to filter on that so it would be helpful if anyone could explain.
Thanks in advance!
Have you looked into Oracle's MINUS command? It is like the inverse of a UNION. It subtracts similarities between two queries so that the result set is the difference between the two. It sounds like it's exactly what you need. Just take your first set and MINUS the second set.
I think the below code is what you described, but you might need to modify it to fit your needs exactly.
SELECT
GLOBAL_USERS.ID AS USER_ID,
GLOBAL_USERS.USER_ID AS USERNAME,
GLOBAL_USERS.DEPARTMENT AS DEPARTMENT,
GLOBAL_USERS.FIRST_NAME AS FIRSTNAME,
GLOBAL_USERS.LAST_NAME AS LASTNAME,
GLOBAL_USERS.TITLE AS TITLE,
USER_FIRST_ENTITLEMENTS.ENTITLEMENT_NAME AS ENTITLEMENTNAME,
USER_FIRST_ENTITLEMENTS.APPLICATION_NAME AS APPLICATIONNAME
FROM GLOBAL_USERS
INNER JOIN USER_FIRST_ENTITLEMENTS
ON GLOBAL_USERS.ID=USER_FIRST_ENTITLEMENTS.USER_ID
AND ( USER_FIRST_ENTITLEMENTS.APPLICATION_NAME='MY APPLICATION NAME'
AND USER_FIRST_ENTITLEMENTS.ENTITLEMENT_NAME LIKE '%\fis\%')
MINUS
SELECT
GLOBAL_USERS.ID AS USER_ID,
GLOBAL_USERS.USER_ID AS USERNAME,
GLOBAL_USERS.DEPARTMENT AS DEPARTMENT,
GLOBAL_USERS.FIRST_NAME AS FIRSTNAME,
GLOBAL_USERS.LAST_NAME AS LASTNAME,
GLOBAL_USERS.TITLE AS TITLE,
USER_SECOND_ENTITLEMENTS.ENTITLEMENT_NAME AS ENTITLEMENTNAME,
USER_SECOND_ENTITLEMENTS.APPLICATION_NAME AS APPLICATIONNAME
FROM GLOBAL_USERS
INNER JOIN USER_SECOND_ENTITLEMENTS
ON GLOBAL_USERS.ID=USER_SECOND_ENTITLEMENTS.USER_ID
AND ( USER_SECOND_ENTITLEMENTS.APPLICATION_NAME='MY APPLICATION NAME'
AND USER_SECOND_ENTITLEMENTS.ENTITLEMENT_NAME LIKE '%\fis\%')
As far as the regex goes, '%\/fis\/%' should do the trick. You needed to escape the '/'.