Find the distinct with different string values from SQL select - sql

I want to remove the duplicate rows Using Select Query how can I get rid of the duplicate rows.
The following out put produces redundant records, How to get the distinct results?
SELECT E.EMAIL_ID, T.FIRST_NAME, T.LAST_NAME, CY.COUNTRY_ID
FROM PLAYER P
INNER JOIN PLAYERTYPE T ON P.PLAYER_ID = T.PLAYER_ID
INNER JOIN PLAYER_CONTACT C ON T.PLAYER_ID = C.PLAYER_ID
INNER JOIN CONTACT_EMAIL E ON E.CONTACT_ID = C.CONTACT_ID
INNER JOIN COUNTRY_TABLE CY ON P.COUNTRY_ID = CY.COUNTRY_ID
WHERE CY.COUNTRY_CODE='AUS'
AND T.PLAYER_TYPE IN ('NEW', 'EXE')
Current Output:
FIRST_NAME
LAST_NAME
EMAIL_ID
COUNTRY_ID
MARK
CLARKSON
dfgh#gmail.com
04
MARK
CLARKSON
DFGH#GMAIL.com
04
CATH
SPEARS
tygh#yahoo.com
04
FESS
LOPEZ
fgvhb#yandex.com
04
FEXS
LOPEZ
byvg#google.com
04
FEXS
LOPEZ
BYVG#GOOGLE.COM
04
EOVA
SMITH
ghjkjh#sdf.com
04
Expected Output:
FIRST_NAME
LAST_NAME
EMAIL_ID
COUNTRY_ID
MARK
CLARKSON
dfgh#gmail.com
04
CATH
SPEARS
tygh#yahoo.com
04
FESS
LOPEZ
fgvhb#yandex.com
04
FEXS
LOPEZ
BYVG#GOOGLE.COM
04
EOVA
SMITH
ghjkjh#sdf.com
04
Tried
SELECT DISTINCT E.EMAIL_ID, T.FIRST_NAME, T.LAST_NAME, CY.COUNTRY_ID
FROM PLAYER P
INNER JOIN PLAYERTYPE T ON P.PLAYER_ID = T.PLAYER_ID
INNER JOIN PLAYER_CONTACT C ON T.PLAYER_ID = C.PLAYER_ID
INNER JOIN CONTACT_EMAIL E ON E.CONTACT_ID = C.CONTACT_ID
INNER JOIN COUNTRY_TABLE CY ON P.COUNTRY_ID = CY.COUNTRY_ID
WHERE CY.COUNTRY_CODE='AUS'
AND T.PLAYER_TYPE IN ('NEW', 'EXE')
SELECT T.FIRST_NAME, T.LAST_NAME, E.EMAIL_ID, CY.COUNTRY_ID
FROM PLAYER P
INNER JOIN PLAYERTYPE T ON P.PLAYER_ID = T.PLAYER_ID
INNER JOIN PLAYER_CONTACT C ON T.PLAYER_ID = C.PLAYER_ID
INNER JOIN CONTACT_EMAIL E ON E.CONTACT_ID = C.CONTACT_ID
INNER JOIN COUNTRY_TABLE CY ON P.COUNTRY_ID = CY.COUNTRY_ID
WHERE CY.COUNTRY_CODE='AUS'
AND T.PLAYER_TYPE IN ('NEW', 'EXE')
GROUP BY T.FIRST_NAME, T.LAST_NAME, E.EMAIL_ID, CY.COUNTRY_ID
Here is the fiddle.

Try using DISTINCT + LOWER:
SELECT DISTINCT T.FIRST_NAME,
T.LAST_NAME,
LOWER(E.EMAIL_ID) AS EMAIL_ID,
CY.COUNTRY_ID
FROM PLAYER P
INNER JOIN PLAYERTYPE T ON P.PLAYER_ID = T.PLAYER_ID
INNER JOIN PLAYER_CONTACT C ON T.PLAYER_ID = C.PLAYER_ID
INNER JOIN CONTACT_EMAIL E ON E.CONTACT_ID = C.CONTACT_ID
INNER JOIN COUNTRY_TABLE CY ON P.COUNTRY_ID = CY.COUNTRY_ID
WHERE CY.COUNTRY_CODE='AUS' AND T.PLAYER_TYPE IN ('NEW', 'EXE')
Output:
FIRST_NAME
LAST_NAME
EMAIL_ID
COUNTRY_ID
MARK
CLARKSON
dfgh#gmail.com
04
CATH
SPEARS
tygh#yahoo.com
04
FESS
LOPEZ
fgvhb#yandex.com
04
FEXS
LOPEZ
byvg#google.com
04
EOVA
SMITH
ghjkjh#sdf.com
04
Check the demo here.

The repeated errors seem to have different email addresses, or the same email address with different cases. Try removing the email column or simply applying a Lowercase() to the email (or whatever the equivalent is for it in your db engine)

Using DISTINCT will only mask the problem. It might give you results you like, but it will hide a gremlin that will cause a problem down the road, particularly for performance if you end up dealing in massive data volumes.
I'm guessing that PLAYER_CONTACT has multiple rows per PLAYER_ID. If so, joining a single row from PLAYER to PLAYER_CONTACT will result in multiple output rows. I'm also guessing that CONTACT_EMAIL can contain multiple rows per contact. if so, joining to that will multiply your output by the # of rows per contact. So if you have 5 contacts per player, and 5 emails per contact, joining both of these one-to-many tables will result in 25 output rows. That is what causes all the other columns to show duplicate values. Now imagine that it's 1:10000 and 1:10000... a single player would return 100,000,000 rows. You can then hide that with DISTINCT but you'll use a lot of temp space and time writing/reading to temp to do that sort, and only hide the fact that there's fundamental probablem in your query granularity.
So it's a matter of changing your query approach. Do you really want all possible contacts and all possible emails for those contacts? Or do you want one row per player? If one row per player, you will either have to decide which contact and which email you want to return, or you will need to return some kind of LISTAGG string to show them all in a single row.

Related

case expression inside this left join?

so I have this left join
LEFT JOIN LATERAL (SELECT d.country FROM db.patient_info d
WHERE d.id IN (SELECT DISTINCT st.category FROM db.surgery_types st, db.surgery_record sr
WHERE sr.id = st.surgery_record_id AND sr.surgery_type_id = m.id)
ORDER BY d.priority, d.country
LIMIT 1
) c ON TRUE
the issue is that sometimes d.country comes back null. How can I add a case statement in the left join so that when d.country IS NULL then 'USA'?
My results look like this
Patient Name
Surgery Type
Dave
USA
Richard
EU
Ben
EU
Sally
JP
Bob
null
Dicky
null
I want to modify the left join so that it looks more like this
Patient Name
Surgery Type
Dave
USA
Richard
EU
Ben
EU
Sally
JP
Bob
USA
Dicky
USA
Thoughts?
Use coalesce which returns the first non-null value.
-- I have no idea if this lateral join is valid.
LEFT JOIN LATERAL (
SELECT coalesce(d.country, 'USA')
FROM db.patient_info d
WHERE d.id IN (
SELECT DISTINCT st.category
FROM db.surgery_types st, db.surgery_record sr
WHERE sr.id = st.surgery_record_id AND sr.surgery_type_id = m.id
)
ORDER BY d.priority, d.country
LIMIT 1
) c ON TRUE
Though the order by will still use null so it might not sort properly. You might want to split this into a CTE.
-- Again, no idea if the lateral join is valid,
-- just showing a technique.
with countries as(
SELECT coalesce(d.country, 'USA') as country
FROM db.patient_info d
WHERE d.id IN (
SELECT DISTINCT st.category
FROM db.surgery_types st
JOIN db.surgery_record sr ON sr.id = st.surgery_record_id
-- Don't know what m is
WHERE sr.surgery_type_id = m.id
)
)
with first_country as (
select country
from countries
order by priority, country
limit 1
)
select
...
LEFT JOIN LATERAL countries on true
Finally, it might be simpler and faster to update the table to set all null countries to USA, and then make the column not null.
Not looking into your business logic and whether a lateral join is needed at all or a scalar subquery in the select list of expressions would be enough, here is my suggestion.
CROSS JOIN LATERAL
(
select coalesce
(
( /* your lateral subquery in the brackets here */),
'USA'
) as country
) as c
You do not need left join anymore. Please note that this will only work if the subquery is scalar.

How to Include Zero in a COUNT() Aggregate?

I have three tables, and I join them and use where Group by - count, I could not get the countries with zero results in the output. I am still lost.
Here is the SQLfiddle
http://sqlfiddle.com/#!4/e330ec/7
CURRENT OUTPUT
(UKD) 3
(EUR) 2
(USA) 2
(CHE) 1
EXPECTED OUTPUT
(UKD) 3
(EUR) 2
(IND) 0
(LAO) 0
(USA) 2
(CHE) 1
You can use a RIGHT JOIN as suggested in another answer or you can reorder your joins and use a LEFT JOIN:
SELECT
C.COUNTRY_CODE,
COUNT(GAME_TYPE)
FROM
COUNTRY_TABLE C
LEFT JOIN PLAYER_TABLE P ON P.COUNTRY_ID = C.COUNTRY_ID
LEFT JOIN PLAYER_GAME_TYPE G ON P.PLAYER_ID = G.PLAYER_ID
WHERE
G.GAME_TYPE = 'GOLF'
OR G.GAME_TYPE IS NULL
GROUP BY
C.COUNTRY_CODE;
Note the inclusion of OR G.GAME_TYPE IS NULL in the WHERE clause -- if you only have G.GAME_TYPE = 'GOLF', then desired results will be filtered out after the joins.
You can prefer applying the following steps as an option :
convert second LEFT JOIN to RIGHT JOIN(since desired missing abbreviations are in COUNTRY_TABLE which stays at right)
make the filtering condition(followed by the WHERE clause) G.GAME_TYPE = 'GOLF' a match condition
by taking next to the ON clause
such as
SELECT C.COUNTRY_CODE, COUNT(GAME_TYPE)
FROM PLAYER_TABLE P
LEFT JOIN PLAYER_GAME_TYPE G
ON P.PLAYER_ID = G.PLAYER_ID
RIGHT JOIN COUNTRY_TABLE C
ON P.COUNTRY_ID = C.COUNTRY_ID
AND G.GAME_TYPE = 'GOLF'
GROUP BY C.COUNTRY_CODE;
Demo
The simple change of tables join order can solve the problem
SELECT C.COUNTRY_CODE, COUNT(GAME_TYPE)
FROM COUNTRY_TABLE C -- get all countries
LEFT JOIN PLAYER_TABLE P ON P.COUNTRY_ID = C.COUNTRY_ID -- join all players
LEFT JOIN PLAYER_GAME_TYPE G ON P.PLAYER_ID = G.PLAYER_ID AND G.GAME_TYPE = 'GOLF' -- join only GOLF games
GROUP BY C.COUNTRY_CODE;
sqlize online
This is your query:
SELECT C.COUNTRY_CODE, COUNT(GAME_TYPE)
FROM PLAYER_TABLE P
LEFT JOIN PLAYER_GAME_TYPE G ON P.PLAYER_ID = G.PLAYER_ID
LEFT JOIN COUNTRY_TABLE C ON P.COUNTRY_ID = C.COUNTRY_ID
WHERE G.GAME_TYPE = 'GOLF'
GROUP BY C.COUNTRY_CODE;
This query seems to try to select all players even when they are no golfers. This doesn't work, however, as WHERE G.GAME_TYPE = 'GOLF' removes all outer joined rows, so you end up with an inner join (all players who play golf.) At last you outer join the countries table, which would give you players that don't belong to a country. Is this indented? I don't think so.
What you want is countries, so select from countries. Then properly outer join players and types in order to count them.
SELECT c.country_code, COUNT(g.game_type) as golfers_in_country
FROM country_table c
LEFT JOIN player_table p ON p.country_id = c.country_id
LEFT JOIN player_game_type g ON g.player_id = p.player_id AND g.game_type = 'GOLF'
GROUP BY c.country_code
ORDER BY c.country_code;
You can use a CTE to get this more readable. It is longer, but makes the intention crystal-clear. Structuring one's queries like this helps avoiding mistakes.
WITH golfers AS
(
SELECT *
FROM player_table
WHERE player_id IN
(
SELECT player_id
FROM player_game_type
WHERE game_type = 'GOLF'
)
)
SELECT c.country_code, COUNT(g.player_id) as golfers_in_country
FROM country_table c
LEFT JOIN golfers g ON g.country_id = c.country_id
GROUP BY c.country_code
ORDER BY c.country_code;

self join after an inner join

I am finding what cities have the same name in different states. The city name and state name are in seperate tables (cities and states) and can be inner joined over a seperate common column.
select c1.city, c1.state, c2.city, c2.state
from cities
inner join states on cities.commonid = states.commonid
After inner joining i need to self join to perform a function as follows
select c1.city, c1.state, c2.city, c2.state
from *joined table* c1 join
*joined table* c2
on c1.city = c2.city and c1.state <> c2.state
i am wondering how i can self join a table that is the result of another join in the same query
output will be like this
+----------+-------+--------+--------+
| city1 | state1|city2 |state2 |
+----------+-------+--------+--------+
| x | melb | x | syd |
| y | bris | y | ACT |
+----------+-------+--------+--------+
I assume that the table cities has a column like state_id that references a column state_id in the table states (change the names to the actual names of the columns).
First do a self join for cities with the conditions:
c1.city = c2.city AND c1.state_id < c2.state_id
The < operator makes sure that each pair of cities wil be returned only once.
Then join 2 copies of states, because each of them will be used to get the name of the state for each of the 2 cities:
SELECT c1.city city1, s1.state state1,
c2.city city2, s2.state state2
FROM cities c1
INNER JOIN cities c2 ON c1.city = c2.city AND c1.state_id < c2.state_id
INNER JOIN states s1 ON s1.state_id = c1.state_id
INNER JOIN states s2 ON s2.state_id = c2.state_id
ORDER BY city1
You can do select of your inner join query and give it alias.
Then it will become something like below...
Select c.city,c.state from
(Select City,state from cities inner join states where cities.id = states.id) as c
Now make a self join for c.
I would perform a pre-query of all cities that have more than one state. Then join to the states table to see which states they are encountered.
select
Duplicates.city,
s.state
from
( select c1.city
from cities c1
group by c1.city
having count(*) > 1 ) Duplicates
JOIN cities c2
on Duplicates.city = c2.city
JOIN states s
on c2.commonid = s.commonid
order by
Duplicates.city,
s.state
By NOT trying to do cross-tab of just two city/states, you would get a single list. If one city name exists in 5 states, how would you plan on showing that. This way you would see all alphabetized.
I would suggest a CTE:
with cs as (
select c.name as city_name, s.name as state_name
from cities c join
states s
on c.commonid = s.commonid
)
select cs1.*, cs2.*
from cs cs1 join
cs cs2
on cs1.name = cs2.name and cs1.state <> cs2.state;

Nested 'Where'?

I have a table named Actor, with only a column for City (CityId). I want to return the number of actors in a particular State (StateId). The catch however is that I have separate tables for City, County, and finally State (City has CountyId, County has StateId). How do I this in a T-SQL query?
I have a solution that involves nested Select statements, something like:
SELECT COUNT(1)
FROM Actor a
WHERE a.CityId IN
(SELECT CityId FROM City WHERE CountyId IN...)
...but is there a more efficient way to do this? Thanks
You can use this query to get your output
----------------------------------------------------------
SELECT COUNT(ActorId)
FROM Actor a
INNER JOIN City c ON a.cityId = c.cityId
INNER JOIN Country con ON c.countryId = con.countryId
INNER JOIN STATE s ON con.stateId = s.stateId
GROUP BY s.stateId
Use JOINS to query your data.
I am using INNER JOIN here.
Assuming that you have CountryId in your City Table, You can do it following way.
In case you don't have countryId in your City Table you have to apply one more INNER JOIN on State Table.
SELECT COUNT(1) FROM Actor a INNER JOIN
City b ON a.CityId = b.CityId
WHERE b.CountryId IN (...)
You can easily put the JOINS across different table that you have and then use the Group By clause to find out the total number of actors from specific state.
I have used the column name on the basis of my wild guess, you can change them with the original name that you have in your database.
SELECT StateId,
Count(ActorId) AS Total
FROM ACTOR
INNER JOIN City ON Actor.CityId = City.CityId
INNER JOIN County ON County.CountyId = City.CountyId
INNER JOIN State ON State.StateId = County.StateId
GROUP BY State.StateId
Assuming the relation names, you can do something like this with joins:
select s.ID, s.Name, count(*)
from Actors a
inner join Cities c on c.ID = a.CityID
inner join County cn on cn.ID = c.CountyID
inner join State s on s.ID = cn.StateID
group by s.ID, s.Name
If you only need the StateId you don't even need to join with states, this will do:
select cn.StateID, count(*)
from Actors a
inner join Cities c on c.ID = a.CityID
inner join County cn on cn.ID = c.CountyID
group by cn.StateID

How to get the values from 2 tables together in SQL

I have 2 tables
1. emp_mst
empcode empname
001 abc
002 def
2. leavetotal
empcode leave
001 10
001 5
001 2
002 12
002 8
Now i am trying to get the empcode and empname from the emp_mst
and the total for leave days from leavetotal.I have no idea how to get it.
thanks in advance.
In the FROM clause you can specify multiple tables, this will result in a cartesian product of the two tables. So each row in the a table will be joined with every other row in all other tables. This is of course not what you want, you only want rows with the same empcode to be joined. So that needs to be specified in the WHERE CLAUSE.
SELECT
MST.EMPCODE,
MST.EMPNAME,
SUM(LTO.LEAVE)
FROM
EMP_MST MST,
LEAVETOTAL LTO
WHERE
MST.EMPCODE = LTO.EMPCODE
GROUP BY
EMPCODE,
EMPNAME
JOIN / GROUP BY solution:
select e.empcode, e.empname, SUM(l.leave)
from emp_mst e
left join leavetotal l on e.empcode = l.empcode
group by e.empcode, e.empname
LEFT JOIN to list even those without any leave. (Do just JOIN if not needed.)
Correlated sub-select solution:
select e.empcode, e.empname,
(select SUM(l.leave) from leavetotal l
where e.empcode = l.empcode)
from emp_mst e
You are looking for a join, and then a group by:
SELECT em.empcode, empname, SUM(leave)
FROM emp_mst em
JOIN leavetotal l ON em.empcode = l.empcode
GROUP BY em.empcode, empname