distinct sql query - sql

I have a simple table with just name and email called name_email.
I am trying to fetch data out of it so that:
If two rows have the same name, but one has an email which is ending with ‘#yahoo.com’ and the other has a different email, then the one with the ‘#yahoo.com’ email should be discarded.
what would be best way to get this data out?

Okay, I'm not going to get involved in yet another fight with those who say I shouldn't advocate database schema changes (yes, you know who you are :-), but here's how I'd do it.
1/ If you absolutely cannot change the schema, I would solve it with code (either real honest-to-goodness procedural code outside the database or as a stored procedure in whatever language your DBMS permits).
This would check the database for a non-yahoo name and return it, if there. If not there, it would attempt to return the yahoo name. If neither are there, it would return an empty data set.
2/ If you can change the schema and you want an SQL query to do the work, here's how I'd do it. Create a separate column in your table called CLASS which is expected to be set to 0 for non-yahoo addresses and 1 for yahoo addresses.
Create insert/update triggers to examine each addition or change of a row, setting the CLASS based on the email address (what it ends in). This guarantees that CLASS will always be set correctly.
When you query your table, order it by name and class, and only select the first row. This will give you the email address in the following preference: non-yahoo, yahoo, empty dataset.
Something like:
select name, email
from tbl
where name = '[name]'
order by name, class
fetch first row only;
If your DBMS doesn't have an equivalent to the DB2 "fetch first row only" clause, you'll probably still need to write code to only process one record.
If you want to process all names but only the specific desired email for that name, a program such as this will suffice (my views on trying to use a relational algebra such as SQL in a procedural way are pretty brutal, so I won't inflict them on you here):
# Get entire table contents sorted in name/class order.
resultSet = execQuery "select name, email from tbl order by name, class"
# Ensure different on first row
lastName = resultSet.value["name"] + "X"
# Process every single row returned.
while not resultSet.endOfFile:
# Only process the first in each name group (lower classes are ignored).
if resultSet.value["name"] != lastName:
processRow resultSet.value["name"] resultSet.value["email"]
# Store the last name so we can detect next name group.
lastName = resultSet.value["name"]

select ne.*
from name_email ne
where ne.email not like '%#yahoo.com' escape '\' or
not exists(
select 1 from name_email
where name = ne.name and
email not like '%#yahoo.com' escape '\'
)

You could use something like the following to exclude invalid email addresses:
SELECT name, email
FROM name_email
WHERE email NOT LIKE '%#yahoo.com' // % symbol is a wildcard so joe#yahoo.com and guy#yahoo.com both match this query.
AND name = 'Joe Guy';
Or do it like this to include only the valid email address or domain:
SELECT name, email
FROM name_email
WHERE email LIKE '%#gmail.com'
AND name = 'Joe Guy';
This works well if you know ahead of time what specific names you are querying for and what email addresses or domains you want to exclude or include.
Or if you don't care which email address you return but only want to return one, you could use something like this:
SELECT DISTINCT (name, email)
FROM name_email;

You could do
SELECT TOP 1 email
FROM name_email
WHERE name = 'Joe Guy'
ORDER BY case when email like '%yahoo.com' then 1 else 0 end
So sort them by *#yahoo.com last and anything else first, and take the first one.
EDIT: sorry, misread the question - you want a list of each name, with only one email, and a preference for non-yahoo emails. Probably can use the above along with a group by, I'll have to rethink it.

Grabbing all the rows from the database, knowing not what the names are (and not needing to care about that really), but just want them to show, and if matching, skip a match if the email contains, in this case, #yahoo.com
SELECT DISTINCT name, email FROM name_email
WHERE email NOT LIKE '%#yahoo.com'
GROUP BY name;
Doing that will grab all the rows, but only one of a record if the names match with another row. But then, if there are two rows with matching names, junk the one with #yahoo.com in the email.

Not very pretty, but I believe it should work
select
ne.name
,ne.email
from
name_email ne
inner join (
select
name
,count(*) as emails_per_name
from
name_email
group by name
) nec
on ne.name = nec.name
where
nec.emails_per_name = 1
or (nec.emails_per_name > 1 and ne.email not like ('%#yahoo.com'))
That is assuming that the duplicate emails would be in yahoo.com domain - as specified in your question, and those would be excluded if there is more than one email per name

If you are working with SQL Server 2005 or Oracle, you can easily solve your problem with a ranking (analytical) function.
select a.name, a.name_email
from (select name, name_email,
row_number() over (partition by name
order by case
when name_email like '%#yahoo.com' then 1
when name_email like '%#gmail.com' then 1
when ... (other 'generic' email) then 1
else 0
end) as rn) as a
where a.rn = 1
By assigning different values to the various generic email names you can even have 'preferences'. As it is written here, if you have both a yahoo and a gmail address, you can't predict which one will be picked up.

You could use a UNION for this. Select everything without the yahoo.com and then just select the records that have yahoo.com and is not in the first list.
SELECT DISTINCT (name, name_email) FROM TABLE
WHERE name_email NOT '%yahoo.com'
UNION
SELECT DISTINCT (name, name_email) FROM TABLE
WHERE name NOT IN (SELECT DISTINCT (name, name_email) FROM TABLE
WHERE name_email NOT '%yahoo.com')

Related

SQL Query with records starts with and contain with

QUESTION: I need to develop a query that looks for certain records that starts with a search term and records that contains the search term. Furthermore, both subsets should be sorted by a certain column, whereas the resultant set shouldn't be sorted. Records starting with term should come above the contains term records.
Example: I have students table and I want all students whose names start with "Jhon". Students having the first name "Jhon" should come first and after then all those students whose last name is "Jhon".
What I have are as following:
Got all records starting with the search term and save it into a temptable_A , than got all records all records containing the search term and excluding results that are already in temptable_A and save into temptable_B. Now both temptable should have respected results, so I dump tempTable_B into tempTable_A, believing that the new records are append at last of the table. But they are not, they are inserted in and are sorted, where are I haven't applied sorting.
I have done the same with a merge statement and it does the same thing, but no fruitful result.
I have tried Union between both sub queries (Start with and contains) but the resultant dataset always doesn't show the start with records on top.
Scenario:
Students Table with column
Id | Student
select * FROM students where name like 'jhon%'
UNION
select * FROM students where name like '%jhon%'
Use an order by. For instance:
select s.*
from students
where name like '%jhon%'
order by charindex('jhon', name);
This orders by how far down 'jhon' is in the string. If you just want the ones that start 'jhon' first, you can use a case expression:
order by (case when name like 'jhon%' then 1 else 2 end)
You can do this with a case expression in the ORDER BY clause:
select * from students where Name like '%jhon%' -- select all students containing john
order by
case -- order by place of match first
when Name like 'john%' then 0
when Name like '%john%' then 1
end,
Name -- then order by Name

Oracle: Selecting Duplicates with where clause?

I have a question: Why can't I just use the following SQL query to get a list of unique eMail addresses from the PERSON table?
SELECT NOT DISTINCT Email FROM PERSON
I think the easiest and common way to achieve this is with grouping by the Email column and then keep the records having count = 1.
SELECT Email, COUNT(Email)
FROM PERSON
GROUP BY Email
HAVING COUNT(Email) > 1;
NOT DISTINCT is not working because it is not a valid expression.
DISTINCT is used to return only different values, so NOT before it is not working as you expect.
You asked to get a list of unique eMail addresses, which is done by:
SELECT DISTINCT(Email) FROM PERSON;
Standard SQL does not have NOT DISTINCT or anything like that.

How to have multiple NOT LIKE in SQL

I have two tables in a database. Contacts and Filter
In Contacts, I have ID, Name, and Email as the fields.
in Filter, I have ID and code
My objective is to be able to query the entire table and export a list that has been filtered by items in the Filter table. (basically the same that could be achieved with a grep -i -Ev ) ... Basically I want to filter out gmail or yahoo or others).
So if I do
select distinct email from contacts where email not like '%gmail%'
One level of the filter works.
but if I do,
select distinct email from contacts where email not like '%gmail%' or not like '%yahoo%'
then things start to fail.
Before I start to integrate the nested select code in filter, I cannot get the multiple where field not like X or field1 not like Y working.
Any input is greatly appreciated.
sample data
name email
bob bob#gmail.com
joey joey#cisco.com
desired output
joey#cisco.com
UPDATE: Thank you all for your help. Answer to phase I of the question was to change from OR to AND. :)
Phase II: Instead of having a query that is larger and larger,.. I would rather use a query determine the items to exclude (meaning if any of them match, then exclude them).. so I would then add yahoo gmail protonmail to records in the code field of the filters table.. with that would it be
select distinct email from contacts where email not like in (select code from filters)
This fails as it says that the select has multiple records
UPDATE:
SELECT DISTINCT email FROM Contacts WHERE email NOT LIKE (select filters.code from filters where filters.id=4)
works.. but is only pulling one record as the filter. not all of them as filters.
You just need to use AND instead of OR.
SELECT distinct email
FROM
contacts
WHERE
email not like '%gmail%'
AND email not like '%yahoo%'
You can benefit from CHARINDEX like below, I think this will increase the performance of your query. Also, you can use group by instead of distinct, it will also help the performance.
select email
from contacts
where charindex('gmail',email) < 1
and charindex('yahoo',email) < 1
group by email
Two issue
you need the column name for each condition so add email after the OR
select distinct email from contacts where email not like '%gmail%' or email not like '%yahoo%'
and could be you want check for bot in the same time so you need AND
select distinct email from contacts where email not like '%gmail%' AND email not like '%yahoo%'
As others have noted, the correct boolean connector with NOT LIKE is AND, not OR.
You might see the logic using NOT:
select distinct email
from contacts
where not (email like '%gmail%' or email like '%yahoo%');
If your select has two NOT LIKE criteria connected by an OR condition then everything will meet the criteria. In this situation "gmail" is not like "yahoo" and "yahoo" is not like "gmail" so even those two will pass the criteria. By converting the select to use the AND condition instead you capture those situations. The syntax requires you to provide the field name in both conditions. I feel that this code is easy to read and meets your needs.
SELECT distinct email
FROM contacts
WHERE email not like '%gmail%'
AND email not like '%yahoo%'

mysql query, two select

As soon as I apologize because I do not know or be able to explain exactly trouble.
How get value from table user_address.
how to pass user ID in the second "select".
select id, name, age,
(select address
from user_address
where user_id = ??user.id
ORDER BY address_name
LIMIT 1) AS address
from user
As an addendum to what already exists, you should probably not be relying on the specific order of rows in the database to give some sort of semantic meaning. If you have some better way of identifying which address you're after, you could use a join, such as:
select id, name, age, address
from user
inner join user_address
on user.id=user_address.user_id
where address_type='Home'
(adjust the where clause to whatever)
I assumed that you want to get something like the first address for a user
(each user may have a couple of addresses)
-there is another option that you want to find the first persone that lives in a given address (The solution below doesn't address this case)
SELECT u.id,u.name,u.age,a.ua as address
FROM
(
SELECT * FROM users
) u
INNER JOIN
(
SELECT userID, MIN(address) AS ua
FROM user_address
GROUP BY userID
) a
on u.id = a.userID
The syntax is for SQLServer - if you use MSAccess(you can use First and not min)
Hope it helps
Asaf

Return all Fields and Distinct Rows

Whats the best way to do this, when looking for distinct rows?
SELECT DISTINCT name, address
FROM table;
I still want to return all fields, ie address1, city etc but not include them in the DISTINCT row check.
Then you have to decide what to do when there are multiple rows with the same value for the column you want the distinct check to check against, but with different val;ues in the other columns. In this case how does the query processor know which of the multiple values in the other columns to output, if you don't care, then just write a group by on the distinct column, with Min(), or Max() on all the other ones..
EDIT: I agree with comments from others that as long as you have multiple dependant columns in the same table (e.g., Address1, Address2, City, State ) That this approach is going to give you mixed (and therefore inconsistent ) results. If each column attribute in the table is independant ( if addresses are all in an Address Table and only an AddressId is in this table) then it's not as significant an issue... cause at least all the columns from a join to the Address table will generate datea for the same address, but you are still getting a more or less random selection of one of the set of multiple addresses...
This will not mix and match your city, state, etc. and should give you the last one added even:
select b.*
from (
select max(id) id, Name, Address
from table a
group by Name, Address) as a
inner join table b
on a.id = b.id
When you have a mixed set of fields, some of which you want to be DISTINCT and others that you just want to appear, you require an aggregate query rather than DISTINCT. DISTINCT is only for returning single copies of identical fieldsets. Something like this might work:
SELECT name,
GROUP_CONCAT(DISTINCT address) AS addresses,
GROUP_CONCAT(DISTINCT city) AS cities
FROM the_table
GROUP BY name;
The above will get one row for each name. addresses contains a comma delimted string of all the addresses for that name once. cities does the sames for all the cities.
However, I don't see how the results of this query are going to be useful. It will be impossible to tell which address belongs to which city.
If, as is often the case, you are trying to create a query that will output rows in the format you require for presentation, you're much better off accepting multiple rows and then processing the query results in your application layer.
I don't think you can do this because it doesn't really make sense.
name | address | city | etc...
abc | 123 | def | ...
abc | 123 | hij | ...
if you were to include city, but not have it as part of the distinct clause, the value of city would be unpredictable unless you did something like Max(city).
You can do
SELECT DISTINCT Name, Address, Max (Address1), Max (City)
FROM table
Use #JBrooks answer below. He has a better answer.
Return all Fields and Distinct Rows
If you're using SQL Server 2005 or above you can use the RowNumber function. This will get you the row with the lowest ID for each name. If you want to 'group' by more columns, add them in the PARTITION BY section of the RowNumber.
SELECT id, Name, Address, ...
(select id, Name, Address, ...,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY id) AS RowNo
from table) sub
WHERE RowNo = 1