How do I search for a subset in a last name field? - sql-server-2000

I have a set of students at a high school. The counselors want to divide the students up by last name. Here is the break down:
Counselor 1: A to F
Counselor 2: G to Hr
Counselor 3: Hs to O
Counselor 4: P - Z
The first one is easy, I just do a:
where last_name like '[A-F]%'
but the second counselor is giving me grief, because if I do:
where last_name like '[G-Hr]%'
...I get all students with last names of G, H, and R. What is the best way to get what I want for counselor's 2 and 3?

For counselor 2, try:
WHERE last_name LIKE 'G%' OR last_name LIKE 'H[a-r]%'
For counselor 3, try:
WHERE last_name LIKE 'H[s-z]%' OR last_name LIKE '[I-O]%'

Related

What are the cases whereby EXCEPT and DISTINCT are different?

Looking into my notes for introduction to databases, I have stumbled upon a case that i do not understand (Between except and distinct).
It says so in my notes that:
The two queries below have the same results, but this will not be the case in general.
First query:
Select c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.country = 'Japan'
EXCEPT
Select c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.last_name LIKE 'D%';
Second query:
Select DISTINCT c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.country = 'Japan' AND NOT (c.last_name LIKE 'D%');
Could anyone provide me some insights as to what are cases whereby the results would differ?
Number 1 selects first, last & email from customers who are from Japan and whose last names do not start with D.
Number 2 selects first, last & email, where no two records have all 3 fields the same, where the customers are from Singapore and their last names do not begin with D.
I suppose I can imagine a table where these would yield the same results, but I don't think it would ever appear except in very contrived circumstances.
Joe Smith jsmith#abc.com Japan
Joe Smith jsmith#abc.com Singapore
Would be one of them. Both queries would yield Joe Smith jsmith#abc.com. Another case would be if no-one was from either country or everyone's last name started with D, then they would both yield nothing.
None of this is tested, and the EXCEPT statement is something I've read about but never had occasion to use.
The first is looking at Japan, the second at Singapore, so I don't see why these would generally -- or specifically -- return the same data.
Even if the countries were the same you have another issue with NULL values. So, if your data looks like this:
first_name last_name email country
xxx NULL a Japan
Your first query would return the row. The second would not.

SQL Newbie Stuck on Manager Flag Field Logic

This is my first month of being a Data Analyst, and I can't seem to find an answer that is specific enough to my problem here to help. I am having trouble getting this manager flag field to work and I think I'm getting confused by the joins.
The goal is to match the EMPLIDs of JOB_VW A to see if they exist in the Supervisor_ID column of Supervisor_VW K. Supervisor_VW K has ALL employees in the company (including supervisors) in the K.EMPLID column. Someone can be in both the supervisor ID column and EMPL column at the same time, but in different rows. The SUP ID is the EMPL ID of someone in a manager position.
For example:
Supervisor_VW K
EMPL ID EMPL NAME SUP ID SUP NAME
1 Smith, John 2 William, Mark
5 Jarvis, John 2 William, Mark
2 William, Mark 4 Rover, Spot
The results I am getting are as such
QUERY RESULTS
EMPLID EMPL NAME MANAGER FLAG
2 William, Mark Y
2 William, Mark Y
4 Rover, Spot Y
1 Smith, John N
5 Jarvis, John N
My current code is as follows:
SELECT CASE WHEN K.Supervisor_Id IS NULL THEN
'N'
ELSE
'Y'
END AS "ManagerFlag".....
FROM (SELECT K.*
FROM SUPERVISOR_VW K, JOB_VW A
WHERE K.SUPERVISOR_ID (+) = A.EMPLID
AND EXISTS (SELECT K1.EMPLID, K1.SUPERVISOR_ID
FROM SUPERVISOR_VW K1
WHERE K1.EMPLID IN K1.Supervisor_Id)
) K
So it seems that I am getting duplicate supervisor rows for every single employee that they supervise. If they supervise only one person, I get a singular row. If they supervise 20, I get 20 duplicate rows of that supervisor. HOWEVER, their employee that they supervise shows up in the table without issue and is properly labeled as N, no duplicates.
If anyone could help, please do! I appreciate you reading through my work, let me know if more info is required.
The usual way to achieve the desired result is like this:
select emplid, emplname,
case when emplid in (select supid from supervisor_vw) then 'Y'
else 'N' end as managerflag
from supervisor_vw;

Querying for swapped columns in SQL database

We've had a few cases of people entering in first names where last names should be and vice versa. So I'm trying to come up with a SQL search to match the swapped columns. For example, someone may have entered the record as first_name = Smith, last_name = John by accident. Later, another person may see that John Smith is not in the database and enter a new user as first_name = John, last_name = Smith, when in fact it is the same person.
I used this query to help narrow my search:
SELECT person_id, first_name, last_name
FROM people
WHERE first_name IN (
SELECT last_name FROM people
) AND last_name IN (
SELECT first_name FROM people
);
But if we have people named John Allen, Allen Smith, and Smith John, they would all be returned even though none of those are actually duplicates. In this case, it's actually good enough that I can see the duplicates in my particular data set, but I'm wondering if there's a more precise way to do this.
I would do a self join like this:
SELECT p1.person_id, p1.first_name, p1.last_name
FROM people p1
join people p2 on p1.first_name = p2.last_name and p1.last_name = p2.first_name
To also find typos on names I recommend this:
SELECT p1.person_id, p1.first_name, p1.last_name
FROM people p1
join people p2 on soundex(p1.first_name) = soundex(p2.last_name) and
soundex(p1.last_name) = soundex(p2.first_name)
soundex is a neat function that "hashes" words in a way that two words that sound the same get the same hash. This means Anne and Ann will have the same soundex. So if you had an Anne Smith and a Smith Ann the query above would find them as a match.
Interesting. This is a problem that I cover in Data Analysis Using SQL and Excel (note: I only very rarely mention books in my answers or comments).
The idea is to summarize the data to get a likelihood of a mismatch. So, look at the number of times a name appears as a first name and as a last name and then combine these. So:
with names as (
select first_name as name, 1.0 as isf, 0.0 as isl
from people
union all
select last_name, 0, 1
from people
),
nl as (
select name, sum(isf) as numf, sum(isl) as numl,
avg(isf) as p_f, avg(isl) as p_l
from names
group by name
)
select p.*
from people p join
nl nlf
on p.first_name = nlf.name join
nl nll
on p.last_name = nll.name
order by (coalesce(nlf.p_l, 0) + coalesce(nll.p_f, 0));
This orders the records by a measure of mismatch of the names -- the sum of the probabilities of the first name used by a last name and a last name used as a first name.

Trying to find duplication in records where address is different only in the one field and only by a certain number

I have a table of listings that has NAP fields and I wanted to find duplication within it - specifically where everything is the same except the house number (within 2 or 3 digits).
My table looks something like this:
Name Housenumber Streetname Streettype City State Zip
1 36 Smith St Norwalk CT 6851
2 38 Smith St Norwalk CT 6851
3 1 Kennedy Ave Campbell CA 95008
4 4 Kennedy Ave Campbell CA 95008
I was wondering how to set up a qry to find records like these.
I've tried a few things but can't figure out how to do it - any help would be appreciated.
Thanks
Are you looking to find something that shows the amount of these rows you have like this?
SELECT
StreenName,
City,
State,
Zip,
COUNT(*)
FROM YourTable
group by StreenName, City, State, Zip
HAVING COUNT(*) >1
Or maybe trying to find all of the rows that have the same street, city, state, and zip?
SELECT
A.HouseNumber,
A.StreetName,
A.City,
A.State,
A.Zip
FROM YourTable as A
INNER JOIN YourTable as B
ON A.StreetName = B.StreetName
AND A.City = B.City
AND A.State = B.State
AND A.Zip = B.Zip
AND A.HouseNumber <> B.HouseNumber
Here is one way to do it. You'll need a unique ID for the table to run this, as you wouldn't want to select the exact same person if theyre the only one there. This'll just spit out all the results where there is at least one duplicate.
Edit: Woops, just realized in comments it says varchar for the street number...hmm. So you could just run a cast on it. The OP never said anything about house numbers in varchar or being letters and numbers in the original post. As for letters in the street number field, I've been a third party shipping provider for 2 yrs in the past and I have never seen one; with the exception of an apt., which would be a diff field. Its just as likely that someone put varchar there for some other reason(leading 0's), or for no reason. Of oourse there could be, but no way of knowing whats in the field without response from OP. To run cast to int its the same except this for each instance: Cast(mt.HouseNumber as int)
select *
from MyTable mt
where exists (select 1
from MyTable mt2
where mt.name = mt2.name
and mt.street = mt2.street
and mt.state = mt2.state
and mt.city = mt2.city
and mt2.HouseNumber between (mt.HouseNumber -3) and (mt.HouseNumber +3)
and mt.UID != mt2.UID
)
order by mt.state, mt.city, mt.street
;
Not sure how to run the -3 +3 if there are letters involed...unless you know excatly where they are and you can just simply cut them out then cast.

Selecting values from one column based on a different columns value

Having trouble stating my problem succinctly here, so I'll just give an example.
Let's say I have a DB2 table about Students:
Name Class Grade
Billy J Econ A
Sarah S Maths B
Greg X Computes A-
Billy J Maths D
Greg X Maths C+
And I want to retrieve those students that are in both Econ and Maths, and display the information thusly:
Name Maths Grade Econ Grade
Billy J D A
How in the world can I accomplish this?
This solution will solve the problem for the two classes you named:
SELECT Name, Math.Grade AS MathsGrade, Econ.Grade AS EconGrade
FROM Students Math INNER JOIN Students Econ ON Math.Name = Econ.Name
WHERE Math.Class = 'Maths' AND Econ.Class = 'Econ'
The only thing that this solution doesn't do is include the spaces in your derived column names. You can do that by writing Maths Grade and Econ Grade in whatever characters DB2 uses for identifier quotes.
To be included students must have both a Maths and an Econ grade.
SELECT * from Students
where id in
(SELECT id from Students where Class = 'Econ')
AND id in
(SELECT id from Students where Class = 'Math');