matching names with SSN in a columnar table - sql

thanks in advance for any advice that you might have.
This is my first time attempting to query from a columnar database, so I'm a bit uncertain as to how to write a query that gives me the results that I'm looking for.
The table ("census_data") that I'm querying from has the following types of values (41 rows total):
plan_id ssn_key field value
1 111111111 DOB 1732-02-22
1 111111111 DOR 1830-11-02
1 111111111 FNAME GEORGE
1 111111111 LNAME WASHINGTON
1 863283322 DOR 2020-03-22
As an FYI, in some cases, we might only have someone's SSN and DOB, but not their FNAME, LNAME, DOR (date of retirement), etc.
We're working with dummy data now and attempting to have queries in place for when we begin working with a large-scale data set.
We know that in some cases in the actual data set, there will be illogical data, such as a Date of Retirement ('DOR') that occurs in the future (assuming for our rules, a 'DOR' value must occur in the past in order for it to be valid).
We've written some queries that have given us the results that we're looking for, such as:
1) Give us the birthdays of all people with FNAME = 'GEORGE' and LNAME = 'WASHINGTON'
select [value] from [testdb3].[dbo].[census_data]
where ssn_key in (select ssn_key from census_data where field='LNAME'
and value='WASHINGTON' and ssn_key in
(select ssn_key from census_data where field='FNAME'
and [value]='GEORGE')) AND field='DOB'
2) Give us all SSNs of people with a Date of Retirement after today
select [plan_id], [ssn_key], [field], [value]
from [testdb3].[dbo].[census_data] as
cd where cd.field = 'DOR'and value > GETDATE()
As a reminder, the SSN values are in the 2nd column in our table, whereas the values for DOB, FNAME, DOR, LNAME, etc. are all in the 4th column of our table.
And here's where we're stumped. We're trying to write a query that gives us the first name of anyone with a date of retirement greater than today. We've spent a few hours trying to come up with something that works and have come up empty so far. If anyone has any thoughts on what the code would be, please let me know, I would greatly appreciate it. Thank you.

Related

What are the cases whereby EXCEPT and DISTINCT are different?

Looking into my notes for introduction to databases, I have stumbled upon a case that i do not understand (Between except and distinct).
It says so in my notes that:
The two queries below have the same results, but this will not be the case in general.
First query:
Select c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.country = 'Japan'
EXCEPT
Select c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.last_name LIKE 'D%';
Second query:
Select DISTINCT c.first_name,c.last_name,c.email
FROM customers as c
WHERE c.country = 'Japan' AND NOT (c.last_name LIKE 'D%');
Could anyone provide me some insights as to what are cases whereby the results would differ?
Number 1 selects first, last & email from customers who are from Japan and whose last names do not start with D.
Number 2 selects first, last & email, where no two records have all 3 fields the same, where the customers are from Singapore and their last names do not begin with D.
I suppose I can imagine a table where these would yield the same results, but I don't think it would ever appear except in very contrived circumstances.
Joe Smith jsmith#abc.com Japan
Joe Smith jsmith#abc.com Singapore
Would be one of them. Both queries would yield Joe Smith jsmith#abc.com. Another case would be if no-one was from either country or everyone's last name started with D, then they would both yield nothing.
None of this is tested, and the EXCEPT statement is something I've read about but never had occasion to use.
The first is looking at Japan, the second at Singapore, so I don't see why these would generally -- or specifically -- return the same data.
Even if the countries were the same you have another issue with NULL values. So, if your data looks like this:
first_name last_name email country
xxx NULL a Japan
Your first query would return the row. The second would not.

SQL - Selecting Records with an odd number of a given attribute

I'm just brushing up on some SQL - in other words, I'm really rusty - and am a bit stuck at the moment. It's probably something trivial, but we'll see.
I'd like to select all people that possess an odd number of a certain attribute that isn't an integer ( in this example, TransactionType). So, for example, take the following test/not real info where these people are buying a car or some similarly big purchase.
Name TransactionType Date
John Buy 5/1
John Cancel 5/1
John Buy 5/2
Joseph Buy 5/25
Joseph Cancel 5/25
Tanya Buy 5/28
I would like it to return the people who had an odd number of transactions; in other words, they ended up purchasing the item. So, in this case, John and Tanya would be selected and Joseph would not.
I know I can use the modulus operand here, but I'm a bit lost how to utilize it correctly.
I thought of using
count(TransactionType) % 2 != 0
in the where clause but that's obviously a no-go. Any pointers in the right direction would be very helpful. Let me know if this is unclear, and thanks!
You are close. You need a having clause instead of a where clause.
select Name
from table
group by Name
having count(TransactionType) % 2 != 0
Wouldn't you be better off getting the latest status by the transaction date and using that rather than relying on counting TransactionType to determine the latest status:
Something like this:
SELECT b.Name, b.TransactionType, b.[Date]
FROM (
SELECT Name, MAX(t1.[DATE]) latestDate
FROM [Transactions] t1
GROUP BY t1.Name
) a
INNER JOIN [Transactions] b ON b.Name = a.Name AND a.latestDate = b.[Date]
WHERE b.TransactionType = 'Buy'
Assuming your dates are valid dates with times included, this should work.
Sample SQL Fiddle
If you only store the date portion the max date would be the same for people that Buy and Cancel on the same date, therefore it would return more data and some incorrect records.

Trying to find duplication in records where address is different only in the one field and only by a certain number

I have a table of listings that has NAP fields and I wanted to find duplication within it - specifically where everything is the same except the house number (within 2 or 3 digits).
My table looks something like this:
Name Housenumber Streetname Streettype City State Zip
1 36 Smith St Norwalk CT 6851
2 38 Smith St Norwalk CT 6851
3 1 Kennedy Ave Campbell CA 95008
4 4 Kennedy Ave Campbell CA 95008
I was wondering how to set up a qry to find records like these.
I've tried a few things but can't figure out how to do it - any help would be appreciated.
Thanks
Are you looking to find something that shows the amount of these rows you have like this?
SELECT
StreenName,
City,
State,
Zip,
COUNT(*)
FROM YourTable
group by StreenName, City, State, Zip
HAVING COUNT(*) >1
Or maybe trying to find all of the rows that have the same street, city, state, and zip?
SELECT
A.HouseNumber,
A.StreetName,
A.City,
A.State,
A.Zip
FROM YourTable as A
INNER JOIN YourTable as B
ON A.StreetName = B.StreetName
AND A.City = B.City
AND A.State = B.State
AND A.Zip = B.Zip
AND A.HouseNumber <> B.HouseNumber
Here is one way to do it. You'll need a unique ID for the table to run this, as you wouldn't want to select the exact same person if theyre the only one there. This'll just spit out all the results where there is at least one duplicate.
Edit: Woops, just realized in comments it says varchar for the street number...hmm. So you could just run a cast on it. The OP never said anything about house numbers in varchar or being letters and numbers in the original post. As for letters in the street number field, I've been a third party shipping provider for 2 yrs in the past and I have never seen one; with the exception of an apt., which would be a diff field. Its just as likely that someone put varchar there for some other reason(leading 0's), or for no reason. Of oourse there could be, but no way of knowing whats in the field without response from OP. To run cast to int its the same except this for each instance: Cast(mt.HouseNumber as int)
select *
from MyTable mt
where exists (select 1
from MyTable mt2
where mt.name = mt2.name
and mt.street = mt2.street
and mt.state = mt2.state
and mt.city = mt2.city
and mt2.HouseNumber between (mt.HouseNumber -3) and (mt.HouseNumber +3)
and mt.UID != mt2.UID
)
order by mt.state, mt.city, mt.street
;
Not sure how to run the -3 +3 if there are letters involed...unless you know excatly where they are and you can just simply cut them out then cast.

Logically merging 4 columns of the same information

I'm querying 3 different databases (4 total fields) for their "username" field given a particular machine name in our environment: SCCM, McAfee EPO, and ActiveDirectory.
The four columns are SCCM_TOP, SCCM_LAST, EPO, AD
Some of the tuples I get look like:
JOE, JOE, ADMINISTRATOR, JOE
or
JOE, SARAH, JOE, JOE
or
NULL, NULL, JOE, JOE
or
NULL, NULL, JOE, SARAH
The last example of which is the most difficult to code against.
I'm writing a CASE statement to help merge the information in an additive way to give one
final column of the "best guess". At the moment, I'm weighing the most valid username based on another column, which is "age of the record" from each database.
CASE
WHEN ePO_Age <= CT_AGE AND NOT ePO_UN IS NULL THEN ePO_UN
WHEN NOT (SCCM_AGE) IS NULL AND NOT (SCCM_LAST_UN) IS NULL THEN SCCM_LAST_UN
WHEN NOT (SCCM_AGE) IS NULL AND NOT (SCCM_TOP_UN) IS NULL THEN SCCM_TOP_UN
WHEN NOT (AD_UN) IS NULL THEN AD_UN
ELSE NULL
END AS BestName,
But there has to be a better way to combine these records into one. My next step is to weigh the "average age" and then pick the username from there, discarding "Administrator".
Any thoughts or tricks?
You could benefit a little from the COALESCE function to get the first NON-NULL value and do something like:
COALESCE(CASE WHEN ePO_Age<=CT_AGE THEN ePO_UN END,
CASE WHEN SCCM_AGE IS NOT NULL THEN COALESCE(SCCM_LAST_UN, SCCM_TOP_UN) END,
AD_UN) AS BestName
If you just want to get the most recent UserName that isn't null, try using UNION to combine the results from each table.
SELECT TOP 1 qry.UserName
FROM(
SELECT UserName, CreateDate
FROM UserNames_1
UNION ALL
SELECT UserName, CreateDate
FROM UserNames_2
UNION ALL
SELECT UserName, CreateDate
FROM UserNames_3
) AS qry
WHERE qry.UserName IS NOT NULL
ORDER BY qry.CreateDate DESC
Have a SQL Fiddle

SQL Server - copy data across tables , but copy the data only when it match with a specific column name

For example I got this 2 table
dbo.fc_states
StateId Name
6316 Alberta
6317 British Columbia
and dbo.fc_Query
Name StatesName StateId
Abbotsford Quebec NULL
Abee Alberta NULL
100 Mile House British Columbia NULL
Ok pretty straightforward , how do I copy the stateId over from fc_states to fc_Query, but match it with the StatesName, let say the result would be
Name StatesName StateId
Abee Alberta 6316
100 Mile House British Columbia 6317
Thanks, and both stateName column type is text
How about:
update fc_Query set StateId =
(select StateId from fc_states where fc_states.Name = fc_Query.StatesName)
That should give you the result you're looking for.
This is a different way than what Eddie did, I like MERGE for updates if they're not dead simple (like I wouldn't consider yours dead simple). So if you're bored/curious also try
WITH stateIds as
(SELECT name, MAX(stateID) as stID
FROM fc_states
GROUP BY name)
MERGE fc_Query
on stateids.name = fc_query.statesname
WHEN MATCHED THEN UPDATE
SET fc_query.stateid = convert(int, stid)
;
The first part, from "WITH" to the GROUP BY NAME), is a CTE, that creates a table-like thing - a name 'stateIds' that is good as a table for the immediately following part of the query - where there's guaranteed to be only one row per state name. Then the MERGE looks for anything in the fc_query with a matching name. And if there's a match, it sets it as you want. YOu can make a small edit if you don't want to overwrite existing stateids in fc_query:
WITH stateIds as
(SELECT name, MAX(stateID) as stID
FROM fc_states
GROUP BY name)
MERGE fc_Query
ON stateids.name = fc_query.statesname
AND fc_query.statid IS NOT NULL
WHEN MATCHED THEN UPDATE
SET fc_query.stateid = convert(int, stid)
;
And you can have it do something different to rows that don't match. So I think MERGE is good for a lot of applications. You need a semicolon at the end of MERGE statements, and you have to guarantee that there will only be one match or zero matches in the source (that is "stateids", my CTE) for each row in the target; if there's more than one match some horrible thing happens, Satan wins or the US economy falters, I'm not sure what, just never let it happen.