Logically merging 4 columns of the same information

Logically merging 4 columns of the same information - sql

I'm querying 3 different databases (4 total fields) for their "username" field given a particular machine name in our environment: SCCM, McAfee EPO, and ActiveDirectory.
The four columns are SCCM_TOP, SCCM_LAST, EPO, AD
Some of the tuples I get look like:
JOE, JOE, ADMINISTRATOR, JOE
or
JOE, SARAH, JOE, JOE
or
NULL, NULL, JOE, JOE
or
NULL, NULL, JOE, SARAH
The last example of which is the most difficult to code against.
I'm writing a CASE statement to help merge the information in an additive way to give one
final column of the "best guess". At the moment, I'm weighing the most valid username based on another column, which is "age of the record" from each database.
CASE
WHEN ePO_Age <= CT_AGE AND NOT ePO_UN IS NULL THEN ePO_UN
WHEN NOT (SCCM_AGE) IS NULL AND NOT (SCCM_LAST_UN) IS NULL THEN SCCM_LAST_UN
WHEN NOT (SCCM_AGE) IS NULL AND NOT (SCCM_TOP_UN) IS NULL THEN SCCM_TOP_UN
WHEN NOT (AD_UN) IS NULL THEN AD_UN
ELSE NULL
END AS BestName,
But there has to be a better way to combine these records into one. My next step is to weigh the "average age" and then pick the username from there, discarding "Administrator".
Any thoughts or tricks?

You could benefit a little from the COALESCE function to get the first NON-NULL value and do something like:
COALESCE(CASE WHEN ePO_Age<=CT_AGE THEN ePO_UN END,
CASE WHEN SCCM_AGE IS NOT NULL THEN COALESCE(SCCM_LAST_UN, SCCM_TOP_UN) END,
AD_UN) AS BestName

If you just want to get the most recent UserName that isn't null, try using UNION to combine the results from each table.
SELECT TOP 1 qry.UserName
FROM(
SELECT UserName, CreateDate
FROM UserNames_1
UNION ALL
SELECT UserName, CreateDate
FROM UserNames_2
UNION ALL
SELECT UserName, CreateDate
FROM UserNames_3
) AS qry
WHERE qry.UserName IS NOT NULL
ORDER BY qry.CreateDate DESC
Have a SQL Fiddle

Related

matching names with SSN in a columnar table

thanks in advance for any advice that you might have.
This is my first time attempting to query from a columnar database, so I'm a bit uncertain as to how to write a query that gives me the results that I'm looking for.
The table ("census_data") that I'm querying from has the following types of values (41 rows total):
plan_id ssn_key field value
1 111111111 DOB 1732-02-22
1 111111111 DOR 1830-11-02
1 111111111 FNAME GEORGE
1 111111111 LNAME WASHINGTON
1 863283322 DOR 2020-03-22
As an FYI, in some cases, we might only have someone's SSN and DOB, but not their FNAME, LNAME, DOR (date of retirement), etc.
We're working with dummy data now and attempting to have queries in place for when we begin working with a large-scale data set.
We know that in some cases in the actual data set, there will be illogical data, such as a Date of Retirement ('DOR') that occurs in the future (assuming for our rules, a 'DOR' value must occur in the past in order for it to be valid).
We've written some queries that have given us the results that we're looking for, such as:
1) Give us the birthdays of all people with FNAME = 'GEORGE' and LNAME = 'WASHINGTON'
select [value] from [testdb3].[dbo].[census_data]
where ssn_key in (select ssn_key from census_data where field='LNAME'
and value='WASHINGTON' and ssn_key in
(select ssn_key from census_data where field='FNAME'
and [value]='GEORGE')) AND field='DOB'
2) Give us all SSNs of people with a Date of Retirement after today
select [plan_id], [ssn_key], [field], [value]
from [testdb3].[dbo].[census_data] as
cd where cd.field = 'DOR'and value > GETDATE()
As a reminder, the SSN values are in the 2nd column in our table, whereas the values for DOB, FNAME, DOR, LNAME, etc. are all in the 4th column of our table.
And here's where we're stumped. We're trying to write a query that gives us the first name of anyone with a date of retirement greater than today. We've spent a few hours trying to come up with something that works and have come up empty so far. If anyone has any thoughts on what the code would be, please let me know, I would greatly appreciate it. Thank you.

SQL - selecting multiple dates from a table for unique ID's - part 2

Gordon got me pretty straight on my former question and now I've learned a lot more :). my query now looks like this:
select ident, min(input_date) as 'date requested',
min(case when op='u' and user IN (lists tech's names)) as 'date assigned'
max(case when status='closed' then date end) as 'date completed'
max(case when op='u' then user end) as 'tech'
from table
so this give me all that i want except the column that's labelled as 'tech'. for the majority of the results, i get the correct results but for a handful it gives me either the wrong tech or the 'user'
so I've added a field so you can better understand - it's field called 'role'. when i run this query, i never want the person listed as 'user' in the role column to be listed as 'tech'. in query above, it lists the user instead of tech and it's always in cases where the user updated their request. so if you look at the table, in ident 1, tom is user and harry is tech. harry assigned it to himself on 2/2; however, tom updated request (to give more info) after harry assigned it. running the above query, will result in tom being listed as the tech in the tech field. my guess is b/c of max and tom is after harry alphabetically.
so i have 5 tech's to track (only 2 listed in table - bertha & harry). if i add an IN clause so that it reads like this:
max(case when op='u' and user IN ('bertha','harry',etc) then user end) as 'tech'
the results is it always gives me the last tech alphabetically rather than the one who assigned it/closed it.
my goal is to have the tech who assigns & closes it to be listed. if the request has not been assigned then the 'tech' field should read null.
the date fields all have time stamps so i could go about it that way and i guess i could target their role as well.
i know that max is probably the problem as the tech could assign it then the user could update but if i put IN statement in there I don't get why it lists a tech who's not associated w/that ident.
finally, the other scenario, in ident 2, bertha actually takes the assignment away from harry so i need to say bertha; however, harry will always be the one reported in field - probably due to max()
thanks

finding duplicate rows with different IDs based on multiple columns

please forgive me if my jargon is off. I'm still learning!
I just started using Teradata, and to be honest has been a lot of fun. however, I have hit a road block that has stumped me for a while.
I successfully selected a table from a database that looks like:
ID service date name
1 service1 1/5/15 john
2 service2 1/7/15 steve
3 service3 1/8/15 lola
4 service4 1/3/15 joan
5 service5 1/5/15 fred
6 service3 1/3/15 joan
7 service5 1/8/15 oscar
Now I want to search the data base again to find any duplicate IDs (example: to see if service service1 with date 1/5/15 with name john exists on another row with a different ID.)
At first, I did something like this:
SELECT ID, service, date, name
FROM table
WHERE table.service = ANY(service1, service2, service3, service4, service5, service3, service5)
AND table.date = ANY('1/5/15', '1/7/15, '1/8/15', '1/3/15', '1/5/15', '1/3/15', '1/8/15')
AND table.name = ANY('john', 'steve', 'lola', 'joan', 'fred', 'joan', 'oscar');
But this is giving me more rows than I wanted.
example:
ID service date name
92 service3 1/8/15 steve
is of no use to me since I am looking for IDs that have the same combination of service, date, and name as of any of the other IDs in the above table.
something like this would be favorable:
ID service date name
609 service3 1/8/15 lola
since it matches than of ID 3.
I was curious to see if it were possible to treat the three columns (service, date, name) as a vector and maybe select the rows that match it that way?
ex
......
WHERE (table.service, table.date, table.name) = ANY((service3,1/8/15,lola), (service1, 1/5/15, john), ...etc)
My Teradata is down right now, So I have yet to try the above example. Nevertheless, any thoughts/feedback is greatly appreciated!

The following query may be what you are trying to achieve. This selects IDs for which the combination of service, date, and name appears more than once.
SELECT t1.ID
FROM yourTable t1
INNER JOIN
(
SELECT service, date, name
FROM yourTable
GROUP BY service, date, name
HAVING COUNT(*) > 1
) t2
ON t1.service = t2.service AND
t1.date = t2.date AND
t1.name = t2.name

This is a simple task for a Windowed Aggregate:
SELECT *
FROM tab
QUALIFY
COUNT(*) OVER (PARTITION BY service, date, name) > 1
This counts the number of rows with the same combination of values (like Tim Biegeleisen's Derived Table) but unlike a Standard Aggregate it keeps all rows. The QUALIFY is a nice Teradata syntax extension to avoid a Derived Table.

Don't hardcode values in your query unless you absolutely have to. Instead, take the query you already wrote and join to that.
SELECT dupes.*
FROM (your query) yourquery
JOIN table dupes
ON yourquery.service = dupes.service
AND yourquery.date = dupes.date
AND yourquery.name = dupes.name

SQL - Selecting Records with an odd number of a given attribute

I'm just brushing up on some SQL - in other words, I'm really rusty - and am a bit stuck at the moment. It's probably something trivial, but we'll see.
I'd like to select all people that possess an odd number of a certain attribute that isn't an integer ( in this example, TransactionType). So, for example, take the following test/not real info where these people are buying a car or some similarly big purchase.
Name TransactionType Date
John Buy 5/1
John Cancel 5/1
John Buy 5/2
Joseph Buy 5/25
Joseph Cancel 5/25
Tanya Buy 5/28
I would like it to return the people who had an odd number of transactions; in other words, they ended up purchasing the item. So, in this case, John and Tanya would be selected and Joseph would not.
I know I can use the modulus operand here, but I'm a bit lost how to utilize it correctly.
I thought of using
count(TransactionType) % 2 != 0
in the where clause but that's obviously a no-go. Any pointers in the right direction would be very helpful. Let me know if this is unclear, and thanks!

You are close. You need a having clause instead of a where clause.
select Name
from table
group by Name
having count(TransactionType) % 2 != 0

Wouldn't you be better off getting the latest status by the transaction date and using that rather than relying on counting TransactionType to determine the latest status:
Something like this:
SELECT b.Name, b.TransactionType, b.[Date]
FROM (
SELECT Name, MAX(t1.[DATE]) latestDate
FROM [Transactions] t1
GROUP BY t1.Name
) a
INNER JOIN [Transactions] b ON b.Name = a.Name AND a.latestDate = b.[Date]
WHERE b.TransactionType = 'Buy'
Assuming your dates are valid dates with times included, this should work.
Sample SQL Fiddle
If you only store the date portion the max date would be the same for people that Buy and Cancel on the same date, therefore it would return more data and some incorrect records.

Oracle null values

Can somebody please explain the functionality of the below query in oracle db and why is it not returning the last null row. And also please explain me not in functionality in case of null values.
Table Store_Information
store_name Sales Date
Los Angeles $1500 Jan-05-1999
San Diego $250 Jan-07-1999
San Francisco $300 Jan-08-1999
Boston $700 Jan-08-1999
(null) $600 Jan-10-1999
SELECT *
FROM scott.Store_Information
WHERE store_name IN (null)
STORE_NAME SALES DATE
-------------------- -------------------- -------------------------
0 rows selected

SELECT *
FROM scott.Store_Information
WHERE store_name IS null;
NULL can not be "compared" as other (real) values. Therefor you have to use IS NULL or IS NOT NULL.
Here is a series of blog posts regarding this topic: http://momjian.us/main/blogs/pgblog/2012.html#December_26_2012

If the value you are looking for is a null value, the query should be:
SELECT *
FROM scott.Store_Information
WHERE store_name IS NULL;

Oracle NULLS are special:
nothing is equal to null
nothing is NOT equal to null
Since in is equivalent to a = any, this applies to in also.
So, you cannot use NULL in an in or not in clause and expect proper results.

Null is a special value which doesn't follow normal conventions of string or numeric comparison. Evaluating null using these common methods will always evaluate to FALSE unless you use some of the special built in functions.
Select * from Store_Information
where nvl(store_name,'') in ('')
Select * from store_information
where coalesce(store_name, 'Missing') in ('Missing')
Select * from store_information
where store_name is null

If you want your query select field with null value you can :
solution 1 :
use where ... IS NULL ...
solution 2 :
or if you want absolutely use a IN you can use a NVL
eg :
SELECT *
FROM scott.Store_Information
WHERE NVL(store_name, 'NULL_STORE_NAME') IN ('NULL_STORE_NAME')
but in the second case you make the assumption that you can't have a store name named 'NULL_STORE_NAME'... so usually it's better to use solution 1 in my opinion...

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Logically merging 4 columns of the same information - sql

You could benefit a little from the COALESCE function to get the first NON-NULL value and do something like: COALESCE(CASE WHEN ePO_Age<=CT_AGE THEN ePO_UN END, CASE WHEN SCCM_AGE IS NOT NULL THEN COALESCE(SCCM_LAST_UN, SCCM_TOP_UN) END, AD_UN) AS BestName

Related

matching names with SSN in a columnar table

SQL - selecting multiple dates from a table for unique ID's - part 2

finding duplicate rows with different IDs based on multiple columns

SQL - Selecting Records with an odd number of a given attribute

Oracle null values

Categories

Resources