BigQuery Union Distinct Where Value not in Preceding DataSet - sql

I am trying to reconcile some student database with GSuite emails, where usernames have been created inconsistently for years.
The gist of the query I am trying to make on BigQuery is:
Match Emails to Students from email Pattern 1 and union with
Match Emails to Students from email Pattern 2 and union with
Emails not in 1 an 2.
Or in SQL:
with mymatches as (
with emaildataset as (
select 'testA' as col
union all
select 'testB'
union all
select 'testC'
union all
select 'testD'
)
select * from emaildataset where col like '%A'
union distinct
select * from emaildataset where col like '%B'
),
emaildataset2 as (
select 'testA' as col
union all
select 'testB'
union all
select 'testC'
union all
select 'testD'
)
select * from mymatches
union distinct
select * from emaildataset2 where emaildataset2.col not in (select col from mymatches)
This runs happily, but when I run the real code, then I'm getting duplicates.
The real code is now:
with matchedEmails as (
with g as (
select * from gsuite.StudentUsers
union all
select * from gsuite.AlumniUsers
)
select
std.STDCODE,
g.*
from g
inner join quick.all_students_alumni as std
on split(lower(g.Email), '#')[offset(0)] = split(quick.studentEmail(std.FNAME, std.MNAME, std.LNAME, std.STATUSTYPE), '#')[offset(0)]
where g.OU like '/Student%' or OU like '/Alumni%'
union distinct select
std.STDCODE,
g.*
from g
inner join quick.all_students_alumni as std
on split(lower(g.Email), '#')[offset(0)] = split(quick.studentEmail(std.FNAME, '', std.LNAME, std.STATUSTYPE), '#')[offset(0)]
where g.OU like '/Student%' or OU like '/Alumni%'
)
select * from matchedEmails
union distinct select
'NOT MATCHED' as STDCODE,
g.*
from (
select * from gsuite.StudentUsers
union all
select * from gsuite.AlumniUsers
) as g
where g.Email not in (select Email from matchedEmails)
and g.OU like '/Student%' or OU like '/Alumni%'
As a result though, I am getting duplicates in the Email column, which--based on my knowledge and test above--should not be, due to the where g.Email not in (select Email from matchedEmails) clause.
Am I doing something wrong?

I think, very last WHERE clause should be fixed to look like below
where g.Email not in (select Email from matchedEmails)
and (g.OU like '/Student%' or OU like '/Alumni%')
As you can see - the brackets around g.OU like '/Student%' or OU like '/Alumni%' were missing
it might be something else too that still need to be fixed - but this answers you below questions
As a result though, I am getting duplicates in the Email column, which--based on my knowledge and test above--should not be, due to the where g.Email not in (select Email from matchedEmails) clause.

Related

Sql - which letter of the alphabet is not in the names

I am a beginner at SQL
I need to find out which letter of the alphabet is not in a list of names as first character.
How can I do that?
To look for a specific letter I can use the LIKE operator. However I don't know what to use to look for the letter that is not in the list....
The query I have to find the different first letter of the emailadresses is:
select left(EmailAddress,1), count(left(EmailAddress,1)
from Customers
order by left(EmailAdress,1)
There is no e-mail adress that start with a U.
But which query I can use to get that result?
You'd need a table of all letters. You can use an ad hoc derived table using UNION ALL and FROM-less SELECTs. One method then is to use a correlated subquery and NOT EXISTS to check for letters not in the first position of any e-mail address.
SELECT l.letter
FROM (SELECT 'a' letter
UNION ALL
SELECT 'b' letter
...
SELECT 'y' letter
UNION ALL
SELECT 'z' letter) l
WHERE NOT EXISTS (SELECT *
FROM customers c
WHERE left(lower(emailaddress), 1) = l.letter);
One could left join to letters and filter the unmatched.
The example below uses a recursive CTE to generate the letters.
WITH RCTE_LETTERS AS
(
SELECT CHAR(ASCII('a')) AS Letter, ASCII('a') AS Code
UNION ALL
SELECT CHAR(code+1), code+1
FROM RCTE_LETTERS
WHERE code < ASCII('z')
)
, CTE_FIRST_LETTERS AS
(
SELECT DISTINCT
LOWER(LEFT(EmailAddress,1)) AS FirstLetter
FROM Customers
)
SELECT l.Letter
FROM RCTE_LETTERS AS l
LEFT JOIN CTE_FIRST_LETTERS AS fl
ON fl.FirstLetter = l.Letter
WHERE fl.FirstLetter IS NULL
ORDER BY l.Letter;
Or use an EXCEPT
WITH RCTE_LETTERS AS
(
SELECT CHAR(ASCII('a')) AS Letter, ASCII('a') AS Code
UNION ALL
SELECT CHAR(code+1), code+1
FROM RCTE_LETTERS
WHERE code < ASCII('z')
)
, CTE_FIRST_LETTERS AS
(
SELECT DISTINCT
LOWER(LEFT(EmailAddress,1)) AS FirstLetter
FROM Customers
)
SELECT Letter FROM RCTE_LETTERS
EXCEPT
SELECT FirstLetter FROM CTE_FIRST_LETTERS
ORDER BY Letter
USE AdventureWorks2014
GO
SELECT left(e.EmailAddress,1) AS Letter, COUNT(left(e.EmailAddress,1)) AS CountEmailLetter
FROM Person.Person p INNER JOIN person.EmailAddress e ON p.BusinessEntityID = e.BusinessEntityID
WHERE left(e.EmailAddress,1) NOT IN
(
SELECT Char(number+65)
FROM master.dbo.spt_values
WHERE name IS NULL AND number < 26
)
GROUP BY left(e.EmailAddress,1)
ORDER BY left(e.EmailAddress,1)

Excluding a row that contains a specific value

I want to exclude people who have joined a specific group. For example, if some students signed up for an Orchestra club, and I want to retrieve a list of students who did NOT sign up for orchestra, how do I do so?
I am unable to simply do a Group By clause because some students may have joined multiple clubs, and would bypass the Where condition and still show up in the query,
as shown here.
I am thinking about using a CASE statement in the SELECT clause to flag the person as '1' if they have joined Orchestra, and '0' if they have not, but I'm struggling to write an aggregate CASE function, which would cause issues from the GROUP BY clause.
Any thoughts on how to flag people with a certain row value?
Apparently my table didn't get saved onto SQLFiddle so you can paste the code below on your own screen:
CREATE TABLE activity ( PersonId, Club) as
select 1, 'Soccer' from dual union
select 1, 'Orchestra' from dual union
select 2, 'Soccer' from dual union
select 2, 'Chess' from dual union
select 2, 'Bball' from dual union
select 3, 'Orchestra' from dual union
select 3, 'Chess' from dual union
select 3, 'Bball' from dual union
select 4, 'Soccer' from dual union
select 4, 'Bball' from dual union
select 4, 'Chess' from dual;
Use the HAVING clause instead of using WHERE, with case expression :
HAVING max(case when column = ‘string’ then 1 else 0 end) = 0
Add this after your group by .
How about selecting a list of user ids from the activity table and excluding it:
SELECT * FROM users WHERE id NOT IN
(SELECT PersonId FROM activity WHERE Club = 'Orchestra');
You could use a subquery to return a list of people to exclude.
-- Returns person 2 and 4.
SELECT
PersonId
FROM
activity
WHERE
PersonId NOT IN
(
-- People to exclude.
SELECT
PersonId
FROM
activity
WHERE
Club = 'Orchestra'
)
GROUP BY
PersonId
;
EDIT Removed superfluous distinct in subquery - thanks #mathguy.
select * from
(
select a.*, case when Club ='Orchestra' then 1 else 0 end flag
from activity a
) where flag =1; --> get some students signed up for an Orchestra club
select * from
(
select a.*, case when Club ='Orchestra' then 1 else 0 end flag
from activity a
) where flag =0; --> get students not signed up for an Orchestra club

SQL: Return a count of 0 with count(*)

I am using WinSQL to run a query on a table to count the number of occurrences of literal strings. When trying to do a count on a specific set of strings, I still want to see if some values return a count of 0. For example:
select letter, count(*)
from table
where letter in ('A', 'B', 'C')
group by letter
Let's say we know that 'A' occurs 3 times, 'B' occurs 0 times, and 'C' occurs 5 times. I expect to have a table returned as such:
letter count
A 3
B 0
C 5
However, the table never returns a row with a 0 count, which results like so:
letter count
A 3
C 5
I've looked around and saw some articles mentioning the use of joins, but I've had no luck in correctly returning a table that looks like the first example.
You can create an in-line table containing all letters that you look for, then LEFT JOIN your table to it:
select t1.col, count(t2.letter)
from (
select 'A' AS col union all select 'B' union all select 'C'
) as t1
left join table as t2 on t1.col = t2.letter
group by t1.col
on many platforms you can now use the values statement instead of union all to create your "in line" table - like this
select t.letter, count(mytable.letter)
from ( values ('A'),('B'),('C') ) as t(letter)
left join mytable on t.letter = mytable.letter
group by t.letter
I'm not that familiar with WinSQL, but it's not pretty if you don't have the values that you want in the left most column in a table somewhere. If you did, you could use a left join and a conditional. Without it, you can do something like this:
SELECT all_letters.letter, IFNULL(letter_count.letter_count, 0)
FROM
(
SELECT 'A' AS letter
UNION
SELECT 'B' AS letter
UNION
SELECT 'C' AS letter
) all_letters
LEFT JOIN
(SELECT letter, count(*) AS letter_count
FROM table
WHERE letter IN ('A', 'B', 'C')
GROUP BY letter) letter_count
ON all_letters.letter = letter_count.letter

How to check existence of data in a table from a where clause in sql server 2008?

Suppose I have a table with columns user_id, name and the table contains data like this:
user_id name
------- -----
sou souhardya
cha chanchal
swa swapan
ari arindam
ran ranadeep
If I want to know these users (sou, cha, ana, agn, swa) exists in this table or not then I want output like this:
user_id it exists or not
------- -----------------
sou y
cha y
ana n
agn n
swa y
As ana and aga do not exist in the table it must show "n" (like the above output).
Assuming your existing checklist is not on the database, you will have to assemble a query containing those. There are many ways of doing it. Using CTEs, it would look like this:
with cte as
(
select 'sou' user_id
union all
select 'cha'
union all
select 'ana'
union all
select 'agn'
union all
select 'swa'
)
select
cte.user_id,
case when yt.user_id is null then 'n' else 'y' end
from cte
left join YourTable yt on cte.user_id = yt.user_id
This also assumes user_id is unique.
Here is the SQLFiddle with the proof of concept: http://sqlfiddle.com/#!3/e023a0/4
Assuming you're just testing this manually:
DECLARE #Users TABLE
(
[user_id] VARCHAR(50)
)
INSERT INTO #Users
SELECT 'sou'
UNION SELECT 'cha'
UNION SELECT 'ana'
UNION SELECT 'agn'
UNION SELECT 'swa'
SELECT a.[user_id]
, [name]
, CASE
WHEN b.[user_id] IS NULL THEN 'N'
ELSE 'Y'
END AS [exists_or_not]
FROM [your_table] a
LEFT JOIN #Users b
ON a.[user_id] = b.[user_id]
You didn't provide quite enough information to provide a working example, but this should get you close:
select tbl1.user_id, case tbl2.user_id is null then 'n' else 'y' end
from tbl1 left outer join tbl2 on tbl1.user_id = tbl2.user_id
;with usersToCheck as
(
select 'sou' as userid
union select 'cha'
union select 'ana'
union select 'agn'
union select 'swa'
)
select utc.userid,
(case when exists ( select * from usersTable as ut where ut.user_id = utc.userid) then 'y' else 'n' end)
from usersToCheck as utc

How can I treat a UNION query as a sub query

I have a set of tables that are logically one table split into pieces for performance reasons. I need to write a query that effectively joins all the tables together so I use a single where clause of the result. I have successfully used a UNION on the result of using the WHERE clause on each subtable explicitly as in the following
SELECT * FROM FRED_1 WHERE CHARLIE = 42
UNION
SELECT * FROM FRED_2 WHERE CHARLIE = 42
UNION
SELECT * FROM FRED_3 WHERE CHARLIE = 42
but as there are ten separate subtables updating the WHERE clause each time is a pain. What I want is something like this
SELECT *
FROM (
SELECT * FROM FRED_1
UNION
SELECT * FROM FRED_2
UNION
SELECT * FROM FRED_3)
WHERE CHARLIE = 42
If it makes a difference the query needs to run against a DB2 database.
Here is a more comprehensive (sanitised) version of what I need to do.
select *
from ( select * from FRD_1 union select * from FRD_2 union select * from FRD_3 ) as FRD,
( select * from REQ_1 union select * from REQ_2 union select * from REQ_3 ) as REQ,
( select * from RES_1 union select * from RES_2 union select * from RES_3 ) as RES
where FRD.KEY1 = 123456
and FRD.KEY1 = REQ.KEY1
and FRD.KEY1 = RES.KEY1
and REQ.KEY2 = RES.KEY2
NEW INFORMATION:
It looks like the problem has more to do with the number of fields in the union than anything else. If I greatly restrict the fields I can get most of the syntax variations below working. Unfortunately, restricting the fields so much means the resulting query, while potentially useful, is not giving me the result I wanted. I've managed to get an additional 3 fields from one of the tables in addition to the 2 keys. Any more than that and the query fails.
I believe you have to give a name to your subquery result. I don't know db2 so I'm taking a shot in the dark, but I know this works on several other platforms.
SELECT *
FROM (
SELECT * FROM FRED_1
UNION
SELECT * FROM FRED_2
UNION
SELECT * FROM FRED_3) AS T1
WHERE CHARLIE = 42
If the logical implementation is a single table but the physical implementation is multiple tables then how about creating a view that defines the logical model.
CREATE VIEW VW_FRED AS
SELECT * FROM FRED_1
UNION
SELECT * FROM FRED_2
UNION
SELECT * FROM FRED_3
then it's a simple matter of
SELECT * FROM VW_FRED WHERE CHARLIE = 42
Again, I'm not familiar with db2 syntax but this gives you the general idea.
with
FRD as ( select * from FRD_1 union select * from FRD_2 union select * from FRD_3 ),
REQ as ( select * from REQ_1 union select * from REQ_2 union select * from REQ_3 ),
RES as ( select * from RES_1 union select * from RES_2 union select * from RES_3 )
SELECT * from FRD, REQ, RES
WHERE FRD.KEY1 = 123456
and FRD.KEY1 = REQ.KEY1
and FRD.KEY1 = RES.KEY1
and REQ.KEY2 = RES.KEY2
I'm not familiar with DB2 syntax but why aren't you doing this as an INNER JOIN or LEFT JOIN?
SELECT *
FROM FRED_1
INNER JOIN FRED_2
ON FRED_1.Charlie = FRED_2.Charlie
INNER JOIN FRED_3
ON FRED_1.Charlie = FRED_3.Charlie
WHERE FRED_1.Charlie = 42
If the values don't exist in FRED_2 or FRED_3 then use a LEFT/OUTER JOIN. I'm assuming that FRED_1 is a master table, and if a record exists then it will be in this table.
maybe:
SELECT * FROM
(select * from FRD_1
union
select * from FRD_2
union
select * from FRD_3) FRD
INNER JOIN (select * from REQ_1 union select * from REQ_2 union select * from REQ_3) REQ
on FRD.KEY1 = REQ.KEY1
INNER JOIN (select * from RES_1 union select * from RES_2 union select * from RES_3) RES
on FRD.KEY1 = RES.KEY1
WHERE FRD.KEY1 = 123456 and REQ.KEY2 = RES.KEY2