I am looking to add a single grand total for salaries to my table, which is also based on a selection of multiple columns. The code I'm stuck on is below:
SELECT country, state1, city, street, ID, lastname + ', ' + firstname AS 'Name', SUM(salary) AS 'AnnualSalary'
FROM geography1 JOIN address ON street = streetname JOIN employee ON ID = PID
WHERE termdate IS NULL
GROUP BY country, state1, city, street, gender, lastname, firstname
UNION ALL
SELECT COALESCE(country,'TOTAL'), NULL AS state1, NULL AS city, NULL AS street, NULL AS gender, NULL AS lastname, NULL AS lastname, SUM(salary) AS 'AnnualSalary'
FROM geography1 JOIN address ON street = streetname JOIN employee ON ID = PID
WHERE termdate IS NULL
GROUP BY ROLLUP(country);
The query above executes to include the grand total and additional rows that group by country totals, but the other columns that follow are null. Is there a way to rewrite this so that there is only a single grand total row?
I apologize in advance for being so new to this. I've looked at other questions and this is what I've been able to piece together. Thanks!
You can control the groupings using grouping sets. If you want the groups that you have plus the total for country and the overall total, then:
SELECT country, state1, city, street, ID, lastname + ', ' + firstname AS Name,
SUM(salary) AS 'AnnualSalary'
FROM geography1 JOIN
address
ON street = streetname JOIN
employee ON ID = PID
WHERE termdate IS NULL
GROUP BY GROUPING SETS ( (country, state1, city, street, gender, lastname, firstname), (country), () );
Related
so I seem to be having a moment and can't figure out why certain dups in a table are not getting deleted. I have a test table called QUERY that has names, addresses, DOB's, phones, etc. I am looking to delete the dups, but want to keep the most recent record (preferable but the below code doesn't represent that) where the phone is not empty. My code below just isn't working always giving 0 results. An example of a row would be:
+----------------------------------------------------------------+
| first,last,DOB,address,city,state,phonenumber,validitydate |
+----------------------------------------------------------------+
| steve,smith,19710922, 123 Here St, Miami, FL,9545551212,201902 |
| steve,smith,19710922, 123 Here St, Miami, FL,,202009 |
| steve,smith,19710922, 123 Here St, Miami, FL,9545551212,201802 |
+----------------------------------------------------------------+
WITH Records AS
(
SELECT lastname, firstname, address, state, dateofbirth, phonenumber
,ROW_NUMBER() OVER (PARTITION BY lastname, firstname, address, state, dateofbirth, phonenumber order by validitydate) AS RecordInstance
,ROW_NUMBER() OVER (PARTITION BY lastname, firstname, address, state, dateofbirth, phonenumber order by
CASE
WHEN phonenumber ='' THEN 0
WHEN phonenumber IS NOT NULL THEN 1
Else 0
END) as [ToInclude]
FROM query
)
delete
FROM records
WHERE
RecordInstance > 1
and ToInclude = 0
Anyone see anything I am doing wrong?? Thanks in advance
I think here is what you want to do :
SELECT
RANK() OVER (PARTITION BY lastname, firstname, address, state, dateofbirth, phonenumber order by validitydate) AS RecordInstance
,COUNT(*) OVER (PARTITION BY lastname, firstname, address, state, dateofbirth, phonenumber) DUPS
FROM QUERY
WHERE DUPS > 2
AND RecordInstance = 1
AND phonenumber <> '' -- maybe? if you really don't want to delete duplicates with no phone number
If your required result is to delete all the duplicate records where phone number is not null or empty then you can try below query:
WITH Records AS
(
SELECT lastname, firstname, address, state, dateofbirth, phonenumber
,ROW_NUMBER() OVER (PARTITION BY lastname, firstname, address, state, dateofbirth, phonenumber order by validitydate) AS RecordInstance
FROM query where phonenumber<>'' and phonenumber is not null
)
delete
FROM records
WHERE
RecordInstance > 1
Try
RecordInstance = ROW_NUMBER() OVER(PARTITION BY LastName, FirstName, Address, City, DateOfBirth ORDER BY PhoneNumber DESC, ValidityDate)
I have a database table of people records with columns for UserID, FirstName, LastName, DOB, and Email address. FirstName, LastName, and Email are required values, but DOB can be null if the person didn't give that information, so a few rows could look like this:
FirstName LastName DOB Email UserID
John Doe 1990-01-01 johndoe#gmail.com 1
Jane Doe 1990-02-01 janedoe#gmail.com 2
John Doe NULL johndoe#gmail.com 3
Paul Blart 1985-01-01 mallcop#gmail.com 4
Clark Kent NULL ImNotSuperman#gmail.com 5
Paul Blart 1985-01-01 mallcop#gmail.com 6
And I am trying to write a query (that is part of a bigger program) to identify duplicate people records in the database. The requirements are that FirstName, LastName, and Email must be identical, and if there is a value for DOB then it must be identical, but if it is null it can still be labeled as a duplicate. So in the above table, the two John Doe's and the two Paul Blart's would be selected. I want to do this in a partition statement. So my initial attempt is:
SELECT COUNT(UserID) OVER (Partition BY FirstName, LastName, DOB, Email) AS Count,
DENSE_RANK() OVER (ORDER BY FirstName, LastName, DOB, Email) AS RANK,
UserID, FirstName, LastName, DOB, Email
FROM People
where COUNT(UserID) OVER (Partition BY FirstName, LastName, DOB, Email) > 1
Which correctly selects the Paul Blart's as duplicates but not the John Doe's because one has a null value for DOB. Is there any way to make it so those records are properly selected?
This might be simpler expressed with exists:
select t.*
from mytable t
where exists (
select 1
from mytable t1
where
t1.id <> t.id
and t1.firstname = t.firstname
and t1.lastname = t.lastname
and t1.email = t.email
and (t1.dob = t.dob or t1.dob is null or t.dob is null)
)
You can do this using window functions:
select t.*
from (select t.*,
count(*) over (partition by firstname, lastname, email, dob) as cnt,
sum(case when dob is null then 1 else 0 end) over (partition by firstname, lastname, email) as cnt_null
from t
) t
where cnt > 1 or
(dob is not null and cnt_null > 0);
I have been given a challenge that is a bit out of my scope, so I'm just going to jump right in.
I have a sample dataset in BigQuery you can find here for testing purposes: https://bigquery.cloud.google.com/table/robotic-charmer-726:bl_test_data.complex_problem
I need to figure out the SQL code to query my table and do the following:
By aggregating using the following rules (I'll start with just one email address, and add the other in at the end):
As a general note up front, everything is to be made lowercase such that Ben=ben when aggregating.
Email is the broadest aggregation, and is aggregated by the lowercase version.
The amounts for all of those lowercase emails are summed, as is pictured below in blue.
First and last names are considered next, and they are selected based on the sum amount of the lowercase of the first AND last name.
Note, first or last names are NOT considered separately. See below where Ben has a sum amount of 160 and Kathleen only has a sum amount of 150, but Kathleen is still selected because her full name has a sum amount higher than any other full name.
Next the lowercase full address of the SELECTED NAME is chosen based on the highest sum amount.
Similar to the names, the full address considers all columns together.
Now I'll add in another email address, and we'll do the same thing.
Each lowercase email address is considered separately. I'm now realizing that I should have made that more clear with my pictures, but I don't want to do it all again... too much work. So I hope I have made it clear enough.
I hope you find this to be a very fun challenge!
There are probably cleaner ways of doing this, but this will give you the answer you need:
select email, first_name, last_name, address, city, state, zip, total_amount amount
from (
select d.email email, d.first_name first_name, d.last_name last_name, d.amount amount, d.total_amount total_amount, e.address address, e.city city, e.state state, e.zip zip, row_number() over (partition by e.email order by e.amount desc) ord
from (
select a.email email, a.first_name first_name, a.last_name last_name, b.amount amount, c.amount total_amount
from (
SELECT
lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name, last_name)) as name_group, lower(address) address, lower(city) city, lower(state) state, lower(concat(address,city,state)) as location_group, zip, sum(amount) amount
FROM [robotic-charmer-726:bl_test_data.complex_problem]
group by 1,2,3,4,5,6,7,8,9
) a
inner join (
select email, first_name, last_name, name_group, amount
from (
select email, first_name, last_name, name_group, amount, row_number() over (partition by email order by amount desc) as ord
from (
select lower(email) email , lower(first_name) first_name, lower(last_name) last_name, lower(concat(first_name,last_name)) as name_group, sum(amount) amount,
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1, 2, 3, 4
)
)
where ord = 1
) b
on a.name_group = b.name_group
inner join (
select lower(email) email, sum(amount) amount
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1
) c
on a.email = c.email
group by 1,2,3,4,5
) d
inner join (
select lower(email) email, lower(first_name) first_name, lower(last_name) last_name, lower(address) address, lower(city) city, lower(state) state, zip,lower(concat(lower(address),lower(city), lower(state), zip)) as location_group, sum(amount) amount
from [robotic-charmer-726:bl_test_data.complex_problem]
group by 1,2,3,4,5,6,7,8
) e
on d.email = e.email and d.first_name = e.first_name and d.last_name = e.last_name
)
where ord = 1
I have this query :
select first_name, last_name, MAX(date)
from person p inner join
address a on
a.person_id = p.id
group by first_name, last_name
with person(sid, last_name, first_name), address(data, cp, sid, city)
My question is how I can have a query that select first_name, last_name, MAX(date), city, cp
without adding city and cp to the group
I mean I want to have all 5 columns but only for the datas grouped by first_name, last_name and date
Many Thanks
This is not possible. Say you have three John Smith in your database, each of them having one or two addresses. When you group by name now, then what city do you want to get? The city of which John Smith and of which of his addresses? As there is no implicit answer to this question, there is no way to write a select statement without explicitly saying which city is to be selected.
I have a simple query on Oracle.
SELECT DISTINCT City, Name, Surname FROM Persons
Is there any alternative sql query for the same query without DISTINCT ?
Have a look at this article
Example as;
select City
from (
select City,
row_number() over
(partition by City
order by City) rownumber
from Persons
) t
where rownumber = 1
SELECT City, Name, Surname FROM Persons
UNION
SELECT City, Name, Surname FROM Persons
SELECT First(City), First(Name), First(Surname)
FROM Persons
GROUP BY City, Name, Surname