Using SQL Group By while keeping same varchar values - sql

I have a query that is returning two values. I want to have the largest value so I do a group by, then MAX. However, I have three other columns(varchar) that I would like to remain consistent with the id that is brought in with max.
Example.
OId CId FName LName BName
18477 110 Hubba Bubba whoa
158 110 Test2 Person2 leee
What I want is
OId CId FName LName BName
18477 110 Hubba Bubba whoa
So I want to group them by CId. And O Id I want to keep the largest number. I can't use Min or Max for the FName, LName, or BName because I want them to be the one with the OId that is selected. The FName, LName and BName for the other row I don't even want/need.
I tried using SELECT TOP, but that only pulls in literally one row and I need multiple.
SQL
INSERT INTO #CustomerInfoAll(FName, LName, BName, OwnerId, CustomerId)
SELECT
-- what goes here --(o.FirstName) AS FName,
-- what goes here --(o.LastName) AS LName,
-- what goes here --(o.BusinessName) AS BName,
MAX(o.OId) AS OId,
(r.CId) AS CId
FROM Owner o
INNER JOIN Report r
ON o.ReportId = r.ReportId
WHERE r.CId IN (SELECT CId FROM #ThisReportAll)
AND r.Completed IS NOT NULL
GROUP BY r.CId
ORDER BY OId DESC;

Assuming you have SQL Server 2005 or higher:
INSERT INTO #CustomerInfoAll (FName, LName, BName, OwnerId, CustomerId)
SELECT
FirstName,
LastName,
BusinessName,
Id,
CId
FROM
(
SELECT
Seq = ROW_NUMBER() OVER (PARTITION BY r.CId ORDER BY o.Id DESC),
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
FROM
dbo.Owner o
INNER JOIN dbo.Report r
ON o.ReportId = r.ReportId
WHERE
EXISTS ( -- can be INNER JOIN instead if `CId` is unique in temp table
SELECT *
FROM #ThisReportAll tra
WHERE r.CId = tra.CId
)
AND r.Completed IS NOT NULL
GROUP BY
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
) x
WHERE
x.Seq = 1;
DO use full schema names on all your objects (dbo.Owner and dbo.Report).
DO use a semi-join (an EXISTS clause) or INNER JOIN instead of IN when possible.

Related

Compare 3 SQL Server tables and once matched based on some attribute put the result on one table when not matched put the result in another

I have the simple requirement, there are the 3 SQL Server tables like this:
Table 1; columns
AID Name DOB Gender PostCode
Table 2; columns
BID Name DOB Gender PostCode
Table 3; columns
CID Name DOB Gender PostCode
I wish to join (inner joins) based on Name, DOB, Gender & PostCode across all the tables, if the record matches I would like to put the result in one table and the remaining records I wish to put in another table for further comparison/processing.
The first part is simple with this query:
SELECT AID, BID, CID, A.Name, A.DOB, A.Gender, A.PostCode
FROM TAB_A
JOIN TAB_B B ON A.Name = B.Name
AND A.DOB = B.DOB
AND A.Gender = B.Gender
AND A.PostCode = B.PostCode
JOIN TAB_C C ON A.Name = C.Name
AND A.DOB = C.DOB
AND A.Gender = C.Gender
AND A.PostCode = C.PostCode
But 2nd part on the remaining records which doesn't match need to put them in a separate table.
For example if there are total 50 records in all the 3 tables (A = 20, B=20 & C=10) and the output where all the records are matching based on the above query are 5, I wish to store 95 records in the separate table.
Your help/answer will be appreciated.
Thanks
You can use union all to combine the tables. Then use a window function to count the matches by the columns you care about.
Return the rows where the match is less than 3:
select id, Name, DOB, Gender, PostCode
from (select id, Name, DOB, Gender, PostCode,
count(*) over (partition by Name, DOB, Gender, PostCode) as cnt
from ((select AID as id, Name, DOB, Gender, PostCode
from a
) union all
(select BID, Name, DOB, Gender, PostCode
from b
) union all
(select CID, Name, DOB, Gender, PostCode
from c
)
) abc
) abc
where cnt < 3;

How to find columns that only have one value - Postgresql

I have 2 tables, person(email, first_name, last_name, postcode, place_name) and location(postcode, place_name). I am trying to find people that live in places where only one person lives. I tried using SELECT COUNT() but failed because I couldn't figure out what to count in this situation.
SELECT DISTINCT email,
first_name,
last_name
FROM person
INNER JOIN location USING(postcode,
place_name)
WHERE 1 <=
(SELECT COUNT(?))
Aggregate functions always go with having:
SELECT DISTINCT first_value(email) over (partition by place_name),
first_value(first_name) over (partition by place_name),
first_value(last_name) over (partition by place_name),
count(*)
FROM person
INNER JOIN location USING(postcode,
place_name)
GROUP BY place_name
HAVING count(*) = 1
For more about the window functions (like first_value) check out this tutorial.
I would do this as follows. I find it plain and simple.
select p1.* from
person p1
join
(
select p.postcode, p.place_name, count(*) cnt from
person p
group by p.postcode, p.place_name
) t on p1.postcode = t.postcode and p1.place_name = t.place_name and t.cnt = 1
How does it work?
In the inner query (aliased t) we just count how many people live in each location.
Then we join the result of it (t) with the table person (aliased p1) and in the join we require t.cnt = 1. This is probably the most natural way of doing it, I think.
Thanks to the help of people here, I found this answer:
SELECT first_name,
last_name,
email
FROM person
WHERE postcode IN
(SELECT postcode
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(place_name)=1
ORDER BY postcode)
AND place_name IN
(SELECT place_name
FROM person
GROUP BY postcode,
place_name
HAVING COUNT(postcode)=1
ORDER BY place_name)

SQL query issue with achieving an encompasses all effect

Below is the query I have been working on for this question.
Find the names of the companies which have employees residing in every city where employees of Mutual of Omaha live.
This means that if Mutual has employees in the cities Omaha, Lincoln, and Denver that the only company names it should return is a company that has employees in all 3 of those cities. This should also return Mutual.
The below query returns the company which has an employee in any of those three cities. The lastname is there for me to manually check which employees it is counting.
SELECT COMPANY_NAME, e1.lastname
FROM EMPLOYEE E1,WORKS W1
WHERE E1.CITY IN (SELECT CITY
FROM EMPLOYEE E2,WORKS W2
WHERE E2.firstname = W2.firstname
AND E2.lastname = W2.lastname
AND W2.COMPANY_NAME= 'Mutual of Omaha')
AND E1.firstname = W1.firstname
AND E1.lastname = W1.lastname;
I realized I didn't put the tables down so here they are
employee (Lastname, FirstName, MidInitial, gender, street, city)
works (Lastname, FirstName, MidInitial, company_name, salary)
manages(Lastname, FirstName, MidInitial, ManagerLastname, MFirstName, MMidInitial, start-date)
It's not the most elegant piece of code, but try using this one:
WITH tab AS (
SELECT DISTINCT W1.COMPANY_NAME,
E1.CITY
FROM EMPLOYEE E1
JOIN WORKS W1 ON (E1.firstname = W1.firstname AND E1.lastname = W1.lastname)
WHERE E1.CITY IN (SELECT DISTINCT E2.CITY
FROM EMPLOYEE E2,WORKS W2
WHERE E2.firstname = W2.firstname
AND E2.lastname = W2.lastname
AND W2.COMPANY_NAME= 'Mutual of Omaha')
)
SELECT tab.COMPANY_NAME
FROM tab
GROUP BY tab.COMPANY_NAME
HAVING COUNT(tab.CITY) = (SELECT COUNT(sub.CITY) FROM tab sub WHERE COMPANY_NAME = 'Mutual of Omaha')
This option uses the bitwise operator so is also limited by the number of cities it can use at once (~31 due to Int size limit)
------------Assign each distinct city a value that is x2 the previous (like binary counting)
create table #cityvalues (CityName varchar(100), ValueField int)
select distinct E1.CITY
into #while
FROM EMPLOYEE E1
while (select count(*) from #while) > 0
begin
insert into #cityvalues
select top 1 CITY, coalesce((select max(ValueField) from #cityvalues)*2, 1) from #while
delete from #while w where w.CITY in (select CityName from #CityValues)
end
--------------------------------------------------------------------------------
--------------Create a list of Company/City--------------------------------
create table #companycities (Compname varchar(100),CityName varchar(100))
insert into #companycities
select distinct
W1.COMPANY_NAME
,E1.CITY
FROM EMPLOYEE E1
JOIN WORKS W1 on E1.firstname = W1.firstname AND E1.lastname = W1.lastname
----------------------------------------------------------------------------
----This SUM function then creates a "list" of all cities for the company in a single field
select cc.Compname, sum(cv.ValueField) as AllCities
into #CompanyAllCities
from #companycities cc
join #cityvalues cv on cc.CityName = cv.CityName
group by Compname
-----------------------------------------------------------------
----------This query checks if the company's "list" contains the "list" for the joined company, excluding itself
select distinct cac1.Compname
from #CompanyAllCities cac1
join #CompanyAllCities cac2 on cac2.Compname = 'Mutual'
where cac2.AllCities & cac1.AllCities = cac2.AllCities
and cac1.Compname <> 'Mutual'
-------------------------------------------------------------
----------Tidy up after yourself----------
drop table #cityvalues,#CompanyAllCities,#companycities
-----------------------------------------
I would start with a CTE that has each company with the cities they operate in.
Then there are several options, but a self-join with aggregation does the count that you want:
with cw as (
select distinct e.city, w.company_name
from employee e join
works w
on e.firstname = w.firstname and
e.lastname = w.lastname and
e.midinitial = w.midinitial
)
select cw.company_name
from cw join
cw cwo
on cw.city = cwo.city and
cwo.company_name = 'Mutual of Omaha'
group by cw.company_name
having count(*) = (select count(*) from cw where
cw.company_name = 'Mutual of Omaha');
You can list and weight the cities where companies have employees, so you will get a single number representing each company based on where they have their employees, a sort of group number.
Then you can check which cities have the same group number.
;with
cities as(select ROW_NUMBER() over (order by city) city_id, city from (select distinct city from employees) c),
chk as (
select distinct company_name, city_id
from works w
join employees e on w.firstname = e.firstname and w.lastname = e.lastname
join cities c on c.city = e.city
),
cnt as (
select company_name, SUM(power(cast(2 as bigint), city_id-1)) n
from chk
group by company_name
)
select company_name
from cnt
where n = (select n from cnt where company_name = 'Mutual of Omaha')

Getting first line of a LEFT OUTER JOIN

I have 3 tables:
(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS FROM ADDRESSES
WHERE ROWNUM <2
ORDER BY UPDATED_DATE DESC)c
ON a.ID = c.ID
An ID can have only one name but can have multiple addresses. I only want the latest one. This query returns the address as null even when there is an address I guess cause it only fetches the first address from the table and then tries LEFT JOIN it to the ID of addresses which it canno find. What is the correct way of writing this query?
Try KEEP DENSE_RANK
Data source:
CREATE TABLE person
(person_id int primary key, firstname varchar2(4), lastname varchar2(9))
/
INSERT ALL
INTO person (person_id, firstname, lastname)
VALUES (1, 'john', 'lennon')
INTO person (person_id, firstname, lastname)
VALUES (2, 'paul', 'mccartney')
SELECT * FROM dual;
CREATE TABLE address
(person_id int, address_id int primary key, city varchar2(8))
/
INSERT ALL
INTO address (person_id, address_id, city)
VALUES (1, 1, 'new york')
INTO address (person_id, address_id, city)
VALUES (1, 2, 'england')
INTO address (person_id, address_id, city)
VALUES (1, 3, 'japan')
INTO address (person_id, address_id, city)
VALUES (2, 4, 'london')
SELECT * FROM dual;
Query:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select person_id,
min(city) -- can change this to max(city). will work regardless of min/max
-- important you do this to get the recent: keep(dense_rank last)
keep(dense_rank last order by address_id)
as recent_city
from address
group by person_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/2
Not all database has similar functionality with Oracle's KEEP DENSE_RANK windowing function, you can use plain windowing function instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city, x.pick_one_only
from person p
left join (
select
person_id,
row_number() over(partition by person_id order by address_id desc) as pick_one_only,
city as recent_city
from address
) x on x.person_id = p.person_id and x.pick_one_only = 1
Live test: http://www.sqlfiddle.com/#!4/7b1c9/48
Or use tuple testing, shall work on databases that doesn't support windowing function:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
person_id,city as recent_city
from address
where (person_id,address_id) in
(select person_id, max(address_id)
from address
group by person_id)
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/21
Not all database supports tuple testing like in the preceding code though. You can use JOIN instead:
select
p.person_id, p.firstname, p.lastname,
x.recent_city
from person p
left join (
select
address.person_id,address.city as recent_city
from address
join
(
select person_id, max(address_id) as recent_id
from address
group by person_id
) r
ON address.person_id = r.person_id
AND address.address_id = r.recent_id
) x on x.person_id = p.person_id
Live test: http://www.sqlfiddle.com/#!4/7b1c9/24
You can use the analytic function RANK
(SELECT DISTINCT ID
FROM IDS) a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES) b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS ,
rank() over (partition by id
order by updated_date desc) rnk
FROM ADDRESSES) c
ON ( a.ID = c.ID
and c.rnk = 1)
Without having access to any database at the moment, you should be able to do
(SELECT DISTINCT ID
FROM IDS) a LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b ON a.ID = b.ID LEFT OUTER JOIN
(SELECT TOP 1 ADDRESS
FROM ADDRESSES
ORDER BY UPDATED_DATE DESC) c ON a.ID = c.ID
As you might see, the "TOP 1" at 'Address' will only return the first row of the result set.
Also, are you sure that a.ID and c.ID is the same?
I would imagine you need something like .... c ON a.ID = c.AddressID
If not, i'm not entirely sure how you link multiple addresses to a single ID.
(SELECT DISTINCT ID
FROM IDS)a
LEFT OUTER JOIN
(SELECT NAME, ID
FROM NAMES)b
ON a.ID = b.ID
LEFT OUTER JOIN
(SELECT ADDRESS, ROWNUMBER() OVER(PARTITON BY ID ORDER BY UPDATED_DATE DESC) RN
FROM ADDRESSES
)c
ON a.ID = c.ID
where c.RN=1

SQL Server 2005 Query remove duplicates via date

I searched and searched and can't seem to figure out this issue:
We have three tables which have data I need to collect and show in a view.
SELECT
C.FirstName, C.LastName,
aspnet_Membership.LoweredEmail,
MAX(Bill.Code) AS BCodes,
MAX(Bill.BillDate)
FROM
dbo.Client C
INNER JOIN
dbo.Bill ON C.Id = Bill.BId
INNER JOIN
dbo.aspnet_Membership ON aspnet_Membership.UserId = C.UserGUID
WHERE
((Bill.Code='ASDF'
OR Bill.Code='XYZ'
OR Bill.Code='QWE'
OR Bill.Code='JKL')
AND C.LastName!='Unassigned')
GROUP BY
LastName, FirstName, LoweredEmail, Code, BDate
Client table has: FirstName LastName and UserGuid
Bill table has: BCode, BillDate
aspnet_Membership table has: E-mail, UserId
RESULTS:
FirstName LastName E-mail BCode BillDate
FName1 Lname1 fname#isp.com XYZ 2010-05-13 00:00:00.000
Fname2 Lname2 fname2#isp2.com XYZ 2010-06-05 00:00:00.000
Fname2 Lname2 fname2#isp2.com ASD 2008-09-17 12:01:45.407
As you can see Fname2 shows up twice, only difference is in the BCode and BillDate.
How can I make this go with the latest date so I get Fname2 record with Bcode of XYZ with date of 2010-06-05.
Any help would be appreciated, thank you in advance.
Seeing that you're using SQL Server 2005, I would probably use a CTE (Common Table Expression) to do this - something like:
;WITH MyData AS
(
SELECT
c.FirstName, c.LastName,
asp.LoweredEmail,
b.Code AS BCodes, b.BillDate,
ROW_NUMBER() OVER (PARTITION BY c.LastName,c.FirstName
ORDER BY BillDate DESC) AS 'RowNum'
FROM
dbo.Client c
INNER JOIN
dbo.Bill b ON C.Id = b.BId
INNER JOIN
dbo.aspnet_Membership asp ON asp.UserId = c.UserGUID
WHERE
b.Code IN ('ASDF', 'JKL', 'QWE', 'XYZ')
AND c.LastName != 'Unassigned'
)
SELECT
FirstName, LastName, LoweredEmail, BCodes, BillDate
FROM
MyData
WHERE
RowNum = 1
This CTE with the ROW_NUMBER() clause will:
"partition" your data by (FirstName,LastName) - each pair of those values gets a new sequential "row number"
order those values within each partition by descending BillDate
So the resulting set of data has each newest entry for any (FirstName,LastName) group with RowNum = 1 - and that's the data I'm selecting from that CTE.
Does that work for you??
Perform a second join (using a LEFT JOIN) to find a later row in Bill table, and then filter any results where that join succeeds:
SELECT
C.FirstName, C.LastName,
aspnet_Membership.LoweredEmail,
MAX(Bill.Code) AS BCodes,
MAX(Bill.BillDate)
FROM dbo.Client C
INNER JOIN dbo.Bill
ON C.Id=Bill.BId
INNER JOIN dbo.aspnet_Membership
ON aspnet_Membership.UserId=C.UserGUID
LEFT JOIN dbo.Bill b2
ON Bill.BId = b2.BId and
b2.Code in ('ASDF','XYZ','QWE','JKL') and
b2.BDate > Bill.BDate
WHERE
b2.BId is null and
((Bill.Code='ASDF'
OR Bill.Code='XYZ'
OR Bill.Code='QWE'
OR Bill.Code='JKL')
AND C.LastName!='Unassigned')
GROUP BY LastName, FirstName, LoweredEmail, Code, BDate