Trying to show Count of Person, Spouse, Kids - sql

I am doing
select
count(distinct(ssn))
, count (distinct(ssn + spousename))
, count distinct((ssn + spousename + kidname))
My problem is that if the spouse name or kid name are blank there is no spouse or kid so they should not be counted.
How would you bypass a blank value from the count?
data:
SSN Person Spouse Child
111-11.... John (no spouse no kid)
222-22.....Jane Jim Jack
333-33.....Jerry Jack (no spouse)
333-33.....Jerry Jill (no spouse second kid)
444-44..... John Judy (no kid)
My answer should be 4 people, 2 spouses and 3 kids because I am doing a count of unique values that don't include blank.
I can't show the real SSN and names so It looks like fake data
Thank you!

you may try COALESCE
select
count(distinct(ssn))
, count (distinct(ssn + spousename))
, count distinct((ssn + spousename + coalesce(kidname,'')))

Try this:
SELECT
COUNT(ssn) AS people,
SUM(spouses) AS spouses,
SUM(children) AS children
FROM
(
SELECT
ssn,
MAX(person) AS person,
COUNT(spouse) AS spouses,
COUNT(child) AS children
FROM
TableName
GROUP BY
ssn
) BySSN
;

Providing your data in an accurate (and of course made up!), and consumable (DDL+DML) format is critical to getting the correct answer.
Knowing whether you use null or blank is makes a big difference also.
I am a big fan of counting exactly what it is you want to count, rather than relying on distinct.
declare #MyData table (SSN varchar(6), Person varchar(12), Spouse varchar(12), Child varchar(12));
insert into #MyData (SSN, Person, Spouse, Child)
values
('111-11', 'John', null, null),
('222-22', 'Jane', 'Jim', 'Jack'),
('333-33', 'Jerry', null, 'Jack'),
('333-33', 'Jerry', null, 'Jill'),
('444-44', 'John', 'Judy', null);
with cte as (
select count(Spouse) Spouse, count(Child) Child
from #MyData
group by SSN
)
select count(*) [Num People], sum(Spouse) Spouses], sum(Child) [Num Children]
from cte;
Returns:
Num People
Num Spouses
Num Children
4
2
3

Related

ms-access 2010: count duplicate names per household address

I am currently working with a spreadsheet in MS Access 2010 which contains about 130k rows of information about people who voted in a local election recently. Each row has their residential information (street name, number, postcode etc.) and personal information (title, surname, forename, middle name, DOB etc.). Each row represents an individual person rather than a household (therefore in many cases the same residential address appears more than once as more than one person resides in a particular household).
What I want to achieve is basically to create a new field in this dataset called 'count'. I want this field to give me a count of how many different surnames reside at a single address.
Is there an SQL script that will allow me to do this in Access 2010?
+------------------+----------+-------+---------+----------+-------------+
| PROPERTYADDRESS1 | POSTCODE | TITLE | SURNAME | FORENAME | MIDDLE_NAME |
+------------------+----------+-------+---------+----------+-------------+
FAKEADDRESS1 EEE 5GG MR BLOGGS JOE N
FAKEADDRESS2 EEE 5BB MRS BLOGGS SUZANNE P
FAKEADDRESS3 EEE 5RG MS SMITH PAULINE S
FAKEADDRESS4 EEE 4BV DR JONES ANNE D
FAKEADDRESS5 EEE 3AS MR TAYLOR STUART A
The following syntax has got me close so far:
SELECT COUNT(electoral.SURNAME)
FROM electoral
GROUP BY electoral.UPRN
However, instead of returning me all 130k odd rows, it only returns me around 67k rows. Is there anything I can do to the syntax to achieve the same result, but just returning every single row?
Any help is greatly appreciated!
Thanks
You could use something like this:
select *,
count(surname) over (partition by householdName)
from myTable
If you have only one column which contains the name,
ex: Rob Adams
then you can do this to have all the surnames in a different column so it will be easier in the select:
SELECT LEFT('HELLO WORLD',CHARINDEX(' ','HELLO WORLD')-1)
in our example:
select right (surmane, charindex (' ',surname)-1) as surname
example on how to use charindex, left and right here:
http://social.technet.microsoft.com/wiki/contents/articles/17948.t-sql-right-left-substring-and-charindex-functions.aspx
if there are any questions, leave a comment.
EDIT: I edited the query, had a syntax error, please try it again. This works on sql server.
here is an example:
create table #temp (id int, PropertyAddress varchar(50), surname varchar(50), forname varchar(50))
insert into #temp values
(1, 'hiddenBase', 'Adamns' , 'Kara' ),
(2, 'hiddenBase', 'Adamns' , 'Anne' ),
(3, 'hiddenBase', 'Adamns' , 'John' ),
(4, 'QueensResidence', 'Queen' , 'Oliver' ),
(5, 'QueensResidence', 'Queen' , 'Moira' ),
(6, 'superSecretBase', 'Diggle' , 'John' ),
(7, 'NandaParbat', 'Merlin' , 'Malcom' )
select * from #temp
select *,
count (surname) over (partition by PropertyAddress) as CountMembers
from #temp
gives:
1 hiddenBase Adamns Kara 3
2 hiddenBase Adamns Anne 3
3 hiddenBase Adamns John 3
7 NandaParbat Merlin Malcom 1
4 QueensResidence Queen Oliver 2
5 QueensResidence Queen Moira 2
6 superSecretBase Diggle John 1
Your query should look like this:
select *,
count (SURNAME) over (partition by PropertyAddress) as CountFamilyMembers
from electoral
EDIT
If over partition by isn't supported, then I guess you can get to your desired result by using group by
select *,
count (SURNAME) over (partition by PropertyAddress) as CountFamilyMembers
from electoral
group by -- put here the fields in the select (one by one), however you can't write group by *
GROUP BY creates an aggregate query, so it's by design that you get fewer records (one per UPRN).
To get the count for each row in the original table, you can join the table with the aggregate query:
SELECT electoral.*, elCount.NumberOfPeople
FROM electoral
INNER JOIN
(
SELECT UPRN, COUNT(*) AS NumberOfPeople
FROM electoral
GROUP BY UPRN
) AS elCount
ON electoral.UPRN = elCount.UPRN
Given the update I want to post another answer. Try it like this:
create table #temp2 ( PropertyAddress1 varchar(50), POSTCODE varchar(20), TITLE varchar (20),
surname varchar(50), FORENAME varchar(50), MIDDLE_NAME varchar (50) )
insert into #temp2 values
('FAKEADDRESS1', 'EEE 5GG', 'MR', 'BLOGGS', 'JOE', 'N'),
('FAKEADDRESS1', 'EEE 5BB', 'MRS', 'BLOGGS', 'SUZANNE', 'P'),
('FAKEADDRESS2', 'EEE 5RG', 'MS', 'SMITH', 'PAULINE', 'S'),
('FAKEADDRESS3', 'EEE 4BV', 'DR', 'JONES', 'ANNE', 'D'),
('FAKEADDRESS4', 'EEE 3AS', 'MR', 'TAYLOR', 'STUART', 'A')
select PropertyAddress1, surname,count (#temp2.surname) as CountADD
into #countTemp
from #temp2
group by PropertyAddress1, surname
select * from #temp2 t2
left join #countTemp ct
on t2.PropertyAddress1 = ct.PropertyAddress1 and t2.surname = ct.surname
This yields:
PropertyAddress1 POSTCODE TITLE surname FORENAME MIDDLE_NAME PropertyAddress1 surname CountADD
FAKEADDRESS1 EEE 5GG MR BLOGGS JOE N FAKEADDRESS1 BLOGGS 2
FAKEADDRESS1 EEE 5BB MRS BLOGGS SUZANNE P FAKEADDRESS1 BLOGGS 2
FAKEADDRESS2 EEE 5RG MS SMITH PAULINE S FAKEADDRESS2 SMITH 1
FAKEADDRESS3 EEE 4BV DR JONES ANNE D FAKEADDRESS3 JONES 1
FAKEADDRESS4 EEE 3AS MR TAYLOR STUART A FAKEADDRESS4 TAYLOR 1

Multiple group by row extracting in SQL

let me tell you my sample data structure first.
CREATE TABLE foobar (
id int primary key,
lastname varchar(100) not null,
firstname varchar(100) not null,
val1 int not null,
val2 int not null
)
imagine sample entries like that:
1, Smith, Bob, 1, 3
2, Smith, Bob 2, 1
3, SMith, Allen , 3, 4
i.e. Firstname, Lastname are not distinct here
Now see the query:
select lastname, firstname, avg(val1), avg(val2)
from foobar
where lastname = 'Smith'
group by firstname
sample output:
1, Smith, Bob, 1.5 , 2
2, Smith, Allen , 3, 4
Now i want three different things.
1. get me all smith where avg(val1) is the biggest value of all
2. get me all smith where avg(val1) is the lowest value of all
sample:
1. 1.5 is smallest value therefore get me Bob
2. 3 is the biggest value therefore get me Allen
I don't know how to do this efficiently.
My approach was to save the min and max of avg() and then join into the same table on that value. But i fell like this is a bad solution.
Is there some efficient way?
I'm not sure what you mean biggest value of all. By definition, an average will never be the largest value, but instead, the median of all?
If you mean to just return the largest and smallest entries in the table, use something like this:
select lastname, firstname, max(val1), val2
from foobar
where lastname = 'Smith'
group by firstname
select lastname, firstname, min(val1), val2
from foobar
where lastname = 'Smith'
group by firstname

How to replace NULL in a result set with the last NOT NULL value in the same column?

A colleague of mine has a problem with a sql query:-
Take the following as an example, two temp tables:-
select 'John' as name,10 as value into #names
UNION ALL SELECT 'Abid',20
UNION ALL SELECT 'Alyn',30
UNION ALL SELECT 'Dave',15;
select 'John' as name,'SQL Expert' as job into #jobs
UNION ALL SELECT 'Alyn','Driver'
UNION ALL SELECT 'Abid','Case Statement';
We run the following query on the tables to give us a joined resultset:-
select #names.name, #names.value, #jobs.job
FROM #names left outer join #jobs
on #names.name = #jobs.name
name value job
John 10 SQL Expert
Abid 20 Case Statement
Alyn 30 Driver
Dave 15 NULL
As 'Dave' does not exist in the #jobs table, he is given a NULL value as expected.
My colleague wants to modify the query so each NULL value is given the same value as the previous entry.
So the above would be:-
name value job
John 10 SQL Expert
Abid 20 Case Statement
Alyn 30 Driver
Dave 15 Driver
Note that Dave is now a 'Driver'
There may be more than one NULL value in sequence,
name value job
John 10 SQL Expert
Abid 20 Case Statement
Alyn 30 Driver
Dave 15 NULL
Joe 15 NULL
Pete 15 NULL
In this case Dave, Joe and Pete should all be 'Driver', as 'Driver' is the last non null entry.
There are probably better ways to do this. Here is one of the ways I could achieve the result using Common Table Expressions (CTE) and using that output to perform a OUTER APPLY to find the previous persion's job. The query here uses id to sort the records and then determines what the previous person's job was. You need at least one criteria to sort the records because data in tables are considered to be unordered sets.
Also, the assumption is that the first person in the sequence should have a job. If the first person doesn't have a job, then there is no value to pick from.
Click here to view the demo in SQL Fiddle.
Click here to view another demo in SQL Fiddle with second data set.
Script:
CREATE TABLE names
(
id INT NOT NULL IDENTITY
, name VARCHAR(20) NOT NULL
, value INT NOT NULL
);
CREATE TABLE jobs
(
id INT NOT NULL
, job VARCHAR(20) NOT NULL
);
INSERT INTO names (name, value) VALUES
('John', 10),
('Abid', 20),
('Alyn', 30),
('Dave', 40),
('Jill', 50),
('Jane', 60),
('Steve', 70);
INSERT INTO jobs (id, job) VALUES
(1, 'SQL Expert'),
(2, 'Driver' ),
(5, 'Engineer'),
(6, 'Barrista');
;WITH empjobs AS
(
SELECT
TOP 100 PERCENT n.id
, n.name
, n.value
, job
FROM names n
LEFT OUTER JOIN jobs j
on j.id = n.id
ORDER BY n.id
)
SELECT e1.id
, e1.name
, e1.value
, COALESCE(e1.job , e2.job) job FROM empjobs e1
OUTER APPLY (
SELECT
TOP 1 job
FROM empjobs e2
WHERE e2.id < e1.id
AND e2.job IS NOT NULL
ORDER BY e2.id DESC
) e2;
Output:
ID NAME VALUE JOB
--- ------ ----- -------------
1 John 10 SQL Expert
2 Abid 20 Driver
3 Alyn 30 Driver
4 Dave 40 Driver
5 Jill 50 Engineer
6 Jane 60 Barrista
7 Steve 70 Barrista
What do you mean by "last" non-null entry? You need a well-defined ordering for "last" to have a consistent meaning. Here's a query with data definitions that uses the "value" column to define last, and that might be close to what you want.
CREATE TABLE #names
(
id INT NOT NULL IDENTITY
, name VARCHAR(20) NOT NULL
, value INT NOT NULL PRIMARY KEY
);
CREATE TABLE #jobs
(
name VARCHAR(20) NOT NULL
, job VARCHAR(20) NOT NULL
);
INSERT INTO #names (name, value) VALUES
('John', 10),
('Abid', 20),
('Alyn', 30),
('Dave', 40),
('Jill', 50),
('Jane', 60),
('Steve', 70);
INSERT INTO #jobs (name, job) VALUES
('John', 'SQL Expert'),
('Abid', 'Driver' ),
('Alyn', 'Engineer'),
('Dave', 'Barrista');
with Partial as (
select
#names.name,
#names.value,
#jobs.job as job
FROM #names left outer join #jobs
on #names.name = #jobs.name
)
select
name,
value,
(
select top 1 job
from Partial as P
where job is not null
and P.value <= Partial.value
order by value desc
)
from Partial;
It might be more efficient to insert the data, then update.

sql recursive function - to find managers

Lets say I have the following table
User_ID Manager_ID
---------------------
Linda Jacob
Mark Linda
Kevin Linda
Steve Mark
John Kevin
Basically the requirement is to pull all the managers under the user_id you are searching for. So for instance if I send in 'Linda' then it should return me:
'Mark', 'Kevin', 'Steve', 'John'
or if I send in 'Mark' then it should return me:
Steve
I have heard of recursive function but I am unsure of how to do this. Any help would be appreciated.
Use:
WITH hieararchy AS (
SELECT t.user_id
FROM YOUR_TABLE t
WHERE t.manager_id = 'Linda'
UNION ALL
SELECT t.user_id
FROM YOUR_TABLE t
JOIN hierarchy h ON h.user_id = t.manager_id)
SELECT x.*
FROM hierarchy x
Resultset:
user_id
--------
Mark
Kevin
John
Steve
Scripts:
CREATE TABLE [dbo].[YOUR_TABLE](
[user_id] [varchar](50) NOT NULL,
[manager_id] [varchar](50) NOT NULL
)
INSERT INTO YOUR_TABLE VALUES ('Linda','Jacob')
INSERT INTO YOUR_TABLE VALUES ('Mark','Linda')
INSERT INTO YOUR_TABLE VALUES ('Kevin','Linda')
INSERT INTO YOUR_TABLE VALUES ('Steve','Mark')
INSERT INTO YOUR_TABLE VALUES ('John','Kevin')
The code example from Recursive Queries Using Common Table Expressions on MSDN shows exactly that.

Add or delete repeated row

I have an output like this:
id name date school school1
1 john 11/11/2001 nyu ucla
1 john 11/11/2001 ucla nyu
2 paul 11/11/2011 uft mit
2 paul 11/11/2011 mit uft
I would like to achieve this:
id name date school school1
1 john 11/11/2001 nyu ucla
2 paul 11/11/2011 mit uft
I am using direct join as in:
select distinct
a.id, a.name,
b.date,
c.school
a1.id, a1.name,
b1.date,
c1.school
from table a, table b, table c,table a1, table b1, table c1
where
a.id=b.id
and...
Any ideas?
We will need more information such as what your tables contain and what you are after.
One thing I noticed is you have a school and then school1. 3nf states that you should never duplicate fields and append numbers to them to get more information even if you think that the relationship will only be 1 or 2 additional items. You need to create a second table that stores a user associated with 1 to many schools.
I agree with everyone else that both your source table and your desired output are poor design. While you probably can't do anything about your source table, I recommend the following code and output:
Select id, name, date, school from MyTable;
union
Select id, name, date, school1 from MyTable;
(repeat as necessary)
This will give you results in the format:
id name date school
1 john 11/11/2001 nyu
1 john 11/11/2001 ucla
2 paul 11/11/2011 mit
2 paul 11/11/2011 uft
(Note: in my version of SQL, union queries automatically select distinct records so the distinct flag isn't needed)
With this format, you could easily count the number of schools per student, number of students per school, etc.
If processing time and/or storage space is a factor here, you could then split this into 2 tables, 1 with the id,name & date, the other with the id & school (basically what JonH just said). But if you're just working up some simple statistics, this should suffice.
This problem was just too irresistable, so I just took a guess at the data structures that we are dealing with. The technology wasn't specified in the question. This is in Transact-SQL.
create table student
(
id int not null primary key identity,
name nvarchar(100) not null default '',
graduation_date date not null default getdate(),
)
go
create table school
(
id int not null primary key identity,
name nvarchar(100) not null default ''
)
go
create table student_school_asc
(
student_id int not null foreign key references student (id),
school_id int not null foreign key references school (id),
primary key (student_id, school_id)
)
go
insert into student (name, graduation_date) values ('john', '2001-11-11')
insert into student (name, graduation_date) values ('paul', '2011-11-11')
insert into school (name) values ('nyu')
insert into school (name) values ('ucla')
insert into school (name) values ('uft')
insert into school (name) values ('mit')
insert into student_school_asc (student_id, school_id) values (1,1)
insert into student_school_asc (student_id, school_id) values (1,2)
insert into student_school_asc (student_id, school_id) values (2,3)
insert into student_school_asc (student_id, school_id) values (2,4)
select
s.id,
s.name,
s.graduation_date as [date],
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s1 where s1.rank_num = 1) as school,
(select max(name) from
(select name,
RANK() over (order by name) as rank_num
from school sc
inner join student_school_asc ssa on ssa.school_id = sc.id
where ssa.student_id = s.id) s2 where s2.rank_num = 2) as school1
from
student s
Result:
id name date school school1
--- ----- ---------- ------- --------
1 john 2001-11-11 nyu ucla
2 paul 2011-11-11 mit uft