Is there an sql query to count the number of people in a particular year, knowing the date of birth and the date of death of each person? - sql

I have a table showing the name, the date of birth and the date of death of people (1900-2000). I need to know the number of people for each year in a certain period of time, for example, in 1940 the population was 2.3 billion, in 1941 2.4 billion, in 1942 2.2 billion and so on until 1950.
I work in SAS Enterprise Guide and maybe the code will look a little different than normal sql. At least I want to see something like this:
~
count of people | year
2.300.000.000 |1940
2.400.000.000 |1941
.....................
select
count(name),
from db
where bd<1jan1940 and dd>=1jan1940 and dd=<31dec1940
group by month

First of all, you must know the initial population at the end of 1899. Let's say that was 2 billion. Then add births minus deaths for each year. (You must access the table twice in order to do this, once for births and once for deaths.) Use SUM OVER to get a running total.
I am not sure which DBMS you are actually using, but this is pretty standard SQL:
select yr, 2000000000 + sum(births.cnt - deaths.cnt) over (order by yr)
from
(
select extract(year from bd) as yr, count(*) as cnt
from db
group by extract(year from bd)
) births
join
(
select extract(year from dd) as yr, count(*) as cnt
from db
group by extract(year from dd)
) deaths using (yr)
order by yr;

data dob_data;
do i = 1 to 10000;
num = ceil(rand('UNIFORM',0,10));
dob = intnx('day','01JAN1899'd,ceil(rand('UNIFORM',1,36865)));
select (num);
when (1) dod = intnx('day',dob,ceil(rand('UNIFORM',1,36865)));
otherwise dod = .;
end;
output;
end;
format dob dod date9.;
drop num;
run;
data calendar;
do i=0 to 100;
year = 1900+i;
soy = intnx('year','01JAN1900'd,i,'s');
eoy = intnx('year','01JAN1900'd,i,'e');
output;
end;
format soy eoy date9.;
run;
proc sql;
create table pop as
select year,
sum(case when DOB < soy and coalesce(DOD,'31DEC2200'd) ge soy then 1 else 0 end) as Alive_At_Start,
sum(case when DOB between soy and eoy then 1 else 0 end) as Born_During,
sum(case when coalesce(DOD,'31DEC2200'd) between soy and eoy then -1 else 0 end) as Passed,
sum(case when DOB le eoy and coalesce(DOD,'31DEC2200'd) > eoy then 1 else 0 end) as Alive_At_End
from dob_data t1, calendar t2
group by year;
quit;

Related

How to solve a nested aggregate function in SQL?

I'm trying to use a nested aggregate function. I know that SQL does not support it, but I really need to do something like the below query. Basically, I want to count the number of users for each day. But I want to only count the users that haven't completed an order within a 15 days window (relative to a specific day) and that have completed any order within a 30 days window (relative to a specific day). I already know that it is not possible to solve this problem using a regular subquery (it does not allow to change subquery values for each date). The "id" and the "state" attributes are related to the orders. Also, I'm using Fivetran with Snowflake.
SELECT
db.created_at::date as Date,
count(case when
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-15,Date) and dateadd(day,-1,Date)) then db.id end)
= 0) and
(count(case when (db.state = 'finished')
and (db.created_at::date between dateadd(day,-30,Date) and dateadd(day,-16,Date)) then db.id end)
> 0) then db.user end)
FROM
data_base as db
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
In other words, I want to transform the below query in a way that the "current_date" changes for each date.
WITH completed_15_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-15,current_date) and dateadd(day,-1,current_date)
group by User
),
completed_16_days_before AS (
select
db.user as User,
count(case when db.state = 'finished' then db.id end) as Completed
from
data_base as db
where
db.created_at::date between dateadd(day,-30,current_date) and dateadd(day,-16,current_date)
group by User
)
SELECT
date(db.created_at) as Date,
count(distinct case when comp_15.completadas = 0 and comp_16.completadas > 0 then comp_15.user end) as "Total Users Churn",
count(distinct case when comp_15.completadas > 0 then comp_15.user end) as "Total Users Active",
week(Date) as Week
FROM
data_base as db
left join completadas_15_days_before as comp_15 on comp_15.user = db.user
left join completadas_16_days_before as comp_16 on comp_16.user = db.user
WHERE
db.created_at::date between '2020-01-01' and dateadd(day,-1,current_date)
GROUP BY Date
Does anyone have a clue on how to solve this puzzle? Thank you very much!
The following should give you roughly what you want - difficult to test without sample data but should be a good enough starting point for you to then amend it to give you exactly what you want.
I've commented to the code to hopefully explain what each section is doing.
-- set parameter for the first date you want to generate the resultset for
set start_date = TO_DATE('2020-01-01','YYYY-MM-DD');
-- calculate the number of days between the start_date and the current date
set num_days = (Select datediff(day, $start_date , current_date()+1));
--generate a list of all the dates from the start date to the current date
-- i.e. every date that needs to appear in the resultset
WITH date_list as (
select
dateadd(
day,
'-' || row_number() over (order by null),
dateadd(day, '+1', current_date())
) as date_item
from table (generator(rowcount => ($num_days)))
)
--Create a list of all the orders that are in scope
-- i.e. 30 days before the start_date up to the current date
-- amend WHERE clause to in/exclude records as appropriate
,order_list as (
SELECT created_at, rt_id
from data_base
where created_at between dateadd(day,-30,$start_date) and current_date()
and state = 'finished'
)
SELECT dl.date_item
,COUNT (DISTINCT ol30.RT_ID) AS USER_COUNT
,COUNT (ol30.RT_ID) as ORDER_COUNT
FROM date_list dl
-- get all orders between -30 and -16 days of each date in date_list
left outer join order_list ol30 on ol30.created_at between dateadd(day,-30,dl.date_item) and dateadd(day,-16,dl.date_item)
-- exclude records that have the same RT_ID as in the ol30 dataset but have a date between 0 amd -15 of the date in date_list
WHERE NOT EXISTS (SELECT ol15.RT_ID
FROM order_list ol15
WHERE ol30.RT_ID = ol15.RT_ID
AND ol15.created_at between dateadd(day,-15,dl.date_item) and dl.date_item)
GROUP BY dl.date_item
ORDER BY dl.date_item;

List all months with a total regardless of null

I have a very small SQL table that lists courses attended and the date of attendance. I can use the code below to count the attendees for each month
select to_char(DATE_ATTENDED,'YYYY/MM'),
COUNT (*)
FROM TRAINING_COURSE_ATTENDED
WHERE COURSE_ATTENDED = 'Fire Safety'
GROUP BY to_char(DATE_ATTENDED,'YYYY/MM')
ORDER BY to_char(DATE_ATTENDED,'YYYY/MM')
This returns a list as expected for each month that has attendees. However I would like to list it as
January 2
February 0
March 5
How do I show the count results along with the nulls? My table is very basic
1234 01-JAN-15 Fire Safety
108 01-JAN-15 Fire Safety
1443 02-DEC-15 Healthcare
1388 03-FEB-15 Emergency
1355 06-MAR-15 Fire Safety
1322 09-SEP-15 Fire Safety
1234 11-DEC-15 Fire Safety
I just need to display each month and the total attendees for Fire Safety only. Not used SQL developer for a while so any help appreciated.
You would need a calendar table to select a period you want to display. Simplified code would look like this:
select to_char(c.Date_dt,'YYYY/MM')
, COUNT (*)
FROM calendar as c
left join TRAINING_COURSE_ATTENDED as tca
on tca.DATE_ATTENDED = c.Date_dt
WHERE tca.COURSE_ATTENDED = 'Fire Safety'
and c.Date_dt between [period_start_dt] and [period_end_dt]
GROUP BY to_char(c.Date_dt,'YYYY/MM')
ORDER BY to_char(c.Date_dt,'YYYY/MM')
You can create your own set required year month's on-fly with 0 count and use query as below.
Select yrmth,sum(counter) from
(
select to_char(date_attended,'YYYYMM') yrmth,
COUNT (1) counter
From TRAINING_COURSE_ATTENDED Where COURSE_ATTENDED = 'Fire Safety'
Group By Y to_char(date_attended,'YYYYMM')
Union All
Select To_Char(2015||Lpad(Rownum,2,0)),0 from Dual Connect By Rownum <= 12
)
group by yrmth
order by 1
If you want to show multiple year's, just change the 2nd query to
Select To_Char(Year||Lpad(Month,2,0)) , 0
From
(select Rownum Month from Dual Connect By Rownum <= 12),
(select 2015+Rownum-1 Year from Dual Connect By Rownum <= 3)
Try this :
SELECT Trunc(date_attended, 'MM') Month,
Sum(CASE
WHEN course_attended = 'Fire Safety' THEN 1
ELSE 0
END) Fire_Safety
FROM training_course_attended
GROUP BY Trunc(date_attended, 'MM')
ORDER BY Trunc(date_attended, 'MM')
Another way to generate a calendar table inline:
with calendar (month_start, month_end) as
( select add_months(date '2014-12-01', rownum)
, add_months(date '2014-12-01', rownum +1) - interval '1' second
from dual
connect by rownum <= 12 )
select to_char(c.month_start,'YYYY/MM') as course_month
, count(tca.course_attended) as attended
from calendar c
left join training_course_attended tca
on tca.date_attended between c.month_start and c.month_end
and tca.course_attended = 'Fire Safety'
group by to_char(c.month_start,'YYYY/MM')
order by 1;
(You could also have only the month start in the calendar table, and join on trunc(tca.date_attended,'MONTH') = c.month_start, though if you had indexes or partitioning on tca.date_attended that might be less efficient.)

Not Exists query

I'm trying to find the clients, those who didn't order in the last 2 years and they ordered this year more than 500.. I wrote this query and I used the "NOT EXISTS" condition, but it is still showing me the wrong results.
Some suggestions would be appreciated.
My code:
SELECT
"Sales"."Kundennummer" as 'Neuer Kunde',
year("Sales"."Datum"),
sum("Sales"."Umsatz mit Steuer") as "Umsatz"
FROM "Sales"
WHERE year("Sales"."Datum") = '2017'
AND NOT EXISTS
(
SELECT "Sales"."Kundennummer"
FROM "Sales"
WHERE year("Sales"."Datum") = '2015'
AND year("Sales"."Datum") = '2016'
)
GROUP BY
"Sales"."Kundennummer",
"Sales"."Datum"
HAVING sum("Sales"."Umsatz mit Steuer") > 500
The query in the NOT EXISTS clause will probably yield 0 rows, since a row can't have Datum both 2015 and 2016. So it should probably be OR instead of AND.
Also, if you fix this, there is no link between the subquery and the superquery, which means that it will return rows for any customer (given that there exists a row with Datum either 2015 or 2016 in your table which I guess it does).
So, something like:
SELECT
"Sales"."Kundennummer" as 'Neuer Kunde',
year("Sales"."Datum"),
sum("Sales"."Umsatz mit Steuer") as "Umsatz"
FROM "Sales" sales
WHERE year("Sales"."Datum") = '2017'
AND NOT EXISTS
(
SELECT "Sales"."Kundennummer"
FROM "Sales" salesI
WHERE salesI."Kundennummer" = sales."Kundennummer"
AND (year("Sales"."Datum") = '2015'
OR year("Sales"."Datum") = '2016')
)
GROUP BY
"Sales"."Kundennummer",
"Sales"."Datum"
HAVING sum("Sales"."Umsatz mit Steuer") > 500
Your EXISTS query is not correlated to the main query, i.e. it doesn't look for data for the Kundennummer in question, but whether there are any records in 2015 and 2016.
(You also have the condition for the years wrong by using AND where it must be OR and you should not use quotes on numbers like 2015', and you should not use single quotes on names like 'Neuer Kunde'.)
It should be
AND NOT EXISTS
(
SELECT *
FROM Sales other_years
WHERE other_years.Kundennummer = Sales.Kundennummer
AND year(other_years.Datum) in (2015, 2016)
)
or uncorrelated with NOT IN
AND Kundennummer NOT IN
(
SELECT Kundennummer
FROM Sales
WHERE year(Datum) in (2015, 2016)
)
Be aware though, that when using NOT IN the subquery must return no nulls. E.g. where 3 not in (1, 2, null) does not result in true, as one might expect, because the DBMS argues that the unknown value (null) might very well be a 3 :-)
I propose you here below 3 different ways to do it:
Joining 2 tables
select this_year_sales.kundenummer, this_year_sales.tot_umsatz
from (select sum(umsatz) tot_umsatz, kundenummer from sales where extract(year from (datum)) = extract(year from sysdate) group by kundenummer) this_year_sales
, (select kundenummer, max(datum) max_datum from sales where datum < trunc(sysdate, 'year') group by kundenummer) previous_sales
where this_year_sales.kundenummer = previous_sales.kundenummer
and extract(year from previous_sales.max_datum) < (extract(year from sysdate)-2)
and this_year_sales.tot_umsatz > 500;
Using NOT INT
select kundenummer, sum(umsatz)
from sales s
where extract(year from datum) = extract(year from sysdate)
and kundenummer not in (select kundenummer from sales where extract(year from datum) > (extract(year from sysdate) - 2) and extract(year from datum) < (extract(year from sysdate)-1))
group by kundenummer
having sum(umsatz) > 500;
Using NOT EXISTS
select kundenummer, sum(umsatz)
from sales s
where extract(year from datum) = extract(year from sysdate)
and not exists(
select s1.kundenummer, s1.datum from sales s1 where extract (year from s1.datum) >= (extract(year from sysdate)-2) and extract(year from s1.datum) < extract (year from sysdate) and s1.kundenummer = s.kundenummer
)
group by kundenummer
having sum(umsatz) > 500;

Need to add total # of orders to summary query

In the following, I need to add here the total of orders per order type which is IHORDT. I tried count(t01.ihordt), but its not a valid. I need this order total to get average amount per order.
Data expected:
Current:
IHORDT current year previous year
RTR 100,000 90,000
INT 2,000,000 1,500,000
New change: add to the above one column:
Total orders
RTR 100
INT 1000
SELECT T01.IHORDT
-- summarize by current year and previous year
,SUM( CASE WHEN YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP) - 1
THEN (T02.IDSHP#*T02.IDNTU$) ELSE 0 END) AS LastYear
,SUM( CASE WHEN YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP)
THEN (T02.IDSHP#*T02.IDNTU$) ELSE 0 END) AS CurYear
FROM ASTDTA.OEINHDIH
T01 INNER JOIN
ASTDTA.OEINDLID T02
ON T01.IHORD# = T02.IDORD#
WHERE T01.IHORDT in ('RTR', 'INT')
--------------------------------------------------------
AND ( YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP) - 1
OR YEAR(IHDOCD) = YEAR(CURRENT TIMESTAMP))
GROUP BY T01.IHORDT
To receive a count of records in a group you need to use count(*).
So here is a generic example:
select order_type,
sum(order_amount) as total_sales,
count(*) as number_of_orders
from order_header
group by order_type;

Group report together, possibly with SQL?

I have a table called Register which contains the following fields:
Date, AMPM, Mark.
A day can have two records for a day. Its fairly easy to select and display all the records in a list ordered by date ascending.
What I would like to do is display the data as a grid. Something along the lines of.
| Mon | Tues| Wed| Thurs| Fri | Sat
9/8/2014 | /\ | /P | /\ | L | /\ | /
Have a week beginning and then group the 5 together. I'm not even sure sql is the best option for this, but the groupby commands seem to suggest it may be able to do this.
The Data structure is as follows.
Date, AMPM, Mark
9/8/2014, AM, /
9/8/2014, PM, \
9/9/2014, AM, /
9/9/2014, PM, P
9,10,2014, AM, /
9,10,2014, PM, \
9,11,2014, PM, L
....
The mark field can contain a number of letters. P for instance means they are participating in a sporting activity. L means they were late.
Does anyone have any resources they can point me towards the right direction that would be helpful. I'm not even sure what this type of report is called and whether I should be using SQL or javascript to group this data in a presentable format. The / \ represents AM and the a PM.
The following query would get you the desired result. If you need Sunday also, you'll have to add a small condition to test for when days_after_last_Monday = 6 in the CASE statement.
select
last_Monday Week_Starting,
max(
case
when days_after_last_Monday = 0 then mark
else null
end) Mon, --if the # of days between previous Monday and reg_date is zero, then get the according mark
max(
case
when days_after_last_Monday = 1 then mark
else null
end) Tues,
max(
case
when days_after_last_Monday = 2 then mark
else null
end) Wed,
max(
case
when days_after_last_Monday = 3 then mark
else null
end) Thurs,
max(
case
when days_after_last_Monday = 4 then mark
else null
end) Fri,
max(
case
when days_after_last_Monday = 5 then mark
else null
end) Sat
from
(
select
reg_date,
last_Monday,
julianday(reg_date) - julianday(last_Monday) as days_after_last_monday, --determine the number of days between previous Monday and reg_date
mark
from
(
select
reg_date,
case
when cast (strftime('%w', reg_date) as integer) = 1 then date(reg_date, 'weekday 1')
else date(reg_date, 'weekday 1', '-7 days')
end last_monday, --determine the date of previous Monday
mark
from
(
select
reg_date,
group_concat(mark, '') mark --concatenate am and pm marks for each reg_date
from
(
SELECT
reg_date,
ampm,
mark
FROM register
order by reg_date, ampm --order by ampm so that am rows are selected before pm
)
group by reg_date
)
)
)
group by last_Monday
order by last_Monday;
SQL Fiddle demo