Measure population on several dates - sql

I want to measure the population of our manucipality (which contains out of several places). I've got two tables in: my first dataset is a calender table with a row for each first day of every month.
My second table contains alle the people that live and have lived in the manucipality.
What I want is the population of each place on every first day of the month from my calender table. I've put some raw data below (just a few records of the persons table because it contains 100.000 records)
Calender table:
+----------+
| Date |
+----------+
| 1-1-2018 |
+----------+
| 1-2-2018 |
+----------+
| 1-3-2018 |
+----------+
| 1-4-2018 |
+----------+
Persons table
+-----+-----------+-----------+---------------+-------+
| BSN | Startdate | Enddate | Date of death | Place |
+-----+-----------+-----------+---------------+-------+
| 1 | 12-1-2000 | null | null | A |
+-----+-----------+-----------+---------------+-------+
| 2 | 10-5-2011 | null | 22-1-2018 | B |
+-----+-----------+-----------+---------------+-------+
| 3 | 16-12-2011| 10-2-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
| 4 | 9-11-2012 | null | null | B |
+-----+-----------+-----------+---------------+-------+
| 5 | 8-9-2013 | null | 27-3-2018 | A |
+-----+-----------+-----------+---------------+-------+
| 6 | 7-10-2017 | 28-3-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
My expected result:
+----------+-------+------------+
| Date | Place | Population |
+----------+-------+------------+
| 1-1-2018 | A | 2 |
+----------+-------+------------+
| 1-1-2018 | B | 4 |
+----------+-------+------------+
| 1-2-2018 | A | 2 |
+----------+-------+------------+
| 1-2-2018 | B | 3 |
+----------+-------+------------+
| 1-3-2018 | A | 2 |
+----------+-------+------------+
| 1-3-2018 | B | 2 |
+----------+-------+------------+
| 1-4-2018 | A | 1 |
+----------+-------+------------+
| 1-4-2018 | B | 1 |
+----------+-------+------------+
What I've done so far but doesnt seems to work:
SELECT a.Place
,c.Date
,(SELECT COUNT(DISTINCT(b.BSN))
FROM Person as b
WHERE b.Startdate < c.Date
AND (b.Enddate > c.Date OR b.Enddate is null)
AND (b.Date of death > c.Date OR b.Date of death is null)
AND a.Place = b.Place) as Population
FROM Person as a
JOIN Calender as c
ON a.Startdate <= c.Date
AND a.Enddate >= c.Date
GROUP BY Place, Date
I hope someone can help finding out the problem. Thanks in advance

First cross join Calender and the places to get the date/place pairs. Then left join the persons on the place and the date. Finally group by date and place to get the count of people for that day and place.
SELECT [ca].[Date],
[pl].[Place],
count([pe].[Place]) [Population]
FROM [Calender] [ca]
CROSS JOIN (SELECT DISTINCT
[pe].[Place]
FROM [Persons] [pe]) [pl]
LEFT JOIN [Persons] [pe]
ON [pe].[Place] = [pl].[Place]
AND [pe].[Startdate] <= [ca].[Date]
AND (colaesce([pe].[Enddate],
[pe].[Date of death]) IS NULL
OR coalesce([pe].[Enddate],
[pe].[Date of death]) > [ca].[Date])
GROUP BY [ca].[Date],
[pl].[Place]
ORDER BY [ca].[Date],
[pl].[Place];
Some notes and assumptions:
If you have a table listing the places use that instead of the subquery aliases [pl]. I just had no other option with the given tables.
I believe the Date of death also implies an Enddate for the same day. You might want to consider a trigger, that sets the Enddate automatically to the Date of death if it isn't null. That would make things easier and probably more consistent.

Related

How to get the soonest date in relation to another date field

Say I have a date field in one table (table a):
+---------+------------+
| item_id | Date |
+---------+------------+
| 12333 | 10/12/2020 |
+---------+------------+
| 45678 | 10/12/2020 |
+---------+------------+
Then I have another table with another date, and it joins to the table above as so (they join on the primary key of table b):
+-------------+------------+-----------+------------+
| primary_key | date2 | item_id | Date |
| (table b) | (table b) | (table a) | (table a) |
+-------------+------------+-----------+------------+
| 45318 | 10/10/2020 | 12333 | 10/12/2020 |
+-------------+------------+-----------+------------+
| 45318 | 10/13/2020 | 12333 | 10/12/2020 |
+-------------+------------+-----------+------------+
| 45318 | 10/24/2020 | 12333 | 10/12/2020 |
+-------------+------------+-----------+------------+
| 75394 | 10/20/2020 | 45678 | 10/12/2020 |
+-------------+------------+-----------+------------+
You see the last column is from table a. I want to get table b's "date2" column to give me the soonest date after 10/12/2020, and remove the rest.
So for the example of 45318, I want to keep the second line only (the one that is 10/13/2020) since that is the soonest date after 10/12/2020.
If this doesn't make sense, let me know and I will fix it!
One method is apply:
select a.*, b.*. -- or whatever columns you want
from a outer apply
(select top (1) b.*
from b
where b.item_id = a.item_id and
b.date2 >= '2020-10-12'
order by b.date2 asc
) b;

Remove Duplicate Result on Query

could help me solve this duplication problem where it returns more than 1 result for the same record I want to bring only 1 result for each id, and only the last history of each record.
My Query:
SELECT DISTINCT ON(tickets.ticket_id,ticket_histories.created_at)
ticket.id AS ticket_id,
tickets.priority,
tickets.title,
tickets.company,
tickets.ticket_statuse,
tickets.created_at AS created_ticket,
group_user.id AS group_id,
group_user.name AS user_group,
ch_history.description AS ch_description,
ch_history.created_at AS ch_history
FROM
tickets
INNER JOIN company ON (company.id = tickets.company_id)
INNER JOIN (SELECT id,
tickets_id,
description,
user_id,
MAX(tickets.created_at) AS created_ticket
FROM
ch_history
GROUP BY id,
created_at,
ticket_id,
user_id,
description
ORDER BY created_at DESC LIMIT 1) AS ch_history ON (ch_history.ticket_id = ticket.id)
INNER JOIN users ON (users.id = ch_history.user_id)
INNER JOIN group_users ON (group_users.id = users.group_user_id)
WHERE company = 15
GROUP BY
tickets.id,
ch_history.created_at DESC;
Result of my query, but returns 3 or 5 identical ids with different histories
I want to return only 1 id of each ticket, and only the last recorded history of each tick
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:35.724485 | SAME COMPANY | people | 5 | TEST 1 | 2019-12-10 09:31:45.780667
49706 | 2 | INCLUDE DATA | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 2 | 2019-12-10 09:38:52.769515
49706 | 2 | ANY TITLE | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 3 | 2019-12-10 09:39:22.779473
49706 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TESTE 4 | 2019-12-10 09:42:59.50332
49706 | 2 | WHITESTRIPES | 1 | f | 2019-12-09 09:16:35.320708 | SAME COMPANY | people | 5 | TEST 5 | 2019-12-10 09:44:30.675434
wanted to return as below
ticket_id | priority | title | company_id | ticket_statuse | created_ticket | company | user_group | group_id | ch_description | ch_history
-----------+------------+--------------------------------------+------------+-----------------+----------------------------+------------------------------------------------------+-----------------+----------+------------------------+----------------------------
49713 | 2 | REMOVE DATA | 1 | t | 2019-12-09 17:50:10.724485 | SAME COMPANY | people | 5 | TEST 1 | 2020-01-01 18:31:45.780667
49707 | 2 | INCLUDE DATA | 1 | f | 2019-12-11 19:22:21.320701 | SAME COMPANY | people | 5 | TEST 2 | 2020-02-05 16:38:52.769515
49708 | 2 | ANY TITLE | 1 | f | 2019-12-15 07:15:57.320950 | SAME COMPANY | people | 5 | TEST 3 | 2020-02-06 07:39:22.779473
49709 | 2 | NOTING ELSE MAT | 1 | f | 2019-12-16 08:30:28.320881 | SAME COMPANY | people | 5 | TESTE 4 | 2020-01-07 11:42:59.50332
49701 | 2 | WHITESTRIPES | 1 | f | 2019-12-21 11:04:00.320450 | SAME COMPANY | people | 5 | TEST 5 | 2020-01-04 10:44:30.675434
I wanted to return as shown below, see that the field ch_description, and ch_history bring only the most recent records and only the last of each ticket listed, without duplication I wanted to bring this way could help me.
Two things jump out at me:
You have listed "created at" as part of your "distinct on," which is going to inherently give you multiple rows per ticket id (unless there happens to be only one)
The distinct on should make the subquery on the ticket history unnecessary... and even if you chose to do it this way, you again are going on the "created at" column, which will give you multiple results. The ideal subquery, should you choose this approach, would have been to group by ticket_id and only ticket_id.
Slightly related:
An alternative approach to the subquery would be an analytic function (windowing function), but I'll save that for another day.
I think the query you want, which will give you one row per ticket_id, based on the history table's created_at field would be something like this:
select distinct on (t.id)
<your fields here>
from
tickets t
join company c on t.company_id = c.id
join ch_history ch on ch.ticket_id = t.id
join users u on ch.user_id = u.ud
join group_users g on u.group_user_id = g.id
where
company = 15
order by
t.id, ch.created_at -- this is what tells distinct on which record to choose

SQL: How to return just 1 previous date for a record, not all previous dates

I have a very simple table of ID's and Sign-in dates and I want to use SQL to make a column that shows the previous sign-in date:
Table: SIGNIN
| ID | Sign-in Date |
| A | 01/01/19 |
| B | 01/01/19 |
| C | 02/01/19 |
| A | 02/01/19 |
| A | 03/01/19 |
| B | 03/01/19 |
| A | 04/01/19 |
| C | 04/01/19 |
| B | 05/01/19 |
I've tried doing a join to itself but it's showing all previous sign-in dates rather than just the most recent.
SELECT [SIGNIN].ID
[SIGNIN].SignInDate
FROM [SIGNIN]
INNER JOIN [SIGNIN] as [Prev] on [SIGNIN].ID = [Prev].ID
and [SIGNIN].SignInDate < [Prev].SignInDate
ORDER BY [SIGNIN].ID, [SIGNIN].SignInDate
The result I want:
Table: SIGNIN
| ID | Sign-in Date | Previous |
| A | 01/01/19 | NULL |
| B | 01/01/19 | NULL |
| C | 02/01/19 | NULL |
| A | 02/01/19 | 01/01/19 |
| A | 03/01/19 | 02/01/19 |
| B | 03/01/19 | 01/01/19 |
| A | 04/01/19 | 03/01/19 |
| C | 04/01/19 | 02/01/19 |
| B | 05/01/19 | 03/01/19 |
What I'm getting:
| ID | Sign-in Date | Previous |
| A | 01/01/19 | NULL |
| B | 01/01/19 | NULL |
| C | 02/01/19 | NULL |
| A | 02/01/19 | 01/01/19 |
| A | 03/01/19 | 01/01/19 |
| A | 03/01/19 | 02/01/19 |
| B | 03/01/19 | 01/01/19 |
| A | 04/01/19 | 01/01/19 |
| A | 04/01/19 | 02/01/19 |
| A | 04/01/19 | 03/01/19 |
| C | 04/01/19 | 02/01/19 |
| B | 05/01/19 | 01/01/19 |
| B | 05/01/19 | 03/01/19 |
I'm certain this has been answered elsewhere before, but the biggest problem I'm having is not knowing how to word my problem!
EDIT: Really helpful responses so far, but is there a solution where I can change the date "cut-off" eg:
Cut off: 03/01/19
Table: The same
Desired result:
| ID | Sign-in Date | Previous |
| A | 03/01/19 | 02/01/19 |
| B | 03/01/19 | 01/01/19 |
| A | 04/01/19 | 03/01/19 |
| C | 04/01/19 | 02/01/19 |
| B | 05/01/19 | 03/01/19 |
I think that if you need to do that it's better to make an ordering column like:
SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SignInDate) AS O FROM [SIGNIN]
So the end result would be like:
SELECT t.ID, t.SignInDate [Sign-In Date], t2.SignInDate as Previous
FROM (SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SignInDate) AS O FROM [SIGNIN]) t
LEFT JOIN
(SELECT *, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY SignInDate) AS O FROM [SIGNIN]) t2 ON t.ID = t2.ID AND t.O = t2.O+1
Which should give Something akin to:
A 2019-01-01 NULL
A 2019-01-04 2019-01-01
A 2019-02-01 2019-01-04
B 2019-01-01 NULL
B 2019-01-05 2019-01-01
C 2019-01-01 NULL
Hope this helps.
Try using LAG assuming you're on a modern version of SQL Server.
SELECT [SIGNIN].ID,
[SIGNIN].SignInDate,
LAG([SIGNIN].SignInDate) OVER (PARTITION BY [SIGNIN].ID ORDER BY [SIGNIN].SignInDate DESC) AS Previous
FROM [SIGNIN]
using this:
SELECT [SIGNIN].ID,
[SIGNIN].SignInDate,
MAX([Prev].SignInDate) as Previous
FROM [SIGNIN]
LEFT JOIN [SIGNIN] as [Prev] on [SIGNIN].ID = [Prev].ID
and [SIGNIN].SignInDate > [Prev].SignInDate
GROUP BY [SIGNIN].ID, [SIGNIN].SignInDate
ORDER BY [SIGNIN].ID, [SIGNIN].SignInDate
Try something like this:
SELECT
ID, SignInDate,
LAG(SignInDate, 1,SignInDate) OVER(order by ID partition by ID)
FROM SIGNIN
The following will give you almost what you are looking for, just without the nulls.
You should probably do a left outer or right outer join in the inner query, and some extra maneuver to add the null rows as well. I am a lit
select id, max(prev) as prev, signindate from
(
SELECT SIGNIN.ID,
SIGNIN.SignInDate as prev,
prev.signindate
FROM SIGNIN
JOIN SIGNIN as Prev on SIGNIN.ID = Prev.ID
and SIGNIN.SignInDate < Prev.SignInDate
ORDER BY SIGNIN.ID, SIGNIN.SignInDate
) a
group by 1,3
I like the APPLY solution because you can add any amount of columns from the matching row(s):
DECLARE #CutOffDate DATE = '2019-01-03'
SELECT
S.ID,
S.SignInDate,
PreviousSignInDate = R.SignInDate
FROM
[SIGNIN] AS S
OUTER APPLY (
SELECT TOP 1
P.* -- Can incorporate many columns (will also have to add them on the outmost SELECT list)
FROM
SIGNIN AS P
WHERE
S.ID = P.ID AND
P.SignInDate < S.SignInDate
ORDER BY
P.SignInDate DESC
) AS R
WHERE
S.SignInDate >= #CutOffDate
ORDER BY
S.SignInDate,
S.ID
For this case, you can use TOP 1 + ORDER BY to fetch the previous one, as long as you have the link S.ID = P.ID and making sure that P.SignInDate < S.SignInDate.
Also get used to writing dates on the YYYY-MM-DD format, since 03/01/19 might lead to confusions.
A correlated subquery is a very simple solution :
SELECT ID, SignInDate,
(SELECT top 1 SigInDate
FROM SIGNIN as S2
WHERE S2.ID = S1.ID and S2.SignInDate < S1.SignInDate
ORDER BY S2.SignInDate desc) as Previous
FROM SIGNIN as S1
ORDER BY S1.ID, S1.SignInDate

Adding new rows into query from nonexistent data in the database table

I have the following sample table:
+----------+------+-------+
| DATE | NAME | HOURS |
+----------+------+-------+
| 2018-5-3 | JOHN | 8 |
+----------+------+-------+
| 2018-5-9 | JOHN | 5 |
+----------+------+-------+
How can I generate a query that fills new rows to the existent data, e.g, sample query result:
+-----------+------+-------+
| DATE | NAME | HOURS |
+-----------+------+-------+
| 2018-5-1 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-2 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-3 | JOHN | 8 |
+-----------+------+-------+
| 2018-5-4 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-5 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-6 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-7 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-8 | JOHN | 0 |
+-----------+------+-------+
| 2018-5-9 | JOHN | 5 |
+-----------+------+-------+
| 2018-5-10 | JOHN | 0 |
+-----------+------+-------+
Check that I've added 0 into HOURS column because JOHN doesn't appear with hours in the specified date (only in 2018-5-3 and 2018-5-8). I am currently trying to get this result. This is only the begin of a big table I need to process, so I'll need to generate this fixed values per user. I was trying using left/right join with previously generated dates but it didn't work.
Can you advice me the best way to accomplish it? Thanks.
Use generate_series() and left join:
select g.dte, t.name, coalesce(t.hours, 0) as hours
from generate_series('2018-05-01'::date, '2018-05-10'::date, interval '1 day') g(dte) left join
t
on g.dte = t.date;
For multiple users, you need to generate all the rows for all the users and then left join:
select g.dte, n.name, coalesce(t.hours, 0) as hours
from generate_series('2018-05-01'::date, '2018-05-10'::date, interval '1 day'
) g(dte) cross join
(select distinct name from t) n left join
t
on g.dte = t.date and n.name = t.name;

Data aggregation with left-outer join

I am trying to pull some data with transaction counts, by branch, by week, which will later be used to feed some dynamic .Net charts.
I have a calendar table, I have a branch table and I have a transaction table.
Here is my DB info (only relevant columns included):
Branch Table:
ID (int), Branch (varchar)
Calendar Table:
Date (datetime), WeekOfYear(int)
Transaction Table:
Date (datetime), Branch (int), TransactionCount(int)
So, I want to do something like the following:
Select b.Branch, c.WeekOfYear, sum(TransactionCount)
FROM BranchTable b
LEFT OUTER JOIN TransactionTable t
on t.Branch = b.ID
JOIN Calendar c
on t.Date = c.Date
WHERE YEAR(c.Date) = #Year // (SP accepts this parameter)
GROUP BY b.Branch, c.WeekOfYear
Now, this works EXCEPT when a branch doesn't have any transactions for a week, in which case NO RECORD is returned for that branch on that week. What I WANT is to get that branch, that week and "0" for the sum. I tried isnull(sum(TransactionCount), 0) - but that didn't work, either. So I will get the following (making up sums for illustration purposes):
+--------+------------+-----+
| Branch | WeekOfYear | Sum |
+--------+------------+-----+
| 1 | 1 | 25 |
| 2 | 1 | 37 |
| 3 | 1 | 19 |
| 4 | 1 | 0 | //THIS RECORD DOES NOT GET RETURNED, BUT I NEED IT!
| 1 | 2 | 64 |
| 2 | 2 | 34 |
| 3 | 2 | 53 |
| 4 | 2 | 11 |
+--------+------------+-----+
So, why doesn't the left-outer join work? Isn't that supposed to
Any help will be greatly appreciated. Thank you!
EDIT: SAMPLE TABLE DATA:
Branch Table:
+----+---------------+
| ID | Branch |
+----+---------------+
| 1 | First Branch |
| 2 | Second Branch |
| 3 | Third Branch |
| 4 | Fourth Branch |
+----+---------------+
Calendar Table:
+------------+------------+
| Date | WeekOfYear |
+------------+------------+
| 01/01/2015 | 1 |
| 01/02/2015 | 1 |
+------------+------------+
Transaction Table
+------------+--------+--------------+
| Date | Branch | Transactions |
+------------+--------+--------------+
| 01/01/2015 | 1 | 12 |
| 01/01/2015 | 1 | 9 |
| 01/01/2015 | 2 | 4 |
| 01/01/2015 | 2 | 2 |
| 01/01/2015 | 2 | 23 |
| 01/01/2015 | 3 | 42 |
| 01/01/2015 | 3 | 19 |
| 01/01/2015 | 3 | 7 |
+------------+--------+--------------+
If you want to return a query that contains each Branch and each week, then you'll need to first create a full list of that, then use a LEFT JOIN to the transactions to get the count. The code will be similar to:
select bc.Branch,
bc.WeekOfYear,
TotalTransaction = coalesce(sum(t.TransactionCount), 0)
from
(
select b.id, b.branch, c.WeekOfYear, c.date
from branch b
cross join Calendar c
-- if you want to limit the number of rows returned use a WHERE to limit the weeks
-- so far in the year or using the date column
WHERE c.date <= getdate()
and YEAR(c.Date) = #Year // (SP accepts this parameter)
) bc
left join TransactionTable t
on t.Date = bc.Date
and bc.id = t.branch
GROUP BY bc.Branch, bc.WeekOfYear
See Demo
This code will create in your subquery a full list of each branch with each date. Once you have this list, then you can JOIN to the transactions to get your total transaction count and you'd return each date as you want.
Bring in the Calendar before you bring in the transactions:
SELECT b.Branch, c.WeekOfYear, sum(TransactionCount)
FROM BranchTable b
INNER JOIN CalendarTable c ON YEAR(c.Date) = #Year
LEFT JOIN TransactionTable t ON t.Branch = b.ID AND t.Date = c.Date
GROUP BY b.Branch, c.WeekOfYear
ORDER BY c.WeekOfYear, b.Branch