Data aggregation with left-outer join - sql

I am trying to pull some data with transaction counts, by branch, by week, which will later be used to feed some dynamic .Net charts.
I have a calendar table, I have a branch table and I have a transaction table.
Here is my DB info (only relevant columns included):
Branch Table:
ID (int), Branch (varchar)
Calendar Table:
Date (datetime), WeekOfYear(int)
Transaction Table:
Date (datetime), Branch (int), TransactionCount(int)
So, I want to do something like the following:
Select b.Branch, c.WeekOfYear, sum(TransactionCount)
FROM BranchTable b
LEFT OUTER JOIN TransactionTable t
on t.Branch = b.ID
JOIN Calendar c
on t.Date = c.Date
WHERE YEAR(c.Date) = #Year // (SP accepts this parameter)
GROUP BY b.Branch, c.WeekOfYear
Now, this works EXCEPT when a branch doesn't have any transactions for a week, in which case NO RECORD is returned for that branch on that week. What I WANT is to get that branch, that week and "0" for the sum. I tried isnull(sum(TransactionCount), 0) - but that didn't work, either. So I will get the following (making up sums for illustration purposes):
+--------+------------+-----+
| Branch | WeekOfYear | Sum |
+--------+------------+-----+
| 1 | 1 | 25 |
| 2 | 1 | 37 |
| 3 | 1 | 19 |
| 4 | 1 | 0 | //THIS RECORD DOES NOT GET RETURNED, BUT I NEED IT!
| 1 | 2 | 64 |
| 2 | 2 | 34 |
| 3 | 2 | 53 |
| 4 | 2 | 11 |
+--------+------------+-----+
So, why doesn't the left-outer join work? Isn't that supposed to
Any help will be greatly appreciated. Thank you!
EDIT: SAMPLE TABLE DATA:
Branch Table:
+----+---------------+
| ID | Branch |
+----+---------------+
| 1 | First Branch |
| 2 | Second Branch |
| 3 | Third Branch |
| 4 | Fourth Branch |
+----+---------------+
Calendar Table:
+------------+------------+
| Date | WeekOfYear |
+------------+------------+
| 01/01/2015 | 1 |
| 01/02/2015 | 1 |
+------------+------------+
Transaction Table
+------------+--------+--------------+
| Date | Branch | Transactions |
+------------+--------+--------------+
| 01/01/2015 | 1 | 12 |
| 01/01/2015 | 1 | 9 |
| 01/01/2015 | 2 | 4 |
| 01/01/2015 | 2 | 2 |
| 01/01/2015 | 2 | 23 |
| 01/01/2015 | 3 | 42 |
| 01/01/2015 | 3 | 19 |
| 01/01/2015 | 3 | 7 |
+------------+--------+--------------+

If you want to return a query that contains each Branch and each week, then you'll need to first create a full list of that, then use a LEFT JOIN to the transactions to get the count. The code will be similar to:
select bc.Branch,
bc.WeekOfYear,
TotalTransaction = coalesce(sum(t.TransactionCount), 0)
from
(
select b.id, b.branch, c.WeekOfYear, c.date
from branch b
cross join Calendar c
-- if you want to limit the number of rows returned use a WHERE to limit the weeks
-- so far in the year or using the date column
WHERE c.date <= getdate()
and YEAR(c.Date) = #Year // (SP accepts this parameter)
) bc
left join TransactionTable t
on t.Date = bc.Date
and bc.id = t.branch
GROUP BY bc.Branch, bc.WeekOfYear
See Demo
This code will create in your subquery a full list of each branch with each date. Once you have this list, then you can JOIN to the transactions to get your total transaction count and you'd return each date as you want.

Bring in the Calendar before you bring in the transactions:
SELECT b.Branch, c.WeekOfYear, sum(TransactionCount)
FROM BranchTable b
INNER JOIN CalendarTable c ON YEAR(c.Date) = #Year
LEFT JOIN TransactionTable t ON t.Branch = b.ID AND t.Date = c.Date
GROUP BY b.Branch, c.WeekOfYear
ORDER BY c.WeekOfYear, b.Branch

Related

SQL - get summary of differences vs previous month

I have a table similar to this one:
| id | store | BOMdate |
| 1 | A | 01/10/2018 |
| 1 | B | 01/10/2018 |
| 1 | C | 01/10/2018 |
|... | ... | ... |
| 1 | A | 01/11/2018 |
| 1 | C | 01/11/2018 |
| 1 | D | 01/11/2018 |
|... | ... | ... |
| 1 | B | 01/12/2018 |
| 1 | C | 01/12/2018 |
| 1 | E | 01/12/2018 |
It contains the stores that are active at BOM (beginning of month).
How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
The output should be this:
| BOMdate | #newstores |
| 01/10/2018 | 3 | * no stores on previous month
| 01/11/2018 | 1 | * D is the only new active store
| 01/12/2018 | 2 | * store B was not active on November, E is new
I now how to count the first time that each store is active (nested select, taking the MIN(BOMdate) and then counting). But I have no idea how to check each month vs its previous month.
I use SQL Server, but I am interested in the differences in other platforms if there are any.
Thanks
How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
One option uses not exists:
select bomdate, count(*) cnt_new_stores
from mytable t
where not exists (
select 1
from mytable t1
where t1.store = t.store and t1.bomdate = dateadd(month, -1, t.bomdate)
)
group by bomdate
You can also use window functions:
select bomdate, count(*) cnt_new_stores
from (
select t.*, lag(bomdate) over(partition by store order by bomdate) lag_bomdate
from mytable t
) t
where bomdate <> dateadd(month, 1, lag_bomdate) or lag_bomdate is null
group by bomdate
you can compare a date with previous month's date using DATEDIFF function of TSQL.
Using NOT EXIST you can count the stores which did not appear in last month as well you can get the names in a list using STRING_AGG function of TSQL introduced from SQL 2017.
select BOMDate, NewStoresCount=count(1),NewStores= STRING_AGG(store,',') from
yourtable
where not exists
(
Select 1 from
yourtable y where y.store=store and DATEDIFF(m,y.BOMDate,BOMDate)=1
)
group by BOMDate

Measure population on several dates

I want to measure the population of our manucipality (which contains out of several places). I've got two tables in: my first dataset is a calender table with a row for each first day of every month.
My second table contains alle the people that live and have lived in the manucipality.
What I want is the population of each place on every first day of the month from my calender table. I've put some raw data below (just a few records of the persons table because it contains 100.000 records)
Calender table:
+----------+
| Date |
+----------+
| 1-1-2018 |
+----------+
| 1-2-2018 |
+----------+
| 1-3-2018 |
+----------+
| 1-4-2018 |
+----------+
Persons table
+-----+-----------+-----------+---------------+-------+
| BSN | Startdate | Enddate | Date of death | Place |
+-----+-----------+-----------+---------------+-------+
| 1 | 12-1-2000 | null | null | A |
+-----+-----------+-----------+---------------+-------+
| 2 | 10-5-2011 | null | 22-1-2018 | B |
+-----+-----------+-----------+---------------+-------+
| 3 | 16-12-2011| 10-2-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
| 4 | 9-11-2012 | null | null | B |
+-----+-----------+-----------+---------------+-------+
| 5 | 8-9-2013 | null | 27-3-2018 | A |
+-----+-----------+-----------+---------------+-------+
| 6 | 7-10-2017 | 28-3-2018 | null | B |
+-----+-----------+-----------+---------------+-------+
My expected result:
+----------+-------+------------+
| Date | Place | Population |
+----------+-------+------------+
| 1-1-2018 | A | 2 |
+----------+-------+------------+
| 1-1-2018 | B | 4 |
+----------+-------+------------+
| 1-2-2018 | A | 2 |
+----------+-------+------------+
| 1-2-2018 | B | 3 |
+----------+-------+------------+
| 1-3-2018 | A | 2 |
+----------+-------+------------+
| 1-3-2018 | B | 2 |
+----------+-------+------------+
| 1-4-2018 | A | 1 |
+----------+-------+------------+
| 1-4-2018 | B | 1 |
+----------+-------+------------+
What I've done so far but doesnt seems to work:
SELECT a.Place
,c.Date
,(SELECT COUNT(DISTINCT(b.BSN))
FROM Person as b
WHERE b.Startdate < c.Date
AND (b.Enddate > c.Date OR b.Enddate is null)
AND (b.Date of death > c.Date OR b.Date of death is null)
AND a.Place = b.Place) as Population
FROM Person as a
JOIN Calender as c
ON a.Startdate <= c.Date
AND a.Enddate >= c.Date
GROUP BY Place, Date
I hope someone can help finding out the problem. Thanks in advance
First cross join Calender and the places to get the date/place pairs. Then left join the persons on the place and the date. Finally group by date and place to get the count of people for that day and place.
SELECT [ca].[Date],
[pl].[Place],
count([pe].[Place]) [Population]
FROM [Calender] [ca]
CROSS JOIN (SELECT DISTINCT
[pe].[Place]
FROM [Persons] [pe]) [pl]
LEFT JOIN [Persons] [pe]
ON [pe].[Place] = [pl].[Place]
AND [pe].[Startdate] <= [ca].[Date]
AND (colaesce([pe].[Enddate],
[pe].[Date of death]) IS NULL
OR coalesce([pe].[Enddate],
[pe].[Date of death]) > [ca].[Date])
GROUP BY [ca].[Date],
[pl].[Place]
ORDER BY [ca].[Date],
[pl].[Place];
Some notes and assumptions:
If you have a table listing the places use that instead of the subquery aliases [pl]. I just had no other option with the given tables.
I believe the Date of death also implies an Enddate for the same day. You might want to consider a trigger, that sets the Enddate automatically to the Date of death if it isn't null. That would make things easier and probably more consistent.

Repeat all rows in left table for each unique ID in other table

I have a team of people who are scored on up to three metrics; sales, leads and Hours.
I have a table (tblScores) in MS Access which holds these scores but only if there is any. (e.g if someone had no sales there would be no entry for them for sales)
| USERID | Metric | Score |
----------------------------------
| 20511 | Sales | 12 |
| 20511 | Leads | 9 |
| 20511 | Hours | 8 |
| 20694 | Sales | 10 |
| 20694 | Hours | 7.5 |
I am trying to create an SQL query that will output three records (each possible metric) for each User in the above table including null values where they don't have an entry for that metric. e.g
| USERID | Metric | Score |
----------------------------------
| 20511 | Sales | 12 |
| 20511 | Leads | 9 |
| 20511 | Hours | 8 |
| 20694 | Sales | 10 |
| 20694 | Leads | Null |
| 20694 | Hours | 7.5 |
I have set up another table (tblMetrics) with just these 3 metrics
| Metric |
---------------
| Sales |
| Leads |
| Hours |
and tried to do a left join on the metric table against the score table
SELECT tblMetrics.*, TblScores.UserID, TblScores.Score
FROM tblMetrics LEFT JOIN TblScores ON tblMetrics.Metric = TblScores.Metric;
but it is still not giving the desired output. Does anyone know if this possible?
You need to do a CROSS JOIN first to generate all combinations, then do the LEFT JOIN to find which one are missing and assign NULL
I check access syntaxis and the CROSS JOIN should be write like this
SELECT DISTINCT M.Metric, S.USERID
FROM tblMetric M, tblScore S
And the Left Join should be
SELECT userMetrc.*, S.Score
FROM ( SELECT DISTINCT M.Metric, S.USERID
FROM tblMetric M, tblScore S
) userMetric
LEFT JOIN tblScore S
ON ( userMetric.USERID = S.USERID
AND userMetric.Metric = S.Metric )

MS Access duplicate values using SUM

I'm having trouble writing a query in Microsoft Access 2016 that will show the sum of an Expense for a particular event, the sum of the signs that event produced, along with the year, event description and company name.
I think I am missing something simple, and am going to feel ridiculous once someone points it out. Hopefully I managed to format my question well enough that it is easy to spot!
Here are the tables involved, along with the dummy data I am testing with.
All_Company Company_Event
------------------ ---------------------------
| ID | Company | | ID | EventDescription |
|------|---------| |----|--------------------|
| 1 | Crapple | | 1 | Concert |
| 2 | Rito | | 2 | Party |
------------------ ---------------------------
Company_Target_Actual
----------------------------------------------------------------
| All_CompanyID | Company_EventID | Year | Quarter | Signed |
|----------------|-------------------|------|---------|--------|
| 1 | 2 | 2015 | 1 | 1 |
| 1 | 2 | 2015 | 2 | 0 |
| 1 | 2 | 2015 | 3 | 3 |
| 1 | 2 | 2015 | 4 | 1 |
----------------------------------------------------------------
Budget_Company_Expense
---------------------------------------------------------------------------------
| ID | All_CompanyID | Company_EventID | Year | Category | SubCategory| Expense |
---------------------------------------------------------------------------------
| 1 | 1 | 2 | 2015 | ABCD | 123 | 40 |
| 2 | 1 | 2 | 2015 | ABCD | cat | 113 |
| 3 | 1 | 2 | 2015 | ABCD | dog | 71 |
---------------------------------------------------------------------------------
This is my code for the query, I broke it up from the ugly Access long lines of code to make it easier to read.
SELECT DISTINCTROW All_Company.Company, Budget_Company_Expense.Year,
Budget_Company_Expense.Company_EventID, Company_Event.EventDescription,
Sum(Budget_Company_Expense.Expense) AS [Sum Of Expense USD],
Sum(Company_Target_Actual.Signed) AS [Sum Of Signed]
FROM Company_Event
INNER JOIN ((All_Company
INNER JOIN Company_Target_Actual
ON All_Company.[ID] = Company_Target_Actual.[All_CompanyID])
INNER JOIN Budget_Company_Expense
ON All_Company.[ID] = Budget_Company_Expense.[All_CompanyID])
ON Company_Event.[ID] = Budget_Company_Expense.[Company_EventID]
GROUP BY All_Company.Company, Budget_Company_Expense.Year,
Budget_Company_Expense.Company_EventID, Company_Event.EventDescription;
and here is the result from running my query
Result
-------------------------------------------------------------------------------------------
| Company | Year | Company_EventID | EventDescription | Sum of Expense USD | Sum of Signed|
-------------------------------------------------------------------------------------------
| Crapple | 2015 | 2 | Party | $896.00 | 15 |
-------------------------------------------------------------------------------------------
As you can see, it is summing as if the total signs (5) happened 3 times (the number of entries in the Company_Target_Actual table) and vis versa for the Expense. Any help on my issue would be greatly appreciated,
and if I forgot any information that may help find my mistake please let me know what else I can provide!
Consider splitting the query into two aggregations, one to sum Signed in Company_Target_Actual and the other to sum Expense in Business_Company_Expense. Then, join the two queries by Company, Event, and Year which are the grouping factors.
Below uses two derived tables (subqueries in FROM/JOIN clause). However, you can very well save either one as a separate query and then join them in final query:
SELECT t1.Company, t1.Year, t1.Company_EventID, t1.EventDescription,
t2.[Sum Of Expense USD], t1.[Sum of Signed]
FROM
(SELECT ac.ID AS CompanyID, ac.Company, ca.Year, ca.Company_EventID, ev.EventDescription,
SUM(ca.Signed) AS [Sum Of Signed]
FROM (Company_Target_Actual ca
INNER JOIN Company_Event ev
ON ca.Company_EventID = ev.ID)
INNER JOIN All_Company ac
ON ca.All_CompanyID = ac.ID
GROUP BY ac.ID, ac.Company, ca.Year, ca.Company_EventID, ev.EventDescription) AS t1
INNER JOIN
(SELECT ac.ID AS CompanyID, ac.Company, be.Year, be.Company_EventID, ev.EventDescription,
SUM(be.Expense) AS [Sum Of Expense USD]
FROM (Budget_Company_Expense be
INNER JOIN Company_Event ev
ON be.Company_EventID = ev.ID)
INNER JOIN All_Company ac
ON be.All_CompanyID = ac.ID
GROUP BY ac.ID, ac.Company, be.Year, be.Company_EventID, ev.EventDescription) AS t2
ON t1.CompanyID = t2.CompanyID
AND t1.Company_EventID = t2.Company_EventID
AND t1.Year = t2.Year

Filter by value in last row of LEFT OUTER JOIN table

I have a Clients table in PostgreSQL (version 9.1.11), and I would like to write a query to filter that table. The query should return only clients which meet one of the following conditions:
--The client's last order (based on orders.created_at) has a fulfill_by_date in the past.
OR
--The client has no orders at all
I've looked for around 2 months, on and off, for a solution.
I've looked at custom last aggregate functions in Postgres, but could not get them to work, and feel there must be a built-in way to do this.
I've also looked at Postgres last_value window functions, but most of the examples are of a single table, not of a query joining multiple tables.
Any help would be greatly appreciated! Here is a sample of what I am going for:
Clients table:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 2 | SecondClient |
| 3 | ThirdClient |
Orders table:
| order_id | client_id | fulfill_by_date | created_at |
-------------------------------------------------------
| 1 | 1 | 3000-01-01 | 2013-01-01 |
| 2 | 1 | 1999-01-01 | 2013-01-02 |
| 3 | 2 | 1999-01-01 | 2013-01-01 |
| 4 | 2 | 3000-01-01 | 2013-01-02 |
Desired query result:
| client_id | client_name |
----------------------------
| 1 | FirstClient |
| 3 | ThirdClient |
Try it this way
SELECT c.client_id, c.client_name
FROM clients c LEFT JOIN
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY client_id ORDER BY created_at DESC) rnum
FROM orders
) o
ON c.client_id = o.client_id
AND o.rnum = 1
WHERE o.fulfill_by_date < CURRENT_DATE
OR o.order_id IS NULL
Output:
| CLIENT_ID | CLIENT_NAME |
|-----------|-------------|
| 1 | FirstClient |
| 3 | ThirdClient |
Here is SQLFiddle demo