How to exclude rows that have matching fields in other rows - sql

I have a table in MS Access 2013 that has a number of different columns. As part of the data that is entered into the main table, there are duplicates in certain columns. However when I 'pot up' the volumes of rows based on their status, I need to be able to exclude those with the same values in other columns.
------------------------------------------------------------
HeaderID | Date | Number | EffectiveDate | Reg | Status
------------------------------------------------------------
2 | 01/01/2016| 100001 | 01/12/2015 | 01 | Ready
3 | 01/01/2016| 100001 | 01/12/2015 | 02 | Ready
4 | 02/02/2016| 100002 | 12/11/2015 | R | Pending
5 | 02/02/2016| 100002 | 12/11/2015 | T | Pending
6 | 02/02/2016| 100002 | 12/11/2015 | N | Pending
7 | 15/09/2015| 100003 | 30/11/2015 | 01 | Ready
8 | 14/09/2015| 100004 | 20/02/2016 | 01 | New
I have the basic below code already:
Select
tbl_Progression.Status,
Count(tbl_Progression.HeaderID) AS CountofHeaderID
From tbl_Progression
Group By tbl_Progression.Status
I'm looking to be able to get the results to look like the below using the example data above, whereby the Status is counted by HeaderID but only counts once those records that have the same Date, Number and EffectiveDate (but different Reg) to look like this:
------------------------
Status | CountofHeaderID
------------------------
Pending | 1
Ready | 2
New | 1
Instead of what the current code is doing:
------------------------
Status | CountofHeaderID
------------------------
Pending | 3
Ready | 3
New | 1

MS Access doesn't support COUNT(DISTINCT). You can, however, use a subquery with DISTINCT (or GROUP BY):
Select p.Status, Count(*) as new_CountofHeaderID
From (select distinct p.status, p.Date, p.Number, pEffectiveDate
from tbl_Progression as p
) as p
Group By p.Status;

Related

Logic to read multiple rows in a table where flag = 'Y'

Consider the following scenario. I have a Customer table, which includes RowStart and EndDate logic, thus writing a new row every time a field value is updated.
Relevant fields in this table are:
RowStartDate
RowEndDate
CustomerNumber
EmployeeFlag
For this, I'd like to write a query, which will return an employee's period of tenure (EmploymentStartDate, and EmploymentEndDate). I.e. The RowStartDate when EmployeeFlag first became 'Y', and then the first RowStartDate where EmployeeFlag changed to 'N' (Ordered of course, by the RowStartDate asc). There is an additional complexity in that the Flag value may change between Y and N multiple times for a single person, as they may become staff, resign and then be employed again at a later date.
Example table structure is:
| CustomerNo | StaffFlag | RowStartDate | RowEndDate |
| ---------- | --------- | ------------ | ---------- |
| 12 | N | 2019-01-01 | 2019-01-14 |
| 12 | N | 2019-01-14 | 2019-03-02 |
| 12 | Y | 2019-03-02 | 2019-10-12 |
| 01 | Y | 2020-03-13 | NULL |
| 12 | N | 2019-10-12 | 2020-01-01 |
| 12 | Y | 2020-01-01 | NULL |
Output could be something like
| CustomerNo | StaffStartDate | StaffEndDate |
| ---------- | -------------- | ------------ |
| 12 | 2019-03-02 | 2019-10-12 |
| 01 | 2020-03-13 | NULL |
| 12 | 2021-01-01 | NULL |
Any ideas on how I might be able to solve this would be really appreciated.
Make sure you order the columns by ID and by dates:
select *
from yourtable
order by CustomerNumber asc,
EmployeeFlag desc,
RowStartDate asc,
RowEndDate asc
This gives you a list of all changes over time per employee.
Subsequently, you want to map two rows into a single row with two columns (two dates mapped into overall start and end date). Others have done this using the lead() function. For details please have a look here: Merging every two rows of data in a column in SQL Server

Complex nested aggregations to get order totals

I have a system to track orders and related expenditures. This is a Rails app running on PostgreSQL. 99% of my app gets by with plain old Rails Active Record call etc. This one is ugly.
The expenditures table look like this:
+----+----------+-----------+------------------------+
| id | category | parent_id | note |
+----+----------+-----------+------------------------+
| 1 | order | nil | order with no invoices |
+----+----------+-----------+------------------------+
| 2 | order | nil | order with invoices |
+----+----------+-----------+------------------------+
| 3 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
| 4 | invoice | 2 | invoice for order 2 |
+----+----------+-----------+------------------------+
Each expenditure has many expenditure_items and can the orders can be parents to the invoices. That table looks like this:
+----+----------------+-------------+-------+---------+
| id | expenditure_id | cbs_item_id | total | note |
+----+----------------+-------------+-------+---------+
| 1 | 1 | 1 | 5 | Fuit |
+----+----------------+-------------+-------+---------+
| 2 | 1 | 2 | 15 | Veggies |
+----+----------------+-------------+-------+---------+
| 3 | 2 | 1 | 123 | Fuit |
+----+----------------+-------------+-------+---------+
| 4 | 2 | 2 | 456 | Veggies |
+----+----------------+-------------+-------+---------+
| 5 | 3 | 1 | 34 | Fuit |
+----+----------------+-------------+-------+---------+
| 6 | 3 | 2 | 76 | Veggies |
+----+----------------+-------------+-------+---------+
| 7 | 4 | 1 | 26 | Fuit |
+----+----------------+-------------+-------+---------+
| 8 | 4 | 2 | 98 | Veggies |
+----+----------------+-------------+-------+---------+
I need to track a few things:
amounts left to be invoiced on orders (thats easy)
above but rolled up for each cbs_item_id (this is the ugly part)
The cbs_item_id is basically an accounting code to categorize the money spent etc. I have visualized what my end result would look like:
+-------------+----------------+-------------+---------------------------+-----------+
| cbs_item_id | expenditure_id | order_total | invoice_total | remaining |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 1 | 5 | 0 | 5 |
+-------------+----------------+-------------+---------------------------+-----------+
| 1 | 2 | 123 | 60 | 63 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 1 | 68 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 1 | 15 | 0 | 15 |
+-------------+----------------+-------------+---------------------------+-----------+
| 2 | 2 | 456 | 174 | 282 |
+-------------+----------------+-------------+---------------------------+-----------+
| | | | Rollup for cbs_item_id: 2 | 297 |
+-------------+----------------+-------------+---------------------------+-----------+
order_total is the sum of total for all the expenditure_items of the given order ( category = 'order'). invoice_total is the sum of total for all the expenditure_items with parent_id = expenditures.id. Remaining is calculated as the difference (but not greater than 0). In real terms the idea here is you place and order for $1000 and $750 of invoices come in. I need to calculate that $250 left on the order (remaining) - broken down into each category (cbs_item_id). Then I need the roll-up of all the remaining values grouped by the cbs_item_id.
So for each cbs_item_id I need group by each order, find the total for the order, find the total invoiced against the order then subtract the two (also can't be negative). It has to be on a per order basis - the overall aggregate difference will not return the expected results.
In the end looking for a result something like this:
+-------------+-----------+
| cbs_item_id | remaining |
+-------------+-----------+
| 1 | 68 |
+-------------+-----------+
| 2 | 297 |
+-------------+-----------+
I am guessing this might be a combination of GROUP BY and perhaps a sub query or even CTE (voodoo to me). My SQL skills are not that great and this is WAY above my pay grade.
Here is a fiddle for the data above:
http://sqlfiddle.com/#!17/2fe3a
Alternate fiddle:
https://dbfiddle.uk/?rdbms=postgres_11&fiddle=e9528042874206477efbe0f0e86326fb
This query produces the result you are looking for:
SELECT cbs_item_id, sum(order_total - invoice_total) AS remaining
FROM (
SELECT cbs_item_id
, COALESCE(e.parent_id, e.id) AS expenditure_id -- ①
, COALESCE(sum(total) FILTER (WHERE e.category = 'order' ), 0) AS order_total -- ②
, COALESCE(sum(total) FILTER (WHERE e.category = 'invoice'), 0) AS invoice_total
FROM expenditures e
JOIN expenditure_items i ON i.expenditure_id = e.id
GROUP BY 1, 2 -- ③
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
① Note how I assume a saner table definition with expenditures.parent_id being integer, and true NULL instead of the string 'nil'. This allows the simple use of COALESCE.
② About the aggregate FILTER clause:
Aggregate columns with additional (distinct) filters
③ Using short syntax with ordinal numbers of an SELECT list items. Example:
Select first row in each GROUP BY group?
can I get the total of all the remaining for all rows or do I need to wrap that into another sub select?
There is a very concise option with GROUPING SETS:
...
GROUP BY GROUPING SETS ((1), ()) -- that's all :)
db<>fiddle here
Related:
Converting rows to columns

Distinct lists on dates where an ID is present (i.e. intersects) on consecutive dates

I'm trying to make an MSSQL query that produces lists of apartment prices. The ultimate goal of the query is to calculate the percentage change in average prices of apartments. However, this final calculation (namely taking averages) is something I can fix in code provided that the list(s) of prices that are retrieved are correct.
What makes this tricky is that apartments are sold and new ones added all the time, so when comparing prices from week to week (I have weekly data), I only want to compare prices for apartments that have a recorded price in weeks (t-1, t), (t, t+1), (t+1,t+2) etc. In other words, some apartments that had a recorded price in time (t-1) might not be there at time t, and some apartments may have been added at time t (and thus weren't there at time t-1). I only want to select prices in week t-1 and t where some ApartmentID exists in both week t-1 and t to calculate the average change in week t.
Example data
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 2 | 2014-04-04 | 6 | 2 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
Solutions thoughts
I think it makes sense to produce two different lists, one for odd-numbered weeks and one for even-numbered weeks. List 1 would then contain Date, Price and ApartmentID that are valid for the tuples (t-1,t), (t+1,t+2), (t+3,t+4) etc. while list 2 would contain the same for the tuples (t,t+1),(t+2,t+3),(t+4,t+5) etc. The reason I think two lists are needed is that for any given week t, there are two sets of apartments and corresponding prices that need to be produced - one that is "forward compatible" and one that is "backwards compatible".
If two such lists can be produced, then the rest is simply an exercise in taking averages over each distinct date.
I'm not really sure to begin here. I played a little around with Intersect, but I'm pretty sure I need to nest queries to get this to work.
Result
Using the methodology described above would yield two lists.
List 1:
Notice how RegistrationID 2 and 6 disappear because they don't exist in on both dates 2014-04-04 and 2014-04-11. The same goes for RegistrationID 7 as this apartment doesn't exist for both 2014-04-19 and 2014-04-26.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 1 | 2014-04-04 | 5 | 1 |
| 3 | 2014-04-04 | 4 | 3 |
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
| 10 | 2014-04-26 | 4.1 | 3 |
| 11 | 2014-04-26 | 7.2 | 4 |
-------------------------------------------------------------
List 2:
Here, nothing disappears because every apartment is present in the tuples within the scope of this list.
-------------------------------------------------------------
| RegistrationID | Date | Price | ApartmentID |
-------------------------------------------------------------
| 4 | 2014-04-11 | 5.2 | 1 |
| 5 | 2014-04-11 | 4 | 3 |
| 6 | 2014-04-11 | 7 | 4 |
| 7 | 2014-04-19 | 5.1 | 1 |
| 8 | 2014-04-19 | 4.1 | 3 |
| 9 | 2014-04-19 | 7.1 | 4 |
-------------------------------------------------------------
Here's a solution. First, I get all the records from the table (I named it "ApartmentPrice"), computing the WeekOf (which is the Sunday of that week), PreviousWeek (the Sunday of the previous week), and NextWeek (the Sunday of the following week). I store that in a table variable (you could also put it in a CTE or a temp table).
declare #tempTable table(RegistrationId int, PriceDate date, Price decimal(8,2), ApartmentId int, WeekOf date, PreviousWeek date, NextWeek date)
Insert #tempTable
select ap.RegistrationId,
ap.PriceDate,
ap.Price,
ap.ApartmentId,
DATEADD(ww, DATEDIFF(ww,0,ap.PriceDate), 0) WeekOf,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, -1, ap.PriceDate)), 0) PreviousWeek,
DATEADD(ww, DATEDIFF(ww,0,dateadd(wk, 1, ap.PriceDate)), 0) NextWeek
from ApartmentPrice ap
Then I join that table variable to itself where WeekOf equals either NextWeek or PreviousWeek. This gives the apartments that have a record in the adjoining week.
select distinct t.RegistrationId, t.PriceDate, t.Price, t.ApartmentId
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
order by t.RegistrationId, t.ApartmentId, t.PriceDate
I'm using distinct because an apartment will appear more than once in the results if it does have an adjoining week record.
You can also find the average prices for each week like this:
select t.WeekOf, avg(distinct t.Price)
from #tempTable t
join #tempTable t2 on t.ApartmentId = t2.ApartmentId and (t.WeekOf = t2.PreviousWeek or t.WeekOf = t2.NextWeek)
group by t.WeekOf
order by t.WeekOf
Here's a SQL Fiddle. I added a few more rows to the test data to show that it handles dates that cross the end of the year boundary.

Create a pivot table from two tables based on dates

I have two MS Access tables sharing a one to many relationship. Their structures are like the following:
tbl_Persons
+----------+------------+-----------+
| PersonID | PersonName | OtherData |
+----------+------------+-----------+
| 1 | PersonA | etc. |
| 2 | PersonB | |
| 3 | PersonC | |
tbl_Visits
+----------+------------+------------+-----------------------
| VisitID | PersonID | VisitDate | dozens of other fields
+----------+------------+------------+-----------
| 1 | 1 | 09/01/13 |
| 2 | 1 | 09/02/13 |
| 3 | 2 | 09/03/13 |
| 4 | 2 | 09/04/13 | etc...
I wish to create a new table based on the VisitDate field, the column headings of which are Visit-n where n is 1 to the number of visits, Visit-n-Data1, Visit-n-Data2, Visit-n-Data3 etc.
MergedTable
+----------+----------+---------------+-----------------+----------+----------------+
| PersonID | Visit1 | Visit1Data1 | Visit1Data2... | Visit2 | Visit2Data1... |
+----------+----------+---------------+-----------
| 1 | 09/01/13 | | | 09/02/13 |
| 2 | 09/03/13 | | | 09/04/13 |
| 3 | etc. | |
I am really not sure how to do this. Whether SQL query or using DAO then looping through records and columns. It is essential that there is only 1 PersonID per row and all his data appears chronologically into columns.
Start of by ranking the visits with something like
SELECT PersonID, VisitID,
(SELECT COUNT(VisitID) FROM tbl_Visits AS C
WHERE C.PersonID = tbl_Visits.PersonID
AND C.VisitDate < tbl_Visits.VisitDate) AS RankNumber
FROM tbl_Visits
Use this query as a base for the 'pivot'
Since you seem to have some visits of persons on the same day (visit 1 and 2) the WHERE clause needs to be a bit more sophisticated. But I hope you get the basic concept.
Pivoting can be done with multiple LEFT JOINs.
I question if my solution will have a high performance, since I did not test it. It is easier in SQL Server than in MS Access to accomplish.

MySQL: How to select and display ALL rows from one table, and calculate the sum of a where clause on another table?

I'm trying to display all rows from one table and also SUM/AVG the results in one column, which is the result of a where clause. That probably doesn't make much sense, so let me explain.
I need to display a report of all employees...
SELECT Employees.Name, Employees.Extension
FROM Employees;
--------------
| Name | Ext |
--------------
| Joe | 123 |
| Jane | 124 |
| John | 125 |
--------------
...and join some information from the PhoneCalls table...
--------------------------------------------------------------
| PhoneCalls Table |
--------------------------------------------------------------
| Ext | StartTime | EndTime | Duration |
--------------------------------------------------------------
| 123 | 2010-09-05 10:54:22 | 2010-09-05 10:58:22 | 240 |
--------------------------------------------------------------
SELECT Employees.Name,
Employees.Extension,
Count(PhoneCalls.*) AS CallCount,
AVG(PhoneCalls.Duration) AS AverageCallTime,
SUM(PhoneCalls.Duration) AS TotalCallTime
FROM Employees
LEFT JOIN PhoneCalls ON Employees.Extension = PhoneCalls.Extension
GROUP BY Employees.Extension;
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 10 | 200 | 2000 |
| Jane | 124 | 20 | 250 | 5000 |
| John | 125 | 3 | 100 | 300 |
------------------------------------------------------------
Now I want to filter out some of the rows that are included in the SUM and AVG calculations...
WHERE PhoneCalls.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
...which will ideally result in a table looking something like this:
------------------------------------------------------------
| Name | Ext | CallCount | AverageCallTime | TotalCallTime |
------------------------------------------------------------
| Joe | 123 | 5 | 200 | 1000 |
| Jane | 124 | 10 | 250 | 2500 |
| John | 125 | 0 | 0 | 0 |
------------------------------------------------------------
Note that John has not made any calls in this date range, so his total CallCount is zero, but he is still in the list of results. I can't seem to figure out how to keep records like John's in the list. When I add the WHERE clause, those records are filtered out.
How can I create a select statement that displays all of the Employees and only SUMs/AVGs the values returned from the WHERE clause?
Use:
SELECT e.Name,
e.Extension,
Count(pc.*) AS CallCount,
AVG(pc.Duration) AS AverageCallTime,
SUM(pc.Duration) AS TotalCallTime
FROM Employees e
LEFT JOIN PhoneCalls pc ON pc.extension = e.extension
AND pc.StartTime BETWEEN "2010-09-12 09:30:00" AND NOW()
GROUP BY e.Name, e.Extension
The issue is when using an OUTER JOIN, specifying criteria in the JOIN section is applied before the JOIN takes place--like a derived table or inline view. The WHERE clause is applied after the OUTER JOIN, which is why when you specified the WHERE clause on the table being LEFT OUTER JOIN'd to that the rows you still wanted to see are being filtered out.