Access SQL: Calculating Percent Change between two rows - sql

I am trying to calculate the percentage change between two dates in Access.
The user inputs the start date, an end date and the desired percentage change. If the result matches the criteria it should return the name.
Please note this is the dd/mm/yy format.
This is the table I have:
Name | Payment | DateTime
John | 53.00 | 06/01/18
Mike | 23.16 | 12/07/18
Steve | 31.28 | 21/03/18
John | 58.30 | 22/04/18
For example, if the user inputs a start date of 01/01/18, an end date of 23/04/18, and a desired percentage change of 10 percent then it should return the name John.
Currently, this is the query I have:
SELECT Name
FROM UserPayments
WHERE DateTime >= '01/01/18' AND DateTime <= '23/04/18';
I would like to calculate the total percentage change for two entries. This should be the record date immediately after the first date and the date immediately before the second date.
I'm completely stuck on how to do this in Access (SQL).
I think I need to use a join, but I have not done these before.

There should be a Users table with a primary key of something like UserID, and that UserID should be used in the UserPayments table instead of the Name of the user.
Although it would by nice to use functions like First and Last, MS Access will not allow to use ORDER BY for sorting the details in a group defined by GROUP BY. Therefore, you will have to use subqueries to retrieve the two values for the percentage calculation. If you have a Users table like mentionned above and the DateTime value can be assumed to be unique per User, this could look like this:
SELECT Users.Name,
(SELECT TOP 1 Payment FROM UserPayments
WHERE UserID = Users.UserID AND DateTime Between #01/01/2018# And #04/23/2018#
ORDER BY DateTime DESC)
/
(SELECT TOP 1 Payment FROM UserPayments
WHERE UserID = Users.UserID AND DateTime Between #01/01/2018# And #04/23/2018#
ORDER BY DateTime ASC)
-1 AS PercentChange
FROM Users;
Be aware of rounding errors and format the result as a percentage.

Related

SQL max without group by

I would like to get one row with the maximum date. I cannot use group by as I need to retrieve all data in that row.
I have this:
ID Date Country
1 05/05/2019 US
2 05/06/2019 UK
I want to get this:
ID Date Country
2 05/06/2019 UK
I've tried the below but it didn't work for me
select TOP 1 ID, Date, country
from table
order by Date desc
I don't believe you. Here is a db<>fiddle that shows three different interpretations of the date in your sample data:
as a string
as mm/dd/yyyy
as dd/mm/yyyy
All three of them produce the same result.
I suspect that your actual data is more complicated and you have oversimplified the example for the question. Further, my suspicion is that the date column is stored as a string rather than a date.
As a string, you might have some hidden characters that affect the sorting (such as leading spaces).
If this is the case, fix the data type and your code will work.
This depends on what DB system you are using.
In Microsoft SQL server, you can use row_number() function:
select top 1 *
from facts
order by ROW_NUMBER() over (order by dateKey)
Can you try this?
select Top 1 ID,Date, country from table where date = max(date)
First set the DATE or DATETIME Datatype in your [Date] column
then try this code:
SELECT TOP 1 ID, [Date] , country FROM TableName ORDER BY Date DESC
SELECT ID,Date,Country from TableName Where Date = MAX(Date) AND Rownum <= 1

How many customers upgraded from Product A to Product B?

I have a "daily changes" table that records when a customer "upgrades" or "downgrades" their membership level. In the table, let's say field 1 is customer ID, field 2 is membership type and field 3 is the date of change. Customers 123 and ABC each have two rows in the table. Values in field 1 (ID) are the same, but values in field 2 (TYPE) and 3 (DATE) are different. I'd like to write a SQL query to tell me how many customers "upgraded" from membership type 1 to membership type 2 how many customers "downgraded" from membership type 2 to membership type 1 in any given time frame.
The table also shows other types of changes. To identify the records with changes in the membership type field, I've created the following code:
SELECT *
FROM member_detail_daily_changes_new
WHERE customer IN (
SELECT customer
FROM member_detail_daily_changes_new
GROUP BY customer
HAVING COUNT(distinct member_type_cd) > 1)
I'd like to see an end report which tells me:
For Fiscal 2018,
X,XXX customers moved from Member Type 1 to Member Type 2 and
X,XXX customers moved from Member Type 2 to Member type 1
Sounds like a good time to use a LEAD() analytical function to look ahead for a given customer's member_Type; compare it to current record and then evaluate if thats an upgrade/downgrade then sum results.
DEMO
CTE AS (SELECT case when lead(Member_Type_Code) over (partition by Customer order by date asc) > member_Type_Code then 1 else 0 end as Upgrade
, case when lead(Member_Type_Code) over (partition by Customer order by date asc) < member_Type_Code then 1 else 0 end as DownGrade
FROM member_detail_daily_changes_new
WHERE Date between '20190101' and '20190201')
SELECT sum(Upgrade) upgrades, sum(downgrade) downgrades
FROM CTE
Giving us: using my sample data
+----+----------+------------+
| | upgrades | downgrades |
+----+----------+------------+
| 1 | 3 | 2 |
+----+----------+------------+
I'm not sure if SQL express on rex tester just doesn't support the sum() on the analytic itself which is why I had to add the CTE or if that's a rule in non-SQL express versions too.
Some other notes:
I let the system implicitly cast the dates in the where clause
I assume the member_Type_Code itself tells me if it's an upgrade or downgrade which long term probably isn't right. Say we add membership type 3 and it goes between 1 and 2... now what... So maybe we need a decimal number outside of the Member_Type_Code so we can handle future memberships and if it's an upgrade/downgrade or a lateral...
I assumed all upgrades/downgrades are counted and a user can be counted multiple times if membership changed that often in time period desired.
I assume an upgrade/downgrade can't occur on the same date/time. Otherwise the sorting for lead may not work right. (but if it's a timestamp field we shouldn't have an issue)
So how does this work?
We use a Common table expression (CTE) to generate the desired evaluations of downgrade/upgrade per customer. This could be done in a derived table as well in-line but I find CTE's easier to read; and then we sum it up.
Lead(Member_Type_Code) over (partition by customer order by date asc) does the following
It organizes the data by customer and then sorts it by date in ascending order.
So we end up getting all the same customers records in subsequent rows ordered by date. Lead(field) then starts on record 1 and Looks ahead to record 2 for the same customer and returns the Member_Type_Code of record 2 on record 1. We then can compare those type codes and determine if an upgrade or downgrade occurred. We then are able to sum the results of the comparison and provide the desired totals.
And now we have a long winded explanation for a very small query :P
You want to use lag() for this, but you need to be careful about the date filtering. So, I think you want:
SELECT prev_membership_type, membership_type,
COUNT(*) as num_changes,
COUNT(DISTINCT member) as num_members
FROM (SELECT mddc.*,
LAG(mddc.membership_type) OVER (PARTITION BY mddc.customer_id ORDER BY mddc.date) as prev_membership_type
FROM member_detail_daily_changes_new mddc
) mddc
WHERE prev_membership_type <> membership_type AND
date >= '2018-01-01' AND
date < '2019-01-01'
GROUP BY membership_type, prev_membership_type;
Notes:
The filtering on date needs to occur after the calculation of lag().
This takes into account that members may have a certain type in 2017 and then change to a new type in 2018.
The date filtering is compatible with indexes.
Two values are calculated. One is the overall number of changes. The other counts each member only once for each type of change.
With conditional aggregation after self joining the table:
select
2018 fiscal,
sum(case when m.member_type_cd > t.member_type_cd then 1 else 0 end) upgrades,
sum(case when m.member_type_cd < t.member_type_cd then 1 else 0 end) downgrades
from member_detail_daily_changes_new m inner join member_detail_daily_changes_new t
on
t.customer = m.customer
and
t.changedate = (
select max(changedate) from member_detail_daily_changes_new
where customer = m.customer and changedate < m.changedate
)
where year(m.changedate) = 2018
This will work even if there are more than 2 types of membership level.

SQL Find latest record only if COMPLETE field is 0

I have a table with multiple records submitted by a user. In each record is a field called COMPLETE to indicate if a record is fully completed or not.
I need a way to get the latest records of the user where COMPLETE is 0, LOCATION, DATE are the same and no additional record exist where COMPLETE is 1. In each record there are additional fields such as Type, AMOUNT, Total, etc. These can be different, even though the USER, LOCATION, and DATE are the same.
There is a SUB_DATE field and ID field that denote the day the submission was made and auto incremented ID number. Here is the table:
ID NAME LOCATION DATE COMPLETE SUB_DATE TYPE1 AMOUNT1 TYPE2 AMOUNT2 TOTAL
1 user1 loc1 2017-09-15 1 2017-09-10 Food 12.25 Hotel 65.54 77.79
2 user1 loc1 2017-09-15 0 2017-09-11 Food 12.25 NULL 0 12.25
3 user1 loc2 2017-08-13 0 2017-09-05 Flight 140 Food 5 145.00
4 user1 loc2 2017-08-13 0 2017-09-10 Flight 140 NULL 0 140
5 user1 loc3 2017-07-14 0 2017-07-15 Taxi 25 NULL 0 25
6 user1 loc3 2017-08-25 1 2017-08-26 Food 45 NULL 0 45
The results I would like is to retrieve are ID 4, because the SUB_DATE is later that ID 3. Which it has the same Name, Location, and Date information and there is no COMPLETE with a 1 value.
I would also like to retrieve ID 5, since it is the latest record for the User, Location, Date, and Complete is 0.
I would also appreciate it if you could explain your answer to help me understand what is happening in the solution.
Not sure if I fully understood but try this
SELECT *
FROM (
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your_table t
) a
WHERE CompleteForNameLocationAndDate = 0 AND
SUB_DATE = LastSubDate
So what we have done here:
First, if you run just the inner query in Management Studio, you will see what that does:
The first max function will partition the data in the table by each unique Name,Location,Date set.
In the case of your data, ID 1 & 2 are the first partition, 3&4 are the second partition, 5 is the 3rd partition and 6 is the 4th partition.
So for each of these partitions it will get the max value in the complete column. Therefore any partition with a 1 as it's max value has been completed.
Note also, the convert function. This is because COMPLETE is of datatype BIT (1 or 0) and the max function does not work with that datatype. We therefore convert to INT. If your COMPLETE column is type INT, you can take the convert out.
The second max function partitions by unique Name, Location and Date again but we are getting the max_sub date this time which give us the date of the latest record for the Name,Location,Date
So we take that query and add it to a derived table which for simplicity we call a. We need to do this because SQL Server doesn't allowed windowed functions in the WHERE clause of queries. A windowed function is one that makes use of the OVER keyword as we have done. In an ideal world, SQL would let us do
SELECT *,
MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) AS CompleteForNameLocationAndDate,
MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE) AS LastSubDate
FROM your)table t
WHERE MAX(CONVERT(INT,COMPLETE)) OVER (PARTITION BY NAME,LOCATION,DATE) = 0 AND
SUB_DATE = MAX(SUB_DATE) OVER (PARTITION BY NAME, LOCATION, DATE)
But it doesn't allow it so we have to use the derived table.
So then we basically SELECT everything from our derived table Where
CompleteForNameLocationAndDate = 0
Which are Name,Location, Date partitions which do not have a record marked as complete.
Then we filter further asking for only the latest record for each partition
SUB_DATE = LastSubDate
Hope that makes sense, not sure what level of detail you need?
As a side, I would look at restructuring your tables (unless of course you have simplified to better explain this problem) as follows:
(Assuming the table in your examples is called Booking)
tblBooking
BookingID
PersonID
LocationID
Date
Complete
SubDate
tblPerson
PersonID
PersonName
tblLocation
LocationID
LocationName
tblType
TypeID
TypeName
tblBookingType
BookingTypeID
BookingID
TypeID
Amount
This way if you ever want to add Type3 or Type4 to your booking information, you don't need to alter your table layout

MS Access Query to get multiple counts from the same field

I'm querying imported data that has a date/time field that I can't format as date in the table.
Sample:
Ticket Name Date
INC000101 User1 9/5/2016 10:00:34AM
INC000102 User2 9/5/2016 12:02:00PM
INC000103 User1 9/7/2016 3:34:00PM
INC000104 User2 10/1/2016 9:30:23AM
INC000105 User1 10/5/2016 10:20:00AM
INC000106 USer2 10/6/2016 4:56:00PM
I'm trying to get a count of how many tickets each user has per month. Because the Date field comes from the database as a text field, I can't seem to make that format as date/time so I use "left" to filter by month. This is what I've used to get a return on a single User item for the month of October.
SELECT COUNT(*)
FROM 2016YTD
WHERE [Name]='User1' AND left(Date,3) = '10/';
I would like to add counts for User2 through UserX per month so that I can get a count row or column for each the quantity of tickets for each user each month in one report. Everything I've tried won't save the query due to syntax errors in one form or another. I've tried variations of the following query help post as well without success.
SELECT a.distributor_id,
(SELECT COUNT(*) FROM myTable WHERE level='personal' and distributor_id = a.distributor_id) as PersonalCount,
(SELECT COUNT(*) FROM myTable WHERE level='exec' and distributor_id = a.distributor_id) as ExecCount,
(SELECT COUNT(*) FROM myTable WHERE distributor_id = a.distributor_id) as TotalCount
FROM myTable a ;
I'm sure the answer is staring at me, just not sure what at the moment.
Thanks for reading
John
This is only an answer to your first question about how to deal with dates that are stored in text fields.
To get the count for all users for every month you can do:
SELECT [Name], Format([Date],'mmmm') AS Month, COUNT(*) as Count
FROM 2016YTD
GROUP BY [Name], Format([Date],'mmmm')
A text field containing a date that is always in the same format can be treated as a date with Format() so Format([Date],'mmmm') returns the full month name for each date.
you should just need conditional aggregation. I haven't looked at access in a while but probably just something like this:
SELECT
a.distributor_id
,Format([Date],'mmmm') as Month
,SUM(IIF(level='personal',1,0)) as PersonalCount
,SUM(IIF(level='exec',1,0)) as ExecCount
,COUNT(*) as TotalCount
FROM
myTable a
GROUP BY
a.distributor_id
,Format([Date],'mmmm')

Is there an established pattern for SQL queries which group by a range?

I've seen a lot of questions on SO concerning how to group data by a range in a SQL query.
The exact scenarios vary, but the general underlying problem in each is to group by a range of values rather than each discrete value in the GROUP BY column. In other words, to group by a less precise granularity than you're storing in the database table.
This crops up often in the real world when producing things like histograms, calendar representations, pivot tables and other bespoke reporting outputs.
Some example data (tables unrelated):
| OrderHistory | | Staff |
--------------------------- ------------------------
| Date | Quantity | | Age | Name |
--------------------------- ------------------------
|01-Jul-2012 | 2 | | 19 | Barry |
|02-Jul-2012 | 5 | | 53 | Nigel |
|08-Jul-2012 | 1 | | 29 | Donna |
|10-Jul-2012 | 3 | | 26 | James |
|14-Jul-2012 | 4 | | 44 | Helen |
|17-Jul-2012 | 2 | | 49 | Wendy |
|28-Jul-2012 | 6 | | 62 | Terry |
--------------------------- ------------------------
Now let's say we want to use the Date column of the OrderHistory table to group by weeks, i.e. 7-day ranges. Or perhaps group the Staff into 10-year age ranges:
| Week | QtyCount | | AgeGroup | NameCount |
-------------------------------- -------------------------
|01-Jul to 07-Jul | 7 | | 10-19 | 1 |
|08-Jul to 14-Jul | 8 | | 20-29 | 2 |
|15-Jul to 21-Jul | 2 | | 30-39 | 0 |
|22-Jul to 28-Jul | 6 | | 40-49 | 2 |
-------------------------------- | 50-59 | 1 |
| 60-69 | 1 |
-------------------------
GROUP BY Date and GROUP BY Age on their own won't do it.
The most common answers I see (none of which are consistently voted "correct") are to use one or more of:
a bunch of CASE statements, one per grouping
a bunch of UNION queries, with a different WHERE clause per grouping
as I'm working with SQL Server, PIVOT() and UNPIVOT()
a two-stage query using a sub-select, temp table or View construct
Is there an established generic pattern for dealing with such queries?
You can use some of the dimensional modeling techniques, such as fact tables and dimension tables. Order History can act as a fact table with DateKey foreign key relation to a Date dimension.
Date dimension can have a schema such as below:
Note that Date table is pre-filled with data up-to N number of years.
Using an example above, here is a sample query to get the result:
select CalendarWeek, sum(Quantity)
from OrderHistory a
join DimDate b
on a.DateKey = b.DateKey
group by CalendarWeek
For Staff table, you can store Birthday Key instead of age and let the query calculate the age and ranges.
Here is SQL Fiddle
Date dimension population script was taken from here.
As is often the case this SQL problem requires using more than one pattern in composition.
In this case the two you can use are
NTILE
Numbers Table
You can use NTITLE to create a set number of groups. However since you don't have each member of the groups represented you also need to use a numbers table Since you're using SQL Server you have it easy as you don't have to simulate either.
Here's an example for the Staff problem
WITH g as (
SELECT
NTILE(6) OVER (ORDER BY number) grp,
NUMBER
FROM
master..spt_values
WHERE
TYPE = 'P'
and number >=10 and number <=69
)
SELECT
CAST(min(g.number) as varchar) + ' - ' +
CAST(max(g.number) as varchar) AgeGroup ,
COUNT(s.age) NameCount
FROM
g
LEFT JOIN Staff s
ON g.NUMBER = s.Age
GROUP BY
grp
DEMO
You can apply this to dates as well it just requires some date to day maniplulation
Take a look at the OVER clause and its associated clauses: PARTITION BY, ROW, RANGE...
Determines the partitioning and ordering of a rowset before the
associated window function is applied. That is, the OVER clause
defines a window or user-specified set of rows within a query result
set. A window function then computes a value for each row in the
window. You can use the OVER clause with functions to compute
aggregated values such as moving averages, cumulative aggregates,
running totals, or a top N per group results.
My favorite case in this genre is where transactions must be grouped by fiscal quarter or fiscal year. The fiscal quarter or fiscal year boundaries of various enterprises can border on the bizarre.
My favorite way to implement this is to create a separate table for the attributes of a date. Let's call the table "Almanac". One of the columns in this table is the fiscal quarter, and another one is the fiscal year. The key to this table is of course the date. Ten years worth of data fill up 3,650 rows, plus a few for leap years. You then need a program that can populate this table from scratch. All the enterprise calendar rules are built into this one program.
When you need to group transaction data by fiscal quarter, you just join with this table over date, and then group by fiscal quarter.
I figure this pattern could be extended to groupings by other kinds of ranges, but I've never done it myself.
In your first example your intervals are regular so you can achieve the desired result simply by using functions. Below is an example that gets the data as you require it. The first query keeps the first column in date format (how I would preferably deal with it doing any formatting outside of SQL), the second does the string conversion for you.
DECLARE #OrderHistory TABLE (Date DATE, Quantity INT)
INSERT #OrderHistory VALUES
('20120701', 2), ('20120702', 5), ('20120708', 1), ('20120710', 3),
('20120714', 4), ('20120717', 2), ('20120728', 6)
SET DATEFIRST 7
SELECT DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date) AS WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
GROUP BY DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date)
SELECT WeekStart,
SUM(Quantity) AS Quantity
FROM #OrderHistory
CROSS APPLY
( SELECT CONVERT(VARCHAR(6), DATEADD(DAY, 1 - DATEPART(WEEKDAY, Date), Date), 6) + ' to ' +
CONVERT(VARCHAR(6), DATEADD(DAY, 7 - DATEPART(WEEKDAY, Date), Date), 6) AS WeekStart
) ws
GROUP BY WeekStart
Something similar can be done for your age grouping using:
SELECT CAST(FLOOR(Age / 10.0) * 10 AS INT)
However this fails for 30-39 because there is no data for this group.
My stance on the matter would be, if you are doing the query as a one off, using a temp table, cte or case statement should work just fine, this should also extend to reusing the same query on small sets of data.
If you are likely to reuse the group however, or you are referring to significant amounts of data then create a permanent table with the ranges defined and indices applied to any columns required. This is the basis of creating dimensions in OLAP.
Couldn't you treat the age (or date) as a foreign key in a new, tiny table that is just ages (or dates) and their corresponding ranges? A join statement could provide a new table with a column that contains AgeGroups. With the new table you could use the standard group-by method.
It does seem reckless to make a new table for grouping, but it would be easy to make programatically and I think it would be easier to maintain (or drop and recreate) than a case statement or a where clause. If the result of this query is a one-off, a throwaway sql statement would probably work best, but I think my method makes the most sense for long-term use.
Well, some years ago with Oracle DB we did it the following way:
We had two tables: Sessions and Ranges. Ranges had foreign key that referenced Session.
When we needed to perform SQL, we created a new record in Sessions and several new records in Ranges that referred to that session.
Our SQL joined Ranges with filter by Session:
select sum(t.Value), r.Name
from DataTable t
join Ranges r on (r.Session = ? and r.Start t.MyDate)
group by r.Name
After we got results we deleted that record from Sessions and records from Ranges where deleted by cascade.
We had daemon job that purged Sessions from junk records that were leaked in case of extraordinary situation (killed processes, etc).
This worked perfectly. Since that time Oracle added new SQL clauses, and maybe they could be used instead. But on other RDBMSes this is still a valid way.
Another approach is to create a number of functions such as GET_YEAR_BY_DATE or GET_QUARTER_BY_DATE or GET_WEEK_BY_DATE (they would return start date of corresponding
period, for example, for any date return start date of year). And then group by them:
select sum(Value), GET_YEAR_BY_DATE(MyDate) from DataTable
group by GET_YEAR_BY_DATE(MyDate)