How to calculate percent change between two values in the same column - sql

I want to calculate a percent change between two values in the same column in a specific form and I have no idea if what I’m trying to do is even possible.
I have a table with 3 fields
Month, Country, Value
order_month
country
value
2021-01
UK
10
2022-02
UK
20
2021-01
France
20
2022-02
France
18
2021-01
Italy
25
2021-02
Italy
35
What I struggle to get :
order_month
country
value
2021-01
UK
10
2022-02
UK
20
diff
UK
10
2021-01
France
20
2022-01
France
18
diff
France
-2
2021-01
Italy
25
2022-02
Italy
35
diff
Italy
10
I tried many things without success. Thanks a lot if you can help me on this.

You can use the LEAD/LAG window functions for this. I'd propose using this to create a new column for the difference, rather than hoping to add in a new row into the result to get the difference of the two rows above it.
Schema (MySQL v8.0)
CREATE TABLE data (
`order_month` date,
`country` VARCHAR(6),
`value` INTEGER
);
INSERT INTO data
(`order_month`, `country`, `value`)
VALUES
('2021-01-01', 'UK', '10'),
('2022-02-01', 'UK', '20'),
('2021-01-01', 'France', '20'),
('2022-02-01', 'France', '18'),
('2021-01-01', 'Italy', '25'),
('2022-02-01', 'Italy', '35');
Query #1
select *,
VALUE - Lead(VALUE) OVER (PARTITION BY COUNTRY ORDER BY ORDER_MONTH DESC) as Month_vs_Month
from data;
order_month
country
value
Month_vs_Month
2022-02-01
France
18
-2
2021-01-01
France
20
2022-02-01
Italy
35
10
2021-01-01
Italy
25
2022-02-01
UK
20
10
2021-01-01
UK
10
View on DB Fiddle

Demo Granted I'm using SQL server but both support union, both support FIRST_VALUE analytic so... I think this will be ok...
Assumptions:
Your order_month is a string or diff will be able to be used.
the collation used supports #'s first or sort could be off.
You're ok with country being sorted.
.
WITH CTE AS (SELECT order_month, country, value
FROM data
UNION ALL
SELECT Distinct 'diff' order_month, country,
FIRST_VALUE(value) over (partition by country order by order_month DESC) -
FIRST_VALUE(value) over (partition by country order by order_month ASC) value
FROM data)
SELECT *
FROM CTE
ORDER BY country, order_month
Giving us:
+-------------+---------+-------+
| order_month | country | value |
+-------------+---------+-------+
| 2021-01-01 | France | 20 |
| 2022-02-01 | France | 18 |
| diff | France | -2 |
| 2021-01-01 | Italy | 25 |
| 2022-02-01 | Italy | 35 |
| diff | Italy | 10 |
| 2021-01-01 | UK | 10 |
| 2022-02-01 | UK | 20 |
| diff | UK | 10 |
+-------------+---------+-------+
What this does:
Generate a CTE using the first_value analytic once ordered by month asc then desc.
we then subract the older form the newer.
we then need to group and order the data so we have a CTE we select from.
I'm not a fan of overlading the order_month with diff

You need to create two subqueries or CTEs that isolate the values to the months you're analyzing.
Subquery example
select
country,
value as jan_value
from {{table}}
where order_month = '2022-01'
Do the same for Februrary then join the tables to create a new data set with county, jan_value, and feb_value. From this dataset you can determine difference in the values.

Related

How to find the average by day in SQL?

I am super new to SQL and am trying to figure out how to find the average by day. So YTD what were they averaging by day.
the table below is an example of the table I am working with
Study Date | ID | Subject
01\01\2018 | 123 | Math
01\01\2018 | 456 | Science
01\02\2018 | 789 | Science
01\02\2018 | 012 | History
01\03\2018 | 345 | Science
01\03\2018 | 678 | History
01\03\2018 | 921 | Art
01\03\2018 | 223 | Science
01\04\2018 | 256 | English
For instance, If I filter on just 'Science' as the Subject, the output I am looking for is , out of the 4 science subject results, what is the daily average, min and max YTD.
So my max in a day would be 2 science subjects, my min would be 1 etc.
how can i configure a query to output this information?
So far I know to get the YTD total it would be
select SUBJECT, count (ID)
from table
where SUBJECT='science' and year (Study_date)=2022
group by SUBJECT
what would be the next step I have to take?
If you want the min/max of the daily subject count, then you need two levels of aggregation:
select subject, sum(cnt_daily) as cnt,
min(cnt_daily) as min_cnt_daily, max(cnt_daily) as max_cnt_daily
from (
select study_date, subject, count(*) as cnt_daily
from mytable
where study_date >= '2022-01-01'
group by study_date, subject
) t
group by subject
The subquery aggregates by day and by subject, and computes the number of occurences in each group. Then, the outer query groups by subject only, and computes the total count (that's the sum of the intermediate counts), along with the min/max daily values.
select Subject
,count(*) as total_count
,min(cnt) as min_daily_count
,max(cnt) as max_daily_count
,avg(cnt*1.0) as avg_daily_count
from
(
select *
,count(*) over(partition by Study_Date, Subject) as cnt
from t
) t
group by Subject
Subject
total_count
min_daily_count
max_daily_count
avg_daily_count
Art
1
1
1
1.000000
English
1
1
1
1.000000
History
2
1
1
1.000000
Math
1
1
1
1.000000
Science
4
1
2
1.500000
Fiddle

Get historical average and count of a value where a date could exist more than once

I have a table with multiple equal date entries and a value. I need a table that calculates the historical value and the count of entries per date. I want to use the data to create some charts in gnuplot/etc later.
Raw data:
date | value
------------+------
2017-11-26 | 5
2017-11-26 | 5
2017-11-26 | 5
2017-11-28 | 20
2017-11-28 | 5
2018-01-07 | 200
2018-01-07 | 5
2018-01-07 | 20
2018-01-15 | 5
2018-01-16 | 50
Output should be:
date | avg | count manual calc explanation
------------+--------+------- ---------------------------------------
2017-11-26 | 5 | 3 (5+5+5) / 3 = 5
2017-11-28 | 8 | 2 (5+5+5+20+5) / 5 = 8
2018-01-07 | 33.125 | 3 (5+5+5+20+5+200+5+20) / 8 = 33.125
2018-01-15 | 30 | 1 (5+5+5+20+5+200+5+20+5) / 9 = 30
2018-01-16 | 32 | 1 (5+5+5+20+5+200+5+20+5+50) / 10 = 32
If it is not possible to calculate two different columns, I would be fine for the avg column. For counting only the dates I have the solution "SELECT DISTINCT date, COUNT(date) FROM table_name GROUP BY date ORDER BY date"
I played around with DISTINCTs, GROUP BYs, JOINs, etc, but I did not find any solution. I found some other articles on the web, but no one covers a case where a date is more than once listed in the table.
You want a running average (total value divided by total count up to the row). This is done with window functions.
select
date,
sum(sum_value) over (order by date) as running_sum,
sum(cnt) over (order by date) as running_count,
sum(sum_value) over (order by date) /
sum(cnt) over (order by date) as running_average
from
(
select date, sum(value) as sum_value, count(*) as cnt
from mytable
group by date
) aggregated
order by date;
Demo: https://dbfiddle.uk/?rdbms=postgres_13&fiddle=fb13b63970cb096913a53075b8b5c8d7

(Presto) SQL: Group by on columns "A" and "B" and count column "C", but also include count of "C" grouped only by "A"

The title of the question feels a bit weird so if you can imagine a better one please feel free to help.
Hello,
imagine a situation like this - there's a "Sales" table with 3 columns: date, store and sale_price, each row indicates a single item sale:
date | store | sale_price
---------------+---------+------------
2021-09-01 | foo | 15
2021-09-01 | foo | 10
2021-09-01 | foo | 10
2021-09-01 | bar | 5
2021-09-02 | foo | 30
2021-09-02 | bar | 40
2021-09-02 | bar | 20
etc...
What I'm trying to do is create a query that groups by date and store, and counts how many items have been sold by each store in each day (so, disregarding the price). So far it's very easy, but for visualization purposes, I'm also trying to add an extra row, that per day also includes the aggregate of sale counts.
Here's the end result I'm looking for:
date | store | sales_count
---------------+-------------+------------
2021-09-01 | foo | 3
2021-09-01 | bar | 1
2021-09-01 | aggregate | 4
2021-09-02 | foo | 1
2021-09-02 | bar | 2
2021-09-02 | aggregate | 3
etc...
I know I can create this by doing a UNION ALL, but it is not super efficient because it scans the original table twice:
SELECT date,
store,
count(sale_price) AS sales_count
FROM sales
GROUP BY 1, 2
UNION ALL
SELECT date,
'aggregate' AS store,
count(sale_price) AS sales_count
FROM sales
GROUP BY 1
I also know that I can create an extra column using over() clause, and avoid scanning "sales" twice, but then I would have two different columns instead of just one like I'm looking for:
SELECT date,
store,
count(sale_price) AS sales_count,
sum(count(sale_price)) over(PARTITION BY date) AS sales_per_day
FROM sales
GROUP BY 1, 2
--->
date | store | sales_count | sales_per_day
---------------+-------------+--------------+-----------------
2021-09-01 | foo | 3 | 4
2021-09-01 | bar | 1 | 4
2021-09-02 | foo | 1 | 3
2021-09-02 | bar | 2 | 3
etc...
Is it even possible to achieve what I'm trying to do without scanning twice? Can the last two columns (sales_count and sales_per_day) be somehow merged?
Thanks in advance for your help.
You can use GROUPING SETS, CUBE and ROLLUP to aggregate at a different levels within the same query. You can also use the GROUPING operation to determine which columns were considered in the group for a given output row:
WITH data(day, store, sale_price) AS (
VALUES
(DATE '2021-09-01', 'foo', 15),
(DATE '2021-09-01', 'foo', 10),
(DATE '2021-09-01', 'foo', 10),
(DATE '2021-09-01', 'bar', 5),
(DATE '2021-09-02', 'foo', 30),
(DATE '2021-09-02', 'bar', 40),
(DATE '2021-09-02', 'bar', 20)
)
SELECT day,
if(grouping(store) = 1, '<aggregate>', store),
count(sale_price) as sales_count
FROM data
GROUP BY GROUPING SETS ((day), (day, store))
ORDER BY day, grouping(store)

PHP Database Query - Group by month

An edit per the suggestions:
$sql=
"SELECT SysproCompanyJ.dbo.InvMovements.StockCode,
SysproCompanyJ.dbo.InvMaster.Description,
SysproCompanyJ.dbo.InvMovements.TrnYear,
SysproCompanyJ.dbo.InvMovements.Warehouse,
SysproCompanyJ.dbo.InvMovements.TrnMonth,
SysproCompanyJ.dbo.InvMovements.TrnQty,
SysproCompanyJ.dbo.InvMovements.TrnValue
FROM SysproCompanyJ.dbo.InvMovements,
SysproCompanyJ.dbo.InvMaster
WHERE SysproCompanyJ.dbo.InvMovements.StockCode = SysproCompanyJ.dbo.InvMaster.StockCode
AND SysproCompanyJ.dbo.InvMovements.Warehouse = 'S2'
GROUP BY SysproCompanyJ.dbo.InvMovements.TrnMonth";
The sample DB data would be:
Stockcode | Description | TrnYear | Warehouse | TrnMonth | TrnQty | TrnValue
PN1 | Part Number 1 | 2013 | S2 | 1 | 100 | 10.00
PN2 | Part Number 2 | 2013 | S2 | 1 | 200 | 125.00
PN3 | Part Number 3 | 2013 | S2 | 1 | 200 | 60.00
PN1 | Part Number 1 | 2013 | S2 | 2 | 300 | 560.00
PN4 | Part Number 4 | 2013 | S2 | 2 | 400 | 30.00
PN5 | Part Number 5 | 2013 | S2 | 2 | 100 | 230.00
I'm trying to break down the data into separate tables grouped by month and then having a variable to sum the total TrnValue by month.
The current query as is gives the following error
Warning: odbc_exec() [function.odbc-exec]: SQL error: [Microsoft][ODBC SQL Server Driver][SQL Server]Column 'SysproCompanyJ.dbo.InvMovements.StockCode' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause., SQL state 37000 in SQLExecDirect in C:\wamp\www\dacs\S2_2.php on line 69
You can't use columns in the select statement unless they inside an aggregate function (min,max,sum,count,...) or are included in the Group By.
Try something like this:
SELECT SysproCompanyJ.dbo.InvMovements.Warehouse,
SysproCompanyJ.dbo.InvMovements.TrnYear,
SysproCompanyJ.dbo.InvMovements.TrnMonth,
Sum(SysproCompanyJ.dbo.InvMovements.TrnQty) as sum_TrnQty,
Sum(SysproCompanyJ.dbo.InvMovements.TrnValue) as sum_TrnValue
FROM SysproCompanyJ.dbo.InvMovements,
SysproCompanyJ.dbo.InvMaster
WHERE SysproCompanyJ.dbo.InvMovements.StockCode = SysproCompanyJ.dbo.InvMaster.StockCode
AND SysproCompanyJ.dbo.InvMovements.Warehouse = 'S2'
GROUP BY SysproCompanyJ.dbo.InvMovements.Warehouse,
SysproCompanyJ.dbo.InvMovements.TrnYear,
SysproCompanyJ.dbo.InvMovements.TrnMonth
Usually doesn't make sense to include varchar columns (Stockcode, Description) in any type of aggregate, and since they are different values, you probably don't want them in the Group By either.
When using GROUP BY in a SQL query, all fields shown should either be 'grouped' by or a calculated (aggregate) value.
For example;
SELECT city, count(*), avg(price) FROM properties GROUP BY city;
Which will produce something like;
City   |  count  |   Avg
Paris  |  3      |   166.666
Will count the number of rows per city. City is part of the 'group by'. 'count(*)' and 'avg(price)' are calculated columns(aggregate).
If we would introduce another column to the query, 'city';
SELECT country, city, count(*), avg(price) FROM properties GROUP BY city;
The query would give an error, because country is neither 'grouped' or a calculated value. This error is quite logical; city-names are not unique worldwide (e.g. 'Paris' USA and 'Paris' in France), so grouping by city alone, the database can not show a unique country name.
To resolve this, either include 'country' in the group by, or make it a calculated field;
SELECT country, city, count(*), avg(price) FROM properties GROUP BY country, city;
Will return the results grouped by country, then by city
Country | City | count | Avg
USA | Paris | 1 | 100.000
France | Paris | 2 | 200.000
Using a calculated value for country;
SELECT min(country), city, count(*), avg(price) FROM properties GROUP BY country, city;
Would return the results grouped by city, and the 'first' country in the group:
min(Country)| City | count | Avg
France | Paris | 3 | 166.666
Which is probably not logical; the results show 'France', but it includes results from Paris, USA

SQL - Average number of records within a time period

I'm trying to compile some lifetime value information for customers within one of our databases.
We have an MS SQL Server database which stores all of our customer/transactional information.
My issue is that I don't have much experience when it comes to MS SQL Server (or SQL in general) - I'd like to be able to run a query against the database that pulls AVG number of loans, and AVG revenue based on three criteria:
1.) Loans be counted if they are 'approved'
2.) Loans from a customer_id only be counted if the first loan (first identified by date_created field) be on or after a certain 'mm/yyyy'
3.) I'm able to specify for how many months after the first 'mm/yyyy' to tally the number of loans / revenue to be included within the AVG
Here is what the database would look like:
customer_id | loan_status | date_created | revenue
111 | 'approved' | 2010-06-20 17:17:09 | 100.00
222 | 'approved' | 2010-06-21 09:54:43 | 255.12
333 | 'denied' | 2011-06-21 12:47:30 | NULL
333 | 'approved' | 2011-06-21 12:47:20 | 56.87
222 | 'denied' | 2011-06-21 09:54:48 | NULL
222 | 'approved' | 2011-06-21 09:54:18 | 50.00
111 | 'approved' | 2011-06-20 17:17:23 | 100.00
... loads' of records ...
555 | 'approved' | 2012-01-02 09:08:42 | 24.70
111 | 'denied' | 2012-01-05 02:10:36 | NULL
666 | 'denied' | 2012-02-05 03:31:16 | NULL
555 | 'approved' | 2012-02-17 09:32:26 | 197.10
777 | 'approved' | 2012-04-03 18:28:45 | 300.50
777 | 'approved' | 2012-06-28 02:42:01 | 201.80
555 | 'approved' | 2012-06-21 22:16:59 | 10.00
666 | 'approved' | 2012-09-30 01:17:20 | 50.00
If I wanted to find the avg transaction count (approved transactions), and average revenue per approved transaction for all customer's who's first loan was in/after 2012-01, and for a period of 4 months after then, how would I go about querying the database?
Any help is greatly appreciated.
something like this (there maybe a few typos here and there)...
you could first calculate the minimum loan date:
select customer_id, min(date_created) from table t where loan_status = 'approved' group by customer_id
then you can join to it:
select customer_id, count(date_created), avg(revenue) from table t
join (
select customer_id, min(date_created) as min_date from table t where loan_status = 'approved' group by customer_id ) s
on t.customer_id = s.customer_id
where t.date_created between s.min_date and DATEADD(month, 4, s.min_date) and t.loan_status = 'approved'
Rename tbl to your table name.
Specify dates in the format YYYYMMDD.
select customer_id, AVG(revenue) average_revenue
from
(
select customer_id
from tbl
group by customer_id
having min(date_created) >= '20120101'
) fl
join tbl t on t.customer_id = fl.customer_id
where t.loan_status = 'approved'
and date_created < '20120501' -- NOT including May the first, so Jan through Apr (4 months)
If you mean 4 months after each customer's first loan, leave me a comment, state whether it's 4 calendar months (e.g. 15-Jan to 15-May) or up to the last day of the 4th month (15-Jan to 30-Apr), and I'll update the answer.