% diff of 2 columns in SQL - sql

I have 2 tables - countries (id, name, continent) and population_years (id, population, year, country_id). The data goes from 2000-2010 and I'm trying to calculate the percentage diff in average population for each continent across this time period. I'm trying to do it by creating a temporary table which produces an output of:
But when I try to calculate the % diff (as you can see from my code below), I don't know how to reference the 'avg pop 2000' and 'avg pop 2010' columns in the code as they haven't been assigned a variable that I can reference. In the code, I've used avg_pop_2010 and avg_pop_2000 to reference these columns - obviously this doesn't actually work.
WITH avg_pop AS( SELECT countries.continent,
ROUND(AVG(CASE WHEN population_years.year = 2000 THEN population_years.population END), 2) as 'avg pop 2000',
ROUND(AVG(CASE WHEN population_years.year = 2010 THEN population_years.population END), 2) as 'avg pop 2010'
FROM countries
JOIN population_years
WHERE population_years.country_id = countries.id
GROUP BY 1)
SELECT countries.continent, ROUND(((avg_pop_2010 - avg_pop_2000)/avg_pop_2000)*100.0, 2) AS '%diff'
FROM avg_pop;

Another option is to repeat the expressions - with a little optimization:
select
c.continent,
round(avg(case when py.year = 2000 then py.population end), 2) avg_pop_2000,
round(avg(case when py.year = 2010 then py.population end), 2) avg_pop_2010,
round(
100.* avg(case when py.year = 2010 then py.population else - py.population end)
/ avg(case when py.year = 2000 then py.population end),
2
) percent_diff
from countries c
inner join population_years py on py.country_id = c.id
where py.year in (2010, 2020)
group by c.continent
Side notes:
don't use single quotes for identifiers! They stand for literal strings in standard SQL; in general, you should prefer identifiers that do not need to be quoted. If quoting is needed, use standard double quotes ("), which SQLite recognizes
prefiltering the relevant years with a where clause makes the query more efficient
use standard join syntax; the join condition goes to the on clause of the join, not to the where clause
rounding, then computing the percent difference is not accurate; compute first, then round
table aliases make the query easier to write and read

WITH avg_pop AS( SELECT countries.continent,
ROUND(AVG(CASE WHEN population_years.year = 2000 THEN population_years.population END), 2) as avg_pop_2000,
ROUND(AVG(CASE WHEN population_years.year = 2010 THEN population_years.population END), 2) as avg_pop_2010
FROM countries
JOIN population_years
WHERE population_years.country_id = countries.id
GROUP BY 1)
SELECT continent, ROUND((( avg_pop_2000 - avg_pop_2010)/avg_pop_2010)*100, 2) AS '%diff'
FROM avg_pop;
Here is a small demo:
DEMO
When checking on other comments and answer I realized you need to give different aliases without '' and all will work...I have updated my answer and demo.

Related

Can I left join twice to do multiple calculations?

I am trying to calculate if a member shops in January, what proportion shop again in February and what proportion shop again within 3 months. Ultimately to create a table similar to the image attached.
I have tried the below code. The first left join works, but when I add the second one to calculate within_3months the error: "FROM keyword not found where expected" is shown (for the separate line). Can I left join twice or must I do separate scripts for columns?
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
select
year_month_january21
, count(distinct A.members) as num_of_mems_shopped_january21
, count(distinct B.members)as retained_february21
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
, count(distinct C.members)/count(distinct A.members) *100 as within_3months
from
(select
members
, year_month as year_month_january21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202101
group by
members
, year_month) A
left join
(select
members
, year_month as year_month_february21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202102
group by
members
, year_month) B on A.members = B.members
left join
(select
members
, year_month as year_month_3months
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month between 202102 and 202104
group by
members
, year_month) C on A.members = C.members
group by
year_month_january21;
I have tried left creating a separate time table and joining to this. It does not work. Doing calculations separately works but I must do this for multiple time frames so will take a long time.
The error isn't coming from the added left join, it's from the as 1month_retention_rate part, because it's an illegal name.
You can see that more simply with:
select dummy as 1month_retention_rate
from dual;
ORA-00923: FROM keyword not found where expected
You could change the column alias so it follows the naming rules (specifically here, does not start with a digit), or if that specific name is actually required then you could make it a quoted identifier - generally not a good option, but sometimes OK in the final output of a query.
fiddle
So in your code you would just change your new line
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
to something like
, count(distinct B.members)/count(distinct A.members) *100 as one_month_retention_rate
or with a quoted identifier
, count(distinct B.members)/count(distinct A.members) *100 as "1month_retention_rate"
fiddle - which still errors but now with ORA-00942 as I don't have your tables, and that is after changing your obfuscated schema/table names to something legal too.
There may be more efficient ways to perform the calculation, but that's a separate issue...
I could understand that you want to get :
count of all members who visited in Jan.
count of all members who visited in Jan and visited again in Feb.
count of all members who visited in Jan and visited again in Feb, Mars and April.
If my understanding is true then you could simplify your inner query using IF instead of LEFT JOIN .
Take a look on the following query. Assuming that table members have an ID field :
SELECT
mem_jan AS num_of_mems_shopped_january21,
mem_feb AS retained_february21,
mem_feb / mem_jan * 100 as 1month_retention_rate
mem_3m / mem_jan * 100 as within_3months
FROM(
SELECT
SUM(IF(mm_jan>0,1,0) AS mem_jan,
SUM(IF(mm_jan>0 AND mm_feb>0,1,0) AS mem_feb,
SUM(IF(mm_jan>0 AND mm_count_3m>0,1,0) AS mem_3m
FROM
(
SELECT
t.Id,
SUM(IF(year_month = 202101, 1,0)) AS mm_jan, /*visit for a member in Jan*/
SUM(IF(year_month = 202102, 1,0)) AS mm_feb, /*visit for a member in Feb*/
SUM(IF(year_month between 202102 and 202104,1,0)) AS mem_3m/*visit for a member in 3 months*/
FROM
table.members t
join table.date tm on t.dt_key = tm.date_key
WHERE
year_month between 202101 and 202104
GROUP BY
t.Id
) AS t1
) AS t2
This is not a final running query but it can explain my idea. According to your engine you may use CASE or IF THEN ELSE
Don't use multiple joins, count the shops per member per month and then use conditional aggregation.
In Oracle, that would be:
SELECT 202101 AS year_month,
COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END)
AS members_shopped_202101,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
AS members_retained_202102,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS one_month_retention_rate,
COUNT(CASE WHEN cnt_202101 > 0 AND (cnt_202102 > 0 OR cnt_202103 > 0 OR cnt_202104 > 0) THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS within_3months
FROM (
SELECT members,
year_month
FROM members m
INNER JOIN date d
ON m.dt_key = d.date_key
)
PIVOT (
COUNT(*)
FOR year_month IN (
202101 AS cnt_202101,
202102 AS cnt_202102,
202103 AS cnt_202103,
202104 AS cnt_202104
)
);

Q) Write a query to return Territory and corresponding Sales Growth (compare growth between periods Q4-2019 vs Q3-2019)

Q) Write a query to return Territory and corresponding Sales Growth (compare growth between periods Q4-2019 vs Q3-2019).
Tables given-
Cust_Sales: -Cust_id,product_sku,order_date,order_value,order_id,month
Cust_Territory: cust_id,territory_id,customer_city,customer_pincode
Use tables FCT_CUSTOMER_SALES (which has sales for each Customer) and MAP_CUSTOMER_TERRITORY (which provides Territory-to-Customer mapping) for this question.
Output format-
TERRITORY_ID | SALES_GROWTH
My solution-
Select ((q2.claims - q1.claims)/q1.claims * 100) AS SALES_GROWTH , c.territory_id
From
(select sum(s.order_value) from FCT_CUSTOMER_SALES s inner join MAP_CUSTOMER_TERRITORY c on s.customer_id=c.customer_id where s.order_datetime between 1/07/2019 and 30/09/2019 group by c.territory_id) as q1.claims,
(select sum(s.order_value) from FCT_CUSTOMER_SALES s inner join MAP_CUSTOMER_TERRITORY c on s.customer_id=c.customer_id where s.order_datetime between 1/10/2019 and 31/12/2019 group by c.territory_id) as q2.claims
Group by c.territory_id
My solution is showing up as incorrect I would request anyone who can help me out with the solution and let me know where my mistake is
One option uses conditional aggregation. The idea is to filter the table on the two quarters at once, then use case expressions within the sum() aggregate function to compute the sales of each of them:
select
c.territory_id,
( sum(case when s.order_date >= date '2020-01-01' then s.order_value end)
- sum(case when s.order_date < date '2020-01-01' then s.order_value end)
) / (sum(case when s.order_date < date '2020-01-01' then s.order_value end)) * 100.0 as sales_growth
from fct_customer_sales s
inner join map_customer_territory c on s.customer_id = c.customer_id
where s.order_datetime >= date '2020-01-07' and s.order_datetime < '2020-01-01'
group by c.territory_id
You did not tell which database you are using, while date features are highly vendor-dependent. This uses the standard DATE syntax to declare the literal dates - you might need to adapat that if your database does not support it.

Is there a way to split sql results by year?

I currently have the following SQL statement:
SELECT
[Manager].[Name],
COUNT([Project].[ProjectId]) AS TotalProjects
FROM
([Project]
INNER JOIN
[Manager] ON [Project].[ManagerId] = [Manager].[ManagerId])
WHERE
[Project].[CurrentStatusId] = 5
GROUP BY
[Manager].[Name]
It currently spits out total projects by each manager. I would like to have it split the projects out by the years they were completed. So basically count the total projects for each manager for each year (2016, 2017, and so on), as well as the total projects all time. I can use the column [Project].[CurrentStatusDt] for the date.
Just add the project year to the SELECT and GROUP BY clauses:
SELECT
[Manager].[Name],
YEAR([Project].[CurrentStatusDt]) PojectYear,
COUNT([Project].[ProjectId]) AS TotalProjects
FROM [Project]
INNER JOIN [Manager] ON [Project].[ManagerId] = [Manager].[ManagerId]
WHERE [Project].[CurrentStatusId] = 5
GROUP BY [Manager].[Name], YEAR([Project].[CurrentStatusDt])
Side note: you don't need parentheses around the joins in SQL Server (this is a MS Access limitation).
EDIT
If you want to spread the years over columns instead of rows, then one solution is to use conditional aggregation
SELECT
[Manager].[Name],
SUM(CASE
WHEN [Project].[CurrentStatusDt] < CAST('2019-01-01' AS DATE)
THEN 1
ELSE 0
END) TotalProjects2018,
SUM(CASE
WHEN [Project].[CurrentStatusDt] >= CAST('2019-01-01' AS DATE)
THEN 1
ELSE 0
END) TotalProjects2019
FROM [Project]
INNER JOIN [Manager] ON [Project].[ManagerId] = [Manager].[ManagerId]
WHERE
[Project].[CurrentStatusId] = 5
AND [Project].[CurrentStatusDt] >= CAST('2018-01-01' AS DATE)
AND [Project].[CurrentStatusDt] < CAST('2020-01-01' AS DATE)
GROUP BY [Manager].[Name]

MS SQL Server 2012 query speed / optimization

First, I apologize if this has been answered elsewhere, but I was unable to find anything today. If it has been answered, the lack is me, not the search system.
I am having an issue where in a stored procedure works fairly quickly until I get to a specific point.
I have a database of POS sales data for a restaurant chain.
Among the various tables we have, the ones I need for this query are:
Item (the definitions of the various items; each row includes the SALES category the item belongs to; see caveats below)
Category (the definitions of the various categories items can be in)
CategoryItem (the mapping between the above)
HstItem (the historical sales of the items)
Caveats: There are 2 types of categories each item can be in:
sales categories: each item can be in one sales category at a time.
reporting categories: each item can be in a arbitrary number of these categories simultaneously.
I am needing to get sales totals for 6 date ranges for a specific REPORTING category (week to date, wtd ly, period to date, ptd ly, year to date, ytd ly).
All of that said, for all of my code the query / procedure runs in a decent amount of time, until to the section for the reporting category.
My select statement currently includes the following WHERE clause:
where hstitem.itemid in (select fkitemid from categoryitem where categoryitemid = ##)
I am looking for a more efficient / faster way to execute this.
Any assistance is greatly appreciated. Thanks in advance.
EDIT:
The full original query is as follows:
insert into #GnG (GStoreID, CurWkGNG, CurWkGNGLY,CurYTDGNG,CurYTDGNGLY,CurrPTDGNG,CurrPTDGNGLY)
select
hgi.FKStoreId,
CurWkGnG = sum(case when hgi.DateOfBusiness between #SDate and #EDate then hgi.price else 0 end),
CurWkGnGLY = sum(case when hgi.DateOfBusiness between #SDateLY and #EDateLY then hgi.price else 0 end),
CurYTDGnG =
case
when convert(varchar(10),opendate,126) between convert(varchar(10),#LYTDStart,126) and convert(varchar(10),#LYTDEnd,126) then sum(case when hgi.DateOfBusiness between DATEADD(day, (DATEPART(week, opendate) * 7 + DATEPART(weekday, opendate)) - (DATEPART(week, DATEADD(year, 1, opendate)) * 7 + DATEPART(weekday, DATEADD(year, 1, opendate))), DATEADD(year, 1, opendate)) and #CYTDEnd then hgi.price else 0 end)
else sum(case when hgi.DateOfBusiness between #CYTDStart and #CYTDEnd then hgi.price else 0 end)
end,
CurYTDGnGLY = sum(case when hgi.DateOfBusiness between #LYTDStart and #LYTDEnd then hgi.price else 0 end),
CurrPTDGnG = sum(case when hgi.DateOfBusiness between #CurrPtDStart and #CurrPtDEnd then hgi.price else 0 end),
CurrPTDGnGLY = sum(case when hgi.DateOfBusiness between #CurrPtDLYStart and #CurrPtDlyEnd then hgi.price else 0 end)
from hstGndItem hgi
join #StoresIncluded si
on hgi.FKStoreID = si.StoreID
where hgi.fkitemid in
(select fkitemid from categoryitem where categoryitemid = 25)
group by hgi.fkstoreid, opendate, comping
order by hgi.fkstoreid
Try converting the "IN" to a inner join like so :
FROM hstitem h inner join categoryitem c on c.fkitemid = h.itemid
where c.categoryitemid = ##
You can use WHERE EXISTS instead of IN to check if the ID exists in the table:
WHERE EXISTS
(
SELECT ci.fkitemid
FROM categoryitem ci
WHERE ci.categoryitemid = ## AND ci.fkitemid = hstitem.itemid
)
The difference between the IN clause and using EXISTS is that the sub-query inside the WHERE EXISTS will exit prematurely after a match have been found while the IN clause would wait until the sub-query is finished.
I apologize for the lag in my response here.
I found AN answer. Dont know if it is the right one or not, but it works for us.
I removed the code to use a sub select from the where, and am now generating a new table to hold the values that should pull. The new code to populate the table runs each morning around 0600. I then am having the main code simply join to that table to pull the single answer, rather than perform math based on the sub-query.
Thank you all for the suggestions.

how to return total sold p / year for several years in columnar format?

The SQL below give me the columns for Account, Name and the Total Sale for the year of 2015.But how, if possible, can I add another column for the previous year of 2014 ?
select a.AcctNo, b.Name, Sum(a.TotSold) as [ Total Sold ]
from Orders as A
Join Accounts as b on a.AcctNo = b.AcctNo
where (a.PurchaseDate between '1/1/2015' and '12/31/2015' )
Group by a.AcctNo, b.Name
You can use conditional aggregation:
select o.AcctNo, a.Name,
Sum(case when year(PurchaseDate) = 2015 then TotSold else 0 end) as tot_2015,
Sum(case when year(PurchaseDate) = 2014 then TotSold else 0 end) as tot_2014
from Orders o Join
Accounts a
on a.AcctNo = o.AcctNo
where PurchaseDate >= '2014-01-01'
PurchaseDate < '2016-01-01'
Group by o.AcctNo, a.Name ;
Notes:
Use ISO standard formats for date constants.
When using dates, try not to use between. It doesn't work as expected when there is a time component. The above inequalities work regardless of a time component.
Use table aliases that are abbreviations of the table name. That makes the query easier to follow.