How to sum a value grouped by product and year in SAS? - sum

I need to get the sum value of volume for each stock in each year.
The data looks like
Date ID volume
2009 BA 100
2009 BA 20
2011 BA 100
2009 VOD 100
2009 VOD 150
2009 VOD 100
2013 BT 300
... ... ...
What I want is
Date ID sumvolume
2009 BA 120
2011 BA 100
2009 VOD 350
2013 BT 300
... ... ...
I used code
proc sql;
create table want as
select *, (select sum(volume) from data as sub where sub.data=main.date)as sumvolume
from data as main;
quit;
but this one only gave the sumvolume in each year instead of sumvolume for each stock in each year.
Anyone can help me with code ? Thanks in advance!!!

You can use a group by statement to use summary functions (like sum()) across the groups defined by variables in the group by statement.
proc sql;
create table want as select
date,
id,
sum(volume) as sumvolume
from data
group by id, date;
quit;

You are getting the total sum of stock per year since you are using where sub.date=main.date. If you would add and sub.ID = main.ID to the where clause, you would get it per product. But that is not your expected behavior, this since you keep every individual observation by having * in your select statement and no group by statement.
Instead of a subquery on the data table, you could use group by to accomplish your wanted behaviour.
input Date ID $ volume;
datalines;
2009 BA 100
2009 BA 20
2011 BA 100
2009 VOD 100
2009 VOD 150
2009 VOD 100
2013 BT 300
;
data work.want;
input Date ID $ sumvolume;
datalines;
2009 BA 120
2011 BA 100
2009 VOD 350
2013 BT 300
;
proc sql;
create table work.wanted as
select Date, ID, sum(volume) as sumvolume
from work.data
group by Date, ID
;
I'm leaving one thing to you, the sorting of the resulting table.

If your source data isn't already sorted then do that first:
proc sort data=input ;
by ID date;
run ;
Then you can do it in one simple pass:
data output(drop=volume) ;
retain sumvolume ;
set input ;
by ID date ;
if first.date then sumvolume=volume ;
else sumvolume=sum(sumvolume,volume) ;
if last.date then output ;
run ;

Related

Create a funnel in SQL with 30 days delay

I have table like this with hundreds of records : month_signup, nb_signups, month_purchase and nb_purchases
month_signup
nb_signups
month_purchase
nb_purchases
01
100
01
10
02
200
02
20
03
150
03
10
Let's say I want to calculate the signup to purchase ratio month after month.
Normaly I can juste divide nb_purchases/nb_signups*100 but here no.
I want to calculate a signup to purchase ratio with 1 month (or 30days) delay.
To let the signups the time to purchase, I want to do the nb_purchase from month 2 divided by nb_signups from month_1. So 20/100 for exemple in my table.
I tried this but really not sure.
SELECT
month_signup
,SAFE_DIVIDE(CASE WHEN purchase_month BETWEEN signups_month AND DATE_ADD(signups_month, INTERVAL 30 DAY) THEN nb_purchases ELSE NULL END, nb_signups)*100 AS sign_up_to_purchase_ratio
FROM table
ORDER BY 1
You can use LEAD() function to get the next value of the current row, I'll provide a MySQL query syntax for this.
with cte as
(select month_signup, nb_signups, lead(nb_purchases) over (order by month_signup) as
nextPr from MyData
order by month_signup)
select cte.month_signup, (nextPr/cte.nb_signups)*100 as per from cte
where (nextPr/cte.nb_signups)*100 is not null;
You may replace (nextPr/cte.nb_signups) with the SAFE_DIVIDE function.
See the demo from db-fiddle.

How to calculate a percentage on different values from same column with different criteria

I'm trying to write a query in SSRS (using SQL) to calculate an income statement percentage of sales for each month (the year is a parameter chosen by the user at runtime). However, the table I have to use for the data lists all of the years, months, accounts, dollars, etc together and looks like this:
ACCT_YEAR
ACCT_PERIOD
ACCOUNT_ID
CREDIT_AMOUNT
2021
1
4000
20000
2021
2
4000
25000
2021
1
5000
5000
2021
2
5000
7500
2021
1
6000
4000
2021
2
6000
8000
etc, etc (ACCOUNT_ID =4000 happens to be the sales account)
As an example,
I need to calculate
CREDIT_AMOUNT when ACCT_YEAR = 2021, ACCT_PERIOD=1, and ACCOUNT_ID=5000
/
CREDIT_AMOUNT when ACCT_YEAR = 2021, ACCT_PERIOD=1, and ACCOUNT_ID=4000
* 100
I would then do that for each ACCT_PERIOD in the ACCT_YEAR.
Hope that makes sense...What I want would look like this:
ACCT_YEAR
ACCT_PERIOD
ACCOUNT_ID
PERCENTAGE
2021
1
5000
0.25
2021
2
5000
0.30
2021
1
6000
0.20
2021
2
6000
0.32
I'm trying to create a graph that shows the percentage of sales of roughly 10 different accounts (I know their specific account_ID's and will filter by those ID's) and use the line chart widget to show the trends by month.
I've tried CASE scenarios, OVER scenarios, and nested subqueries. I feel like this should be simple but I'm being hardheaded and not seeing the obvious solution.
Any suggestions?
Thank you!
One important behaviour to note is that window functions are applied after the where clause.
Because you need the window functions to be applied before any where clause (which would filter account 4000 out), they need to be used in one scope, and the where clause in another scope.
WITH
perc AS
(
SELECT
*,
credit_amount * 100.0
/
SUM(
CASE WHEN account_id = 4000 THEN credit_amount END
)
OVER (
PARTITION BY acct_year, accr_period
)
AS credit_percentage
FROM
your_table
)
SELECT
*
FROM
perc
WHERE
account_id IN (5000,6000)
You just to use a matrix with a parent column group for ACCT_YEAR and a child column group for ACCT_PERIOD. Then you can use your calculation. If you format the textbox for percentage, you won't need to multiply it by 100.
Textbox value: =IIF(ACCOUNT_ID=4000, Sum(CREDIT_AMOUNT), 0) = 0, 0, IIF(ACCOUNT_ID=5000, Sum(CREDIT_AMOUNT), 0) / IIF(ACCOUNT_ID=4000, Sum(CREDIT_AMOUNT), 0)

How to index for a self join

I'm using SAS University Edition to analyze the following table (actually has 2.5M rows in it)
p_id c_id startyear endyear
0001 3201 2008 2013
0001 2131 2013 2015
0013 3201 2006 2010
where p_id is person_id and c_id is companyid.
I want to get number of colleagues (number of persons that worked during an overlapping span at the same companies) in a certain year, so I created a table with the distinct p_ids and do the following query:
PROC SQL;
UPDATE no_colleagues AS t1
SET c2007 = (
SELECT COUNT(DISTINCT t2.p_id) - 1
FROM table AS t2
INNER JOIN table AS t3
ON t3.p_id = t1.p_id
AND t3.c_id = t2.c_id
AND t3.startyear <= t2.endyear % checks overlapping criteria
AND t3.endyear >= t2.startyear % checks overlapping criteria
AND t3.startyear <= 2007 % limits number of returns
AND t2.startyear <= 2007 % limits number of returns
);
A single lookup on an indexed query (p_id, c_id, startyear, endyear) takes 0.04 seconds. The query above takes about 1.8 seconds for a single update, and does not use any indexes.
So my question is:
How to improve the query, and/or how to use indices to make sure the self join can use the indices?
Thanks in advance.
Based on your data, I'd do something like this, but maybe you need to tweak the code to fit your needs.
First, create a table with p_id, c_id, year.
So your first guy working at the company 3201 will have 6 observations in this table, one for each worked year.
data have_count;
set have;
do i=startyear to endyear;
worked_in = i;
output;
end;
drop i startyear endyear;
run;
Now you just count and agreggate:
proc sql;
select
worked_in as year
,c_id
,count(distinct p_id) as no_colleagues
from have_count
group by 1,2;
quit;
Result:
year c_id no_colleagues
2006 3201 1
2007 3201 1
2008 3201 2
2009 3201 2
2010 3201 2
2011 3201 1
2012 3201 1
2013 2131 1
2013 3201 1
2014 2131 1
2015 2131 1
A more efficient method:
1) Create a long format table for the results rather than wide format. This will be both easier to populate and easier to work with later.
create table colleagues_by_year (
p_id int,
year int,
colleagues int
);
Now this can be populated with a single insert statement. The only trick is getting the full list of years you want in the final table. There are a few options, but since I'm not too familiar with SAS SQL I'm going to go with a very simple one: a lookup table of years, to which you can join.
create table years (
year int
);
insert into years
values (2007),(2008),...
(A more sophisticated approach would be a recursive query that found the range of all years in the input data).
Now the final insert:
insert into colleagues_by_year
select p_id,
year,
count(*)
from colleagues
join years on
years.year between colleagues.startyear and colleagues.endyear
group by p_id,year
This won't have any rows where the number of colleagues for the year would be 0. If you wanted that you could make years be a left join and only count the rows where years.year is not null.

group yearmonth field by quarter in sql server

I have a int field in my database which represent year and month like 201501 stands for 2015 Jan,
i need to group by reporting_date field and showcase the quarterly data .The table is in the following format .Reporting_date is an int field rather than a datetime and interest_payment is float
reporting_date interest_payment
200401 5
200402 10
200403 25
200404 15
200406 5
200407 20
200408 25
200410 10
the output of the query should like this
reporting_date interest_payment
Q1 -2004 40
Q2 -2004 20
Q3 -2004 40
Q4 -2004 10
i tried using the normal group by statement
select reporting_date , sum(interest_payment) as interest_payment from testTable
group by reporting_date
but got different result output.Any help would be appreciated
Thanks
before grouping you need to calculate report_quarter, which is equal to
(reporting_date%100-1)/3
then do select
select report_year, 'Q'+cast(report_quarter+1 as varchar(1)), SUM (interest_payment)
from
(
select
*,
(reporting_date%100 - 1)/3 as report_quarter,
reporting_date/100 as report_year
from #x
) T
group by report_year, report_quarter
order by report_year, report_quarter
I see two problems here:
You need to convert reporting_date into a quarter.
You need to SUM() the values in interest_payment for each quarter.
You seem to have the right idea for (2) already, so I'll just help with (1).
If the numbers are all 6 digits (see my comment above) you can just do some numeric manipulation to turn them into quarters.
First, convert into months by dividing by 100 and keeping the remainder: MOD(reporting_date/100).
Then, convert that into a quarter: MOD(MOD(reporting_date/100)/4)+1
Add a Q and the year if desired.
Finally, use that value in your GROUP BY.
You didn't specify which DBMS you are using, so you may have to convert the functions yourself.

MySQL - show field value only in first instance of each grouped value?

I don't think this is possible, but I would like to be proved otherwise.
I have written a simple report viewing class to output the results of various database queries. For the purpose of improving the display, when I have a report with grouped data, I would like to display the field value only on the first row of each unique value - and I would like to do this at the query level, or it would necessitate additional logic in my class to determine these special values.
It will probably help to illustrate my requirements with a simple example. Imagine this dataset:
Year Quarter Total
2008 Q1 20
2008 Q2 25
2008 Q3 35
2008 Q4 40
2009 Q1 15
2009 Q2 20
2009 Q3 30
2009 Q4 35
If possible, I would like the dataset returned as:
Year Quarter Total
2008 Q1 20
Q2 25
Q3 35
Q4 40
2009 Q1 15
Q2 20
Q3 30
Q4 35
Is there any way of doing this progammatically in MySQL?
SELECT CASE WHEN #r = year THEN NULL ELSE year END AS year,
quarter,
total,
#r := year
FROM (
SELECT #r := 0
) vars,
mytable
ORDER BY
year
#r here is a session variable. You can use these in MySQL like any variable in any procedural language.
First, it's initialized to zero inside the subquery.
Second, it's checked in the SELECT clause. If the current value of #r is not equal to year, then the year is output, else NULL is output.
Third, it's updated with current value of year.
Why would you want to do this? What about existing records where the Year column is empty or null?
Beautifying the output belongs inside the report logic. In pseudocode it would be sth. like:
var lastYear = 0
foreach (record in records)
{
if (record.Year == lastYear)
{
print " "
}
else
{
print record.Year
lastYear = record.Year
}
// print the other columns
}
Not the answer you asked for, but...
Sounds like an iffy thing to be doing in MySQL in the first place. Just looking at the raw rows of data, 2008 and 2009's Q2s don't seem to make much sense as data rows. The issue is presentational, not a matter of fetching data. Sounds more like something to be written into your viewing class - when passed a certain parameter, for example, it will know not to repeat things like "2008".
This allows for greater reusability of code, as well: rather than rewriting the query when you want to present the data differently, say by quarters rather than be year, you can just change one of the arguments of the viewing class so that the same query with a different order clause can output:
Quarter Year Total
Q1 2008 20
2009 15
Q2 2008 25
2009 20
...
It does not exactly match your request but I would rather pivot my table. It allows to visually compare figures from the 2 years as you have one quarter per column:
SELECT Year,
SUM(IF(Quarter="Q1", Rev, 0)) AS Q1,
SUM(IF(Quarter="Q2", Rev, 0)) AS Q2,
SUM(IF(Quarter="Q3", Rev, 0)) AS Q3,
SUM(IF(Quarter="Q4", Rev, 0)) AS Q4
FROM t1 GROUP BY 1
ORDER BY 1
You then have:
YEAR Q1 Q2 Q3 Q4
2008
2009