sas is the column an even year? - sql

I currently have data regarding income streams over four years, the income stream fills a 4x4 matrix as follows:
Year1 Year2 Year3 Year4
2017 10 15 20 25
2018 10 15 20 25
2019 10 15 20 25
2020 10 15 20 25
The income streams come from a debt instrument where the rows indicate when the debt was issued and the columns indicate the flow of income coming in per year.
I am trying to build another matrix to highlight whether or not the debt instrument was priced that year to create this matrix:
Year1 Year2 Year3 Year4
2017 1 0 0 0
2018 0 1 0 0
2019 0 0 1 0
2020 0 0 0 1
This is a very simple example and it goes on to work on how often the instrument is repriced etc. for simplicity, is there a way to create an if or where function that returns a 1 or 0 depending on whether the year is equal to or within a certain amount of time of the instrument being issued.
So far i'm thinking along the lines
%Repricing = 1
data want;
set have;
if Year[i] <= &Repricing;
then Year[i]=1;
run;
It's probably obvious that SAS isn't my language of choice, TIA.

Here's one way to do this, but I basically renamed the years to match the years of data - this is logic that was in your text but not in your data.
*sample data;
data have;
input YearD Year2017 Year2018 Year2019 Year2020;
cards;
2017 10 15 20 25
2018 10 15 20 25
2019 10 15 20 25
2020 10 15 20 25
;
run;
data want;
set have;
*arrays for years and flags;
array _year(2017:2020) year2017-year2020;
array _flag(2017:2020) flag2017-flag2020;
*loop over array;
do i=2017 to hbound(_year);
/*check if year matches year in variable name*/
if put(yearD, 4.) = compress(vname(_year(i)),, 'kd')
then _flag(i)=1;
else _flag(i)=0;
end;
drop i;
run;

Related

SQL Where exists use value else use another

I have a table where I want to join to bring through an i.d, straight forward enough but I only want to bring through values that are 'live' (referenced by a 1 in the flag column below). On the latest year no values are live yet but I need these values brought through too. It might be easier to explain in an example.
Joining Table:
Company Year Product ID Flag
A 2019 X 100 0
A 2019 X 101 1
A 2019 Y 102 1
A 2019 Y 103 0
A 2019 Y 104 0
A 2020 X 105 1
A 2020 Y 106 0
A 2020 Y 107 1
A 2020 Y 108 0
A 2020 Z 109 1
A 2021 X 110 0
A 2021 Y 111 0
A 2021 Y 112 0
A 2021 Y 113 0
A 2021 Z 114 0
I need to bring through those values that have a 1 in the Flag column and then all values with a year of 2021 (when 2021 begins the values in the flag column for 2021 will swap to zeroes and 1s, with the need to only bring through the rows with a 1 in the flag column, again).
The need to bring through next years values will reoccur at the end of every year so the idea is to future proof this from further changes so adding a when year =2021 is not an option.
The original table has the company, year and product so when I join it will be on these three fields.
Any thoughts, let me know
Thanks
Is this what you want?
select t.*
from mytable t
where flag = 1 or year = extract(year from current_date)
This brings rows where flag has value 1 or where year is the current year.
Note that this uses standard date functions extract() and current_date - not all databases support this syntax, but they all have equivalent.

Pandas: Group by two columns to get sum of another column

I look most of the previously asked questions but was not able to find answer for my question:
I have following data.frame
id year month score num_attempts
0 483625 2010 01 50 1
1 967799 2009 03 50 1
2 213473 2005 09 100 1
3 498110 2010 12 60 1
5 187243 2010 01 100 1
6 508311 2005 10 15 1
7 486688 2005 10 50 1
8 212550 2005 10 500 1
10 136701 2005 09 25 1
11 471651 2010 01 50 1
I want to get following data frame
year month sum_score sum_num_attempts
2009 03 50 1
2005 09 125 2
2010 12 60 1
2010 01 200 2
2005 10 565 3
Here is what I tried:
sum_df = df.groupby(by=['year','month'])['score'].sum()
But this doesn't look efficient and correct. If I have more than one column need to be aggregate this seems like a very expensive call. for example if I have another column num_attempts and just want to sum by year month as score.
This should be an efficient way:
sum_df = df.groupby(['year','month']).agg({'score': 'sum', 'num_attempts': 'sum'})

sql running total math current quarter

Im trying to figure out the total for the quarter when the only data shown is a running total for the year:
Id Amount Periods Year Type Date
-------------------------------------------------------------
1 65 2 2014 G 4-1-12
2 75 3 2014 G 7-1-12
3 25 1 2014 G 1-1-12
4 60 1 2014 H 1-1-12
5 75 1 2014 Y 1-1-12
6 120 3 2014 I 7-1-12
7 30 1 2014 I 1-1-12
8 90 2 2014 I 4-1-12
In the data shown above. The items in type G and I are running totals for the period (in qtrs). If my query returns period 3, is there a sql way to get the data for the qtr? The math would involve retrieving the data for the 3rd period - 2nd period.
Right now my sql is something like:
SELECT * FROM data WHERE Date='4-1-12';
In this query, it will return row #1, which is a total for 2 periods. I would like it to return just the total for the 2nd period. Im looking to make this happen with SQLite.
Any help would be appreciated.
Thank alot
You want to subtract the running total of the previous quarter:
SELECT Id,
Year,
Type,
Date,
Amount - IFNULL((SELECT Amount
FROM data AS previousQuarter
WHERE previousQuarter.Year = data.year
AND previousQuarter.Type = data.Type
AND previousQuarter.Periods = data.Periods - 1
), 0) AS Amount
FROM data
The IFNULL is needed to handle a quarter that has no previous quarter.

Count and where conditions leades to perfomance issues?

I am working on a million data rows table.The table look likes below
Departement year Candidate Spent Saved
Electrical 2013 A 50 50
Electrical 2013 B 25 50
Electrical 2013 C 11 50
Electrical 2013 D 25 0
Electrical 2013 Dt 86 50
Electrical 2014 AA 50 50
Electrical 2014 BB 25 0
Electrical 2014 CH 11 50
Electrical 2014 DG 25 0
Electrical 2014 DH 0 50
Computers 2013 Ax 50 50
Computers 2013 Bc 25 50
Computers 2013 Cx 11 50
Computers 2013 Dx 25 0
Computers 2013 Dx 86 50
I am looking output like below.
Departement year NoOfCandidates NoOfCandidatesWith50$save NoOfCandidatesWith0$save
Electrical 2013 5 4 1
Electrical 2014 5 3 2
Computers 2013 5 4 1
I am using #TEMP tables for every count where conditions and left outer joining at last .So it takes me more time.
Is there any way so i can perform better for above Table .
Thanks in advance.
You want to do this as a single aggregation query. There is no need for temporary tables:
select department, year, count(*) as NumCandidates,
sum(case when saved = 50 then 1 else 0 end) as NumCandidatesWith50Save
sum(case when saved = 0 then 1 else 0 end) as NumCandidatesWith00Save
from table t
group by department, year
order by 1, 2;

Normalize a Table That Contains Monthly, Yearly and Quarterly Data

How do I normalize this table:
Frequency (PK) Year (PK) Quarter (PK) Month (PK) Value
Monthly 2013 1 1 1
Quarterly 2013 1 0 2
Yearly 2013 0 0 3
The table is not in 2nd normal form, because when Frequency = Yearly Value depends on a subset of the primary key (Frequency, Year)
I've thougt about adding a surrogate key. Then Quarter and Month columns could be nullable.
Surrogate (PK) Frequency Year Quarter Month Value
1 Monthly 2013 1 1 1
2 Quarterly 2013 1 NULL 2
3 Yearly 2013 NULL NULL 3
But this doesn't solve the problem, because the 2nd normal form definition also applies to candidate keys. Dividing the table into three tables based on Frequency doesn't sound like a good idea, because it will introduce if statemments into my business logic:
if (frequency == Monthly) then select from DataMonthly
I'm going to assume that a couple of year's worth of data might look something like this. Correct me if I'm wrong. (I'm going to ignore the issue of whether using zeroes is a good idea or a bad idea.)
Frequency Year Quarter Month Value
--
Monthly 2012 1 1 1
Monthly 2012 1 2 2
Monthly 2012 1 3 3
Monthly 2012 2 4 4
Monthly 2012 2 5 5
Monthly 2012 2 6 6
Monthly 2012 3 7 7
Monthly 2012 3 8 8
Monthly 2012 3 9 9
Monthly 2012 4 10 10
Monthly 2012 4 11 11
Monthly 2012 4 12 12
Quarterly 2012 1 0 2
Quarterly 2012 2 0 5
Quarterly 2012 3 0 8
Quarterly 2012 4 0 11
Yearly 2012 0 0 3
Monthly 2013 1 1 1
Monthly 2013 1 2 2
Monthly 2013 1 3 3
Monthly 2013 2 4 4
Monthly 2013 2 5 5
Monthly 2013 2 6 6
Monthly 2013 3 7 7
Monthly 2013 3 8 8
Monthly 2013 3 9 9
Monthly 2013 4 10 10
Monthly 2013 4 11 11
Monthly 2013 4 12 12
Quarterly 2013 1 0 2
Quarterly 2013 2 0 5
Quarterly 2013 3 0 8
Quarterly 2013 4 0 11
Yearly 2013 0 0 3
From that data we can deduce two functional dependencies. A functional dependency answers the question, "Given one value for the set of attributes 'X', do we know one and only one value for the set of attributes 'Y'?"
{Year, Quarter, Month}->Frequency
{Year, Quarter, Month}->Value
Given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Frequency}. And given one value for the set of attributes {Year, Quarter, Month}, we know one and only one value for the set of attributes {Value}.
The problem you were running into involved including "Frequency" as part of the primary key. It's really not.
This table could do probably without the [Frequency] and [Quarter] column.
Why do you want to have these in? Is there any added value in having the Quarterly and Yearly values precalculated in this table? Comment: Since it's Value's are not just the sum of it's Month's.
So [Quarter] is mandatory.
This will work too:
Year (PK) Quarter (PK) Month (PK) Value
2013 1 1 1
2013 1 0 2
2013 0 0 3
Yearly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 0 AND [Month] = 0
Quarterly results:
SELECT
[Value]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 0
Monthly results:
SELECT
[Value] AS [Results]
FROM [Table1]
WHERE [Year] = 2013 AND [Quarter] = 1 AND [Month] = 1
Would this work for you?