Editting multiple rows in Pandas according to its value - pandas

I have a column which goes in a pattern like this:
RS
RS
GF
NB
BP
TO
RS
GF
NB
BP
TO
...
and I want to convert the RSs into RS1 and RS2. The first one should be RS1 and the second one should be RS2. And the one in the middle needs to be RS1. And this pattern repeats on. How would I do this in pandas?

Assuming you have a DataFrame column which repeats every 11 rows
df['col']
# col
#0 RS
#1 RS
#2 GF
#3 NB
#4 BP
#5 TO
#6 RS
#7 GF
#8 NB
#9 BP
#10 TO
#11 RS
#12 RS
# ...
then you can use simple slicing
df.ix[ df.index[0::11],'col'] = 'RS1'
df.ix[ df.index[1::11],'col'] = 'RS2'
df.ix[ df.index[6::11],'col'] = 'RS1'

Related

SAS - Conditional input statement

I would like to use conditional if...then...else to read in the following data set, to read in using one input statement if source =1 and to read in using another input statement if source = 2. Not sure where my error is. This is what I have so far and the associated error. Not sure if the pointers are needed.
DATA results2;
infile datalines missover;
input #10 source 1. #;
if source = 1 then input #1 id #4 name $ #12 score;
else if source = 2 then input #1 id #4 score #12 name $;
DATALINES;
11 john 1 77
11 88 2 james
22 bobby 1 55
22 89 2 opey
;;;;
RUN;
It is correctly reading in the id but the source is not correctly matched to the id and having an issue with the name and score.
Thanks for helping!

dynamic import for file using wildmatch

dataimport:
LOAD
#1 AS CoCd,
#2 AS Period,
#3 AS [Doc. Date],
#4 AS [Pstng Date],
#6 AS Reference,
#7 AS DocumentNo,
#8 AS Crcy,
#9 AS Year,
#10 AS [Doc. Type]
FROM
\\cagta5454\Indirect\Clients\\zz Work-in-Progress\2014\data\*_110_*GLDetl*
;
I want help in dynamic import for a file
currently file is located at
\\cagta5454\Indirect\Clients\\zz Work-in-Progress\2014\data\*_110_*GLDetl*
I am looking for a way so that I can able to do dynamic import
something like creating a variable
$dataLocation = \\cagta5454\Indirect\Clients\\zz Work-in-Progress\2014\data\*_
and $datafiles = '110','121','141'
so that instead of using number for the file containing data , I can use a variable
The for each ... next function can work for you. For more examples you can always make reference to the QV Help file c:\Program Files\QlikView\English.chm or the online version of it at https://help.qlik.com/en-US/#
let dataFiles = '110, 121, 141';
for each i in $(datafiles)
let dataLocation = '\\cagta5454\Indirect\Clients\\zz Work-in-Progress\2014\data\*_' & '$(i)' & '_*GLDetl*';
dataimport:
Load
#1 AS CoCd,
#2 AS Period,
#3 AS [Doc. Date],
#4 AS [Pstng Date],
#6 AS Reference,
#7 AS DocumentNo,
#8 AS Crcy,
#9 AS Year,
#10 AS [Doc. Type]
From
$(dataLocation)
;
next

loop through array to load data in qlikview

I am loding data in qlikview using statement
/ /Importing data from flat file
dataimport:
LOAD #1 AS CoCd,
#2 AS Period,
#3 AS [Doc. Date],
#4 AS [Pstng Date],
#5 AS TranslDate,
#6 AS Reference,
#7 AS DocumentNo,
#8 AS Crcy,
#9 AS Year,
#10 AS [Doc. Type],
\\cagesre005\*GLDetl*
(txt, codepage is 1252, no labels, delimiter is ';', msq)
where #10 = 'KA' or #10 = 'KG' or #10 = 'KR' or #10 = 'KH' or #10 = 'KN' or #10 ='AB' or #10 ='IK' or #10 ='IM' or #10 ='MM' or #10 ='RE' or #10 ='RN';
this statement loads data perfectly but it is not dynamic since if I want to change the #10 to some different value I have to make change directly to the script, I am looking for a way that loop through a array containing these values and load data to the table
something like creating variable
$(vDocTypes) = 'KA','KG','KR','KH','KN','AB','IK','IM','MM','RE' ,'RN';
which I can use in where clause that goes through the values in the array and load the data
You can always use the Match function:
set vDocTypes= 'KA','KG','KR','KH','KN','AB','IK','IM','MM','RE' ,'RN';
//Importing data from flat file
dataimport:
LOAD
#1 AS CoCd,
#2 AS Period,
#3 AS [Doc. Date],
#4 AS [Pstng Date],
#5 AS TranslDate,
#6 AS Reference,
#7 AS DocumentNo,
#8 AS Crcy,
#9 AS Year,
#10 AS [Doc. Type]
From
\\cagesre005\*GLDetl* (txt, codepage is 1252, no labels, delimiter is ';', msq)
Where
Match( #10, $(vDocTypes) ) > 0
;

SAS: How to use RETAIN statement to create a summed variable in the DATA step, equivalent to the SUM statement output in PROC PRINT

In SAS, I'm trying to create a variable that is the sum of another. In this case, I am trying to create two variables: Total_All_Ages, which is the sum of the 2013 US population POPESTIMATE2013, and Total_18Plus, which is the sum of the 2013 US population aged 18+ POPEST18PLUS2013.
I want the output of these variables to appear as though I had used the sum statement under proc print (where the sum appears at the bottom of the variable column in a new row). However, I do not want to use the print procedure. Instead, I want to create my output only using the data step.
The way I need to do this is with the retain (and input) statement.
My code is as follows:
data _NULL_;
retain Total_All_Ages Total_18Plus;
infile RAWfoldr DLM=',' firstobs=3 obs=53;
informat STATE $2. NAME $20.;
input SUMLEV REGION $ DIVISION STATE $ NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
Total_All_Ages = sum(Total_All_Ages, POPESTIMATE2013);
Total_18Plus = sum(Total_18Plus, POPEST18PLUS2013);
keep STATE NAME POPESTIMATE2013 POPEST18PLUS2013 Total_All_Ages Total_18Plus;
format POPESTIMATE2013 comma11. POPEST18PLUS2013 comma11.;
file print notitles;
if _n_=1 then put '=== U.S. Resident Population Estimates for All Ages and ===
Ages 18 or Older by State (in Alphabetical Order), 2013';
if _n_=1 then put ' ';
if _n_=1 then put #5 'FIPS Code' #16 'State Name' #40 'All Ages' #55 'Ages 18 or Older';
if _n_=1 then put ' ';
put #5 STATE #16 NAME #40 POPESTIMATE2013 #55 POPEST18PLUS2013;
run;
You can see that in my input statement, I create the two variables that I mentioned. I also mention them in my retain statement. However, I'm not sure how to make them appear in my output in the way I specified.
I want them to appear as a Total line at the bottom of the output, like this:
POPESTIMATE2013 POPEST18PLUS2013
112312234 1234123412341234
23413412341234 213412341234
============ ============ ============
Total 23423423429 242234545345
Is there a way to put these variables on a new line at the very bottom of the output (sort of like how I put the variable labels using the if _n_=1 code)?
Let me know if I need to explain myself better. I appreciate any help with this. Thank you.
If I understand your question, you're almost there.
First, add end=eof to your infile statement. This initializes a variable "eof" that is equal to 0, but will equal 1 only when SAS is reading in the last line of data. This works in a set statement as well.
Next, add this do block, which will execute when sas is on the last line of the file:
if eof then do;
put #5 9*'=' #40 11*'=' #55 11*'=';
put #5 'Total' #40 Total_All_Ages comma11. #55 Total_18Plus comma11.;
end;
Here, you use put statements to print out the formatting (repeated ='s signs) and the totals. Complete code is below:
data _NULL_;
retain Total_All_Ages Total_18Plus;
infile RAWfoldr DLM=',' firstobs=3 obs=53 end=eof;
informat STATE $2. NAME $20.;
input SUMLEV REGION $ DIVISION STATE $ NAME $ POPESTIMATE2013 POPEST18PLUS2013 PCNT_POPEST18PLUS;
Total_All_Ages = sum(Total_All_Ages, POPESTIMATE2013);
Total_18Plus = sum(Total_18Plus, POPEST18PLUS2013);
keep STATE NAME POPESTIMATE2013 POPEST18PLUS2013 Total_All_Ages Total_18Plus;
format POPESTIMATE2013 comma11. POPEST18PLUS2013 comma11.;
file print notitles;
if _n_=1 then put '=== U.S. Resident Population Estimates for All Ages and ===
Ages 18 or Older by State (in Alphabetical Order), 2013';
if _n_=1 then put ' ';
if _n_=1 then put #5 'FIPS Code' #16 'State Name' #40 'All Ages' #55 'Ages 18 or Older';
if _n_=1 then put ' ';
put #5 STATE #16 NAME #40 POPESTIMATE2013 comma11. #55 POPEST18PLUS2013 comma11.;
if eof then do;
put #5 9*'=' #40 11*'=' #55 11*'=';
put #5 'Total' #40 Total_All_Ages comma11. #55 Total_18Plus comma11.;
end;
run;
One final note on your code: you can right-align your numbers by specifying a format followed by "-r" in your put statement, e.g.:
put #5 STATE #16 NAME #40 POPESTIMATE2013 comma11.-r #55 POPEST18PLUS2013 comma11.-r;
This will override any format statement you have.

Error reading fixed-formats in SAS

Here are a few lines of data.
q 2016 55 59 580067.12 89453.03 74579.31 63005.34 54211.66
q 2016 60 64 826983.94119020.88 99145.49 85347.23 75223.34
q 2016 65 69 1080400.00139847.91116260.10103226.14 93063.24
q 2016 70 74 1086917.25120158.78100291.15 91782.05 85081.34
I saved that in a file called "junk.txt". The follow bits of SAS code behave differently on those lines.
filename junk "junk.txt";
data temp;
infile junk ;
input
#1 TYPE $ 2.
#4 year 4.
#9 age1 2.
#12 age2 2.
#15 foo 10.2
#25 bar 9.2
#34 YSD1-YSD3 9.2;
run;
proc print;
data temp;
infile junk ;
input
#1 TYPE $ 2.
#4 year 4.
#9 age1 2.
#12 age2 2.
#15 foo 10.2
#25 bar 9.2
#34 YSD1 9.2
#43 YSD2 9.2
#52 YSD3 9.2;
run;
proc print;
I get an erroneous read from the first input, and a correct read from the second input. Trying to figure out what is going on. I actually have a lot more variables than 3, so being able to use the shortcut syntax would be useful to me.
A colleague of mine was familiar with syntax I was not familiar with. The behavior of the following is the same as the behavior of my second technique.
filename junk "junk.txt";
data temp;
infile junk ;
input
#1 TYPE $ 2.
#4 year 4.
#9 age1 2.
#12 age2 2.
#15 foo 10.2
#25 bar 9.2
#34 (YSD1-YSD3) (9.2);
run;
proc print;