SPSS using value of one cell to call another cell - conditional-statements

Below is some data:
Test Day1 Day2 Score
A 1 2 100
B 1 3 62
C 3 4 90
D 2 4 20
E 4 5 80
I am trying to take the values from column 'day' and 'day2' and use them to select the row number for the column score. For example for Test A I would like to find the sum of 100 and 62 because that is the values of the first and second rows of score. Test B I would like to find the sum of 100, 62 and 90.
Does anyone have any ideas on how to go about doing this? I am looking to use something similar to the indirect function in Excel? Thank You

The trick is to convert variable "Score" as a row. Could not think of an easy way how to avoid SAVE/GET - room for improvements.
file handle tmp
/name = "C:\DATA\Temp".
***.
data list free /Test (a1) Day1 (f8) Day2 (f8) Score (f8).
begin data
A 1 2 100
B 1 3 62
C 3 4 90
D 2 4 20
E 4 5 80
end data.
comp f = 1.
var wid all (12).
save out "tmp\data.sav".
***.
get "tmp\data.sav"
/keep score.
flip.
comp f = 1.
match files
/file "tmp\data.sav"
/table *
/by f
/drop case_lbl.
comp stat = 0.
do rep var = var001 to var005
/k = 1 to 5.
if range(k, Day1, Day2) stat = sum(stat, var).
end rep.
list Test Day1 Day2 Score stat.
The result:
Test Day1 Day2 Score stat
A 1 2 100 162
B 1 3 62 252
C 3 4 90 110
D 2 4 20 172
E 4 5 80 100
Number of cases read: 5 Number of cases listed: 5

Related

Pandas DataFrame subtract values

Im new to python
I have a data frame (df) which has the following structure:
ID
rate
Sequential number
a
150
1
a
150
1
a
50
2
b
250
1
c
25
1
d
25
1
d
40
2
d
25
3
The ID are customers, the value are monthly rates and Sequential number is a number that always increases by 1, if the customer changes the monthly rate
I want to do the following:
for every ID find the maximum value in the column Sequential number, take the associated value in the column rate, find the minimum value in the column Sequential number and take associated value in the column rate and subtracting the rates.
At the end I want to have a additional column to my data frame with the difference of the rates. Maybe the loop could do the following:
for id in df()
find max() in column Sequential number and get value in rates -
min () in column Sequential number and get value in rates
return difference
The new df_new should be this
ID
rate
Sequential number
rate_diff
a
150
1
0
a
150
1
0
a
50
2
-100
b
250
1
0
c
25
1
0
d
25
1
0
d
40
2
0
d
30
3
5
If an ID has only one entry, the rate_diff should be 0
I tried already the lambda Function:
df['diff_rate'] = df.groupby('ID')['rate'].transform(lambda x : x-x.min())
but this returns
ID
rate
Sequential number
rate_diff
a
150
1
100
a
150
1
100
a
50
2
0
b
250
1
0
c
25
1
0
d
25
1
0
d
40
2
15
d
30
3
10
Maybe someone of you have a small workaround for this! :-)
One approach with indexing:
g = df.groupby('ID')['Sequential number']
IMAX = g.idxmax()
IMIN = g.idxmin()
df['rate_diff'] = 0
df.loc[IMAX, 'rate_diff'] = (df.loc[IMAX, 'rate'].to_numpy()
-df.loc[IMIN, 'rate'].to_numpy()
)
Another with groupby.transform+where:
g = df.sort_values(by=['ID', 'Sequential number']).groupby('ID')
m = g['Sequential number'].idxmax()
df['rate_diff'] = (g['rate'].transform(lambda x: x.iloc[-1]-x.iloc[0])
.where(df.index.isin(m), 0)
)
output:
ID rate Sequential number rate_diff
0 a 150 1 0
1 a 150 1 0
2 a 50 2 -100
3 b 250 1 0
4 c 25 1 0
5 d 25 1 0
6 d 40 2 0
7 d 30 3 5

Assigning Score based on Order Sequence in pandas

Following are the dataframes I have
score_df
col1_id col2_id score
1 2 10
5 6 20
records_df
date col_id
D1 6
D2 4
D3 1
D4 2
D5 5
D6 7
I would like to compute a score based on the following criteria:
When 2 occurs after 1 the score should be assigned 10 or when 1 occurs after 2, score should be assigned 10.
i.e when (1,2) gives a score 10 .. (2,1) also get the same score 10.
considering (1,2) . When 1 occurs first time we dont assign a score. We flag the row and wait for 2 to occur. When 2 occurs in the column we give the score 10.
considering (2,1). When 2 comes first. We assign value 0 and wait for 1 to occur. When 1 occurs, we give the score 10.
So, for the first time - dont assign the score and wait for the corresponding event to occur and then assign the score
So, my result dataframe should look something like this
result
date col_id score
D1 6 0 -- Eventhough 6 is there in score list, it occured for first time. So 0
D2 4 0 -- 4 is not even there in list
D3 1 0 -- 1 occurred for first time . So 0
D4 2 10 -- 1 occurred previously. 2 occurred now.. we can assign 10.
D5 5 20 -- 6 occurred previously. we can assign 20
D6 7 0 -- 7 is not in the list
I have around 100k rows in both score_df and record_df. Looping and assigning score is taking the time. Can someone help with logic without looping the entire dataframe?
From what i understand , you can try melt for unpivotting and then merge. keeping the index from the melted df , we check where the index is duplicated , and then return score from the merge else 0.
m = score_df.reset_index().melt(['index','uid','score'],
var_name='col_name',value_name='col_id')
final = records_df.merge(m.drop('col_name',1),on=['uid','col_id'],how='left')
c = final.duplicated(['index']) & final['index'].notna()
final = final.drop('index',1).assign(score=lambda x: x['score'].where(c,0))
print(final)
uid date col_id score
0 123 D1 6 0.0
1 123 D2 4 0.0
2 123 D3 1 0.0
3 123 D4 2 10.0
4 123 D5 5 20.0
5 123 D6 7 0.0

iterrows() of 2 columns and save results in one column

in my data frame I want to iterrows() of two columns but want to save result in 1 column.for example df is
x y
5 10
30 445
70 32
expected output is
points sequence
5 1
10 2
30 1
445 2
I know about iterrows() but it saved out put in two different columns.How can I get expected output and is there any way to generate sequence number according to condition? any help will be appreciated.
First never use iterrows, because really slow.
If want 1, 2 sequence by number of columns convert values to numy array by DataFrame.to_numpy and add numpy.ravel, then for sequence use numpy.tile:
df = pd.DataFrame({'points': df.to_numpy().ravel(),
'sequence': np.tile([1,2], len(df))})
print (df)
points sequence
0 5 1
1 10 2
2 30 1
3 445 2
4 70 1
5 32 2
Do this way:
>>> pd.DataFrame([i[1] for i in df.iterrows()])
points sequence
0 5 1
1 10 2
2 30 1
3 445 2

pandas replace multiple values (that you do not know) on one column

What is the best way to change several values in a column ('Status') that differ from the only two values that you want to analyse?
As an example, my df is:
Id Status Email Product Age
1 ok g# A 20
5 not ok l# J 45
1 A a# A 27
2 B h# B 25
2 ok t# B 33
3 C b# E 23
4 not ok c# D 30
In the end, I want to have:
Id Status Email Product Age
1 ok g# A 20
5 not ok l# J 45
1 other a# A 27
2 other h# B 25
2 ok t# B 33
3 other b# E 23
4 not ok c# D 30
The greatest difficulty is that my df is very huge, so I do not know all the others values different than 'ok' and 'not ok' (the values that I want to analise).
Thanks in advance!
np.where + isin
df.Status=np.where(df.Status.isin(['ok','not ok']),df.Status,'Others')
df
Out[384]:
Id Status Email Product Age
0 1 ok g# A 20
1 5 not ok l# J 45
2 1 Others a# A 27
3 2 Others h# B 25
4 2 ok t# B 33
5 3 Others b# E 23
6 4 not ok c# D 30
use an apply
df['Status'] = df.apply(lambda x: 'other' if x['Status'] not in ['ok', 'not ok'] else x['Status'], axis=1)

Retrieve value from different fields for each record of an Access table

I would be more than appreciative for some help here, as I have been having some serious problems with this.
Background:
I have a list of unique records. For each record I have a monotonically increasing pattern (either A, B or C), and a development position (1 to 5) assigned to it.
So each of the 3 patterns is set out in five fields representing the development period.
Problem:
I need to retrieve the percentages relating to the relevant development periods, from different fields for each row. It should be in a single column called "Output".
Example:
Apologies, not sure how to attach a table here, but the fields are below, the table is a transpose of these fields.
ID - (1,2,3,4,5)
Pattern - (A, B, C, A, C)
Dev - (1,5,3,4,2)
1 - (20%, 15%, 25%, 20%, 25%)
2 - (40%, 35%, 40%, 40%, 40%)
3 - (60%, 65%, 60%, 60%, 60%)
4 - (80%, 85%, 65%, 80%, 65%)
5 - (100%, 100%, 100%, 100%, 100%)
Output - (20%, 100%, 60%, 80%, 40%)
In MS Excel, I could simply use a HLOOKUP or OFFSET function to do this. But how do I do this in Access? The best I have come up with so far is Output: Eval([Category]) but this doesn't seem to achieve what I want which is to select the "Dev" field, and treat this as a field when building an expression.
In practice, I have more than 100 development periods to play with, and over 800 different patterns, so "switch" methods can't work here I think.
Thanks in advance,
alch84
Assuming that
[ID] is a unique column (primary key), and
the source column for [Output] only depends on the value of [Dev]
then this seems to work:
UPDATE tblAlvo SET Output = DLOOKUP("[" & Dev & "]", "tblAlvo", "ID=" & ID)
Before:
ID Pattern Dev 1 2 3 4 5 Output
-- ------- --- -- -- -- -- --- ------
1 A 1 20 40 60 80 100
2 B 5 15 35 65 85 100
3 C 3 25 40 60 65 100
4 A 4 20 40 60 80 100
5 C 2 25 40 60 65 100
After:
ID Pattern Dev 1 2 3 4 5 Output
-- ------- --- -- -- -- -- --- ------
1 A 1 20 40 60 80 100 20
2 B 5 15 35 65 85 100 100
3 C 3 25 40 60 65 100 60
4 A 4 20 40 60 80 100 80
5 C 2 25 40 60 65 100 40