I have pandas data frame like below.
df
Out[50]:
0 1 2 3 4 5 6 7 8 9 ... 90 91 92 93 94 95 96 97 \
0 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1
1 0 1 1 1 0 0 1 1 1 1 ... 0 0 0 0 0 0 0 0
2 1 1 1 1 1 1 1 1 1 1 ... 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1
4 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1
5 1 0 0 1 1 1 1 0 0 0 ... 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1
7 0 0 0 0 0 0 0 0 0 0 ... 1 1 1 1 1 1 1 1
[8 rows x 100 columns]
I have target variable as an array as below.
[1, -1, -1, 1, 1, -1, 1, 1]
How can I map this target variable to a data frame and convert it into lib SVM format?.
equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}
df["labels"] = df.index.map[(equi)]
d = df[np.setdiff1d(df.columns,['indx','labels'])]
e = df.label
dump_svmlight_file(d,e,'D:/result/smvlight2.dat')er code here
ERROR:
File "D:/spyder/april.py", line 54, in <module>
df["labels"] = df.index.map[(equi)]
TypeError: 'method' object is not subscriptable
When I use
df["labels"] = df.index.list(map[(equi)])
ERROR:
AttributeError: 'RangeIndex' object has no attribute 'list'
Please help me to solve those errors.
I think you need convert index to_series and then call map:
df["labels"] = df.index.to_series().map(equi)
Or use rename of index:
df["labels"] = df.rename(index=equi).index
All together:
For difference of columns pandas has difference:
from sklearn.datasets import dump_svmlight_file
equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}
df["labels"] = df.rename(index=equi).index
e = df["labels"]
d = df[df.columns.difference(['indx','labels'])]
dump_svmlight_file(d,e,'C:/result/smvlight2.dat')
Also it seems label column is not necessary:
from sklearn.datasets import dump_svmlight_file
equi = {0:1, 1:-1, 2:-1,3:1,4:1,5:-1,6:1,7:1}
e = df.rename(index=equi).index
d = df[df.columns.difference(['indx'])]
dump_svmlight_file(d,e,'C:/result/smvlight2.dat')
Excel Formulas I am trying to replicate in pandas:
Click here to download workbook
* Look at columns D, E and F
entsig and exsig are manual and can be changed. In real life they would be derived from the value of another column or a comparison of two other columns
ent = 1 if entsig previous = 1 and in = 0
in = 1 if ent previous = 1 or (in previous = 1 and ex = 0)
ex = 1 if exsig previous = 1 and in previous = 1
so either ent, in, or ex will always be = 1 but never more than one of them
import pandas as pd
df = pd.DataFrame(
[[0,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [0,0,0,0,0],
[0,1,0,0,0], [0,1,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [0,0,0,0,0],
[0,0,0,0,0], [0,0,0,0,0], [0,1,0,0,0], [0,1,0,0,0], [0,1,0,0,0],
[0,0,0,0,0], [0,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0],
[1,1,0,0,0], [0,1,0,0,0], [0,1,0,0,0], [0,1,0,0,0]],
columns=['entsig', 'exsig','ent', 'in', 'ex'])
for i in df.index:
df['ent'][(df.entsig.shift(1)==1) & (df['ent'].shift(1) == 0) & (df['in'].shift(1) == 0)]=1
df['ex'][(df.exsig.shift(1)==1) & (df['in'].shift(1)==1)]=1
df['in'][(df.ent.shift(1)==1) | ((df['in'].shift(1)==1) & (df['ex']==0))]=1
for j in df.index:
df['ent'][df['in'] == 1]=0
df['in'][df['ex']==1]=0
df['ex'][df['ex'].shift(1)==1]=0
df
results in
entsig exsig ent in ex
0 0 0 0 0 0
1 1 0 0 0 0
2 1 0 1 0 0
3 1 0 0 1 0
4 0 0 0 1 0
5 0 1 0 1 0
6 0 1 0 0 1
7 1 0 0 0 0
8 1 0 1 0 0
9 0 0 0 1 0
10 0 0 0 1 0
11 0 0 0 1 0
12 0 1 0 1 0
13 0 1 0 0 1
14 0 1 0 0 0
15 0 0 0 0 0
16 0 0 0 0 0
17 1 0 0 0 0
18 1 0 1 0 0
19 1 0 0 1 0
20 1 1 0 1 0
21 0 1 0 0 1
22 0 1 0 0 0
23 0 1 0 0 0
Question
How can I make this code faster? It runs slow because it's a loop but I have not been able to come up with a solution that does not use loops. Any ideas or comments are appreciated.
If we can assume every group of 1's in entsig is followed by at least one 1 in
exsig, then you could compute ent, ex and in like this:
def ent_in_ex(df):
entsig_mask = (df['entsig'].diff().shift(1) == 1)
exsig_mask = (df['exsig'].diff().shift(1) == 1)
df.loc[entsig_mask, 'ent'] = 1
df.loc[exsig_mask, 'ex'] = 1
df['in'] = df['ent'].shift(1).cumsum().subtract(df['ex'].cumsum(), fill_value=0)
return df
If we can make this assumption, then ent_in_ex is significantly faster:
In [5]: %timeit orig(df)
10 loops, best of 3: 185 ms per loop
In [6]: %timeit ent_in_ex(df)
100 loops, best of 3: 2.23 ms per loop
In [95]: orig(df).equals(ent_in_ex(df))
Out[95]: True
where orig is the original code:
def orig(df):
for i in df.index:
df['ent'][(df.entsig.shift(1)==1) & (df['ent'].shift(1) == 0) & (df['in'].shift(1) == 0)]=1
df['ex'][(df.exsig.shift(1)==1) & (df['in'].shift(1)==1)]=1
df['in'][(df.ent.shift(1)==1) | ((df['in'].shift(1)==1) & (df['ex']==0))]=1
for j in df.index:
df['ent'][df['in'] == 1]=0
df['in'][df['ex']==1]=0
df['ex'][df['ex'].shift(1)==1]=0
return df
I'm very new to VBA and programming, so this might be a dumb question. I have written the following code:
Function central(X)
Dim xc(300, 10), xa(200)
m = X.Rows.Count
n = X.Columns.Count
For j = 1 To n
xa(j) = 0
For i = 1 To m
xa(j) = xa(j) + X(i, j)
Next i
xa(j) = xa(j) / m
For i = 1 To m
xc(i, j) = X(i, j) - xa(j)
Next i
Next
central = xc()
End Function
This should output a matrix whose elements are subtracted from the average value of their columns.
My problem is that the output is shifted with one row and column. So for example for this table:
1 1 1
2 2 2
3 3 3
it gives me:
0 0 0
0 -1 -1
0 0 0
Thanks in advance!
I am trying to do something very similar to this question: mysql - UPDATEing row based on other rows
I have a table, called modset, of the following form:
member year y1 y2 y3 y1y2 y2y3 y1y3 y1y2y3
a 1 0 0 0 0 0 0 0
a 2 0 0 0 0 0 0 0
a 3 0 0 0 0 0 0 0
b 1 0 0 0 0 0 0 0
b 2 0 0 0 0 0 0 0
c 1 0 0 0 0 0 0 0
c 3 0 0 0 0 0 0 0
d 2 0 0 0 0 0 0 0
Columns 3:9 are binary flags to indicate which combination of years the member has records in. So I wish the result of an SQL update to look as follows:
member year y1 y2 y3 y1y2 y2y3 y1y3 y1y2y3
a 1 0 0 0 0 0 0 1
a 2 0 0 0 0 0 0 1
a 3 0 0 0 0 0 0 1
b 1 0 0 0 1 0 0 0
b 2 0 0 0 1 0 0 0
c 1 0 0 0 0 0 1 0
c 3 0 0 0 0 0 1 0
d 2 0 1 0 0 0 0 0
The code in the question linked above does something very close but only when it is a count of the distinct years in which the member has records. I need to base the columns on the specific values of the years in which the member has records.
Thanks in advance!
SOLUTION
SELECT member,
case when min(distinct(year)) = 1 and max(distinct(year)) = 1 then 1 else 0 end y1,
case when min(distinct(year)) = 1 and max(distinct(year)) = 2 then 1 else 0 end y1y2,
case when min(distinct(year)) = 1 and max(distinct(year)) = 3 and count(distinct(year)) = 2 then 1 else 0 end y1y3,
case when min(distinct(year)) = 1 and max(distinct(year)) = 3 and count(distinct(year)) = 3 then 1 else 0 end y1y2y3,
case when min(distinct(year)) = 2 and max(distinct(year)) = 2 then 1 else 0 end y2,
case when min(distinct(year)) = 2 and max(distinct(year)) = 3 then 1 else 0 end y2y3,
case when min(distinct(year)) = 3 then 1 else 0 end y3
INTO temp5
FROM modset
GROUP BY member;
UPDATE modset M
SET y1 = T.y1, y2 = T.y1, y3 = T.y3, y1y2 = T.y1y2, y1y3 = T.y1y3, y2y3 = T.y2y3, y1y2y3 = T.y1y2y3
FROM temp5 T
WHERE T.member = M.member;
What is the query you are using to return the indicators of the years the member has records in?
It sounds like you would want take your query results and use it in your update:
http://dev.mysql.com/doc/refman/5.0/en/update.html
It may look something like this:
UPDATE targetTable t, sourceTable s
SET t.y1 = s.y1, t.y2 = s.y2 -- (and so on...)
WHERE t.member = s.member AND t.year = m.year;
Example :
If a got word "don" then file will contain
ddd
ddo
ddn
dod
doo
don
dnd
dno
dnn
odd
odo
odn
ood
<...>
I have no idea to do this. Not less then 3 symbol words.
I presented a solution in Experts Exchange, which you may not be able to see (if you never payed them) so I copy it for you:
Question was:
I have n items and each item can be assigned a 1 or a 2. So I would like to get the matrix result that would generate all possible combinations.
For eg. if n= 3 , then the possible outcomes are : I need an algorithm that can generate this series for n . Please help thanks. ideally i would like to store the result in a datatable
1 1 1
1 1 2
1 2 1
2 1 1
2 1 2
1 2 2
2 2 1
2 2 2
Answer:
Dim HighestValue As Integer = 2 ' max value
Dim NrOfValues As Integer = 3 ' nr of values in one result
Dim Values(NrOfValues) As Integer
Dim i As Integer
For i = 0 To NrOfValues - 1
Values(i) = 1
Next
Values(NrOfValues - 1) = 0 ' to generate first as ALL 1
For i = 1 To HighestValue ^ NrOfValues
Values(NrOfValues - 1) += 1
For j As Integer = NrOfValues - 1 To 0 Step -1
If Values(j) > HighestValue Then
Values(j) = 1
Values(j - 1) += 1
End If
Next
Dim Result As String = ""
For j As Integer = 0 To NrOfValues - 1
Result = Result & CStr(Values(j))
Next
Debug.WriteLine(Result)
Next
Ok Here's the solution, you just need to change the Debug.Writeline with a write to your file
Dim HighestValue As Integer = 3 ' max value
Dim NrOfValues As Integer = 3 ' nr of values in one result
Dim Values(NrOfValues) As Integer
Dim i As Integer
For i = 0 To NrOfValues - 1
Values(i) = 1
Next
Values(NrOfValues - 1) = 0 ' to generate first as ALL 1
For i = 1 To HighestValue ^ NrOfValues
Values(NrOfValues - 1) += 1
For j As Integer = NrOfValues - 1 To 0 Step -1
If Values(j) > HighestValue Then
Values(j) = 1
Values(j - 1) += 1
End If
Next
Dim Result As String = ""
For j As Integer = 0 To NrOfValues - 1
If Values(j) = 1 Then Result = Result & "d"
If Values(j) = 2 Then Result = Result & "o"
If Values(j) = 3 Then Result = Result & "n"
'Result = Result & CStr(Values(j))
Next
Debug.WriteLine(Result)
Next