I load data from a csv file. When I try to save the labels column into a variable I am getting the error saying label, and the dataset was taken from https://www.kaggle.com/c/digit-recognizer/data
d0 = pd.read_csv('./test.csv')
print(d0.head(5)) # print first five rows of d0.
# to save the labels into a variable l.
l = (d0['label'])
got output:
pixel0 pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 \
0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
pixel9 ... pixel774 pixel775 pixel776 pixel777 pixel778 \
0 0 ... 0 0 0 0 0
1 0 ... 0 0 0 0 0
2 0 ... 0 0 0 0 0
3 0 ... 0 0 0 0 0
4 0 ... 0 0 0 0 0
pixel779 pixel780 pixel781 pixel782 pixel783
0 0 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 0 0 0 0 0
[5 rows x 784 columns]
I'm getting error
KeyError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3077 try:
-> 3078 return self._engine.get_loc(key)
3079 except KeyError:
KeyError: 'label'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
in
15
16 # save the labels into a variable l.
---> 17 l = (d0['label'])
18
19 # Drop the label feature and store the pixel data in d.
KeyError: 'label'
I cant understand the error described in anaconda
Dropped some error code as stack is showing add more detail error
Please help me on solving this problem
I have a Pandas dataframe which I want to transform in the following way: I have some sensor data from an intelligent floor which is in column "CAPACITANCE" (split by ",") and that data comes from the device indicated in column "DEVICE". Now I want to have one row with a column per sensor - each device has 8 sensors, so I want to have devices x 8 columns and in that column I want the sensor data from exactly that sensor.
But my code seems to be super slow since I have about 90.000 rows in that dataframe! Does anyone have a suggestion how to speed it up?
BEFORE:
CAPACITANCE DEVICE TIMESTAMP \
0 0.00,-1.00,0.00,1.00,1.00,-2.00,13.00,1.00 01,07 2017/11/15 12:24:42
1 0.00,0.00,-1.00,-1.00,-1.00,0.00,-1.00,0.00 01,07 2017/11/15 12:24:42
2 0.00,-1.00,-2.00,0.00,0.00,1.00,0.00,-2.00 01,07 2017/11/15 12:24:43
3 2.00,0.00,-2.00,-1.00,0.00,0.00,1.00,-2.00 01,07 2017/11/15 12:24:43
4 1.00,0.00,-2.00,1.00,1.00,-3.00,5.00,1.00 01,07 2017/11/15 12:24:44
AFTER:
01,01-0 01,01-1 01,01-2 01,01-3 01,01-4 01,01-5 01,01-6 01,01-7 \
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
01,02-0 01,02-1 ... 05,07-1 05,07-2 05,07-3 05,07-4 05,07-5 \
0 0 0 ... 0 0 0 0 0
1 0 0 ... 0 0 0 0 0
2 0 0 ... 0 0 0 0 0
3 0 0 ... 0 0 0 0 0
4 0 0 ... 0 0 0 0 0
05,07-6 05,07-7 TIMESTAMP 01,07-8
0 0 0 2017-11-15 12:24:42 1.00
1 0 0 2017-11-15 12:24:42 0.00
2 0 0 2017-11-15 12:24:43 -2.00
3 0 0 2017-11-15 12:24:43 -2.00
4 0 0 2017-11-15 12:24:44 1.00
# creating new dataframe based on the old one
floor_df_resampled = floor_df.copy()
floor_device = ["01,01", "01,02", "01,03", "01,04", "01,05", "01,06", "01,07", "01,08", "01,09", "01,10",
"02,01", "02,02", "02,03", "02,04", "02,05", "02,06", "02,07", "02,08", "02,09", "02,10",
"03,01", "03,02", "03,03", "03,04", "03,05", "03,06", "03,07", "03,08", "03,09",
"04,01", "04,02", "04,03", "04,04", "04,05", "04,06", "04,07", "04,08", "04,09",
"05,06", "05,07"]
# creating new columns
floor_objects = []
for device in floor_device:
for sensor in range(8):
floor_objects.append(device + "-" + str(sensor))
# merging new columns
floor_df_resampled = pd.concat([floor_df_resampled, pd.DataFrame(columns=floor_objects)], ignore_index=True, sort=True)
# part that takes loads of time
for index, row in floor_df_resampled.iterrows():
obj = row["DEVICE"]
sensor_data = row["CAPACITANCE"].split(',')
for idx, val in enumerate(sensor_data):
col = obj + "-" + str(idx + 1)
floor_df_resampled.loc[index, col] = val
floor_df_resampled.drop(["DEVICE"], axis=1, inplace=True)
floor_df_resampled.drop(["CAPACITANCE"], axis=1, inplace=True)
Like commented, I'm not sure why you want that many columns, but the new columns can be created as follows:
def explode(x):
dev_name = x.DEVICE.iloc[0]
ret_df = x.CAPACITANCE.str.split(',', expand=True).astype(float)
ret_df.columns = [f'{dev_name}-{col}' for col in ret_df.columns]
return ret_df
new_df = df.groupby('DEVICE').apply(explode).fillna(0)
and then you can merge this with the old data frame:
df = df.join(new_df)
Given a matrix Z[n,m]:
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
1 0 0 0 0
0 0 1 0 0
I'd like to check how many "1" there are in the different columns of the matrix. So given k=1 in this case, the problem should be unsatisfiable since in the column there are 2 "1", so "number of 1">k. I tried in this way but it doesn't work:
constraint forall(i in n, j in m) forall(k in n) k<=( Z[i,j]\/Z[k,j])
Where am I wrong?
In the case I have this variables how I can do?
int b;
int: k;
set of int: PEOPLE = 1..p;
set of int: STOPS = 1..s;
array [1..b, PEOPLE, STOPS] of var bool: Z;
Z[1]
0 0 0 0 0
0 0 1 0 0
0 0 0 0 0
1 0 0 0 0
0 0 1 0 0
Z[2]
0 1 0 0 0
0 0 0 0 0
0 1 0 0 0
0 0 0 0 0
0 0 0 0 0
p = 5;
s =5;
k=1;
b=2;
So in this case the result should be:
Z[1]: 1 0 1 0 0 , the number of "1" is 2, "2 > K"
Z[2]: 0 1 0 0 0, the number of "1" is 1, "1<=K"
UNSATISFIABLE
I just solved in this way:
array [1..b, STOPS] of var bool: M;
constraint forall (m in 1..b) ( forall (j in STOPS) ( M[m,j]= exists([Z[m,i,j] | i in PEOPLE ])));
constraint forall (m in 1..b) ( let {
var int: s = sum (j in STOPS)(M[m,j]>0);
} in
s <= t );
thank you all for the answers :)
I am doing a News recommendation system and I need to build a table for users and news they read. my raw data just like this :
001436800277225 [12,456,157]
009092130698762 [248]
010003000431538 [361,521,83]
010156461231357 [173,67,244]
010216216021063 [203,97]
010720006581483 [86]
011199797794333 [142,12,86,411,201]
011337201765123 [123,41]
011414545455156 [62,45,621,435]
011425002581540 [341,214,286]
the first column is userID, the second column is the newsID.newsID is a index column, for example, after transformation, [12,456,157] in the first row means that this user has read the 12th, 456th and 157th news (in sparse vector, the 12th column, 456th column and 157th column are 1, while other columns have value 0). And I want to change these data into a sparse vector format that can be used as input vector in Kmeans or DBscan algorithm of sklearn.
How can I do that?
One option is to construct the sparse matrix explicitly. I often find it easier to build the matrix in COO matrix format and then cast to CSR format.
from scipy.sparse import coo_matrix
input_data = [
("001436800277225", [12,456,157]),
("009092130698762", [248]),
("010003000431538", [361,521,83]),
("010156461231357", [173,67,244])
]
NUMBER_MOVIES = 1000 # maximum index of the movies in the data
NUMBER_USERS = len(input_data) # number of users in the model
# you'll probably want to have a way to lookup the index for a given user id.
user_row_map = {}
user_row_index = 0
# structures for coo format
I,J,data = [],[],[]
for user, movies in input_data:
if user not in user_row_map:
user_row_map[user] = user_row_index
user_row_index+=1
for movie in movies:
I.append(user_row_map[user])
J.append(movie)
data.append(1) # number of times users watched the movie
# create the matrix in COO format; then cast it to CSR which is much easier to use
feature_matrix = coo_matrix((data, (I,J)), shape=(NUMBER_USERS, NUMBER_MOVIES)).tocsr()
Use MultiLabelBinarizer from sklearn.preprocessing
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(df.newsID), columns=mlb.classes_)
12 41 45 62 67 83 86 97 123 142 ... 244 248 286 341 361 411 435 456 521 621
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0 0 0 0 ... 0 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6 1 0 0 0 0 0 1 0 0 1 ... 0 0 0 0 0 1 0 0 0 0
7 0 1 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
8 0 0 1 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 1
9 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 1 0 0 0 0 0 0
I have a data file that contains a lot of extra data. I want to run a word macro that only keeps 5 lines (I could live with 6 if it makes it easier)
I found how to delete a row if it contains a string.
I want to keep the paragraphs that start with:
Record write time
Headband impedance
Headband Packets
Headband RSSI
Headband Status
I could live with keeping
Headband ID
I tried the following macro, based on a sample I saw here. But, I am getting an error.
Sub test()
'
' test Macro
Dim search1 As String
search1 = "record"
Dim search2 As String
search2 = "headb"
Dim para As Paragraph
For Each para In ActiveDocument.Paragraphs
Dim txt As String
txt = para.Range.Text
If Not InStr(LCase(txt), search1) Then
If Not InStr(LCase(txt), search2) Then
para.Range.Delete
End If
Next
End Sub
The error is: next without For.
I know that there may be a better way, and an open to any fix.
Sample data:
The data is:
ZEO Start data record
----------------
Record write time: 10/14/2014 20:32
Factory reset date: 10/14/2014 20:23
Headband ID: 01/01/1970 18:32
Headband impedance: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 255 241 247 190 165 154 150 156 162 177 223 202
Headband Packets: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 21 4 30 3 3 3 9 4 46 46 1
Headband RSSI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 6 254 254 250 5 255 4 3 249
Headband Status: 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 169 170 170
Hardware ID: 2
Software ID: 43
Sensor Life Reset date: Not recorded
sleep Stat Reset date: 10/14/2014 20:18
Awakenings: 0
Awakenings Average: 0
Start of night: 10/14/2014 20:28
End of night: 10/14/2014 20:32
Awakenings: 0
Awakenings Average: 0
Time in deep: 0
Time in deep average: 0
There is an End If missing. Add this immediately after the first End If - do you get the same error?
Update:
There is also an error in the If conditions. Check the InStr reference for return values.
You need to use something like If Not InStr(...) = 1 Then on both if statements.