tensorflow 0.8 one hot encoding - tensorflow

the data that i wanna encode looks as follows:
print (train['labels'])
[ 0 0 0 ..., 42 42 42]
there are 43 classes going from 0-42
Now i read that tensorflow in version 0.8 has a new feature for one hot encoding so i tried to use it as following:
trainhot=tf.one_hot(train['labels'], 43, on_value=1, off_value=0)
only problem is that i think the output is not what i need
print (trainhot[1])
Tensor("strided_slice:0", shape=(43,), dtype=int32)
Can someone nudge me in the right direction please :)

The output is correct and expected. trainhot[1] is the label of the second (0-based index) training sample, which is of 1D shape (43,). You can play with the code below to better understand tf.one_hot:
onehot = tf.one_hot([0, 0, 41, 42], 43, on_value=1, off_value=0)
with tf.Session() as sess:
onehot_v = sess.run(onehot)
print("v: ", onehot_v)
print("v shape: ", onehot_v.shape)
print("v[1] shape: ", onehot[1])
output:
v: [[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1]]
v shape: (4, 43)
v[1] shape: Tensor("strided_slice:0", shape=(43,), dtype=int32)

Related

How to fix "tf.math.confusion_matrix()" error

I'm trying to find the confusion matrix of a multiclass classification problem. I'm using tf.math.confusion_matrix() to do that. The code snippet is as follows,
y_pred = model.predict(x_test)
y_pred = tf.argmax(y_pred, axis=1)
Y_test = tf.argmax(y_test, axis=1)
matrix = tf.math.confusion_matrix(Y_test, y_pred)
The output of Y_test is,
tf.Tensor(
[[0 2 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
[0 2 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 3 ... 0 0 0]
[0 0 2 ... 0 0 0]], shape=(2124, 279), dtype=int64)
The output of y_pred is,
tf.Tensor(
[[1 2 2 ... 0 0 0]
[0 2 3 ... 0 0 0]
[3 2 0 ... 3 1 3]
...
[3 1 0 ... 2 3 2]
[1 0 3 ... 1 1 2]
[1 0 2 ... 1 1 2]], shape=(2124, 279), dtype=int64)
Y_test[1] looks like the following,
tf.Tensor(
[0 2 0 1 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], shape=(279,), dtype=int64)
y_pred[1] looks like the following,
tf.Tensor(
[0 2 3 1 3 3 2 2 2 3 2 3 2 3 3 2 1 0 0 0 0 3 1 0 2 3 1 2 0 1 0 0 1 0 0 0 0
2 0 2 1 0 0 0 0 1 0 0 0 3 2 0 0 3 2 0 0 3 3 0 3 0 0 0 0 1 0 2 1 0 2 3 0 3
3 0 2 3 1 3 2 0 3 0 0 0 0 0 0 0 0 0 0 0 0 1 3 0 0 0 3 3 0 0 0 0 0 3 0 0 1
0 3 0 3 3 0 1 0 3 0 0 0 0 0 0 3 0 1 0 0 0 0 0 0 0 0 0 3 0 0 3 0 0 0 0 0 0
3 0 3 3 0 0 0 3 0 0 0 0 0 0 0 0 0 0 3 0 3 0 0 0 0 3 0 3 0 0 0 0 0 0 0 2 0
0 1 0 0 0 0 2 0 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 3 0 3 3 0 2 3 0 3 3
3 3 3 0 3 0 3 0 0 3 0 0 0 0 3 3 3 2 0 0 0 0 0 0 0 0 2 3 0 0 3 0 0 0 3 0 2
0 0 3 0 0 0 0 0 3 1 2 0 3 2 3 0 3 0 0 0], shape=(279,), dtype=int64)
And the error I'm getting is,
InvalidArgumentError: Dimensions [0,2) of indices[shape=[2124,2,279]] must match dimensions [0,2) of updates[shape=[2124,279]] [Op:ScatterNd]
How this can be solved?

How can I change my index vector into sparse feature vector that can be used in sklearn?

I am doing a News recommendation system and I need to build a table for users and news they read. my raw data just like this :
001436800277225 [12,456,157]
009092130698762 [248]
010003000431538 [361,521,83]
010156461231357 [173,67,244]
010216216021063 [203,97]
010720006581483 [86]
011199797794333 [142,12,86,411,201]
011337201765123 [123,41]
011414545455156 [62,45,621,435]
011425002581540 [341,214,286]
the first column is userID, the second column is the newsID.newsID is a index column, for example, after transformation, [12,456,157] in the first row means that this user has read the 12th, 456th and 157th news (in sparse vector, the 12th column, 456th column and 157th column are 1, while other columns have value 0). And I want to change these data into a sparse vector format that can be used as input vector in Kmeans or DBscan algorithm of sklearn.
How can I do that?
One option is to construct the sparse matrix explicitly. I often find it easier to build the matrix in COO matrix format and then cast to CSR format.
from scipy.sparse import coo_matrix
input_data = [
("001436800277225", [12,456,157]),
("009092130698762", [248]),
("010003000431538", [361,521,83]),
("010156461231357", [173,67,244])
]
NUMBER_MOVIES = 1000 # maximum index of the movies in the data
NUMBER_USERS = len(input_data) # number of users in the model
# you'll probably want to have a way to lookup the index for a given user id.
user_row_map = {}
user_row_index = 0
# structures for coo format
I,J,data = [],[],[]
for user, movies in input_data:
if user not in user_row_map:
user_row_map[user] = user_row_index
user_row_index+=1
for movie in movies:
I.append(user_row_map[user])
J.append(movie)
data.append(1) # number of times users watched the movie
# create the matrix in COO format; then cast it to CSR which is much easier to use
feature_matrix = coo_matrix((data, (I,J)), shape=(NUMBER_USERS, NUMBER_MOVIES)).tocsr()
Use MultiLabelBinarizer from sklearn.preprocessing
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
pd.DataFrame(mlb.fit_transform(df.newsID), columns=mlb.classes_)
12 41 45 62 67 83 86 97 123 142 ... 244 248 286 341 361 411 435 456 521 621
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 1 0 0
1 0 0 0 0 0 0 0 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0 0 0 0 ... 0 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 0 0 0 0 0 ... 1 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 1 0 0 ... 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 1 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
6 1 0 0 0 0 0 1 0 0 1 ... 0 0 0 0 0 1 0 0 0 0
7 0 1 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0
8 0 0 1 1 0 0 0 0 0 0 ... 0 0 0 0 0 0 1 0 0 1
9 0 0 0 0 0 0 0 0 0 0 ... 0 0 1 1 0 0 0 0 0 0

Pharo FileSystem: How do I write a binary file?

TabularResources testExcelSheet
from this project gives me a binary representation in a literal array of an Excel file.
````
testExcelSheet
^ #[80 75 3 4 20 0 6 0 8 0 0 0 33 0 199 122 151 144 120 1 0 0 32 6 0 0 19 0 8 2 91 67 111 110 116 101 110 116 95 84 121 112 101 115 93 46 120 109 108 32 162 4 2 40 160 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 .....
....0 109 108 80 75 1 2 45 0 20 0 6 0 8 0 0 0 33 0 126 148 213 45 209 1 0 0 250 10 0 0 16 0 0 0 0 0 0 0 0 0 0 0 0 0 233 36 0 0 120 108 47 99 97 108 99 67 104 97 105 110 46 120 109 108 80 75 5 6 0 0 0 0 13 0 13 0 74 3 0 0 232 38 0 0 0 0]
````
Question
How do I write this to the disk to see which kind of file it is?
Answer
(by Esteban, edited)
./TabularATest1.xlsx' asFileReference writeStreamDo: [ :stream |
stream
binary;
nextPutAll: self testExcelSheet ]
Easiest way to do that is something like this:
'./file.bin' asFileReference writeStreamDo: [ :stream |
stream
binary;
nextPutAll: #[1 2 3 4 5 6 7 8 9 0] ]
So the trick is just telling to the stream "be a binary file" :)

in a word macro delete everything that does not start with one of two strings

I have a data file that contains a lot of extra data. I want to run a word macro that only keeps 5 lines (I could live with 6 if it makes it easier)
I found how to delete a row if it contains a string.
I want to keep the paragraphs that start with:
Record write time
Headband impedance
Headband Packets
Headband RSSI
Headband Status
I could live with keeping
Headband ID
I tried the following macro, based on a sample I saw here. But, I am getting an error.
Sub test()
'
' test Macro
Dim search1 As String
search1 = "record"
Dim search2 As String
search2 = "headb"
Dim para As Paragraph
For Each para In ActiveDocument.Paragraphs
Dim txt As String
txt = para.Range.Text
If Not InStr(LCase(txt), search1) Then
If Not InStr(LCase(txt), search2) Then
para.Range.Delete
End If
Next
End Sub
The error is: next without For.
I know that there may be a better way, and an open to any fix.
Sample data:
The data is:
ZEO Start data record
----------------
Record write time: 10/14/2014 20:32
Factory reset date: 10/14/2014 20:23
Headband ID: 01/01/1970 18:32
Headband impedance: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 255 241 247 190 165 154 150 156 162 177 223 202
Headband Packets: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 21 4 30 3 3 3 9 4 46 46 1
Headband RSSI: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 14 0 0 6 254 254 250 5 255 4 3 249
Headband Status: 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 169 170 170
Hardware ID: 2
Software ID: 43
Sensor Life Reset date: Not recorded
sleep Stat Reset date: 10/14/2014 20:18
Awakenings: 0
Awakenings Average: 0
Start of night: 10/14/2014 20:28
End of night: 10/14/2014 20:32
Awakenings: 0
Awakenings Average: 0
Time in deep: 0
Time in deep average: 0
There is an End If missing. Add this immediately after the first End If - do you get the same error?
Update:
There is also an error in the If conditions. Check the InStr reference for return values.
You need to use something like If Not InStr(...) = 1 Then on both if statements.

Jasper Report - Subreport only printed first time

My problem is related with subreports primary, my configuration is the following:
I have a main report as shown in the image:
Trueness associated subreport:
And each of the 4 last reports has the same structure, a page header and a detail.
the main report sends the parameter wavelength to its subreports and all the DataSources with all the info, and the last report has a conditional print detail:
$F{wavelength}.intValue()==$P{wavelength}.intValue()
Each DataSource "Bean" has wavelength as parameter and each ChX information.
When executing the application it generates 6 TruenessReports for Wavelenghts: (405,450,...,690), and 48 SubReports of each type (absorvance, reference, abs_error, rel_error).
The Report generated is the following (sorry but cannot generate one right now)
Wavelength: 405
Absorvances
Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 Ch11 Ch12
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
Reference Absorvances
Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 Ch11 Ch12
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
Absorvances Error
Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 Ch11 Ch12
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
Relative Errors
Ch1 Ch2 Ch3 Ch4 Ch5 Ch6 Ch7 Ch8 Ch9 Ch10 Ch11 Ch12
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0
Wavelength: 450
Absorvances
Reference Absorvances
Absorvances Error
Relative Errors
....
Wavelength: 690
Absorvances
Reference Absorvances
Absorvances Error
Relative Errors
So, only the first time the last 4 subreports are printed, the next ones (in my case 5 other wavelengths) it does not print anything, and there is data for its own associated wavelength.
Anyone have any idea?