wikipedia
standard manual
when calculating the SHA-1, we need a sequence of logical functions, f0, f1,…, f79,
I noticed that the function definitions in Wikipedia and the standard manual are different.
oddly, when I chose the ones in the standard manual, the SHA-1 result went wrong.
I used online sha-1 calculators and found that everyone uses the functions written in wikipedia.
Why?
Here are the truth tables for both versions of 'choose' (0..19) and 'majority' (40..59) (for 'parity' 20..39 and 60..79 both sources use xor). Please identify the rows for which the ior result is different from the xor result; those are the cases for which the two formulas produce different results.
x
y
z
x^y
¬x^z
ior
xor
0
0
0
0
0
0
0
0
0
1
0
1
1
1
0
1
0
0
0
0
0
0
1
1
0
1
1
1
1
0
0
0
0
0
0
1
0
1
0
0
0
0
1
1
0
1
0
1
1
1
1
1
1
0
1
1
x
y
z
x^y
x^z
y^z
ior
xor
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
0
0
1
1
1
1
0
0
0
0
0
0
0
1
0
1
0
1
0
1
1
1
1
0
1
0
0
1
1
1
1
1
1
1
1
1
1
Hint: there are no differences. The results are always the same, and it doesn't matter which formula you use, as long as you do it correctly you get the correct result.
In fact, on checking my copy of 180-4 this is even stated in section 4.1, immediately above the section you quoted:
... Each of the algorithms [for SHA-1, SHA-256 group, and SHA-512 group] include Ch(x, y, z)
and Maj(x, y, z) functions; the exclusive-OR operation (⊕ ) in these functions may be replaced
by a bitwise OR operation (∨) and produce identical results.
If something you did 'went wrong', it's because you did something wrong, but nobody here is psychic so we have absolutely no idea at all what you did wrong.
The order in which DMBS execute queries is:
FROM -> JOIN -> WHERE -> GROUP BY -> HAVING -> SELECT -> ORDER BY -> LIMIT
I was surprised to find out that this query works well on SQLite:
SELECT *, duration / 60 AS length_in_hours
FROM movies
WHERE length_in_hours > 10;
Why is this query working, while in some other cases it seems to fail?
OK, So I run EXPLAIN to see what's going on here.
.eqp full
PRAGMA vdbe_trace=true;
EXPLAIN
SELECT substr(name, 1, 1) AS initial
FROM names
WHERE initial = 'D';
The query results are:
addr opcode p1 p2 p3 p4 p5 comment
---- ------------- ---- ---- ---- ------------- -- -------------
0 Init 0 11 0 0 Start at 11
1 OpenRead 0 7 0 2 0 root=7 iDb=0; names
2 Rewind 0 10 0 0
3 Column 0 1 2 0 r[2]=names.name
4 Function 6 2 1 substr(3) 0 r[1]=func(r[2..4])
5 Ne 5 9 1 80 if r[1]!=r[5] goto 9
6 Column 0 1 7 0 r[7]=names.name
7 Function 6 7 6 substr(3) 0 r[6]=func(r[7..9])
8 ResultRow 6 1 0 0 output=r[6]
9 Next 0 3 0 1
10 Halt 0 0 0 0
11 Transaction 0 0 10 0 1 usesStmtJournal=0
12 Integer 1 3 0 0 r[3]=1
13 Integer 1 4 0 0 r[4]=1
14 String8 0 5 0 D 0 r[5]='D'
15 Integer 1 8 0 0 r[8]=1
16 Integer 1 9 0 0 r[9]=1
17 Goto 0 1 0 0
In addr 0, the Init opcode sends us to addr 11 which open new Transaction.
Right after that SQLite allocate the integer 1 FOUR TIMES (addr 12-13, 15-16). That's where I started to suspect SQLite may artificially duplicate the expression before the AS into the WHERE clause.
In opcodes 3-5, which happen before the SELECT (opcodes 6-8), we can see that SQLite duplicated our substr into the WHERE clause.
Parameter st(t) starting time of curfew on airport t
/11 540
22 540
33 540
44 540
55 540/;
Table arr(i,t) arrival
11 22 33 44 55
101 0 1 0 0 0
102 0 0 1 0 0
103 1 0 0 0 0
104 0 0 0 1 0
105 1 0 0 0 0
106 0 0 0 0 1
107 1 0 0 0 0;
Table dep(i,t) departure
11 22 33 44 55
101 1 0 0 0 0
102 0 1 0 0 0
103 0 0 1 0 0
104 1 0 0 0 0
105 0 0 0 1 0
106 1 0 0 0 0
107 0 0 0 0 1;
Variables
db(i) new estimated time of departure of flight i
r(i) new estimated time of arrival of flight ??
v(i) equal to 1 if flight i violates the curfew requirement
z total cost;
Binary Variables v(i);
Positive Variables db(i),r(i);
Equations
costt objective
c1(i) constraint
c2(i) constraint
c3(i,k) constraint
c4(i,i2,k) constaint
c5(i,t,tt) constraint
c6(i,t,tt) constraint
c7(i) constraint;
'''c5(i,t,tt)$((dep(i,t)=1 and arr(i,tt)=1))..v(i) =g= ifthen((db(i)>=(st(t)) or (r(i))>=(st(tt))),1,1-M);
c6(i,t,tt)$((dep(i,t)=1 and arr(i,tt)=1))..v(i) =l= ifthen((db(i)<(st(t)) or (r(i))<(st(tt))),1,1-M);'''
I don't understand why I got this error. But I know that these constraints are the cause of the error. I can not think of any other way of expressing this constraint in GAMS. Can you help me?
Constraint in the photo: https://imgur.com/DSrPxPE
You use the $( )-Condition on the equation domain of c5 to reduce the number of single equations.
When GAMS prepares the model for the solver it evaluates the $( )-condition and eliminates all (i,t,tt) combinations which fail the $( ) - condition. The problem is: your $( )-condition includes variables, whose values (in GAMS: 'levels') are determined during the optimization run of the model by a solver.
This means: GAMS cannot use the variables when it prepares the model for the solver, because their values are determined during the model run. That are the "endogenous relational operations".
One of features looks like this:
1 170,169,205,174,173,246,247,249,380,377,383,38...
2 448,104,239,277,276,99,154,155,76,412,139,333,...
3 268,422,419,124,1,17,431,343,341,435,130,331,5...
4 50,53,449,106,279,420,161,74,123,364,231,18,23...
5 170,169,205,174,173,246,247,249,380,377,383,38...
It tells us what categories the example belongs to.
How should I use it while solving classification problem?
I've tried to use dummy variables,
df=df.join(features['cat'].str.get_dummies(',').add_prefix('contains_'))
but we don't know where there are some other categories that were not mentioned in the training set, so, I do not know how to preprocess all the objects.
That's interesting. I didn't know str.get_dummies, but maybe I can help you with the rest.
You basically have two problems:
The set of categories you get later contains categories that were unknown while training the model. You have to get rid of these later.
The set of categories you get later does not contain all categories. You have to make sure, you generate dummies for them as well.
Problem 1: filtering out unknown/unwanted categories
The first problem is easy to solve:
# create a set of all categories, you want to allow
# either definie it as a fixed set, or extract it from your
# column like this (the output of the map is actually irrelevant)
# the result will be in valid_categories
valid_categories= set()
df['categories'].str.split(',').map(valid_categories.update)
# now if you want to normalize your data before you do the
# dummy encoding, you can cleanse the data by
# splitting it, creating an intersection and then joining
# it back again to get a string on which you can work with
# str.get_dummies
df['categories'].str.split(',').map(lambda l: valid_categories.intersection(l)).str.join(',')
Problem 2: generating dummies for all known categories
The second problem can be solved by just adding a dummy row, that
contains all categories e.g. with df.append just before you
call get_dummies and removing it right after get_dummies.
# e.g. you can do it like this
# get a new index value to
# be able to remove the row later
# (this only works if you have
# a numeric index)
dummy_index= df.index.max()+1
# assign the categories
#
df.loc[dummy_index]= {'id':999, 'categories': ','.join(valid_categories)}
# now do the processing steps
# mentioned in the section above
# then create the dummies
# after that remove the dummy line
# again
df.drop(labels=[dummy_index], inplace=True)
Example:
import io
raw= """id categories
1 170,169,205,174,173,246,247
2 448,104,239,277,276,99,154
3 268,422,419,124,1,17,431,343
4 50,53,449,106,279,420,161,74
5 170,169,205,174,173,246,247"""
df= pd.read_fwf(io.StringIO(raw))
valid_categories= set()
df['categories'].str.split(',').map(valid_categories.update)
# remove 154 and 170 for demonstration purposes
valid_categories.remove('170')
valid_categories.remove('154')
df['categories'].str.split(',').map(lambda l: valid_categories.intersection(l)).str.join(',').str.get_dummies(',')
Out[622]:
1 104 106 124 161 169 17 173 174 205 239 246 247 268 276 277 279 343 419 420 422 431 448 449 50 53 74 99
0 0 0 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1
2 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0
3 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 0
4 0 0 0 0 0 1 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
You can see, that there are not columns for 154 and 170.
I have a data file which contain more than 2000 lines and 45001 columns.
The first column is actually a "string" which explains the data type.
Start from column #2, up to column #45001, the data is reprsented as
"1"
or
"0"
For example, the pattern of data in a line is
(0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0)
The total number of data is 25. Within this data line, there are 5 sub-groups which are made by only the number "1"s e.g. (11 111 1111 1 111 ). The "0"s in between the subgroups are assumed as "delimiter". The total of all "1"s is = 13.
I would like to calculate the ratio of
(total of all "1"s / total of number of sub-groups made only by "1"s)
That is
(13/5).
I tried with this code for calculating the total of all "1"s ;
awk -F '0' '{print NF}' < inputfile.in
This gives value 13.
But I donn't know how to go further from here to calcuate the ratio that I want.
I don't know how to find the number of sub-groups within each line beacuse the number of occurances of "1"s and "0"s are random.
Wish to get some kind help to sort this problem.
Appreciate any help in advance.
It is not clear to me from the description what the format of the input file is. Assume the input looks like:
$ cat file
0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0
To count up the number of ones and the number of groups of ones and take their ratio:
$ awk '{f=0;s1=0;s2=0;for (i=2;i<=NF;i++){s1+=$i;if ($i && !f)s2++;f=$i}; print s1/s2}' file
2.6
Update: Handling all zeros
Suppose one of the lines in the file has all zeros:
$ cat file
0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
For the second line, both sums are zero which would lead to a divide by zero error. We can avoid that by adding an if statement which will print the ratio if one exists or 0/0 is it doesn't:
if (s2>0)print s1/s2; else print s1"/"s2
The complete code is now:
$ awk '{f=0;s1=0;s2=0;for (i=2;i<=NF;i++){s1+=$i;if ($i && !f)s2++;f=$i}; if (s2>0)print s1/s2; else print s1"/"s2}' file
2.6
0/0
How it works
The code uses three variables. f is a flag which is true (1) if we are currently in a group of ones and is false (0) otherwise. s1 is the the number of ones on the line. s2 is the number of groups of ones on the line.
f=0;s1=0;s2=0
At the beginning of each line, we initialize the variables.
for (i=2;i<=NF;i++){s1+=$i;if ($i && !f)s2++;f=$i}
We loop over each field on the line starting with field 2. If the field contains a 1, we increment counter s1. If the field is 1 and is the start of a new group, we increment s2.
if (s2>0)print s1/s2; else print s1"/"s2}
If we encountered at least one one, we print the ratio s1/s2. Otherwise, we print 0/0.
Here is an awk that does what you need:
cat file
data 0 0 0 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0
data 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
data 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
data 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BMR_10#O24-BMR_6#O13-H13 1 1 1 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1
data 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1
awk '{$1="";$0="0 "$0" 0";t=split($0,b,"1")-1;gsub(/ +/,"");n=split($0,a,"[^1]+")-2;print (n?t/n:0)}' t
2.6
0
25
11
5.5
3