Related
I have a LSTM model I am using to predict the unemployment rate from federal reserve filings. It uses glove vectors and vocab2index embedding and the training went as planned. However, upon attempting to feed a word embedding into the model for prediction testing it keeps throwing various errors.
Here is the model:
def load_glove_vectors(glove_file= glove_embedding_vectors_text_file):
"""Load the glove word vectors"""
word_vectors = {}
with open(glove_file) as f:
for line in f:
split = line.split()
word_vectors[split[0]] = np.array([float(x) for x in split[1:]])
return word_vectors
def get_emb_matrix(pretrained, word_counts, emb_size = 300):
""" Creates embedding matrix from word vectors"""
vocab_size = len(word_counts) + 2
vocab_to_idx = {}
vocab = ["", "UNK"]
W = np.zeros((vocab_size, emb_size), dtype="float32")
W[0] = np.zeros(emb_size, dtype='float32') # adding a vector for padding
W[1] = np.random.uniform(-0.25, 0.25, emb_size) # adding a vector for unknown words
vocab_to_idx["UNK"] = 1
i = 2
for word in word_counts:
if word in word_vecs:
W[i] = word_vecs[word]
else:
W[i] = np.random.uniform(-0.25,0.25, emb_size)
vocab_to_idx[word] = i
vocab.append(word)
i += 1
return W, np.array(vocab), vocab_to_idx
word_vecs = load_glove_vectors()
pretrained_weights, vocab, vocab2index = get_emb_matrix(word_vecs, counts)
Unfortunately when I feed this array
[array([ 3, 10, 6287, 6, 113, 271, 3, 6639, 104, 5105, 7525,
104, 7526, 9, 23, 9, 10, 11, 24, 7527, 7528, 104,
11, 24, 7529, 7530, 104, 11, 24, 7531, 7530, 104, 11,
24, 7532, 7530, 104, 11, 24, 7533, 7534, 24, 7535, 7536,
104, 7537, 104, 7538, 7539, 7540, 6643, 7541, 7354, 7542, 7543,
7544, 9, 23, 9, 10, 11, 24, 25, 8, 10, 11,
24, 3, 10, 663, 168, 9, 10, 290, 291, 3, 4909,
198, 10, 1478, 169, 15, 4621, 3, 3244, 3, 59, 1967,
113, 59, 520, 198, 25, 5105, 7545, 7546, 7547, 7546, 7548,
7549, 7550, 1874, 10, 7551, 9, 10, 11, 24, 7552, 6287,
7553, 7554, 7555, 24, 7556, 24, 7557, 7558, 7559, 6, 7560,
323, 169, 10, 7561, 1432, 6, 3134, 3, 7562, 6, 7563,
1862, 7144, 741, 3, 3961, 7564, 7565, 520, 7566, 4833, 7567,
7568, 4901, 7569, 7570, 4901, 7571, 1874, 7572, 12, 13, 7573,
10, 7574, 7575, 59, 7576, 59, 638, 1620, 7577, 271, 6488,
59, 7578, 7579, 7580, 7581, 271, 7582, 7583, 24, 669, 5932,
7584, 9, 113, 271, 3764, 3, 5930, 3, 59, 4901, 7585,
793, 7586, 7587, 6, 1482, 520, 7588, 520, 7589, 3246, 7590,
13, 7591])
into torch.LongTensor() I keep getting the following error:
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
Any ideas on how to remedy? I am fairly new to AI in general, and I am an economist by trade so I am almost certain I have made a boneheaded error.
In Kotlin, when I build a multiline string like this:
value expected = """
|digraph Test {
|${'\t'}Empty1;
|${'\t'}Empty2;
|}
|""".trimMargin()
I see that the string lacks carriage return characters (ASCII code 13) when I output it via:
println("Expected bytes")
println(expected.toByteArray().contentToString())
Output:
Expected bytes
[100, 105, 103, 114, 97, 112, 104, 32, 84, 101, 115, 116, 32, 123, 10, 9, 69, 109, 112, 116, 121, 49, 59, 10, 9, 69, 109, 112, 116, 121, 50, 59, 10, 125, 10]
When some code I'm trying to unit test builds the same String via a PrintWriter it delineates lines via the lineSeparator property:
/*
* Line separator string. This is the value of the line.separator
* property at the moment that the stream was created.
*/
So I end up with a string which looks the same in output, but is composed of different bytes and thus is not equal:
Actual bytes
[100, 105, 103, 114, 97, 112, 104, 32, 84, 101, 115, 116, 32, 123, 13, 10, 9, 69, 109, 112, 116, 121, 49, 59, 13, 10, 9, 69, 109, 112, 116, 121, 50, 59, 13, 10, 125, 13, 10]
Is there a better way to address this during string declaration than splitting my multiline string into concatenated stringlets which can each be suffixed with char(13)?
Alternately, I'd like to do something like:
value expected = """
|digraph Test {
|${'\t'}Empty1;
|${'\t'}Empty2;
|}
|""".trimMargin().useLineSeparator(System.getProperty("line.separator"))
or .replaceAll() or such.
Does any standard method exist, or should I add my own extension function to String?
This did the trick.
System.lineSeparator()
Kotlin multiline strings are always compiled into string literals which use \n as the line separator. If you need to have the platform-dependent line separator, you can do replace("\n", System.getProperty("line.separator")).
As of Kotlin 1.2, there is no standard library method for this, so you should define your own extension function if you're using this frequently.
For the following function:
func CycleClock(c *ballclock.Clock) int {
for i := 0; i < fiveMinutesPerDay; i++ {
c.TickFive()
}
return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}
where c.BallQueue is defined as []int and CalculateBallCycle is defined as func CalculateBallCycle(s []int) int. I am having a huge performance decrease between the for loop and the return statement.
I wrote the following benchmarks to test. The first benchmarks the entire function, the second benchmarks the for loop, while the third benchmarks the CalculateBallCycle function:
func BenchmarkCycleClock(b *testing.B) {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
j := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(j)
CycleClock(c)
}
})
}
}
func BenchmarkCycle24(b *testing.B) {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
j := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(j)
for k := 0; k < fiveMinutesPerDay; k++ {
c.TickFive()
}
}
})
}
}
func BenchmarkCalculateBallCycle123(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n < b.N; n++ {
CalculateBallCycle(m)
}
}
Using 123 balls, this gives the following result:
BenchmarkCycleClock/BallCount=123-8 200 9254136 ns/op
BenchmarkCycle24/BallCount=123-8 200000 7610 ns/op
BenchmarkCalculateBallCycle123-8 3000000 456 ns/op
Looking at this, there is a huge disparity between benchmarks. I would expect that the first benchmark would take roughly ~8000 ns/op since that would be the sum of the parts.
Here is the github repository.
EDIT:
I discovered that the result from the benchmark and the result from the running program are widely different. I took what #yazgazan found and modified the benchmark function in main.go mimic somewhat the BenchmarkCalculateBallCycle123 from main_test.go:
func Benchmark() {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
if i != 123 {
continue
}
start := time.Now()
t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})
duration := time.Since(start)
fmt.Printf("Ballclock with %v balls took %s;\n", i, duration)
}
}
This gave the output of:
Ballclock with 123 balls took 11.86748ms;
As you can see, the total time was 11.86 ms, all of which was spent in the CalculateBallCycle function. What would cause the benchmark to run in 456 ns/op while the running program runs in around 11867480 ms/op?
You wrote that CalcualteBallCycle() modifies the slice by design.
I can't speak to correctness of that approach, but it is why benchmark time of BenchmarkCalculateBallCycle123 is so different.
On first run it does the expected thing but on subsequent runs it does something completely different, because you're passing different data as input.
Benchmark this modified code:
func BenchmarkCalculateBallCycle123v2(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n < b.N; n++ {
tmp := append([]int{}, m...)
CalculateBallCycle(tmp)
}
}
This works-around this behavior by making a copy of m, so that CalculateBallCycle modifies a local copy.
The running time becomes more like the others:
BenchmarkCalculateBallCycle123-8 3000000 500 ns/op
BenchmarkCalculateBallCycle123v2-8 100 10483347 ns/op
In your CycleClock function, you are copying the c.BallQueue slice. You can improve performance significantly by using CalculateBallCycle(c.BallQueue) instead (assuming CalculateBallCycle doesn't modify the slice)
For example:
func Sum(values []int) int {
sum := 0
for _, v := range values {
sum += v
}
return sum
}
func BenchmarkNoCopy(b *testing.B) {
for n := 0; n < b.N; n++ {
Sum(m)
}
}
func BenchmarkWithCopy(b *testing.B) {
for n := 0; n < b.N; n++ {
Sum(append([]int{}, m...))
}
}
// BenchmarkNoCopy-4 20000000 73.5 ns/op
// BenchmarkWithCopy-4 5000000 306 ns/op
// PASS
There is a subtle bug in your tests.
Both methods BenchmarkCycleClock and BenchmarkCycle24 run the benchmark in a for loop, passing a closure to b.Run. Inside of those closures you initialize the clocks using the loop variable i like this:ballclock.NewClock(i).
The problem is, that all instances of your anonymous function share the same variable. And, by the time the function is run by the test runner, the loop will be finished, and all of the clocks will be initialized using the same value: ballclock.MaxBalls.
You can fix this using a local variable:
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
i := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(i)
CycleClock(c)
}
})
}
The line i := i stores a copy of the current value of i (different for each instance of your anonymous function).
I need help with extracting all the lines from the file that has minimum number in the last column, i.e 7 in in this case.
The sample file is as below:
File-1.txt
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Here, I want to extract all the lines that have 7, which is the least value (minimum value) in the last column and save the output into another file File-2.txt by only extracting the values enclosed in [], as shown below.
File-2.txt
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
I could use awk to get the least value as "7" from the last column using the code as below:
awk 'BEGIN{getline;min=max=$NF}
NF{
max=(max>$NF)?max:$NF
min=(min>$NF)?$NF:min
}
END{print min,max}' File-1.txt
and to print only the values in square brackets [] buy using the awk code as below:
awk 'NR > 1 {print $1}' RS='[' FS=']' File-1.txt
but, I am stuck in assigning the least value obtained from first awk script, i.e. 7 in this case to extract the corresponding numbers enclosed in [], as shown in File-2.txt.
Any help in resolving this problem will be appreciated.
#Asha:#try:
awk '{Q=$NF;gsub(/.*\[|\]/,"");$NF="";A[Q]=A[Q]?A[Q] ORS $0:$0;MIN=MIN<Q?(MIN?MIN:Q):Q} END{print A[MIN]}' Input_file
Will add description shortly too.
EDIT: Following is the description on same too.
awk '{
Q=$NF; ##### Making last field of Input_file as NULL.
gsub(/.*\[|\]/,""); ##### Using global substitution functionality of awk to remove everything till [ and then remove ] from the line as per your required output.
$NF=""; ##### Nullifying the last column of each line as you don't need them in your output.
A[Q]=A[Q]?A[Q] ORS $0:$0; ##### creating an array named A whose index is Q variable(whose value is already assigned previously to last column), creating array A with index Q and concatenating it's value in itself.
MIN=MIN<Q?(MIN?MIN:Q):Q} ##### Creating a variable named MIN(to get the minimum last value of each line) and comparing it's value to each line's last field and keeping the minimum value in it as per requirement.
END{print A[MIN]} ##### In end block of code printing the value of array A whose index is variable MIN to print all the lines whose index is variable named MIN.
' Input_file ##### Mentioning the Input_file here.
Reading same file twice, instead of using array practically bit slower, as we read file 2 times, but zero memory overhead.
awk -F'[][]' 'FNR==NR{if(min > $NF || min==""){ min=$NF} next }
$NF==min{ print $2 }' file file
Explanation
awk -F'[][]' 'FNR==NR{ # This block we read file
# and will find whats minimum
if(min > $NF || min==""){
min=$NF # NF gives no of fields, assign the value of $NF to variable min
}
next
}
$NF==min{ # Here we read file 2nd time, if last field value is equal to minimum
print $2
}' file file
Input
$ cat file
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Output
$ awk -F'[][]' 'FNR==NR{ if(min > $NF || min==""){ min=$NF } next }
$NF==min{ print $2 }' file file
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
Using sort as a helper to get a neat code:
$ sort -t\] -nk 2 your_file |awk '$NF!=L && L{exit}{L=$NF;print $2}' FS='[][]'
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
read once (ex: for streaming/piped info) with minimum memory use
awk -F'[][]' '
# init counter
NR == 1 { m = $3 + 1 }
# add or replace content into the buffer if counter is lower or equal
$3 <= m { b = ( $3 == m ? b "\n" : "" ) $2; m = $3 }
# at the end, print buffer
END { print b }
' YourFile
$ awk -F'[][]' -vmin=99999 '$NF<=min{min=$NF;print $2}'
-F'[][]' set FS to regexp [][] which mean "or [ or ]", i.e. your input string will be splited in 3 field.
-vmin=99999 set variable min to 99999. In this variable will be stored minum value of last field
$NF <= min {min = $NF; print $2} if current last field less or equal then stored in variable min,
then update min, and output what we need.
I am trying to write Swift implementation of the following ObjC(header file) code.
#include <stddef.h>
#ifndef VO_CERTIFICATE_TYPE
#define VO_CERTIFICATE_TYPE
typedef struct _voCertificate
{
const char* bytes;
size_t length;
}
voCertificate;
#endif
static const char myCertificate_BYTES[] =
{
103, 92, -99, 33, 72, 48, 119, -72,
-77, 75, -88, 81, 113, -46, -119, -119,
5, 42, -33, 94, 23, 3, -112, 34,
-63, 75, -77, 26, -41, -69, 50, 71,
19, 121, 109, -60, 40, 18, 46, -86,
..........
};
voCertificate const myCertificate =
{
myCertificate_BYTES,
sizeof(myCertificate_BYTES)
};
//////////////////////////////////////
NSData *certificate = [NSData dataWithBytes:myCertificate.bytes length:myCertificate.length];
My best assumption was:
let myCertificate = [
103, 92, -99, 33, 72, 48, 119, -72,
-77, 75, -88, 81, 113, -46, -119, -119,
5, 42, -33, 94, 23, 3, -112, 34,
-63, 75, -77, 26, -41, -69, 50, 71,
19, 121, 109, -60, 40, 18, 46, -86,
........................]
var certificate = NSData(bytes: myCertificate as [Byte], length: myCertificate.count)
I tried to reach ObjC variable through Bridging-Header too, but there was "Undefined symbols for architecture armv7" error.
I would really appreciate any help.
Your biggest problem is that the type of your myCertificate array is Int not Int8. Here is something that is working for me. Note I reconstructed the array from the NSData object to see if everything came out ok.
let myCertificate = Array<Int8>(arrayLiteral:
103, 92, -99, 33, 72, 48, 119, -72,
-77, 75, -88, 81, 113, -46, -119, -119,
5, 42, -33, 94, 23, 3, -112, 34,
-63, 75, -77, 26, -41, -69, 50, 71,
19, 121, 109, -60, 40, 18, 46, -86)
var certificate = NSData(bytes: myCertificate, length: myCertificate.count)
var buffer = [Int8](count: certificate.length, repeatedValue: 0)
certificate.getBytes(&buffer, length: certificate.length)