Related
[51, -42, 119, -85, -64, 126, 22, 127, -72, 72, 48, -66, -18, 45, 99, -119]
This is the BytesArray that I want to print in String.
When I searched on the internet, I found that
String(Bytes, Charsets.UTF_8)
would convert it to String.
However, I get �؉���Q�t, and doesn't seem to be converted in right way.
Why is it?
I want it to be String in Alphabet characters and numbers
Firstly, you are specifying an array of signed bytes (indicated by negative numbers):
51, -42, 119, -85, -64, 126, 22, 127, -72, 72, 48, -66, -18, 45, 99, -119
Let's take a look at what this would hypothetically look like if it were unsigned (I used this tool for the conversion):
51, 214, 119, 171, 192, 126, 22, 127, 184, 72, 48, 190, 238, 45, 99, 137
Assuming by "Alphabet characters and numbers", you mean the English alphabet, then asciitable will help you identify each character's decimal value, but as a rough guide:
"0"-"9" = 48-57
"A"-"Z" = 65-90
"a"-"z" = 97-122
Consider the following code sample:
/**
* You can edit, run, and share this code.
* play.kotlinlang.org
*/
fun main() {
val bytes = byteArrayOf(51, -42, 119, -85, -64, 126, 22, 127, -72, 72, 48, -66, -18, 45, 99, -119)
val string = bytes.toString(Charsets.US_ASCII)
println(string)
}
As you can see, some of the values in the unsigned array fall outside of the range for English alphabetic characters and numbers, which is why you end up with a string, something like this "3�w��~�H0��-c�" depending on the charset you choose.
For reference:
Charset
Result
Charsets.US_ASCII
3�w��~�H0��-c�
Charsets.UTF_8
3�w��~�H0��-c�
Charsets.UTF_16
㏖瞫쁾ᙿ롈ゾ掉
Charsets.UTF_32
����
Charsets.ISO_8859_1
3Öw«À~¸H0¾î-c
So, it really depends on exactly which encoding the array is using, and exactly what it is you're expecting the resulting string to be.
You can play with the code above, here.
In my code, I can filter a column from exact texts, and it works without problems. However, it is necessary to filter another column with the beginning of a sentence.
The phrases in this column are:
A_2020.092222
A_2020.090787
B_2020.983898
B_2020.209308
So, I need to receive everything that starts with A_20 and B_20.
Thanks in advance
My code:
from bs4 import BeautifulSoup
import pandas as pd
import zipfile, urllib.request, shutil, time, csv, datetime, os, sys, os.path
#location
dt = datetime.datetime.now()
file_csv = "/home/Downloads/source.CSV"
file_csv_new = "/var/www/html/Data/Test.csv"
#open CSV
with open(file_csv, 'r', encoding='CP1251') as file:
reader = csv.reader(file, delimiter=';')
data = list(reader)
#list to dataframe
df = pd.DataFrame(data)
#filter UF
df = df.loc[df[9].isin(['PR','SC','RS'])]
#filter key
# A_ & B_
df = df.loc[df[35].isin(['A_20','B_20'])]
#print (df)
#Empty DataFrame
#Columns: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
#Index: []
#[0 rows x 119 columns]```
Give the following a try:
lst1 = ['A_2020.092222', 'A_2020.090787 ', 'B_2020.983898', 'B_2020.209308', 'C_2020.209308', 'D_2020.209308']
df = pd.DataFrame(lst1, columns =['Name'])
df.loc[df.Name.str.startswith(('A_20','B_20'))]
System.out.print("What would you like to decode? ");
String fileName1 = console.next();
System.out.print("Save the results as: ");
resultFileName = console.next();
int token1;
Scanner inFile1 = new Scanner(new
File(fileName1)).useDelimiter("[" + "," + " ]");
List<Integer> temps = new ArrayList<Integer>();
while (inFile1.hasNext()) {
token1 = inFile1.nextInt();
temps.add(token1);
}
for(int i = 0; i <= temps.size() - 1; i++) {
int x = temps.get(i);
System.out.print((char) x);
}
when i run this it says there is a error on "token1 = inFile1.nextInt();"
this is what is in the file: [89, 111, 117, 39, 114, 101, 32, 97, 108, 109, 111, 115, 116, 32, 116, 104, 101, 114, 101, 46, 32, 75, 101, 101, 112, 32, 117, 112, 32, 116, 104, 101, 32, 103, 111, 111, 100, 32, 119, 111, 114, 107, 33]
Use the inFile1.hasNextInt() opposed to the inFile1.hasNext(). This way you can make sure the next value can be interpreted as an int.
I think the inFile1.hasNext() is resolving as true because you still have the character ] left in the file.
The useDelimiter is only used to seperate your ints. So you have to manually parse the [ and ] and call Scanner inFile1 = new Scanner(new
File(fileName1)).useDelimiter(","); for parsing the ints.
Edit: for parsing the '[' for example you can use:
Pattern p = Pattern.compile("[");
scanner.next(p);
For the following function:
func CycleClock(c *ballclock.Clock) int {
for i := 0; i < fiveMinutesPerDay; i++ {
c.TickFive()
}
return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}
where c.BallQueue is defined as []int and CalculateBallCycle is defined as func CalculateBallCycle(s []int) int. I am having a huge performance decrease between the for loop and the return statement.
I wrote the following benchmarks to test. The first benchmarks the entire function, the second benchmarks the for loop, while the third benchmarks the CalculateBallCycle function:
func BenchmarkCycleClock(b *testing.B) {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
j := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(j)
CycleClock(c)
}
})
}
}
func BenchmarkCycle24(b *testing.B) {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
j := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(j)
for k := 0; k < fiveMinutesPerDay; k++ {
c.TickFive()
}
}
})
}
}
func BenchmarkCalculateBallCycle123(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n < b.N; n++ {
CalculateBallCycle(m)
}
}
Using 123 balls, this gives the following result:
BenchmarkCycleClock/BallCount=123-8 200 9254136 ns/op
BenchmarkCycle24/BallCount=123-8 200000 7610 ns/op
BenchmarkCalculateBallCycle123-8 3000000 456 ns/op
Looking at this, there is a huge disparity between benchmarks. I would expect that the first benchmark would take roughly ~8000 ns/op since that would be the sum of the parts.
Here is the github repository.
EDIT:
I discovered that the result from the benchmark and the result from the running program are widely different. I took what #yazgazan found and modified the benchmark function in main.go mimic somewhat the BenchmarkCalculateBallCycle123 from main_test.go:
func Benchmark() {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
if i != 123 {
continue
}
start := time.Now()
t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})
duration := time.Since(start)
fmt.Printf("Ballclock with %v balls took %s;\n", i, duration)
}
}
This gave the output of:
Ballclock with 123 balls took 11.86748ms;
As you can see, the total time was 11.86 ms, all of which was spent in the CalculateBallCycle function. What would cause the benchmark to run in 456 ns/op while the running program runs in around 11867480 ms/op?
You wrote that CalcualteBallCycle() modifies the slice by design.
I can't speak to correctness of that approach, but it is why benchmark time of BenchmarkCalculateBallCycle123 is so different.
On first run it does the expected thing but on subsequent runs it does something completely different, because you're passing different data as input.
Benchmark this modified code:
func BenchmarkCalculateBallCycle123v2(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n < b.N; n++ {
tmp := append([]int{}, m...)
CalculateBallCycle(tmp)
}
}
This works-around this behavior by making a copy of m, so that CalculateBallCycle modifies a local copy.
The running time becomes more like the others:
BenchmarkCalculateBallCycle123-8 3000000 500 ns/op
BenchmarkCalculateBallCycle123v2-8 100 10483347 ns/op
In your CycleClock function, you are copying the c.BallQueue slice. You can improve performance significantly by using CalculateBallCycle(c.BallQueue) instead (assuming CalculateBallCycle doesn't modify the slice)
For example:
func Sum(values []int) int {
sum := 0
for _, v := range values {
sum += v
}
return sum
}
func BenchmarkNoCopy(b *testing.B) {
for n := 0; n < b.N; n++ {
Sum(m)
}
}
func BenchmarkWithCopy(b *testing.B) {
for n := 0; n < b.N; n++ {
Sum(append([]int{}, m...))
}
}
// BenchmarkNoCopy-4 20000000 73.5 ns/op
// BenchmarkWithCopy-4 5000000 306 ns/op
// PASS
There is a subtle bug in your tests.
Both methods BenchmarkCycleClock and BenchmarkCycle24 run the benchmark in a for loop, passing a closure to b.Run. Inside of those closures you initialize the clocks using the loop variable i like this:ballclock.NewClock(i).
The problem is, that all instances of your anonymous function share the same variable. And, by the time the function is run by the test runner, the loop will be finished, and all of the clocks will be initialized using the same value: ballclock.MaxBalls.
You can fix this using a local variable:
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
i := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(i)
CycleClock(c)
}
})
}
The line i := i stores a copy of the current value of i (different for each instance of your anonymous function).
I need help with extracting all the lines from the file that has minimum number in the last column, i.e 7 in in this case.
The sample file is as below:
File-1.txt
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Here, I want to extract all the lines that have 7, which is the least value (minimum value) in the last column and save the output into another file File-2.txt by only extracting the values enclosed in [], as shown below.
File-2.txt
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
I could use awk to get the least value as "7" from the last column using the code as below:
awk 'BEGIN{getline;min=max=$NF}
NF{
max=(max>$NF)?max:$NF
min=(min>$NF)?$NF:min
}
END{print min,max}' File-1.txt
and to print only the values in square brackets [] buy using the awk code as below:
awk 'NR > 1 {print $1}' RS='[' FS=']' File-1.txt
but, I am stuck in assigning the least value obtained from first awk script, i.e. 7 in this case to extract the corresponding numbers enclosed in [], as shown in File-2.txt.
Any help in resolving this problem will be appreciated.
#Asha:#try:
awk '{Q=$NF;gsub(/.*\[|\]/,"");$NF="";A[Q]=A[Q]?A[Q] ORS $0:$0;MIN=MIN<Q?(MIN?MIN:Q):Q} END{print A[MIN]}' Input_file
Will add description shortly too.
EDIT: Following is the description on same too.
awk '{
Q=$NF; ##### Making last field of Input_file as NULL.
gsub(/.*\[|\]/,""); ##### Using global substitution functionality of awk to remove everything till [ and then remove ] from the line as per your required output.
$NF=""; ##### Nullifying the last column of each line as you don't need them in your output.
A[Q]=A[Q]?A[Q] ORS $0:$0; ##### creating an array named A whose index is Q variable(whose value is already assigned previously to last column), creating array A with index Q and concatenating it's value in itself.
MIN=MIN<Q?(MIN?MIN:Q):Q} ##### Creating a variable named MIN(to get the minimum last value of each line) and comparing it's value to each line's last field and keeping the minimum value in it as per requirement.
END{print A[MIN]} ##### In end block of code printing the value of array A whose index is variable MIN to print all the lines whose index is variable named MIN.
' Input_file ##### Mentioning the Input_file here.
Reading same file twice, instead of using array practically bit slower, as we read file 2 times, but zero memory overhead.
awk -F'[][]' 'FNR==NR{if(min > $NF || min==""){ min=$NF} next }
$NF==min{ print $2 }' file file
Explanation
awk -F'[][]' 'FNR==NR{ # This block we read file
# and will find whats minimum
if(min > $NF || min==""){
min=$NF # NF gives no of fields, assign the value of $NF to variable min
}
next
}
$NF==min{ # Here we read file 2nd time, if last field value is equal to minimum
print $2
}' file file
Input
$ cat file
VALID_PATH : [102, 80, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 81, 112, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 109, 23, 125, 111] 7
VALID_PATH : [102, 112, 37, 56, 23, 125, 111] 7
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 111] 8
VALID_PATH : [102, 80, 112, 109, 23, 125, 110, 111] 8
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 127, 88, 112, 109, 23, 125, 111] 9
VALID_PATH : [102, 80, 112, 37, 109, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 112, 37, 56, 23, 125, 110, 111] 9
VALID_PATH : [102, 80, 127, 6, 112, 37, 109, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 37, 56, 23, 125, 111] 10
VALID_PATH : [102, 80, 127, 6, 112, 109, 23, 125, 110, 111] 10
Output
$ awk -F'[][]' 'FNR==NR{ if(min > $NF || min==""){ min=$NF } next }
$NF==min{ print $2 }' file file
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
Using sort as a helper to get a neat code:
$ sort -t\] -nk 2 your_file |awk '$NF!=L && L{exit}{L=$NF;print $2}' FS='[][]'
102, 112, 37, 109, 23, 125, 111
102, 112, 37, 56, 23, 125, 111
102, 80, 112, 109, 23, 125, 111
102, 81, 112, 109, 23, 125, 111
read once (ex: for streaming/piped info) with minimum memory use
awk -F'[][]' '
# init counter
NR == 1 { m = $3 + 1 }
# add or replace content into the buffer if counter is lower or equal
$3 <= m { b = ( $3 == m ? b "\n" : "" ) $2; m = $3 }
# at the end, print buffer
END { print b }
' YourFile
$ awk -F'[][]' -vmin=99999 '$NF<=min{min=$NF;print $2}'
-F'[][]' set FS to regexp [][] which mean "or [ or ]", i.e. your input string will be splited in 3 field.
-vmin=99999 set variable min to 99999. In this variable will be stored minum value of last field
$NF <= min {min = $NF; print $2} if current last field less or equal then stored in variable min,
then update min, and output what we need.