Related
I am currently trying to INSERT INTO my SQL database a row of 144 columns.
The problem is that the last 10 values of the new row are NULL while they are supposed to be float and int.
That's an example of what I have in my DB after the INSERT INTO :
First column
Before last column
Last column
1
NULL
NULL
That's the SQL request I am using
INSERT INTO "historic_data2"
VALUES (28438, 163, 156, 1, 'FIST 2', 91, 81, 82, 84, 90, 6, '2 Pts Int M', 'Offensive', 0, '91_81_82_84_90', 86, 85, 0, 36, 62, 24, 0, 132, 86, 0, 83, 0, 0, 0, 0, 42, 77, 24, 0, 173, 107, 0, 204, 0, 0, 0, 0, 42, 77, 24, 0, 173, 107, 0, 204, 0, 0, 0, 81, 62, 34, 23, 19, 45, 32, 18, 9, 19, 0.5555555555555556, 0.5161290322580645, 0.5294117647058824, 0.391304347826087, 1.0, 82, 54, 34, 18, 28, 49, 27, 17, 8, 28, 0.5975609756097561, 0.5, 0.5, 0.4444444444444444, 1.0, 302, 233, 132, 89, 69, 168, 116, 69, 35, 69, 0.5562913907284768, 0.4978540772532189, 0.5227272727272727, 0.39325842696629215, 1.0, 214, 161, 84, 73, 53, 119, 79, 39, 36, 53, 0.5560747663551402, 0.4906832298136646, 0.4642857142857143, 0.4931506849315068, 1.0, 717, 544, 298, 233, 173, 416, 285, 175, 97, 173, 0.5801952580195258, 0.5238970588235294, 0.587248322147651, 0.41630901287553645, 1.0, 466, 315, 183, 128, 151, 357, 233, 138, 91, 151, 0.7660944206008584, 0.7396825396825397, 0.7540983606557377, 0.7109375,1.0,112)
I can't figure out how to solve this issue. My guess would be that there is a hard limit on how much column you can insert at once but I don't know how to solve that.
Thank you in advance for your help
In my code, I can filter a column from exact texts, and it works without problems. However, it is necessary to filter another column with the beginning of a sentence.
The phrases in this column are:
A_2020.092222
A_2020.090787
B_2020.983898
B_2020.209308
So, I need to receive everything that starts with A_20 and B_20.
Thanks in advance
My code:
from bs4 import BeautifulSoup
import pandas as pd
import zipfile, urllib.request, shutil, time, csv, datetime, os, sys, os.path
#location
dt = datetime.datetime.now()
file_csv = "/home/Downloads/source.CSV"
file_csv_new = "/var/www/html/Data/Test.csv"
#open CSV
with open(file_csv, 'r', encoding='CP1251') as file:
reader = csv.reader(file, delimiter=';')
data = list(reader)
#list to dataframe
df = pd.DataFrame(data)
#filter UF
df = df.loc[df[9].isin(['PR','SC','RS'])]
#filter key
# A_ & B_
df = df.loc[df[35].isin(['A_20','B_20'])]
#print (df)
#Empty DataFrame
#Columns: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...]
#Index: []
#[0 rows x 119 columns]```
Give the following a try:
lst1 = ['A_2020.092222', 'A_2020.090787 ', 'B_2020.983898', 'B_2020.209308', 'C_2020.209308', 'D_2020.209308']
df = pd.DataFrame(lst1, columns =['Name'])
df.loc[df.Name.str.startswith(('A_20','B_20'))]
For the following function:
func CycleClock(c *ballclock.Clock) int {
for i := 0; i < fiveMinutesPerDay; i++ {
c.TickFive()
}
return 1 + CalculateBallCycle(append([]int{}, c.BallQueue...))
}
where c.BallQueue is defined as []int and CalculateBallCycle is defined as func CalculateBallCycle(s []int) int. I am having a huge performance decrease between the for loop and the return statement.
I wrote the following benchmarks to test. The first benchmarks the entire function, the second benchmarks the for loop, while the third benchmarks the CalculateBallCycle function:
func BenchmarkCycleClock(b *testing.B) {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
j := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(j)
CycleClock(c)
}
})
}
}
func BenchmarkCycle24(b *testing.B) {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
j := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(j)
for k := 0; k < fiveMinutesPerDay; k++ {
c.TickFive()
}
}
})
}
}
func BenchmarkCalculateBallCycle123(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n < b.N; n++ {
CalculateBallCycle(m)
}
}
Using 123 balls, this gives the following result:
BenchmarkCycleClock/BallCount=123-8 200 9254136 ns/op
BenchmarkCycle24/BallCount=123-8 200000 7610 ns/op
BenchmarkCalculateBallCycle123-8 3000000 456 ns/op
Looking at this, there is a huge disparity between benchmarks. I would expect that the first benchmark would take roughly ~8000 ns/op since that would be the sum of the parts.
Here is the github repository.
EDIT:
I discovered that the result from the benchmark and the result from the running program are widely different. I took what #yazgazan found and modified the benchmark function in main.go mimic somewhat the BenchmarkCalculateBallCycle123 from main_test.go:
func Benchmark() {
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
if i != 123 {
continue
}
start := time.Now()
t := CalculateBallCycle([]int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16})
duration := time.Since(start)
fmt.Printf("Ballclock with %v balls took %s;\n", i, duration)
}
}
This gave the output of:
Ballclock with 123 balls took 11.86748ms;
As you can see, the total time was 11.86 ms, all of which was spent in the CalculateBallCycle function. What would cause the benchmark to run in 456 ns/op while the running program runs in around 11867480 ms/op?
You wrote that CalcualteBallCycle() modifies the slice by design.
I can't speak to correctness of that approach, but it is why benchmark time of BenchmarkCalculateBallCycle123 is so different.
On first run it does the expected thing but on subsequent runs it does something completely different, because you're passing different data as input.
Benchmark this modified code:
func BenchmarkCalculateBallCycle123v2(b *testing.B) {
m := []int{8, 62, 42, 87, 108, 35, 17, 6, 22, 75, 116, 112, 39, 119, 52, 60, 30, 88, 56, 36, 38, 26, 51, 31, 55, 120, 33, 99, 111, 24, 45, 21, 23, 34, 43, 41, 67, 65, 66, 85, 82, 89, 9, 25, 109, 47, 40, 0, 83, 46, 73, 13, 12, 63, 15, 90, 121, 2, 69, 53, 28, 72, 97, 3, 4, 94, 106, 61, 96, 18, 80, 74, 44, 84, 107, 98, 93, 103, 5, 91, 32, 76, 20, 68, 81, 95, 29, 27, 86, 104, 7, 64, 113, 78, 105, 58, 118, 117, 50, 70, 10, 101, 110, 19, 1, 115, 102, 71, 79, 57, 77, 122, 48, 114, 54, 37, 59, 49, 100, 11, 14, 92, 16}
for n := 0; n < b.N; n++ {
tmp := append([]int{}, m...)
CalculateBallCycle(tmp)
}
}
This works-around this behavior by making a copy of m, so that CalculateBallCycle modifies a local copy.
The running time becomes more like the others:
BenchmarkCalculateBallCycle123-8 3000000 500 ns/op
BenchmarkCalculateBallCycle123v2-8 100 10483347 ns/op
In your CycleClock function, you are copying the c.BallQueue slice. You can improve performance significantly by using CalculateBallCycle(c.BallQueue) instead (assuming CalculateBallCycle doesn't modify the slice)
For example:
func Sum(values []int) int {
sum := 0
for _, v := range values {
sum += v
}
return sum
}
func BenchmarkNoCopy(b *testing.B) {
for n := 0; n < b.N; n++ {
Sum(m)
}
}
func BenchmarkWithCopy(b *testing.B) {
for n := 0; n < b.N; n++ {
Sum(append([]int{}, m...))
}
}
// BenchmarkNoCopy-4 20000000 73.5 ns/op
// BenchmarkWithCopy-4 5000000 306 ns/op
// PASS
There is a subtle bug in your tests.
Both methods BenchmarkCycleClock and BenchmarkCycle24 run the benchmark in a for loop, passing a closure to b.Run. Inside of those closures you initialize the clocks using the loop variable i like this:ballclock.NewClock(i).
The problem is, that all instances of your anonymous function share the same variable. And, by the time the function is run by the test runner, the loop will be finished, and all of the clocks will be initialized using the same value: ballclock.MaxBalls.
You can fix this using a local variable:
for i := ballclock.MinBalls; i <= ballclock.MaxBalls; i++ {
i := i
b.Run("BallCount="+strconv.Itoa(i), func(b *testing.B) {
for n := 0; n < b.N; n++ {
c, _ := ballclock.NewClock(i)
CycleClock(c)
}
})
}
The line i := i stores a copy of the current value of i (different for each instance of your anonymous function).
I have ECC public and private generated with BouncyCastle:
Security.addProvider(new org.bouncycastle.jce.provider.BouncyCastleProvider());
ECNamedCurveParameterSpec ecSpec = ECNamedCurveTable
.getParameterSpec("secp192r1");
KeyPairGenerator g = KeyPairGenerator.getInstance("ECDSA", "BC");
g.initialize(ecSpec, new SecureRandom());
KeyPair pair = g.generateKeyPair();
System.out.println(Arrays.toString(pair.getPrivate().getEncoded()));
System.out.println(Arrays.toString(pair.getPublic().getEncoded()));
byte[] privateKey = new byte[]{48, 123, 2, 1, 0, 48, 19, 6, 7, 42, -122, 72, -50, 61, 2, 1, 6, 8, 42, -122, 72, -50, 61, 3, 1, 1, 4, 97, 48, 95, 2, 1, 1, 4, 24, 14, 117, 7, -120, 15, 109, -59, -35, 72, -91, 99, -2, 51, -120, 112, -47, -1, -115, 25, 48, -104, -93, 78, -7, -96, 10, 6, 8, 42, -122, 72, -50, 61, 3, 1, 1, -95, 52, 3, 50, 0, 4, 64, 48, -104, 32, 41, 13, 1, -75, -12, -51, -24, -13, 56, 75, 19, 74, -13, 75, -82, 35, 1, -50, -93, -115, -115, -34, -81, 119, -109, -50, -39, -57, -20, -67, 65, -50, 66, -122, 96, 84, 117, -49, -101, 54, -30, 77, -110, -122}
byte[] publicKey = new byte[]{48, 73, 48, 19, 6, 7, 42, -122, 72, -50, 61, 2, 1, 6, 8, 42, -122, 72, -50, 61, 3, 1, 1, 3, 50, 0, 4, 64, 48, -104, 32, 41, 13, 1, -75, -12, -51, -24, -13, 56, 75, 19, 74, -13, 75, -82, 35, 1, -50, -93, -115, -115, -34, -81, 119, -109, -50, -39, -57, -20, -67, 65, -50, 66, -122, 96, 84, 117, -49, -101, 54, -30, 77, -110, -122}
How to convert them into traditional format which can be reused later in https://github.com/kmackay/micro-ecc/blob/master/uECC.h? I need 24 bytes private and 48 public key while now it is 125 and 75.
Gives 24 and 48, sometimes when 0 is added at the beginning 25 or 49:
ECPrivateKey ecPrivateKey = (ECPrivateKey)privateKey;
System.out.println(ecPrivateKey.getS().toByteArray().length);
ECPublicKey ecPublicKey = (ECPublicKey)publicKey;
System.out.println(ecPublicKey.getW().getAffineX().toByteArray().length + ecPublicKey.getW().getAffineY().toByteArray().length);
In Rails4 app (versions: rails 4.2.3, postgresql 9.3.5), I have model classes like below
class Message < ActiveRecord::Base
belongs_to :receiver, class_name: 'User'
belongs_to :sender, class_name: 'User'
validate :receiver, presence: true
validate :sender, presence: true
end
class User < ActiveRecord::Base
has_many :received_messages, class_name: 'Message', foreign_key: :receiver_id
has_many :sent_messages, class_name: 'Message', foreign_key: :sender_id
end
I want to get collection of users who are NOT received message from specific user, So I wrote these scopes:
class User < ActiveRecord::Base
...
scope :received_messages_from, -> (user) {
includes(:received_messages).
where('messages.sender_id': user.id).
references(:received_messages)
}
scope :not_received_messages_from, -> (user) {
includes(:received_messages).
where.not(id: received_messages_from(user).select(:id)).
references(:received_messages)
}
end
I have these rows in messages table:
message_00:
sender_user_id: 11
receiver_user_id: 12
message_01:
sender_user_id: 11
receiver_user_id: 12
message_02:
sender_user_id: 12
receiver_user_id: 11
message_11:
sender_user_id: 17
receiver_user_id: 11
message_12:
sender_user_id: 11
receiver_user_id: 17
message_13:
sender_user_id: 18
receiver_user_id: 12
message_14:
sender_user_id: 12
receiver_user_id: 18
message_15:
sender_user_id: 17
receiver_user_id: 12
message_16:
sender_user_id: 17
receiver_user_id: 13
message_17:
sender_user_id: 17
receiver_user_id: 14
So, User.received_messages_from(User.find(17)).pluck(:id) results: [11, 12, 13, 14], and User.not_received_messages_from(User.find(17)).pluck(:id) results sholdn't contain these ids.
But the not_received_messages_from scope dosen't work as it returning users who has received messages from specific user. This generates SQL like this (in this example, user's id is 17):
SELECT "users"."id" FROM "users"
LEFT OUTER JOIN "messages" ON "messages"."receiver_id" = "users"."id"
WHERE ("users"."id"
NOT IN (
SELECT "users"."id" FROM "users"
WHERE "messages"."sender_id" = 17))
User.not_received_messages_from(User.find(17)).pluck(:id) results:
[11, 12, 12, 12, 15, 16, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 17, 18, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
So, I tried fixing .select(:id) to .pluck(:id) in where in not_received_messages_from scope and this works.
scope :not_received_messages_from, -> (user) {
includes(:received_messages).
where.not(id: received_messages_from(user).pluck(:id)).
references(:received_messages)
}
SQL:
SELECT "users"."id" FROM "users"
LEFT OUTER JOIN "messages" ON "messages"."receiver_id" = "users"."id"
WHERE ("users"."id" NOT IN (11, 12, 13, 14))
User.not_received_messages_from(User.find(17)).pluck(:id) results:
[15, 16, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 17, 18, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
I think the defferece between two SQLs is only subquery or static ids array passed to 'NOT IN'. Why the results differ each other?
This is likely because your sub-select is not returning the expected response.
SELECT "users"."id" FROM "users"
LEFT OUTER JOIN "messages" ON "messages"."receiver_user_id" = "users"."id"
WHERE ("users"."id"
NOT IN (
SELECT "users"."id" FROM "users"
WHERE "messages"."sender_user_id" = 17))
It's been a while since I've looked at PostresQL joins, but I don't know what, if anything that sub-select would produce. It's operating with a join, but... which one? There's no reference in the documentation that explains that.