I have a numpy array (inputs) of shape (30,1). I want to insert 31st value (eg. x = 2). Trying to use the np.insert function but it is giving me out of bounds error.
np.insert(inputs,b+1,x)
IndexError: index 31 is out of bounds for axis 0 with size 30
Short answer: you need to insert it at index b, not b+1.
The index you pass to np.insert(..) [numpy-doc], is the one where the element should be added. If you insert it at index 30, then it will be positioned last. Note that indexes are zero-based. So if you have an array with 30 elements, then the last index is 29. If you thus insert this at index 30, we get:
>>> a
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
>>> np.insert(a,30,42)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 42])
Related
import numpy as np
data = np.array([[10, 20, 30, 40, 50, 60, 70, 80, 90],
[2, 7, 8, 9, 10, 11],
[3, 12, 13, 14, 15, 16],
[4, 3, 4, 5, 6, 7, 10, 12]],dtype=object)
target = data[:,0]
It has this error.
IndexError Traceback (most recent call last)
Input In \[82\], in \<cell line: 9\>()
data = np.array(\[\[10, 20, 30, 40, 50, 60, 70, 80, 90\],
\[2, 7, 8, 9, 10, 11\],
\[3, 12, 13, 14, 15, 16\],
\[4, 3, 4, 5, 6, 7, 10,12\]\],dtype=object)
# Define the target data ----\> 9 target = data\[:,0\]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
May I know how to fix it, please? I mean do not change the elements in the data. Many thanks. I made the matrix in the same size and the error message was gone. But I have the data with variable size.
You have a array of objects, so you can't use indexing on axis=1 as there is none (data.shape -> (4,)).
Use a list comprehension:
out = np.array([a[0] for a in data])
Output: array([10, 2, 3, 4])
I have a dataset like so -
15643, 14087, 12020, 8402, 7875, 3250, 2688, 2654, 2501, 2482, 1246, 1214, 1171, 1165, 1048, 897, 849, 579, 382, 285, 222, 168, 115, 92, 71, 57, 56, 51, 47, 43, 40, 31, 29, 29, 29, 29, 28, 22, 20, 19, 18, 18, 17, 15, 14, 14, 12, 12, 11, 11, 10, 9, 9, 8, 8, 8, 8, 7, 6, 5, 5, 5, 4, 4, 4, 3, 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
Based on domain knowledge, I know that larger values are the only ones we want to include in our analysis. How do I determine where to cut off our analysis? Should it be don't include 15 and lower or 50 and lower etc?
You can do a distribution check with quantile function. Then you can remove values below lowest 1 percentile or 2 percentile. Following is an example:
import numpy as np
data = np.array(data)
print(np.quantile(data, (.01, .02)))
Another method is calculating the inter quartile range (IQR) and setting lowest bar for analysis is Q1-1.5*IQR
Q1, Q3 = np.quantile(data, (0.25, 0.75))
data_floor = Q1 - 1.5 * (Q3 - Q1)
I keep getting the error message
More than one column has the same display name
but I cannot find the route cause. Any help is greatly appreciated!
SELECT
gl_ap_details.ledger_name,
gl_ap_details.company_code,
gl_ap_details.location_code,
gl_ap_details.cost_center,
gl_ap_details.account_number,
gl_ap_details.account_name,
gl_ap_details.product_code,
gl_ap_details.channel_code,
gl_ap_details.journal_name,
gl_ap_details.line_description,
gl_ap_details.gl_posted_date,
gl_ap_details.currency,
gl_ap_details.je_source,
gl_ap_details.je_category,
gl_ap_details.effective_date,
gl_ap_details.created_by,
gl_ap_details.invoice_num,
gl_ap_details.invoice_id,
gl_ap_details.invoice_date,
gl_ap_details.vendor_name,
gl_ap_details.vendor_number,
gl_ap_details.invoice_image,
gl_ap_details.po_number,
gl_ap_details.po_requestor,
gl_ap_details.period_name,
gl_ap_details.amount,
gl_ap_details.gl_posted_date,
gl_ap_details.project_code
FROM
wbr_global.gl_ap_details
WHERE
wbr_global.gl_ap_details.ledger_name = 'Amazon.com, Inc.'
AND cost_center IN ('1172')
AND period_name = 'JUL-21'
AND wbr_global.gl_ap_details.account_number = '60820'
GROUP BY
1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28;
val primes = generateSequence(2 to generateSequence(3) {it + 2}) {
val currSeq = it.second.iterator()
val nextPrime = currSeq.next()
nextPrime to currSeq.asSequence().filter { it % nextPrime != 0}
}.map {it.first}
println(primes.take(10).toList()) // prints [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
I tried to understand this function about how it works, but not easy to me.
Could someone explain how it works? Thanks.
It generates an infinite sequence of primes using the "Sieve of Eratosthenes" (see here: https://en.wikipedia.org/wiki/Sieve_of_Eratosthenes).
This implementation uses a sequence of pairs to do this. The first element of every pair is the current prime, and the second element is a sequence of integers larger than that prime which is not divisible by any previous prime.
It starts with the pair 2 to [3, 5, 7, 9, 11, 13, 15, 17, ...], which is given by 2 to generateSequence(3) { it + 2 }.
Using this pair, we create the next pair of the sequence by taking the first element of the sequence (which is now 3), and then removing all numbers divisible by 3 from the sequence (removing 9, 15, 21 and so on). This gives us this pair: 3 to [5, 7, 11, 13, 17, ...]. Repeating this pattern will give us all primes.
After creating a sequence of pairs like this, we are finally doing .map { it.first } to pick only the actual primes, and not the inner sequences.
The sequence of pairs will evolve like this:
2 to [3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, ...]
3 to [5, 7, 11, 13, 17, 19, 23, 25, 29, ...]
5 to [7, 11, 13, 17, 19, 23, 29, ...]
7 to [11, 13, 17, 19, 23, 29, ...]
11 to [13, 17, 19, 23, 29, ...]
13 to [17, 19, 23, 29, ...]
// and so on
I am trying to do crossover on a Genetic Algorithm population using numpy.
I have sliced the population using parent 1 and parent 2.
population = np.random.randint(2, size=(4,8))
p1 = population[::2]
p2 = population[1::2]
But I am not able to figure out any lambda or numpy command to do a multi-point crossover over parents.
The concept is to take ith row of p1 and randomly swap some bits with ith row of p2.
I think you want to select from p1 and p2 at random, cell by cell.
To make it easier to understand i've changed p1 to be 10 to 15 and p2 to be 20 to 25. p1 and p2 were generated at random in these ranges.
p1
Out[66]:
array([[15, 15, 13, 14, 12, 13, 12, 12],
[14, 11, 11, 10, 12, 12, 10, 12],
[12, 11, 14, 15, 14, 10, 13, 10],
[11, 12, 10, 13, 14, 13, 12, 13]])
In [67]: p2
Out[67]:
array([[23, 25, 24, 21, 24, 20, 24, 25],
[21, 21, 20, 20, 25, 22, 24, 22],
[24, 22, 25, 20, 21, 22, 21, 22],
[22, 20, 21, 22, 25, 23, 22, 21]])
In [68]: sieve=np.random.randint(2, size=(4,8))
In [69]: sieve
Out[69]:
array([[0, 1, 0, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 1, 1],
[0, 1, 1, 0, 0, 1, 1, 0],
[0, 0, 0, 1, 1, 1, 1, 1]])
In [70]: not_sieve=sieve^1 # Complement of sieve
In [71]: pn = p1*sieve + p2*not_sieve
In [72]: pn
Out[72]:
array([[23, 15, 24, 14, 12, 20, 12, 25],
[14, 11, 11, 20, 25, 12, 10, 12],
[24, 11, 14, 20, 21, 10, 13, 22],
[22, 20, 21, 13, 14, 13, 12, 13]])
The numbers in the teens come from p1 when sieve is 1
The numbers in the twenties come from p2 when sieve is 0
This may be able to be made more efficient but is this what you expect as output?