creating a variable that change sizes in for loop - variables

I have to create a fits file using the data from two IDL structures. This is not the basic problem.
My problem is that first I have to create a variable that contains the two structures.
To create this I used a for loop that will write at each step a new row of my variable.
The problem is that I cannot add the new row at the next step, it overwrite it so at the end my fits file instead of having, I don't know, 10000 rows, it has only one row.
This is what I also tried
for jj=0,h[1]-1 do begin
test[*,jj] = [sme.wave[jj], sme.smod[jj]]
print,test
endfor
but the * wildcard is messing up everything because now inside test I have the number corresponding to jj, not the values of sme.wave and sme.smod.
I hope that someone can understand what I asked and that can help me!
thank you in advance!
Chiara

Assuming your "sme.wave" and "sme.smod" structure fields contain 1-D arrays with the same number of elements as there are rows in "test", then your code should work. For example, I tried this and got the following output:
IDL> test = intarr(2, 10) ; all zeros
IDL> sme = {wave:indgen(10), smod:indgen(10)*2}
IDL> for jj=0, 9 do test[*,jj] = [sme.wave[jj], sme.smod[jj]]
IDL> print, test
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18
However, for better speed optimization, you should instead do the following and take advantage of IDL's multi-threaded array operations. Looping is typically much slower than something like the following:
IDL> test = intarr(2, 10) ; all zeros
IDL> sme = {wave:indgen(10), smod:indgen(10)*2}
IDL> test[0,*] = sme.wave
IDL> test[1,*] = sme.smod
IDL> print, test
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18
Further, if you don't know what the size of "test" is ahead of time, and you want to append to the variable, i.e. add a row, then you can do this:
IDL> test = []
IDL> sme = {wave:Indgen(10), smod:Indgen(10)*2}
IDL> for jj=0, 9 do test = [[test], [sme.wave[jj], sme.smod[jj]]]
IDL> Print, test
0 0
1 2
2 4
3 6
4 8
5 10
6 12
7 14
8 16
9 18

Related

how to convert function output into list, dict or as data frame?

My issue is, i don't know how to use the output of a function properly. The output contains multiple lines (j = column , i = testresult)
I want to use the output for some other rules in other functions. (eg. if (i) testresult > 5 then something)
I have a function with two loops. The function goes threw every column and test something. This works fine.
def test():
scope = range(10)
scope2 = range(len(df1.columns))
for (j) in scope2:
for (i) in scope:
if df1.iloc[:,[j]].shift(i).loc[selected_week].item() > df1.iloc[:,[j]].shift(i+1).loc[selected_week].item():
i + 1
else:
print(j,i)
break
Output:
test()
1 0
2 3
3 3
4 1
5 0
6 6
7 0
8 1
9 0
10 1
11 1
12 0
13 0
14 0
15 0
I tried to convert it to list, dataframe etc. However, i miss something here.
What is the best way for that?
Thank you!
A fix of your code would be:
def test():
out = []
scope = range(10)
scope2 = range(len(df1.columns))
for j in scope2:
for i in scope:
if df1.iloc[:,[j]].shift(i).loc[selected_week].item() <= df1.iloc[:,[j]].shift(i+1).loc[selected_week].item():
out.append([i, j])
return pd.DataFrame(out)
out = test()
But you probably don't want to use loops as it's slow, please clarify what is your input with a minimal reproducible example and what you are trying to achieve (expected output and logic), we can probably make it a vectorized solution.

Is there a way to detect special chars such as '?' or any, in a column in huge dataframe with thousands of records?

INPUT
A B C
0 1 2 3
1 4 ? 6
2 7 8 ?
... ... ... ...
551 4 4 6
552 3 7 9
There might be '?' in between somewhere which is undetectable, I tried doing it with
pd.to_numeric, error='coerce'
but it only show first 5 and last 5 rows, and I cant check all rows/columns for special chars
So how to actually deal with this problem and make dataset clean
Once detected I know how to remove those and fill with their respective column mean values, so thats not an issue
Please I'm new to this stack overflow and switching from a non-IT field
The below is an easier way without using regex.
special = '[#_!#$%^&*()<>?/\|}{~:]'
df['B'].str.count(special)
Please refer to below link to do it using regex:
regex

Processing loading table data

I have a text file "celldata.txt" containing a very simple table of data.
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
2 3 4 5
The problem is when it comes to accessing the data at a certain column and row.
My approach has been to load using loadTable.
Table table;
int numCols;
int numRows;
void setup() {
size(200,200);
table = loadTable("celldata.txt","tsv");
numRows=table.getRowCount();
numCols=table.getColumnCount();
}
void draw() {
background(255);
fill(0);
text(numRows +" "+ numCols,100,100); // Check num of cols and rows
println(table.getFloat(0,0));
}
Question 1: When I do this, it says the number of rows are 5 and the number of columns is just 1. Why is it not 5 x 4?
Question 2: Why is table.getFloat(0,0) "NaN" instead of the first element of the data?
I want to use a much bigger matrix later and access certain elements (of type double) with something like getFloat(i,j) and be able to loop through all elements.
Using the same example data as I, can someone please help me understand what is wrong with my code and how to access the textfile's data? Should I be using another method than loadTable?
You've told Processing that the file contains tab separated values (by using the "tsv" option), but your file contains space separated values.
Since your file does not contain any tabs, it reads the entire row as a single value. So the 0,0 position of your table is 1 2 3 4, which isn't a number- hence the NaN. This is also why it thinks your table only has one column.
You should modify your celldata.txt file to actually be separated by tabs instead of spaces:
1 2 3 4
5 6 7 8
9 10 11 12
1 2 3 4
2 3 4 5
You could also separate them by commas and then use the "csv" option.
If you're still having trouble, you can see what Processing is reading in by adding saveTable(table, "data/new.csv"); to the end of your setup() function and then looking at that file. It will be a list of values separated by commas, so you can see exactly where Processing thinks the cells of the table are.

How to delete "1" followed by trailing zeros from Data Frame row values ?

From my "Id" Column I want to remove the one and zero's from the left.
That is
1000003 becomes 3
1000005 becomes 5
1000011 becomes 11 and so on
Ignore -1, 10 and 1000000, they will be handled as special cases. but from the remaining rows I want to remove the "1" followed by zeros.
Well you can use modulus to get the end of the numbers (they will be the remainder). So just exclude the rows with ids of [-1,10,1000000] and then compute the modulus of 1000000:
print df
Id
0 -1
1 10
2 1000000
3 1000003
4 1000005
5 1000007
6 1000009
7 1000011
keep = df.Id.isin([-1,10,1000000])
df.Id[~keep] = df.Id[~keep] % 1000000
print df
Id
0 -1
1 10
2 1000000
3 3
4 5
5 7
6 9
7 11
Edit: Here is a fully vectorized string slice version as an alternative (Like Alex' method but takes advantage of pandas' vectorized string methods):
keep = df.Id.isin([-1,10,1000000])
df.Id[~keep] = df.Id[~keep].astype(str).str[1:].astype(int)
print df
Id
0 -1
1 10
2 1000000
3 3
4 5
5 7
6 9
7 11
Here is another way you could try to do it:
def f(x):
"""convert the value to a string, then select only the characters
after the first one in the string, which is 1. For example,
100005 would be 00005 and I believe it's returning 00005.0 from
dataframe, which is why the float() is there. Then just convert
it to an int, and you'll have 5, etc.
"""
return int(float(str(x)[1:]))
# apply the function "f" to the dataframe and pass in the column 'Id'
df.apply(lambda row: f(row['Id']), axis=1)
I get that this question is satisfactory answered. But for future visitors, what I like about alex' answer is that it does not depend on there to be exactly four zeros. The accepted answer will fail if you sometimes have 10005, sometimes 1000005 and whatever.
However, to add something more to the way we think about it. If you know it's always going to be 10000, you can do
# backup all values
foo = df.id
#now, some will be negative or zero
df.id = df.id - 10000
#back in those that are negative or zero (here, first three rows)
df.if[df.if <= 0] = foo[df.id <= 0]
It gives you the same as Karl's answer, but I typically prefer these kind of methods for their readability.

Understanding The Modulus Operator %

I understand the Modulus operator in terms of the following expression:
7 % 5
This would return 2 due to the fact that 5 goes into 7 once and then gives the 2 that is left over, however my confusion comes when you reverse this statement to read:
5 % 7
This gives me the value of 5 which confuses me slightly. Although the whole of 7 doesn't go into 5, part of it does so why is there either no remainder or a remainder of positive or negative 2?
If it is calculating the value of 5 based on the fact that 7 doesn't go into 5 at all why is the remainder then not 7 instead of 5?
I feel like there is something I'm missing here in my understanding of the modulus operator.
(This explanation is only for positive numbers since it depends on the language otherwise)
Definition
The Modulus is the remainder of the euclidean division of one number by another. % is called the modulo operation.
For instance, 9 divided by 4 equals 2 but it remains 1. Here, 9 / 4 = 2 and 9 % 4 = 1.
In your example: 5 divided by 7 gives 0 but it remains 5 (5 % 7 == 5).
Calculation
The modulo operation can be calculated using this equation:
a % b = a - floor(a / b) * b
floor(a / b) represents the number of times you can divide a by b
floor(a / b) * b is the amount that was successfully shared entirely
The total (a) minus what was shared equals the remainder of the division
Applied to the last example, this gives:
5 % 7 = 5 - floor(5 / 7) * 7 = 5
Modular Arithmetic
That said, your intuition was that it could be -2 and not 5. Actually, in modular arithmetic, -2 = 5 (mod 7) because it exists k in Z such that 7k - 2 = 5.
You may not have learned modular arithmetic, but you have probably used angles and know that -90° is the same as 270° because it is modulo 360. It's similar, it wraps! So take a circle, and say that its perimeter is 7. Then you read where is 5. And if you try with 10, it should be at 3 because 10 % 7 is 3.
Two Steps Solution.
Some of the answers here are complicated for me to understand. I will try to add one more answer in an attempt to simplify the way how to look at this.
Short Answer:
Example 1:
7 % 5 = 2
Each person should get one pizza slice.
Divide 7 slices on 5 people and every one of the 5 people will get one pizza slice and we will end up with 2 slices (remaining). 7 % 5 equals 2 is because 7 is larger than 5.
Example 2:
5 % 7 = 5
Each person should get one pizza slice
It gives 5 because 5 is less than 7. So by definition, you cannot divide whole 5items on 7 people. So the division doesn't take place at all and you end up with the same amount you started with which is 5.
Programmatic Answer:
The process is basically to ask two questions:
Example A: (7 % 5)
(Q.1) What number to multiply 5 in order to get 7?
Two Conditions: Multiplier starts from `0`. Output result should not exceed `7`.
Let's try:
Multiplier is zero 0 so, 0 x 5 = 0
Still, we are short so we add one (+1) to multiplier.
1 so, 1 x 5 = 5
We did not get 7 yet, so we add one (+1).
2 so, 2 x 5 = 10
Now we exceeded 7. So 2 is not the correct multiplier.
Let's go back one step (where we used 1) and hold in mind the result which is5. Number 5 is the key here.
(Q.2) How much do we need to add to the 5 (the number we just got from step 1) to get 7?
We deduct the two numbers: 7-5 = 2.
So the answer for: 7 % 5 is 2;
Example B: (5 % 7)
1- What number we use to multiply 7 in order to get 5?
Two Conditions: Multiplier starts from `0`. Output result and should not exceed `5`.
Let's try:
0 so, 0 x 7 = 0
We did not get 5 yet, let's try a higher number.
1 so, 1 x 7 = 7
Oh no, we exceeded 5, let's get back to the previous step where we used 0 and got the result 0.
2- How much we need to add to 0 (the number we just got from step 1) in order to reach the value of the number on the left 5?
It's clear that the number is 5. 5-0 = 5
5 % 7 = 5
Hope that helps.
As others have pointed out modulus is based on remainder system.
I think an easier way to think about modulus is what remains after a dividend (number to be divided) has been fully divided by a divisor. So if we think about 5%7, when you divide 5 by 7, 7 can go into 5 only 0 times and when you subtract 0 (7*0) from 5 (just like we learnt back in elementary school), then the remainder would be 5 ( the mod). See the illustration below.
0
______
7) 5
__-0____
5
With the same logic, -5 mod 7 will be -5 ( only 0 7s can go in -5 and -5-0*7 = -5). With the same token -5 mod -7 will also be -5.
A few more interesting cases:
5 mod (-3) = 2 i.e. 5 - (-3*-1)
(-5) mod (-3) = -2 i.e. -5 - (-3*1) = -5+3
It's just about the remainders. Let me show you how
10 % 5=0
9 % 5=4 (because the remainder of 9 when divided by 5 is 4)
8 % 5=3
7 % 5=2
6 % 5=1
5 % 5=0 (because it is fully divisible by 5)
Now we should remember one thing, mod means remainder so
4 % 5=4
but why 4?
because 5 X 0 = 0
so 0 is the nearest multiple which is less than 4
hence 4-0=4
modulus is remainders system.
So 7 % 5 = 2.
5 % 7 = 5
3 % 7 = 3
2 % 7 = 2
1 % 7 = 1
When used inside a function to determine the array index. Is it safe programming ? That is a different question. I guess.
Step 1 : 5/7 = 0.71
Step 2 : Take the left side of the decimal , so we take 0 from 0.71 and multiply by 7
0*7 = 0;
Step # : 5-0 = 5 ; Therefore , 5%7 =5
Modulus operator gives you the result in 'reduced residue system'. For example for mod 5 there are 5 integers counted: 0,1,2,3,4. In fact 19=12=5=-2=-9 (mod 7). The main difference that the answer is given by programming languages by 'reduced residue system'.
lets put it in this way:
actually Modulus operator does the same division but it does not care about the answer , it DOES CARE ABOUT reminder for example if you divide 7 to 5 ,
so , lets me take you through a simple example:
think 5 is a block, then for example we going to have 3 blocks in 15 (WITH Nothing Left) , but when that loginc comes to this kinda numbers {1,3,5,7,9,11,...} , here is where the Modulus comes out , so take that logic that i said before and apply it for 7 , so the answer gonna be that we have 1 block of 5 in 7 => with 2 reminds in our hand! that is the modulus!!!
but you were asking about 5 % 7 , right ?
so take the logic that i said , how many 7 blocks do we have in 5 ???? 0
so the modulus returns 0...
that's it ...
A novel way to find out the remainder is given below
Statement : Remainder is always constant
ex : 26 divided by 7 gives R : 5
This can be found out easily by finding the number that completely divides 26 which is closer to the
divisor and taking the difference of the both
13 is the next number after 7 that completely divides 26 because after 7 comes 8, 9, 10, 11, 12 where none of them divides 26 completely and give remainder 0.
So 13 is the closest number to 7 which divides to give remainder 0.
Now take the difference (13 ~ 7) = 5 which is the temainder.
Note: for this to work divisor should be reduced to its simplest form ex: if 14 is the divisor, 7 has to be chosen to find the closest number dividing the dividend.
As you say, the % sign is used to take the modulus (division remainder).
In w3schools' JavaScript Arithmetic page we can read in the Remainder section what I think to be a great explanation
In arithmetic, the division of two integers produces a quotient and a
remainder.
In mathematics, the result of a modulo operation is the
remainder of an arithmetic division.
So, in your specific case, when you try to divide 7 bananas into a group of 5 bananas, you're able to create 1 group of 5 (quotient) and you'll be left with 2 bananas (remainder).
If 5 bananas into a group of 7, you won't be able to and so you're left with again the 5 bananas (remainder).