Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I am having some difficulty writing an awk/sed code for finding the distances between every row and the last row systematically. To be more specific, suppose I have a file f1 as follows.
1 2 3
4 5 6
7 8 9
.
.
.
51 52 53
30 31 32
where the first column is the x coordinate, second column is the y coordinate, and third column is the z coordinate. How do I create a file containing the distances between the first row and the last row (i.e. distance between (1,2,3) and (30,31,32)), second row and last row, third row and last row, and so on, until the penultimate row and last row. If f1 has n rows, then the file (let's call it f2) would therefore have n-1 rows.
I have been stuck on this for a long time, but any help would be much appreciated. Thanks!
Use tac to get last line first:
$ tac file | awk '(NR == 1){ x=$1; y=$2; z=$3; next } {
print sqrt((x-$1)^2 + (y-$2)^2 + (z-$3)^2)
}' | tac
50.2295
45.0333
39.8372
36.3731
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
So, let's say I have 5 items, A, B, C, D and E. Item A comes in sizes 1 and 2, item B comes in sizes 2 and 3, C comes in 1 and 3, D comes in 1 and E comes in 3. Now, I am considering 2 table options, as follow:
Table 1
Name
Size
A
1
A
2
B
2
B
3
C
1
C
3
D
1
E
3
Another option is Table 2, as follows:
Name
A1
A2
B2
B3
C1
C3
D1
E3
Now, which of these 2 tables is actually a better option? What are the advantages and disadvantages (if any) of each of the 2 tables above? One thing that I can think of is that, if I use table 1, I can easily extract all items by size, no matter what item I want. So, for instance, if I want to analyze this month's sales of items of size 1, it's easy to do it with Table 1. I can't seem to see the same advantage if I use table 2. What do you guys think? Please kindly enlighten me on this matter. Thank you in advance for your kind assistance, everyone. Cheers! :)
I don't even understand why you have the second table option - what purpose does it have or how does it help you? Plain and simple you have a one to many relationship. That is an item comes in 1 or more different sizes. You just saying that sentence should scream ONLY option 1. Option 2 will make your life a living hell because you are going against normalization guidelines by taking 2 datatypes into 1, and it has no real benefit.
Option 1 says I have an item and it can have one or more sizes associated with it.
Item Size
A 1
A 2
A 3
B 1
C 1
C 2
Then you can do simple queries like give me all items that have more then 1 size. Give me any item that only has 1 size. Give me all the sizes of item with item id A, etc.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I am new to pandas.
I have been trying to solve a problem here
This is the problem statement where I want to drop any row where I have a duplicate A but non duplicate B
Here is the kind of output I want
enter image description here
IIUC, this is what you need
a = (df['A'].ne(df['A'].shift())).ne((df['B'].ne(df['B'].shift())))
df[~a].reset_index(drop=True)
Output
A B
0 2 z
1 3 x
2 3 x
I think you need:
cond=(df.eq(df.shift(-1))|df.eq(df.shift())).all(axis=1)
pd.concat([df[~cond].groupby('A').last().reset_index(),df[cond]])
A B
0 2 y
2 3 x
3 3 x
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a matrix of data in excel that has a large number of values ranging from around 50 to ~2000. I am trying to find the 200 largest values in the table (found using the LARGE() function) and then for each of those 200 largest values return the marginal values of each of those values. I found a function that does this here. It returns a string with the marginal values, but when their are multiple instances of the value being searched for, it simply returns the same one. How would I go about finding all instances of all 200 values? Below is an example of how the data would look.
| 1 2
---------
1 |10 12
2 |12 14
3 |11 13
4 |2 12
5 |9 14
6 |10 12
7 |15 9
8 |15 16
after using the large function to find the top 5 values (16, 15, 15, 14, and 14) it would need to return the following:
9-2
8-1
9-1
2-2
5-2
Any help is greatly appreciated. Would prefer not to use VBA but to use functions built into Excel, but I am open to any solution using Excel, including those that reformat the data.
If you can guarantee that no value will be duplicated in the same columns then:
=INDEX($1:$1,AGGREGATE(15,6,COLUMN($B$2:$J$3)/($B$2:$J$3=L1),COUNTIF($L$1:L1,L1)))&" - "&INDEX(A:A,AGGREGATE(15,6,ROW(INDEX($A$1:$J$3,0,AGGREGATE(15,6,COLUMN($B$2:$J$3)/($B$2:$J$3=L1),COUNTIF($L$1:L1,L1))))/(INDEX($A$1:$J$3,0,AGGREGATE(15,6,COLUMN($B$2:$J$3)/($B$2:$J$3=L1),COUNTIF($L$1:L1,L1)))=L1),1))
Where L1:L5 have your values from the Large.
Now that you flipped the data:
=INDEX(A:A,AGGREGATE(15,6,ROW($B$2:$C$9)/($B$2:$C$9=L1),COUNTIF($L$1:L1,L1)))&" - "&INDEX($1:$1,AGGREGATE(15,6,COLUMN(INDEX($A$1:$C$9,AGGREGATE(15,6,ROW($B$2:$C$9)/($B$2:$C$9=L1),COUNTIF($L$1:L1,L1)),0))/(INDEX($A$1:$C$9,AGGREGATE(15,6,ROW($B$2:$C$9)/($B$2:$C$9=L1),COUNTIF($L$1:L1,L1)),0)=L1),1))
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have about 1000 records in an excel spreadsheet in two columns
Column 1:
- Row 1- Name:
- Row 2- Company:
- Row 3- Tel. No:
- Row 4- Email:
- Row 5- Web-address:
- Row 6- Name:
- Row 7- Company:
- Row 8- Tel.No:
etc
Column 2
- Row 1- Mike A
- Row 2- Microsoft
- Row 3- 78544587455
- Row 4- mike#microsoft.com
- Row 5- www.microsoft.com
- Row 6- Steve B
- Row 7- Google
- Row 8- 1521557547
Now what I need is the same data but in 5 columns so its a horizontal data on vertical if that makes sense.
So the end result will look like this:
Name | Company | Tel no. | Email | Website |
Mike A |Microsoft| 78544587455|mike#microsoft.com|www.microsoft.com
Steve B| Google | 1521557547 | etc
Any ideas for the VBA?
This is exactly what you need. Please read through the answer, especially items 1, 7 and 8.
You want to reshape column data (column B) into matrix data (depending on your choice, you may need an additional transposition).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I have this problem getting the Standard Deviation (equiation here). My question is how could I get the sum of ([X interval] - mean) from a set of data wherein a certain criteria(s) is to be followed.
For example, the data is:
Gender Grade
M 36
M 32
F 25
F 40
I have acquired N needed in the equation via COUNTIFS and acquired the mean via SUMIFS. The problem is having the get the sum of the range (X interval minus mean) without declaring a cell/column for the said range. In the given example, I would want to get the Standard Deviation of Grade with respect to gender. It would be hard if record 2 gender would be changed to 'F' if I would add column for X interval minus mean.
Any thoughts how this maybe done?
With a little algebra the sd formula can be rewritten as
Ʃ(x²) - Ʃ(x)²/n
sd = √( --------------- )
n
which can be implemented with SUMIFS, COUNTIFS and SUMPRODUCT
Assuming gender data is in range A1:A4 and grade in B1:B4 and criteria in C1 use
=SQRT( (SUMPRODUCT($B$1:$B$4,$B$1:$B$4,--($A$1:$A$4=C1)) -
SUMIFS($B$1:$B$4,$A$1:$A$4,C1)^2/COUNTIFS($A$1:$A$4,C1)) /
COUNTIFS($A$1:$A$4,C1) )