I am trying to get the value of 'id' in the vmstat result.
However, I found out that the position of 'id' column is different between platforms such as linux/AIX/HP...
## Linux
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 35268 117568 158244 1849104 0 0 3 11321 5 2 9 15 73 3 0
So, I think I should find the string 'id' and get the position(the) then, get the value of the position in the next row.
How can I do that with awk script?
this oneliner does what you want:
awk '{for(i=NF;i>0;i--)if($i=="id"){x=i;break}}END{print $x}'
first find out the id index, then print the corresponding column in the last line.
Related
I have a column using bits to record status of every mission. The index of bits represents the number of mission while 1/0 indicates if this mission is successful and all bits are logically isolated although they are put together.
For instance: 1010 is stored in decimal means a user finished the 2nd and 4th mission successfully and the table looks like:
uid status
a 1100
b 1111
c 1001
d 0100
e 0011
Now I need to calculate: for every mission, how many users passed this mission. E.g.: for mission1: it's 0+1+1+0+1 = 5 while for mission2, it's 0+1+0+0+1 = 2.
I can use a formula FLOOR(status%POWER(10,n)/POWER(10,n-1)) to get the bit of every mission of every user, but actually this means I need to run my query by n times and now the status is 64-bit long...
Is there any elegant way to do this in one query? Any help is appreciated....
The obvious approach is to normalise your data:
uid mission status
a 1 0
a 2 0
a 3 1
a 4 1
b 1 1
b 2 1
b 3 1
b 4 1
c 1 1
c 2 0
c 3 0
c 4 1
d 1 0
d 2 0
d 3 1
d 4 0
e 1 1
e 2 1
e 3 0
e 4 0
Alternatively, you can store a bitwise integer (or just do what you're currently doing) and process the data in your application code (e.g. a bit of PHP)...
uid status
a 12
b 15
c 9
d 4
e 3
<?php
$input = 15; // value comes from a query
$missions = array(1,2,3,4); // not really necessary in this particular instance
for( $i=0; $i<4; $i++ ) {
$intbit = pow(2,$i);
if( $input & $intbit ) {
echo $missions[$i] . ' ';
}
}
?>
Outputs '1 2 3 4'
Just convert the value to a string, remove the '0's, and calculate the length. Assuming that the value really is a decimal:
select length(replace(cast(status as char), '0', '')) as num_missions as num_missions
from t;
Here is a db<>fiddle using MySQL. Note that the conversion to a string might look a little different in Hive, but the idea is the same.
If it is stored as an integer, you can use the the bin() function to convert an integer to a string. This is supported in both Hive and MySQL (the original tags on the question).
Bit fiddling in databases is usually a bad idea and suggests a poor data model. Your data should have one row per user and mission. Attempts at optimizing by stuffing things into bits may work sometimes in some programming languages, but rarely in SQL.
I am trying to rename around 100 dummy variables with the values from a separate variable.
I have a variable products, which stores information on what products a company sells and have generated a dummy variable for each product using:
tab products, gen(productid)
However, the variables are named productid1, productid2 and so on. I would like these variables to take the values of the variable products instead.
Is there a way to do this in Stata without renaming each variable individually?
Edit:
Here is an example of the data that will be used. There will be duplications in the product column.
And then I have run the tab command to create a dummy variable for each product to produce the following table.
sort product
tab product, gen(productid)
I noticed it updates the labels to show what each variable represents.
What I would like to do is to assign the value to be the name of the variable such as commercial to replace productid1 and so on.
Using your example data:
clear
input companyid str10 product
1 "P2P"
2 "Retail"
3 "Commercial"
4 "CreditCard"
5 "CreditCard"
6 "EMFunds"
end
tabulate product, generate(productid)
list, abbreviate(10)
sort product
levelsof product, local(new) clean
tokenize `new'
ds productid*
local i 0
foreach var of varlist `r(varlist)' {
local ++i
rename `var' ``i''
}
Produces the desired output:
list, abbreviate(10)
+---------------------------------------------------------------------------+
| companyid product Commercial CreditCard EMFunds P2P Retail |
|---------------------------------------------------------------------------|
1. | 3 Commercial 1 0 0 0 0 |
2. | 5 CreditCard 0 1 0 0 0 |
3. | 4 CreditCard 0 1 0 0 0 |
4. | 6 EMFunds 0 0 1 0 0 |
5. | 1 P2P 0 0 0 1 0 |
6. | 2 Retail 0 0 0 0 1 |
+---------------------------------------------------------------------------+
Arbitrary strings might not be legal Stata variable names. This will happen if they (a) are too long; (b) start with any character other than a letter or an underscore; (c) contain characters other than letters, numeric digits and underscores; or (d) are identical to existing variable names. You might be better off making the strings into variable labels, where only an 80 character limit bites.
This code loops over the variables and does its best:
gen long obs = _n
foreach v of var productid? productid?? productid??? {
su obs if `v' == 1, meanonly
local tryit = product[r(min)]
capture rename `v' `=strtoname("`tryit'")'
}
Note: code not tested.
EDIT: Here is a test. I added code for variable labels. The data example and code show that repeated values and values that could not be variable names are accommodated.
clear
input str13 products
"one"
"two"
"one"
"three"
"four"
"five"
"six something"
end
tab products, gen(productsid)
gen long obs = _n
foreach v of var productsid*{
su obs if `v' == 1, meanonly
local value = products[r(min)]
local tryit = strtoname("`value'")
capture rename `v' `tryit'
if _rc == 0 capture label var `tryit' "`value'"
else label var `v' "`value'"
}
drop obs
describe
Contains data
obs: 7
vars: 7
size: 133
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
products str13 %13s
five byte %8.0g five
four byte %8.0g four
one byte %8.0g one
six_something byte %8.0g six something
three byte %8.0g three
two byte %8.0g two
-------------------------------------------------------------------------------
Another solution is to use the extended macro function
local varlabel:variable label
The tested code is:
clear
input companyid str10 product
1 "P2P"
2 "Retail"
3 "Commercial"
4 "CreditCard"
5 "CreditCard"
6 "EMFunds"
end
tab product, gen(product_id)
* get the list of product id variables
ds product_id*
* loop through the product id variables and change the
variable name to its label
foreach var of varlist `r(varlist)' {
local varlabel: variable label `var'
display "`varlabel'"
local pos = strpos("`varlabel'","==")+2
local varlabel = substr("`varlabel'",`pos',.)
display "`varlabel'"
rename `var' `varlabel'
}
I'm trying to use gnuplot 4.6 patchlevel 6 to visualize some data from a file test.dat which looks like this:
#Pkg 1
type min max avg
small 1 10 5
medium 5 15 7
large 10 20 15
#Pkg 2
small 3 9 5
medium 5 13 6
large 11 17 13
(Note that the values are actually separated by tabs even though it shows as spaces here.)
My gnuplot commands are
reset
set datafile separator "\t"
plot 'test.dat' index 0 using 2:xticlabels(1) title col, '' using 3 title col, '' using 4 title col
This works fine as long as there is only a single data block in test.dat. When I add the second block spurious data points appear. Why is that and how can it be fixed?
YFTR: Using stat on the file yields only expected results. It reports two data blocks for the full file and correct values (for min, max and sum) when I specify one of the two using index
as mentioned in the comment to the question, one has to explicitly repeat the index 0 specification within all parts of the plot command as
plot 'test.dat' index 0 using 2, '' index 0 using 3, ...
otherwise '' refers to all blocks in the data file
Suppose I have a table as follows:
id name length
1 A 21.5
2 B 12.4
3 C 0
4 D 17
5 E 1
I wish to get:
id name length
1 A 21.5
5 E 1
Meaning all rows that hase length that ends up with 1.
length is a numeric column.
It's very simple thing to do with programing languages but it seems quite not natural for SQL. How can I do that efficiently and simply?
My only thought is to convert the field to Text and then lose eveything after the . then convert it to array and choose the letter in the position of array length. This will probebly work but it seems like a very bad solution.
You can use FLOOR and modulo division:
SELECT *
FROM tab
WHERE FLOOR(length) % 10 = 1;
SqlFiddleDemo
I want to convert data in to a specific format in Apache Pig so that I can use a reporting tool on top of it.
For example:
10:00,abc
10:00,cde
10:01,abc
10:01,abc
10:02,def
10:03,efg
The output should be in the following format:
abc cde def efg
10:00 1 1 0 0
10:01 2 0 0 0
10:02 0 0 1 0
The main problem here is that a value can occur multiple times in a row, depending on the different values available in the sample csv file, up to a total of 120.
Any suggestions to tackle this are more than welcome.
Thanks
Gagan
Try something like the following:
A = load 'data' using PigStorage(",") as (key:chararray,value:chararray);
B = foreach A generate key,(value=='abc'?1:0) as abc,(value=='cde'?1:0) as cde,(value=='efg'?1:0) as efg;
C = group B by key;
D = foreach C generate group as key, COUNT(abc) as abc, COUNT(cde) as cde, COUNT(efg) as efg;
That should get you a count of the occurances of a particular value for a particular key.
EDIT: just noticed the limit 120 part of the question. If you cannot go above 120 put the following code
E = foreach D generate key,(abc>120?"OVER 120":abc) as abc,(cde>120?"OVER 120":cde) as cde,(efg>120?"OVER 120":efg) as efg;