How do I use Perl to parse the output of the sqlplus command? - sql

I have an SQL file which will give me an output like below:
10|1
10|2
10|3
11|2
11|4
.
.
.
I am using this in a Perl script like below:
my #tmp_cycledef = `sqlplus -s $connstr \#DLCycleState.sql`;
after this above statement, since #tmp_cycledef has all the output of the SQL query,
I want to show the output as:
10 1,2,3
11 2,4
How could I do this using Perl?
EDIT:
I am using the following code:
foreach my $row (#tmp_cycledef)
{
chomp $row;
my ($cycle_code,$cycle_month)= split /\s*\|\s*/, $row;
print "$cycle_code, $cycle_month\n";
$hash{$cycle_code}{$cycle_month}=1
}
foreach my $num ( sort keys %hash )
{
my $h = $hash{$num};
print join(',',sort keys %$h),"\n";
}
the fist print statement prints:
2, 1
2, 10
2, 11
2, 12
3, 1
3, 10
3, 11
but the out is always
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12
1,10,11,12

Well, this one is actually how you might do it in perl:
# two must-have pragmas for perl development
use strict;
use warnings;
Perl allows for variables to be created as they are used, $feldman = some_function() means that you now have the variable $feldman in your local namespace. But the bad part about this is that you can type $fldman and take a long time finding out why what you thought was $feldman has no value. Turning on strictures means that your code fails to compile if it encounters an undeclared variable. You declare a variable with a my or our statement (or in older Perl code a use vars statement.
Turning on warnings just warns you when you're not getting values you expect. Often warnings tends to be too touchy, but they are generally a good thing to develop code with.
my %hash; # the base object for the data
Here, I've declared a hash variable that I creatively called %hash. The sigil (pronounced "sijil") "%" tells that it is a map of name-value pairs. This my statement declared the variable and makes it legal for the compiler. The compiler will warn me about any use of %hsh.
The next item is a foreach loop (which can be abbreviated "for"). The loop will process the list of lines in #tmp_cycledef assigning each one in turn to $row. ( my $row).
We chomp the line first, removing the end-of-line character for that platform.
We split the line on the '|' character, creating a list of strings that had been separated by a pipe.
And then we store it in a two-layered hash. Since we want to group them by at least the first number. We could do this by array, and create an array at the location in the hash like so: push #{$hash{$key}}, $val, but I typically want to collapse duplicates (not that there were any duplicates in your sample.)
Here:
foreach my $row ( #tmp_cycledef ) {
chomp $row; # removes the end-of-line character when present.
my ( $key, $val ) = split /\|/, $row;
# One of the best ways to merge lists is a presence-of idea
# with the hash holding whether the value is present
$hash{$key}{$val} = 1;
}
Once we have the data in the structure, we need to iterate both level of hash keys. You wanted to separate the "top level" numbers by lines, but you wanted the second numbers concatenated on the same line. So we print a line for each of the first numbers and join the list of strings stored for each number on the same line, delimited by commas. We also sort the list: { $a <=> $b } just takes to keys and numerically compares them. So you get a numeric order.
# If they were alpha keys our sort routine, we would just likely say sort keys %hash
foreach my $num ( sort { $a <=> $b } keys %hash ) {
my $h = $hash{$num};
print "$num ", join( ',', sort { $a <=> $b } keys %$h ), "\n";
}
As I said in the comments, sort, by default, sorts in character order so you can just say sort keys %hash.
To help you out, you really need to read some of these:
strictures
warnings
perldata
perlfunc -- especially my, foreach, chomp, split, keys, sort and join
And the data structure tutorial

Use a hash of arrays to collect all the values for a single key together, then print them out:
init hash
for each line:
parse into key|value
append value to hash[key]
for each key in hash: # you can sort it, if needed
print out key, list of values

If your input is sorted (as it is in the provided sample), you don't actually need to bother with the hash of arrays/hashes. The code is a bit longer, but doesn't require you to understand references and should run faster for large datasets:
#!/usr/bin/perl
use strict;
use warnings;
my #tmp_cycledef = <DATA>;
my $last_key;
my #values;
for (#tmp_cycledef) {
chomp;
my ($key, $val) = split '\|';
# Seed $last_key with the first key value on the first pass
$last_key = $key unless defined $last_key;
# The key has changed, so it's time to print out the values associated
# with the previous key, then reset everything for the new one
if ($key != $last_key) {
print "$last_key " . join(',', #values) . "\n";
$last_key = $key;
#values = ();
}
# Add the current value to the list of values for this key
push #values, $val;
}
# Don't forget to print out the final key when you're done!
print "$last_key " . join(',', #values) . "\n";
__DATA__
10|1
10|2
10|3
11|2
11|4

Related

Awk get unique elements from array

file.txt:
INTS11:P446P&INTS11:P449P&INTS11:P518P&INTS11:P547P&INTS11:P553P
PLCH2:A1007int&PLCH1:D987int&PLCH2:P977L
I am attempting to create a hyperlink by transforming the content of a file. The hyperlink will have the following style:
somelink&gene=<gene>[&gene=<gene>]&mutation=<gene:key>[&mutation=<gene:key>]
where INTS11:P446P corresponds to gene:key for example
The problem is that I am looping on the each row to create an array that contains the genes as values and thus multiple duplicated entries can be found for the same gene.
My attempt is the following
Split on & and store in a
For each element in a, split on : and add a[i] to array b
The problem is that I don't know how to get unique values from my array. I found this question but it talks about files and not arrays like in my case.
The code:
awk '#include "join"
{
split($0,a,"&")
for ( i = 1; i <= length(a); i++ ) {
split(a[i], b, ":");
genes[i] = "&gene="b[1];
keys[i] = "&mutation="b[1]":"b[2]
}
print "somelink"join(genes, 1, length(genes),SUBSEP)join(keys, 1, length(keys),SUBSEP)
delete genes
delete keys
}' file.txt
will output:
somelink&gene=INTS11&gene=INTS11&gene=INTS11&gene=INTS11&gene=INTS11&mutation=INTS11:P446P&mutation=INTS11:P449P&mutation=INTS11:P518P&mutation=INTS11:P547P&mutation=INTS11:P553P
somelink&gene=PLCH2&gene=PLCH1&gene=PLCH2&mutation=PLCH2:A1007int&mutation=PLCH1:D987int &mutation=PLCH2:P977L
I wish to obtain something similar like (notice how many &gene= is there):
somelink&gene=INTS11&mutation=INTS11:P446P&INTS11:P449P&INTS11:P518P&INTS11:P547P&INTS11:P553P
somelink&gene=PLCH2&gene=PLCH1&mutation=PLCH2:A1007int&mutation=PLCH1:D987int&mutation=PLCH2:P977L
EDIT:
my problem was partly solved thanks to Pierre Francois's answer which was the SUBSEP. My other issue is that I want to get only unique elements from my arrays genes and keys.
Thank you.
Supposing you want to remove the spaces between the fields concatenated with the join function of awk, the 4th argument you have to provide to the join function is the magic number SUBSEP and not an empty string "" as you did. Try:
awk '#include "join"
{
split($0,a,"&")
for ( i = 1; i <= length(a); i++ ) {
split(a[i], b, ":");
genes[i] = "&gene="b[1];
keys[i] = "&mutation="b[1]":"b[2]
}
print "somelink"join(genes, 1, length(genes),SUBSEP)join(keys, 1, length(keys),SUBSEP)
delete genes
delete keys
}' file.txt

Multiply all values in a %hash and return a %hash with the same structure

I have some JSON stored in a database column that looks like this:
pokeapi=# SELECT height FROM pokeapi_pokedex WHERE species = 'Ninetales';
-[ RECORD 1 ]------------------------------------------
height | {"default": {"feet": "6'07\"", "meters": 2.0}}
As part of a 'generation' algorithm I'm working on I'd like to take this value into a %hash, multiply it by (0.9..1.1).rand (to allow for a 'natural 10% variance in the height), and then create a new %hash in the same structure. My select-height method looks like this:
method select-height(:$species, :$form = 'default') {
my %heights = $.data-source.get-height(:$species, :$form);
my %height = %heights * (0.9..1.1).rand;
say %height;
}
Which actually calls my get-height routine to get the 'average' heights (in both metric and imperial) for that species.
method get-height (:$species, :$form) {
my $query = dbh.prepare(qq:to/STATEMENT/);
SELECT height FROM pokeapi_pokedex WHERE species = ?;
STATEMENT
$query.execute($species);
my %height = from-json($query.row);
my %heights = self.values-or-defaults(%height, $form);
return %heights;
}
However I'm given the following error on execution (I assume because I'm trying to multiple the hash as a whole rather than the individual elements of the hash):
$ perl6 -I lib/ examples/height-weight.p6
{feet => 6'07", meters => 2}
Odd number of elements found where hash initializer expected:
Only saw: 1.8693857987465123e0
in method select-height at /home/kane/Projects/kawaii/p6-pokeapi/lib/Pokeapi/Pokemon/Generator.pm6 (Pokeapi::Pokemon::Generator) line 22
in block <unit> at examples/height-weight.p6 line 7
Is there an easier (and working) way of doing this without duplicating my code for each element? :)
Firstly, there is an issue with logic of your code. Initially, you are getting a hash of values, "feet": "6'07\"", "meters": 2.0 parsed out of json, with meters being a number and feet being a string. Next, you are trying to multiply it on a random value... And while it will work for a number, it won't for a string. Perl 6 allomorphs allow you to do that, actually: say "5" * 3 will return 15, but X"Y' pattern is complex enough for Perl 6 to not naturally understand it.
So you likely need to convert it before processing, and to convert it back afterwards.
The second thing is exact line that leads to the error you are observing.
Consider this:
my %a = a => 5;
%a = %a * 10 => 5; # %a becomes a hash with a single value of 10 => 5
# It happens because when a Hash is used in math ops, its size is used as a value
# Thus, if you have a single value, it'll become 1 * 10, thus 10
# And for %a = a => 1, b => 2; %a * 5 will be evaluated to 10
%a = %a * 10; # error, the key is passed, but not a value
To work directly on hash values, you want to use map method and process every pair, for example: %a .= map({ .key => .value * (0.9..1.1).rand }).
Of course, it can be golfed or written in another manner, but the main issue is resolved this way.
You've accepted #Takao's answer. That solution requires manually digging into %hash to get to leaf hashes/lists and then applying map.
Given that your question's title mentions "return ... same structure" and the body includes what looks like a nested structure, I think it's important there's an answer providing some idiomatic solutions for automatically descending into and duplicating a nested structure:
my %hash = :a{:b{:c,:d}}
say my %new-hash = %hash».&{ (0.9 .. 1.1) .rand }
# {a => {b => {c => 1.0476391741359872, d => 0.963626602773474}}}
# Update leaf values of original `%hash` in-place:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
# Same effect:
%hash »*=» (0.9..1.1).rand;
# Same effect:
%hash.deepmap: { $_ = (0.9..1.1).rand }
Hyperops (eg ») iterate one or two data structures to get to their leaves and then apply the op being hypered:
say %hash».++ # in-place increment leaf values of `%hash` even if nested
.&{ ... } calls the closure in braces using method call syntax. Combining this with a hyperop one can write:
%hash».&{ $_ = (0.9 .. 1.1) .rand }
Another option is .deepmap:
%hash.deepmap: { $_ = (0.9..1.1).rand }
A key difference between hyperops and deepmap is that the compiler is allowed to iterate data structures and run hyperoperations in parallel in any order whereas deepmap iteration always occurs sequentially.

Perl6 split function adding extra elements to array

my #r = split("", "hi");
say #r.elems;
--> output: 4
split is adding two extra elements to the array, one at the beginning and another at the end.
I have to do shift and pop after every split to correct for this.
Is there a better way to split a string?
If you're splitting on the empty string, you will get an empty element at the start and the end of the returned list as there is also an empty string before and after the string.
What you want is .comb without parameters, written out completely functionally:
"hi".comb.elems.say; # 2
See https://docs.raku.org/routine/comb#(Str)_routine_comb for more info.
The reason for this is when you use an empty Str “” for the delimiter it is the same as if you had used the regex /<|wb>/ which matches next to characters. So it also matches before the first character, and after the last character. Perl 5 removes these “extra” strings for you in this case (and in this case only), which is likely where the confusion lays.
What Perl 6 does instead is allow you to explicitly :skip-empty values
'hi'.split('') :skip-empty
'hi'.split('', :skip-empty)
split("", "hi") :skip-empty
split("", "hi", :skip-empty)
Or to specify what you actually want
'hi'.comb( /./ )
'hi'.comb( 1 )
'hi'.comb
comb( /./, 'hi' )
comb( 1, 'hi' )

Why is assignment to a list of variables inconsistent?

To save 2 values from a list returned by a sub and throw the third away, one can;
(my $first, my $second) = (1, 2, 3);
print $first, "\n";
print $second, "\n";
exit 0;
and it works as expected (in both perl5 and perl6). If you want just the first however;
(my $first) = (1, 2, 3);
print $first, "\n";
exit 0;
... you get the whole list. This seems counter-intuitive - why the inconsistency?
This should be due to the single argument rule. You get the expected behaviour by adding a trailing ,:
(my $first,) = (1, 2, 3);
Note that while this works as declarations return containers, which are first-class objects that can be put in lists, you're nevertheless doing it 'wrong':
The assignments should read
my ($first, $second) = (1, 2, 3);
and
my ($first) = (1, 2, 3);
Also note that the parens on the right-hand side are superfluous as well (it's the comma that does list construction); the more idiomatic versions would be
my ($first, $second) = 1, 2, 3;
and
my ($first) = 1, 2, 3;
(my $first, ) = (1,2,3);
dd $first; # OUTPUT«Int $first = 1␤»
In your first example you assign a list (or a part thereof) to a list of containers. Your second example does exactly what you ask it for. A list of values is assigned to one container. In Perl 5, the list is constructed by the parentheses (in this case), whereby in Perl 6 the list is constructed by the comma. The latter is used in my example to get what is asked for.
I would argue that it's Perl 5 that is inconsistent as sometimes lists are constructed by commas, parentheses or brackets.
my ($first,$,$third) = (1,2,3);
dd $first, $third; # OUTPUT«Int $first = 1␤Int $third = 3␤»
You can skip one or many list elements by adding anonymous state variables. This also leads to a shortcut to your first example.
my $first,$ = 1,2,3;
dd $first; # OUTPUT«Any $first = Any␤»

How can I do a SQL like group by in AWK? Can I calculate aggregates for different columns?

I would like to run splits on csv files in unix and run aggregates on some columns. I want to group by on several columns if possible on each of the split up files using awk.
Does anyone know some unix magic that can do this?
here is a sample file:
customer_id,location,house_hold_type,employed,income
123,Florida,Head,true,100000
124,NJ,NoHead,false,0
125,Florida,NoHead,true,120000
126,Florida,Head,true,72000
127,NJ,Head,false,0
I want to get counts grouping on location, house_hold_type as well as AVG(income) for the same group by conditions.
How can I split a file and run awk with this?
this is the output I expect the format of the output could be different but
this is the overall data structure I am expecting. Will humbly accept other ways of presenting
the information:
location:[counts:['Florida':3, 'NJ':2], income_avgs:['Florida':97333, 'NJ':0]]
house_hold_type:[counts:['Head':3, 'NoHead':2], income_avgs:['Head':57333, 'NoHead':60000]]
Thank you in advance.
awk deals best with columns of data, so the input format is fine. The output format could be managed, but it will be much simpler to output it in columns as well:
#set the input and output field separators to comma
BEGIN {
FS = ",";
OFS = FS;
}
#skip the header row
NR == 1 {
next;
}
#for all remaining rows, store counters and sums for each group
{
count[$2,$3]++;
sum[$2,$3] += $5;
}
#after all data, display the aggregates
END {
print "location", "house_hold_type", "count", "avg_income";
#for every key we encountered
for(i in count) {
#split the key back into "location" and "house_hold_type"
split(i,a,SUBSEP);
print a[1], a[2], count[i], sum[i] / count[i];
}
}
Sample input:
customer_id,location,house_hold_type,employed,income
123,Florida,Head,true,100000
124,NJ,NoHead,false,0
125,Florida,NoHead,true,120000
126,Florida,Head,true,72000
127,NJ,Head,false,0
and output:
location,house_hold_type,count,avg_income
Florida,Head,2,86000
Florida,NoHead,1,120000
NJ,NoHead,1,0
NJ,Head,1,0