Generating sequential number lists in tcsh - scripting

I've been trying to find a workaround to defining lists of sequential numbers extensively in tcsh, ie. instead of doing:
i = ( 1 2 3 4 5 6 8 9 10 )
I would like to do something like this (knowing it doesn't work)
i = ( 1..10 )
This would be specially usefull in foreach loops (I know I can use while, just trying to look for an alternative).
Looking around I found this:
foreach $number (`seq 1 1 9`)
...
end
Found that here. They say it would generate a list of number starting with 1, with increments of 1 ending in 9.
I tried it, but it didn't work. Apparently seq isn't a command. Does it exist or is this plain wrong?
Any other ideas?

seq certainly exists, but perhaps not on your system since it is not in the POSIX standard. I just noticed you have two errosr in your command. Does the following work?
foreach number ( `seq 1 9` )
echo $number
end
Notice the omission of the dollar sign and the extra backticks around the seq command.
If that still doesn't work you could emulate seq with awk:
foreach number ( `awk 'BEGIN { for (i=1; i<=9; i++) print i; exit }'` )
Update
Two more alternatives:
If your machine has no seq it might have jot (BSD/OSX):
foreach number ( `jot 9` )
I had never heard of jot before, but it looks like seq on steroids.
Use bash with built-in brace expansion:
for number in {1..9}

Related

strugling with awk script need help to done this just need your suggestion or logic

I have a sql file to filter the data
-- Edit this file by adding your SQL below each question.
-------------------------------------------------------------------------------
-------------------------------------------------------------
-- The following queries are based on the 1994 census data.
-------------------------------------------------------------
.read 1994
-census-summary-1.sql
-- 4. what is the average age of people from China?
select avg(age)
from census
where native_country ='China';
-- 5. what is the average age of people from Taiwan?
select avg(age)
from census
where native_country ='Taiwan';
-- 6. which native countries have "land" in their name?
select distinct(native_country)
from census
where native_country like '%land%';
--------------------------------------------------------------------------------------
-- The following queries are based on the courses-ddl.sql and courses-small.sql data
--------------------------------------------------------------------------------------
drop table census;
.read courses-ddl.sql
.read courses-small-1.sql
-- 11. what are the names of all students who have taken some course? Don't show duplicates.
select distinct(name)
from student
where tot_cred > 0;
-- 12. what are the names of departments that offer 4-credit courses? Don't list duplicates.
select distinct(dept_name)
from course
where credits=4;
-- 13. What are the names and IDs of all students who have received an A in a computer science class?
select distinct(name), id
from student natural join takes natural join course
where dept_name="Comp. Sci." and grade="A";
if I run
./script.awk -v ID=6 file.sql
Note that the problem id is passed to the awk script as variable ID on the command line, like this:
-v ID=6
How Can I get the result like
Result :
select distinct(native_country) from census where native_country like '%land%';
With your shown samples and in GNU awk, please try following GNU awk code using its match function. Where id is an awk variable has value which you want to make sure should be checked in lines of your Input_file. Also I have used exit to get/print the very first match and get out of program to save some time/cycle, in case you have more than one matches then simply remove it from following code.
awk -v RS= -v id="6" '
match($0,/(\n|^)-- ([0-9]+)\.[^\n]*\n(select[^;]*;)/,arr) && arr[2]==id{
gsub(/\n/,"",arr[3])
print arr[3]
exit
}
' Input_file
One option with awk could be matching the start of the line with -- 6. where 6 is the ID.
Then move to the next line, and set a variable that the start of the part that you want to match is seen
Then print all lines that do not start with a space and are seen.
Set seen to 0 when encountering an "empty" line
Concatenate the lines that you want in the output as a single line, and at the end remove the trailing space.
gawk -v ID=6 '
match($0, "^-- "ID"\\.") {
seen=1
next
}
/^[[:space:]]*$/ {
seen=0
}
seen {
a = a $0 " "
}
END {
sub(/ $/, "", a)
print a
}
' file.sql
Or as a single line
gawk -v ID=6 'match($0,"^-- "ID"\\."){seen=1;next};/^[[:space:]]*$/{seen=0};seen{a=a$0" "};END{sub(/ $/,"",a);print a}' file.sql
Output
select distinct(native_country) from census where native_country like '%land%';
Another option with gnu awk setting the row separator to an "empty" line and using a regex with a capture group to match all lines after the initial -- ID match that do not start with a space
gawk -v ID=6 '
match($0, "\\n-- "ID"\\.[^\\n]*\\n(([^[:space:]][^\\n]*(\\n|$))*)", m) {
gsub(/\n/, " ", m[1])
print m[1]
}
' RS='^[[:space:]]*$' file

Foxpro String Variable combination in Forloop

As in title, there is an error in my first code in FOR loop: Command contains unrecognized phrase. I am thinking if the method string+variable is wrong.
ALTER TABLE table1 ADD COLUMN prod_n c(10)
ALTER TABLE table1 ADD COLUMN prm1 n(19,2)
ALTER TABLE table1 ADD COLUMN rbon1 n(19,2)
ALTER TABLE table1 ADD COLUMN total1 n(19,2)
There are prm2... until total5, in which the numbers represent the month.
FOR i=1 TO 5
REPLACE ALL prm+i WITH amount FOR LEFT(ALLTRIM(a),1)="P" AND
batch_mth = i
REPLACE ALL rbon+i WITH amount FOR LEFT(ALLTRIM(a),1)="R"
AND batch_mth = i
REPLACE ALL total+i WITH sum((prm+i)+(rbon+i)) FOR batch_mth = i
NEXT
ENDFOR
Thanks for the help.
There are a number of things wrong with the code you posted above. Cetin has mentioned a number of them, so I apologize if I duplicate some of them.
PROBLEM 1 - in your ALTER TABLE commands I do not see where you create fields prm2, prm3, prm4, prm5, rbon2, rbon3, etc.
And yet your FOR LOOP would be trying to write to those fields as the FOR LOOP expression i increases from 1 to 5 - if the other parts of your code was correct.
PROBLEM 2 - You cannot concatenate a String to an Integer so as to create a Field Name like you attempt to do with prm+i or rbon+1
Cetin's code suggestions would work (again as long as you had the #2, #3, etc. fields defined). However in Foxpro and Visual Foxpro you can generally do a task in a variety of ways.
Personally, for readability I'd approach your FOR LOOP like this:
FOR i=1 TO 5
* --- Keep in mind that unless fields #2, #3, #4, & #5 are defined ---
* --- The following will Fail ---
cFld1 = "prm" + STR(i,1) && define the 1st field
cFld2 = "rbon" + STR(i,1) && define the 2nd field
cFld3 = "total" + STR(i,1) && define the 3rd field
REPLACE ALL &cFld1 WITH amount ;
FOR LEFT(ALLTRIM(a),1)="P" AND batch_mth = i
REPLACE ALL &cFld2 WITH amount ;
FOR LEFT(ALLTRIM(a),1)="R" AND batch_mth = i
REPLACE ALL &cFld3 WITH sum((prm+i)+(rbon+i)) ;
FOR batch_mth = i
NEXT
NOTE - it might be good if you would learn to use VFP's Debug tools so that you can examine your code execution line-by-line in the VFP Development mode. And you can also use it to examine the variable values.
Breakpoints are good, but you have to already have the TRACE WINDOW open for the Break to work.
SET STEP ON is the Debug command that I generally use so that program execution will stop and AUTOMATICALLY open the TRACE WINDOW for looking at code execution and/or variable values.
Do you mean you have fields named prm1, prm2, prm3 ... prm12 that represent the months and you want to update them in a loop? If so, you need to understand that a "fieldName" is a "name" and thus you need to use a "name expression" to use it as a variable. That is:
prm+i
would NOT work but:
( 'pro'+ ltrim(str(m.i)) )
would.
For example here is your code revised:
For i=1 To 5
Replace All ('prm'+Ltrim(Str(m.i))) With amount For Left(Alltrim(a),1)="P" And batch_mth = m.i
Replace All ('rbon'+Ltrim(Str(m.i))) With amount For Left(Alltrim(a),1)="R" And batch_mth = m.i
* ????????? REPLACE ALL ('total'+Ltrim(Str(m.i))) WITH sum((prm+i)+(rbon+i)) FOR batch_mth = i
Endfor
However, I must admit, your code doesn't make sense to me. Maybe it would be better if you explained what you are trying to do and give some simple data with the result you expect (as code - you can use FAQ 50 on foxite to create code for data).

Apache pig Store based on condition

I'm reading from a csv file and after grouping those datas I'm doing a count operation . Is there any way to store the datas into a folder name bad if the count is 0 and to good if the count is > 0 . I tried with the below code but it is not happening .
CODE :
STORE countVal INTO '/user/cloudera/good' IF countVal > 0 ;
USE function SPLIT. Refer :
https://pig.apache.org/docs/r0.7.0/piglatin_ref2.html#SPLIT
SPLIT A INTO X IF f1<7, Y IF f2==5, Z IF (f3<6 OR f3>6);
There are a couple of ways this :
1)Use the split function to perform the split based on the criteria.
SPLIT data into good if count>0, bad if (count==0);
2)Use a FOREACH loop to separate the data based on a criteria, using a BinCond operator.
X = FOREACH A GENERATE , data, (count>0?"good":"bad");

Perl: for (min .. max) uses random order, but I want it in order 0,1,2,

As I am a total beginner to perl, oracle sql and everything else. I have to write a script to parse an excel file and write the values into an oracle sql database.
Everything is good so far. But it writes the rows in random order into the database.
for ($row_min .. $row_max) {...insert into db code $sheetValues[$_][col0] etc...}
I don't get it why the rows are inserted in a random order?
And obviously how can I get them in order? excel_row 0 => db_row 0 and so on...
The values in the array are in order! The number of rows is dynamic.
Thanks for your help, I hope you got all the information you need.
Edit:
&parseWrite;
sub parseWrite {
my #sheetValues;
my $worksheet = $workbook->worksheet(0);
my ($row_min, $row_max) = $worksheet->row_range();
print "| Zeile $row_min bis $row_max |";
my ($col_min, $col_max) = $worksheet->col_range();
print " Spalte $col_min bis $col_max |<br>";
for my $row ($row_min .. $row_max) {
for my $col ($col_min .. $col_max) {
my $cell = $worksheet->get_cell ($row,$col);
next unless $cell;
$sheetValues[$row][$col] = $cell->value();
print $sheetValues[$row][$col] .
"(".$row."," .$col.")"."<br>";
}
}
for ($row_min .. $row_max) {
my $sql="INSERT INTO t_excel (
a,b,c,d,e
) VALUES (
'$sheetValues[$_][0 ]',
'$sheetValues[$_][1 ]',
'$sheetValues[$_][2 ]',
'$sheetValues[$_][3 ]',
'$sheetValues[$_][4 ]',
'$sheetValues[$_][5 ]'
)";
$dbh->do($sql);
}
}
With in order I mean that my PL/SQL Developer 8.0.3 (given by my company)
shows with SELECT * FROM t_excel;
pic
But shell = (2,0), maggie = (0,0) and 13 = (1,0) in the array.
The rows are being inserted in the order you expect. I believe the mistaken assumption here is that SELECT will return rows in the same order they're inserted. This is not true. While implementations may make it seem like it does, SELECT has no default order. You're thinking a table is basically like a big list, INSERT is adding to the end of it, and SELECT just iterates through it. That's not a bad approximation, but it can lead you to make bad assumptions. The reality is that you can say little for sure about how a table is stored.
SQL is a declarative language which means you tell the computer what you want. This is different from a most other language types where you tell the computer what to do. SELECT * FROM sometable says "give me all the rows and all their columns in the table". Since you didn't give an order, the database can return them in whatever order it likes. Contrast with the procedural meaning which would be "iterate through all the rows in the table" as if the table was some sort of list.
Most languages encourage you to take advantage of how data is stored. Declarative languages prevent you from knowing how data is stored.
If you want your SELECT to be ordered, you have to give it an ORDER BY.

Pig Latin - Extracting fields meeting two different filter criteria from chararray line and grouping in a bag

I am new to Pig Latin.
I want to extract all lines that match a filter criteria (have a word "line_token" ) from log files and then from these matching lines extract two different fields meeting two separate field match criteria . Since the lines aren't structured well I am loading them as a char array.
When I try to run the following code - I get an error
"Invalid resource schema: bag schema must have tuple as its field"
I have tried to perform an explicit cast to a tuple but that does not work
input_lines = LOAD '/inputdir/' AS ( line:chararray);
filtered_lines = FILTER input_lines BY (line MATCHES '.*line_token1.*' );
tokenized_lines = FOREACH filtered_lines GENERATE FLATTEN(TOKENIZE(line)) AS tok_line;
my_wordbag = FOREACH tokenized_lines {
word1 = FILTER tok_line BY ( $0 MATCHES '.*word_token1.*' ) ;
word2 = FILTER tok_line BY ( $0 MATCHES '.*word_token1.*' ) ;
GENERATE word1 , word2 as my_tuple ;
-- I also tried --> GENERATE (word1 , word2) as my_tuple ;
}
dump my_wordbag;
I suppose I am taking a very wrong approach.
Please note - my logs aren't structured well - so I cant mend the way I load
Post loading and initial filtering for lines of interest ( which is straightforward) - I guess I need to do something different rather than tokenize line and iterate through fields trying to find fields.
Or maybe I should use joins ?
Also if I know the structure of line beforehand well as all text fields, then will loading it differently ( not as a chararray) make it an easier problem ?
For now I made a compromise - I added a extra filter clause in my original - line filter and settled for picking just one field from line. When I get back to it I will try with joins and post that code ... - here's my working code that gets me a useful output - but not all that I want.
-- read input lines from poorly structured log
input_lines = LOAD '/log-in-dir-in-hdfs' AS ( line:chararray) ;
-- Filter for line filter criteria and date interested in passed as arg
filtered_lines = FILTER input_lines BY (
( line MATCHES '.*line_filter1*' )
AND ( line MATCHES '.*line_filter2.*' )
AND ( line MATCHES '.*$forDate.*' )
) ;
-- Tokenize every line
tok_lines = FOREACH filtered_lines
GENERATE TOKENIZE(line) AS tok_line;
-- Pick up specific field frm tokenized line based on column filter criteria
fnames = FOREACH tok_lines {
fname = FILTER tok_line BY ( $0 MATCHES '.*field_selection.*' ) ;
GENERATE FLATTEN(fname) as nnfname;
}
-- Count occurances of that field and store it with field name
-- My original intent is to store another field name as well
-- I will do that once I figure how to put both of them in a tuple
flgroup = FOREACH fnames
GENERATE FLATTEN(TOKENIZE((chararray)$0)) as cfname;
grpfnames = group flgroup by cfname;
readcounts = FOREACH grpfnames GENERATE COUNT(flgroup), group ;
STORE readcounts INTO '/out-dir-in-hdfs';
As I understand, after the FLATTEN operation, you have single line (tok_line) in each row and you want to extract 2 words from each line. REGEX_EXTRACT will help you achieve this. I'm not a REGEX expert so will leave writing the REGEX part up to you.
data = FOREACH tokenized_lines
GENERATE
REGEX_EXTRACT(tok_line, <first word regex goes here>) as firstWord,
REGEX_EXTRACT(tok_line, <second word regex goes here>) as secondWord;
I hope this helps.
You must refer to the alias, not the column.
So:
word1 = FILTER tokenized_lines BY ( $0 MATCHES '.*word_token1.*' ) ;
word1 and word2 are going to be aliases as well, not columns.
How do you need the output to look like?