delete records from the file using jython - jython

How to delete n lines at top=2 and bottom=1 using jython which is saved in sample.txt. My filesize might be MB/GB.
sample.txt contains below lines
1,a
2,b
3,c
4,d
5,e
6,f
Expected Output :
3,c
4,d
5,e

Related

How to display the date of each file as the first element of each lines with bash/awk?

I have 7 txt files which are the output of the df -m command on AIX 7.2.
I need to keep only the first column and the second column for one filesystem. So I do that :
cat *.txt | grep hd4 | awk '{print $1","$2}' > test1.txt
And the output is :
/dev/hd4,384.00
/dev/hd4,394.00
/dev/hd4,354.00
/dev/hd4,384.00
/dev/hd4,484.00
/dev/hd4,324.00
/dev/hd4,384.00
Each files are created from the crontab and their filenames are :
df_command-2019-09-03-12:50:00.txt
df_command-2019-08-28-12:59:00.txt
df_command-2019-08-29-12:51:00.txt
df_command-2019-08-30-12:52:00.txt
df_command-2019-08-31-12:53:00.txt
df_command-2019-09-01-12:54:00.txt
df_command-2019-09-02-12:55:00.txt
I would like to keep only the date on the filename, I'm able to do that :
test=df_command-2019-09-03-12:50:00.txt
echo $test | cut -d'-' -f2,3,4
outout :
2019-09-03
But I would like to put each date as the first element of each line of my test1.txt :
2019-08-28,/dev/hd4,384.00
2019-08-29,/dev/hd4,394.00
2019-08-30,/dev/hd4,354.00
2019-08-31,/dev/hd4,384.00
2019-09-01,/dev/hd4,484.00
2019-09-02,/dev/hd4,324.00
2019-09-03,/dev/hd4,384.00
Do you have any idea to do that ?
This awk may do:
awk '/hd4/ {split(FILENAME,a,"-");print a[2]"-"a[3]"-"a[4]","$1","$2}' *.txt > test1.txt
/hd4/ find line with hd4
split(FILENAME,a,"-") splits the filename in to array a split by -
print a[2]"-"a[3]"-"a[4]","$1","$2 print year-month-date, field 1, field 2
> test1.txt to file test1.txt
Date output file : dates.txt
2019-08-20
2019-08-08
2019-08-01
File system data fsys.txt
/dev/hd4,384.00
/dev/hd4,394.00
/dev/hd4,354.00
paste can be used to append the files as columns. Use -d to specify comma as the separator.
paste -d ',' dates.txt fsys.txt

line count with in the text files having multiple lines and single lines

i am using UTL_FILE utility in oracle to get the data in to csv file. here i am using the script.
so i am getting the set of text files
case:1
sample of output in the test1.csv file is
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
now i am counting the number of records in the test1.csv by using linux commans as
egrep -c "^\"[0-9]" test1.csv
here i am getting the record count as
2 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
case:2
sample of output in the test2.csv file is
"sno","name","p"
"","",""
"","","ramesh is in USA"
"","",""
now i am counting the number of records in the test2.csv by using linux commans as
egrep -c "^\"[0-9]" test2.csv
here i am getting the record count as
0 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
can any body help me how to count the exact lines in case:1 and case:2 using the single command
thanks in advance.
Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.
#!/usr/bin/perl -w
open(FH, $ARGV[0]) or die "Failed to open file";
# Get coloms from HEADER and use it to contruct regex
my $head = <FH>;
my #col = split(",", $head); # Colums array
my $col_cnt = scalar(#col); # Colums count
# Read rest of the rows
my $rows;
while(<FH>) {
$rows .= $_;
}
# Create regex based on number of coloms
# E.g for 3 coloms, regex should be
# ".*?",".*?",".*?"
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", #col);
# /s to treat the data as single line
# /g for global matching
my #row_cnt = $rows =~ m/($regex)/sg;
print "Row count:" . scalar(#row_cnt);
Just store it as row_count.pl and run it as ./row_count.pl filename
egrep -c test1.csv doesn't have a search term to match for, so it's going to try to use test1.csv as the regular expression it tries to search for. I have no idea how you managed to get it to return 2 for your first example.
A useable egrep command that will actually produce the number of records in the files is egrep '"[[:digit:]]*"' test1.csv assuming your examples are actually accurate.
timp#helez:~/tmp$ cat test.txt
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test.txt
2
timp#helez:~/tmp$ cat test2.txt
"sno","name"
"1","hari is in singapore"
"2","ramesh is in USA"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test2.txt
2
Alternatively you might do better to add an extra value to your SELECT statement. Something like SELECT 'recmatch.,.,',sno,name FROM TABLE; instead of SELECT sno,name FROM TABLE; and then grep for recmatch.,., though that's something of a hack.
In your second example your lines do not start with " followed by a number. That's why count is 0. You can try egrep -c "^\"([0-9]|\")" to catch empty first column values. But in fact it might be simpler to count all lines and remove 1 because of the header row.
e.g.
count=$(( $(wc -l test.csv) - 1 ))

counting the lines in the text file having different types [duplicate]

i am using UTL_FILE utility in oracle to get the data in to csv file. here i am using the script.
so i am getting the set of text files
case:1
sample of output in the test1.csv file is
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
now i am counting the number of records in the test1.csv by using linux commans as
egrep -c "^\"[0-9]" test1.csv
here i am getting the record count as
2 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
case:2
sample of output in the test2.csv file is
"sno","name","p"
"","",""
"","","ramesh is in USA"
"","",""
now i am counting the number of records in the test2.csv by using linux commans as
egrep -c "^\"[0-9]" test2.csv
here i am getting the record count as
0 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
can any body help me how to count the exact lines in case:1 and case:2 using the single command
thanks in advance.
Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.
#!/usr/bin/perl -w
open(FH, $ARGV[0]) or die "Failed to open file";
# Get coloms from HEADER and use it to contruct regex
my $head = <FH>;
my #col = split(",", $head); # Colums array
my $col_cnt = scalar(#col); # Colums count
# Read rest of the rows
my $rows;
while(<FH>) {
$rows .= $_;
}
# Create regex based on number of coloms
# E.g for 3 coloms, regex should be
# ".*?",".*?",".*?"
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", #col);
# /s to treat the data as single line
# /g for global matching
my #row_cnt = $rows =~ m/($regex)/sg;
print "Row count:" . scalar(#row_cnt);
Just store it as row_count.pl and run it as ./row_count.pl filename
egrep -c test1.csv doesn't have a search term to match for, so it's going to try to use test1.csv as the regular expression it tries to search for. I have no idea how you managed to get it to return 2 for your first example.
A useable egrep command that will actually produce the number of records in the files is egrep '"[[:digit:]]*"' test1.csv assuming your examples are actually accurate.
timp#helez:~/tmp$ cat test.txt
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test.txt
2
timp#helez:~/tmp$ cat test2.txt
"sno","name"
"1","hari is in singapore"
"2","ramesh is in USA"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test2.txt
2
Alternatively you might do better to add an extra value to your SELECT statement. Something like SELECT 'recmatch.,.,',sno,name FROM TABLE; instead of SELECT sno,name FROM TABLE; and then grep for recmatch.,., though that's something of a hack.
In your second example your lines do not start with " followed by a number. That's why count is 0. You can try egrep -c "^\"([0-9]|\")" to catch empty first column values. But in fact it might be simpler to count all lines and remove 1 because of the header row.
e.g.
count=$(( $(wc -l test.csv) - 1 ))

Manipulating the awk output depending on the number of occurrences

I don't know how to word it well. I have an input file with the first column of each row being the index. I need to convert this input file into multi-columned output file so that starting indexes of each such columns match.
I have an input file in the following format:
1 11.32 12.55
1 13.32 17.55
1 56.77 33.22
2 34.22 1.112
3 12.13 13.14
3 12.55 34.55
3 22.44 12.33
3 44.32 77.44
The expected output should be:
1 11.32 12.55 2 34.22 1.112 3 12.13 13.14
1 13.32 17.55 3 12.55 34.55
1 56.77 33.22 3 22.44 12.33
3 44.32 77.44
Is there an easy way I can do this in awk?
Something like this, in bash:
paste <(grep '^1 ' input.txt) <(grep '^2 ' input.txt) <(grep '^3 ' input.txt)
paste has an option to set the delimiter if you don't want the default tab characters used, or you could post-process the tabs with expand...
EDIT: For an input file with many more tags, you could take this sort of approach:
awk '{print > "/tmp/output" $1 ".txt"}' input.txt
paste /tmp/output*.txt > final-output.txt
The awk line outputs each line to a file named after the first field of the line, then paste recombines them.
EDIT: as pointed out in a comment below, you might have issues if you end up with more than 9 intermediate files. One way around that would be something like this:
paste /tmp/output[0-9].txt /tmp/output[0-9][0-9].txt > final-output.txt
Add additional arguments as needed if you have more than 99 files... or more than 999... If that's the case, though, a python or perl solution might be a better route...
If all you need is independently running columns (without trying to line up matching items between the columns or anything like that) then the simplest solution might be something like:
awk '{print > $1".OUT"}' FILE; paste 1.OUT 2.OUT 3.OUT
The only issue with that is it won't fill in missing columns so you will need to fill those in yourself to line up your columns.
If the column width is known in advance (and the same for every column) then using:
paste 1.OUT 2.OUT 3.OUT | sed -e 's/^\t/ \t/;s/\t\t/\t \t/'
where those spaces are the width of the column should get you what you want. I feel like there should be a way to do this in a more automated fashion but can't think of one offhand.

awk: how to delete first and last value on entire column

I have a data that is comprised of several columns. On one column I would like to delete two commas that are each located in beginning and the end of entire column. My data looks something like this:
a ,3,4,3,2,
b ,3,4,5,1,
c ,1,5,2,4,5,
d ,3,6,24,62,3,54,
Can someone teach me how to delete the first and last commas on this data? I would appreciate it.
$ awk '{gsub(/^,|,$/,"",$NF)}1' file
a 3,4,3,2
b 3,4,5,1
c 1,5,2,4,5
d 3,6,24,62,3,54
awk '{sub(/,/,"",$0); print substr($0,0,length($0)-1)}' input.txt
Output:
a 3,4,3,2,
b 3,4,5,1,
c 1,5,2,4,5,
d 3,6,24,62,3,54
You can do it with sed too:
sed -e 's/,//' -e 's/,$//' file
That says "substitue the first comma on the line with nothing" and then "substitute a comma followed by end of line with nothing".
If you want it to write a new file, do this:
sed -e 's/,//' -e 's/,$//' file > newfile.txt