line count with in the text files having multiple lines and single lines - sql

i am using UTL_FILE utility in oracle to get the data in to csv file. here i am using the script.
so i am getting the set of text files
case:1
sample of output in the test1.csv file is
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
now i am counting the number of records in the test1.csv by using linux commans as
egrep -c "^\"[0-9]" test1.csv
here i am getting the record count as
2 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
case:2
sample of output in the test2.csv file is
"sno","name","p"
"","",""
"","","ramesh is in USA"
"","",""
now i am counting the number of records in the test2.csv by using linux commans as
egrep -c "^\"[0-9]" test2.csv
here i am getting the record count as
0 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
can any body help me how to count the exact lines in case:1 and case:2 using the single command
thanks in advance.

Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.
#!/usr/bin/perl -w
open(FH, $ARGV[0]) or die "Failed to open file";
# Get coloms from HEADER and use it to contruct regex
my $head = <FH>;
my #col = split(",", $head); # Colums array
my $col_cnt = scalar(#col); # Colums count
# Read rest of the rows
my $rows;
while(<FH>) {
$rows .= $_;
}
# Create regex based on number of coloms
# E.g for 3 coloms, regex should be
# ".*?",".*?",".*?"
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", #col);
# /s to treat the data as single line
# /g for global matching
my #row_cnt = $rows =~ m/($regex)/sg;
print "Row count:" . scalar(#row_cnt);
Just store it as row_count.pl and run it as ./row_count.pl filename

egrep -c test1.csv doesn't have a search term to match for, so it's going to try to use test1.csv as the regular expression it tries to search for. I have no idea how you managed to get it to return 2 for your first example.
A useable egrep command that will actually produce the number of records in the files is egrep '"[[:digit:]]*"' test1.csv assuming your examples are actually accurate.
timp#helez:~/tmp$ cat test.txt
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test.txt
2
timp#helez:~/tmp$ cat test2.txt
"sno","name"
"1","hari is in singapore"
"2","ramesh is in USA"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test2.txt
2
Alternatively you might do better to add an extra value to your SELECT statement. Something like SELECT 'recmatch.,.,',sno,name FROM TABLE; instead of SELECT sno,name FROM TABLE; and then grep for recmatch.,., though that's something of a hack.

In your second example your lines do not start with " followed by a number. That's why count is 0. You can try egrep -c "^\"([0-9]|\")" to catch empty first column values. But in fact it might be simpler to count all lines and remove 1 because of the header row.
e.g.
count=$(( $(wc -l test.csv) - 1 ))

Related

Sed/Awk: how to find and remove two lines if a pattern in the first line is being repeated; bash

I am processing text file(s) with thousands of records per file. Each record is made up of two lines: a header that starts with ">" and followed by a line with a long string of characters "-AGTCNR". The header has 10 fields separated by "|" whose first field is a unique identifier to each record e.g ">KEN096-15" and a record is termed duplicate if it has same identifier. Here is how a simple record look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTAGGAAATTGATTAGTACCTTTAATATT----CCGAAT---
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTTTCCT----TAAATAAT-----
Now I am triying to delete repeats, like duplicate records of "ACRJP458-10" and "PMANL2431-12".
Using a bash script I have extracted the unique identifiers and stored repeated ones in a variable "$duplicate_headers". Currently, I am trying to find any repeated instances of their two-line records and deleting them as follows:
for i in "$#"
do
unset duplicate_headers
duplicate_headers=`grep ">" $1 | awk 'BEGIN { FS="|"}; {print $1 "\n"; }' | sort | uniq -d`
for header in `echo -e "${duplicate_headers}"`
do
sed -i "/^.*\b${header}\b.*$/,+1 2d" $i
#sed -i "s/^.*\b${header}\b.*$//,+1 2g" $i
#sed -i "/^.*\b${header}\b.*$/{$!N; s/.*//2g; }" $i
done
done
The final result (with thousands of records in mind) will look like:
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
$ awk -F'[|]' 'NR%2{f=seen[$1]++} !f' file
>ACML500-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_-2
----TAAGATTTTGACTTCTTCCCCCATCATCAAGAAGAATTGT-------
>ACRJP458-10|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----------TCCCTTTAATACTAGGAGCCCCTGACATAGCCTTTCCTAAATAAT-----
>ASILO303-17|Dip|gs-Par|sp-Par vid|subsp-NA|co
-------TAAGATTCTGATTACTCCCCCCCTCTCTAACTCTTCTTCTTCTATAGTAGATG
>ASILO326-17|Dip|gs-Goe|sp-Goe par|subsp-NA|c
TAAGATTTTGATTATTACCCCCTTCATTAACCAGGAACAGGATGA---------------
>CLT100-09|Lep|gs-Col|sp-Col elg|subsp-NA|co-Buru
AACATTATATTTGGAATTT-------GATCAGGAATAGTCGGAACTTCTCTGAA------
>PMANL2431-12|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_
----ATGCCTATTATAATTGGAGGATTTGGAAAACCTTTAATATT----CCGAAT
>STBOD057-09|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
ATCTAATATTGCACATAGAGGAACCTCNGTATTTTTTCTCTCCATCT------TTAG
>TBBUT582-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
-----CCCCCTCATTAACATTACTAAGTTGAAAATGGAGCAGGAACAGGATGA
>TBBUT583-11|Lep|gs-NA|sp-NA|subsp-NA|co-Buru|site-NA|lat_N
TAAGATTTTGACTCATTAA----------------AATGGAGCAGGAACAGGATGA
>AFBTB001-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGCTCCATCC-------------TAGAAAGAGGGG---------GGGTGA
>AFBTB003-09|Col|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
TAAGATTTTGACTTCTGC------CATGAGAAAGA-------------AGGGTGA
>AFBTB002-09|Cole|gs-NA|sp-NA|subsp-NA|co-Ethi|site-NA|lat_N
-------TCTTCTGCTCAT-------GGGGCAGGAACAGGG----------TGA
To run it on multiple files at once would be this to remove duplicates across all files:
awk -F'[|]' 'FNR%2{f=seen[$1]++} !f' *
or this to only remove duplicates within each file:
awk -F'[|]' 'FNR==1{delete seen} FNR%2{f=seen[$1]++} !f' *

What did I do wrong? Not sorting properly with awk

Hi so basically I have a 'temp' text file that I'm using that has a long list of various email addresses (some repeats). What I'm trying to output is the email addresses in order of highest frequency and then the total number of unique email addresses at the end.
awk '{printf "%s %s\n", $2, $1} END {print "total "NR}' temp | sort -n | uniq -c -i
So far I got the output I wanted except for the fact that it's not ordered in terms of highest frequency. Instead, it's in alphabetical order.
I've been stuck on this for a few hours now and have no idea why. I know I probably did something wrong but I'm not sure. Please let me know if you need more information and if the code I provided was not the problem. Thank you in advance.
edit: I've also tried doing sort -nk1 (output has frequency in first column) and even -nk2
edit2: Here is a sample of my 'temp' file
aol.com
netscape.net
yahoo.com
yahoo.com
adelphia.net
twcny.rr.com
charter.net
yahoo.com
edit 3:
expected output:
33 aol.com
24 netscape.net
18 yahoo.com
5 adelphia.net
4 twcny.rr.com
3 charter.net
total 6
(no repeat emails, 6 total unique email addresses)
Sample input modified to include an email with two instances
$ cat ip.txt
aol.com
netscape.net
yahoo.com
yahoo.com
adelphia.net
twcny.rr.com
netscape.net
charter.net
yahoo.com
Using perl
$ perl -lne '
$c++ if !$h{$_}++;
END
{
#k = sort { $h{$b} <=> $h{$a} } keys %h;
print "$h{$_} $_" foreach (#k);
print "total ", $c;
}' ip.txt
3 yahoo.com
2 netscape.net
1 adelphia.net
1 charter.net
1 aol.com
1 twcny.rr.com
total 6
$c++ if !$h{$_}++ increment counter for unique input lines, increment hash value with input line as key. Default initial value is 0 for both
After processing all input lines:
#k = sort { $h{$b} <=> $h{$a} } keys %h get keys sorted by descending numeric values of hash
print "$h{$_} $_" foreach (#k) print each hash value and key based on sorted keys #k
print "total ", $c print total unique lines
Can be written in single line if preferred:
perl -lne '$c++ if !$h{$_}++; END{#k = sort { $h{$b} <=> $h{$a} } keys %h; print "$h{$_} $_" foreach (#k); print "total ", $c}' ip.txt
Reference: How to sort perl hash on values and order the keys correspondingly
In Gnu awk using #Sundeep's data:
$ cat program.awk
{ a[$0]++ } # count domains
END {
PROCINFO["sorted_in"]="#val_num_desc" # sort in desc order in for loop
for(i in a) { # this for in desc order
print a[i], i
j++ # count total
}
print "total", j
}
Run it:
$ awk -f program.awk ip.txt
3 yahoo.com
2 netscape.net
1 twcny.rr.com
1 aol.com
1 adelphia.net
1 charter.net
total 6
Updated / Summary
Summarising a few tested approaches here for this handy sorting tool:
Using bash (In my case v4.3.46)
sortedfile="$(sort temp)" ; countedfile="$(uniq -c <<< "$sortedfile")" ; uniquefile="$(sort -rn <<< "$countedfile")" ; totalunique="$(wc -l <<< "$uniquefile")" ; echo -e "$uniquefile\nTotal: $totalunique"
Using sh/ash/busybox (Though they aren't all the same, they all worked the same for these tests)
time (sort temp > /tmp/sortedfile ; uniq -c /tmp/sortedfile > /tmp/countedfile ; sort -rn /tmp/countedfile > /tmp/uniquefile ; totalunique="$(cat /tmp/uniquefile | wc -l)" ; cat /tmp/uniquefile ; echo "Total: $totalunique")
Using perl (see this answer https://stackoverflow.com/a/40145395/3544399)
perl -lne '$c++ if !$h{$_}++; END{#k = sort { $h{$b} <=> $h{$a} } keys %h; print "$h{$_} $_" foreach (#k); print "Total: ", $c}' temp
What was tested
A file temp was created using a random generator:
#domain.com was different in the unique addresses
Duplicated addresses were scattered
File had 55304 total addresses
File has 17012 duplicate addresses
A small sample of the file looks like this:
24187#9674.com
29397#13000.com
18398#27118.com
23889#7053.com
24501#7413.com
9102#4788.com
16218#20729.com
991#21800.com
4718#19033.com
22504#28021.com
Performance:
For the sake of completeness it's worth mentioning the performance;
perl: sh: bash:
Total: 17012 Total: 17012 Total: 17012
real 0m0.119s real 0m0.838s real 0m0.973s
user 0m0.061s user 0m0.772s user 0m0.894s
sys 0m0.027s sys 0m0.025s sys 0m0.056s
Original Answer (Counted total addresses and not unique addresses):
tcount="$(cat temp | wc -l)" ; sort temp | uniq -c -i | sort -rn ; echo "Total: $tcount"
tcount="$(cat temp | wc -l)": Make Variable with line count
sort temp: Group email addresses ready for uniq
uniq -c -i: Count occurrences allowing for case variation
sort -rn: Sort according to numerical occurrences and reverse the order (highest on top)
echo "Total: $tcount": Show the total addresses at the bottom
Sample temp file:
john#domain.com
john#domain.com
donald#domain.com
john#domain.com
sam#domain.com
sam#domain.com
bill#domain.com
john#domain.com
larry#domain.com
sam#domain.com
larry#domain.com
larry#domain.com
john#domain.com
Sample Output:
5 john#domain.com
3 sam#domain.com
3 larry#domain.com
1 donald#domain.com
1 bill#domain.com
Total: 13
Edit: See comments below regarding use of sort

How to compare two strings of a file match the strings of another file using AWK?

I possess 2 huge files and I need to count how many entries of file 1 exist on file 2.
The file 1 contains two ids, source and destination, like below:
11111111111111|22222222222222
33333333333333|44444444444444
55555555555555|66666666666666
11111111111111|44444444444444
77777777777777|22222222222222
44444444444444|00000000000000
12121212121212|77777777777777
01010101010101|01230123012301
77777777777777|97697697697697
66666666666666|12121212121212
The file 2 contains the valid id list, which will be used to filter file 1:
11111111111111
22222222222222
44444444444444
77777777777777
00000000000000
88888888888888
66666666666666
99999999999999
12121212121212
01010101010101
What I am struggling to achieve is find a way to count how many entries in file one possess the entry in file 2. Only when both numbers in the same line
exist in file 2 will the line be counted.
On file 2:
11111111111111|22222222222222 — This will be counted because both entries exist on file 2, as well as 77777777777777|22222222222222 because both entries exist on file 2.
33333333333333|44444444444444 — This will not be counted because 33333333333333 does not exist on file 2 and the same goes to 55555555555555|66666666666666, the first does not exist on file 2.
So in the examples I mentioned in the beginning it should count 6, and printing this should be enough, better than editing one file.
awk -F'|' 'FNR == NR { seen[$0] = 1; next }
seen[$1] && seen[$2] { ++count }
END { print count }' file2 file1
Explanation:
1) FNR == NR (number of record in current file equals number of record) is only true for the first input file, which is file2 (the order is important!). Thus for every line of file2, we record the number in seen.
2) For other lines (which is file1, given second on the command line) if the |-separated fields (-F'|') number 1 and 2 were both seen (in file2), we increment count by one.
3) In the END output the count.
Caveat: Every unique number in file2 is loaded into memory. But this also makes it fast instead of having to read through file2 over and over again.
Don't know how to do it in awk but if you are open to a quick-and-dirty bash script that someone can help make efficient, you could try this:
searcher.sh
-------------
#!/bin/bash
file1="$1"
file2="$2"
-- split by pipe
while IFS='|' read -ra line; do
-- find 1st item in file2. If found, find 2nd item in file2
grep -q ${line[0]} "$file2"
if [ $? -eq 0 ]; then
grep -q ${line[1]} "$file2"
if [ $? -eq 0 ]; then
-- print line since both items were found in file2
echo "${line[0]}|${line[1]}"
fi
fi
done < "$file1"
Usage
------
bash searcher.sh file1 file2
Result using your example
--------------------------
$ time bash searcher.sh file1 file2
11111111111111 | 22222222222222
11111111111111 | 44444444444444
77777777777777 | 22222222222222
44444444444444 | 00000000000000
12121212121212 | 77777777777777
66666666666666 | 12121212121212
real 0m1.453s
user 0m0.423s
sys 0m0.627s
That's really slow on my old PC.

counting the lines in the text file having different types [duplicate]

i am using UTL_FILE utility in oracle to get the data in to csv file. here i am using the script.
so i am getting the set of text files
case:1
sample of output in the test1.csv file is
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
now i am counting the number of records in the test1.csv by using linux commans as
egrep -c "^\"[0-9]" test1.csv
here i am getting the record count as
2 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
case:2
sample of output in the test2.csv file is
"sno","name","p"
"","",""
"","","ramesh is in USA"
"","",""
now i am counting the number of records in the test2.csv by using linux commans as
egrep -c "^\"[0-9]" test2.csv
here i am getting the record count as
0 (ACCORDING TO LINUX)
but if i calculate the number of records by using select * from test;
COUNT(*)
---------- (ACCORDING TO DATA BASE)
2
can any body help me how to count the exact lines in case:1 and case:2 using the single command
thanks in advance.
Columns in both case is different. To make it generic I wrote a perl script which will print the rows. It generates the regex from headers and used it to calculate the rows. I assumed that first line always represents the number of columns.
#!/usr/bin/perl -w
open(FH, $ARGV[0]) or die "Failed to open file";
# Get coloms from HEADER and use it to contruct regex
my $head = <FH>;
my #col = split(",", $head); # Colums array
my $col_cnt = scalar(#col); # Colums count
# Read rest of the rows
my $rows;
while(<FH>) {
$rows .= $_;
}
# Create regex based on number of coloms
# E.g for 3 coloms, regex should be
# ".*?",".*?",".*?"
# this represents anything between " and "
my $i=0;
while($i < $col_cnt) {
$col[$i++] = "\".*?\"";
}
my $regex = join(",", #col);
# /s to treat the data as single line
# /g for global matching
my #row_cnt = $rows =~ m/($regex)/sg;
print "Row count:" . scalar(#row_cnt);
Just store it as row_count.pl and run it as ./row_count.pl filename
egrep -c test1.csv doesn't have a search term to match for, so it's going to try to use test1.csv as the regular expression it tries to search for. I have no idea how you managed to get it to return 2 for your first example.
A useable egrep command that will actually produce the number of records in the files is egrep '"[[:digit:]]*"' test1.csv assuming your examples are actually accurate.
timp#helez:~/tmp$ cat test.txt
"sno","name"
"1","hari is in singapore
ramesh is in USA"
"2","pong is in chaina
chang is in malaysia
vilet is in uk"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test.txt
2
timp#helez:~/tmp$ cat test2.txt
"sno","name"
"1","hari is in singapore"
"2","ramesh is in USA"
timp#helez:~/tmp$ egrep -c '"[[:digit:]]*"' test2.txt
2
Alternatively you might do better to add an extra value to your SELECT statement. Something like SELECT 'recmatch.,.,',sno,name FROM TABLE; instead of SELECT sno,name FROM TABLE; and then grep for recmatch.,., though that's something of a hack.
In your second example your lines do not start with " followed by a number. That's why count is 0. You can try egrep -c "^\"([0-9]|\")" to catch empty first column values. But in fact it might be simpler to count all lines and remove 1 because of the header row.
e.g.
count=$(( $(wc -l test.csv) - 1 ))

how to parse a number from sql output

Hi I need to parse a number from sql output :
COUNT(*)
----------
924
140
173
583
940
77
6 rows selected.
if the the fisrt line is less then 10 I want to create a empty file,
The problem is I dont know how to parse it, the numbers are still changing (from 0 to ca. 10 000 ) .
Question is very unclear so I'll make some assumptions. You'll get the output above from sql either to file or stdout and you would like to test of the first line containing digits is less than 10. Correct?
This is one way to do it.
sed -n '3p' log | awk '{ print ($1 < 10) ? "true" : "false" }'
sed is used to print the 3rd line from your example
this is then piped into awk which makes to comparison.
...or putting it together in bash
#!/bin/bash
while read variable;
do
if [[ "$variable" =~ ^[0-9]+$ ]]
then
break
fi
done < input
if [ "$variable" -lt 10 ]
then
echo 'less than 10'
# add your code here, eg
# touch /path/to/file/to/be/created
fi