extract a pattern if a number is higher than a specific integer using sed or awk - awk

I want to extract the lines that contain numbers which exceed a specific Integer for example if I have the following code
INTEGER ( 16 )
INTEGER ( 16 )
INTEGER ( 6 )
INTEGER ( 18 )
I want to keep only the lines that contain INTEGER (n <= 16), so I want to have as an output
INTEGER ( 16 )
INTEGER ( 16 )
INTEGER ( 6 )

If you can be sure that there are always spaces before and after the digits, then you could use this awk:
awk '$3 <= 16' file
This simply checks whether the third field is less than or equal to 16.
However, it might be safer to use something like this:
awk -F'[^0-9]+' '/INTEGER *\( *[0-9]+ *\)/ && $2 <= 16' file
This sets the field separator to any number of non-digit characters, so the first field is empty and the second field contains the digits you're interested in. If the line matches the pattern (which is flexible with respect to spacing) and the digits are less than or equal to 16, the line is printed.

Related

How do you detect blank lines in Fortran?

Given an input that looks like the following:
123
456
789
42
23
1337
3117
I want to iterate over this file in whitespace-separated chunks in Fortran (any version is fine). For example, let's say I wanted to take the average of each chunk (e.g. mean(123, 456, 789) then mean(42, 23, 1337) then mean(31337)).
I've tried iterating through the file normally (e.g. READ), reading in each line as a string and then converting to an int and doing whatever math I want to do on each chunk. The trouble here is that Fortran "helpfully" ignores blank lines in my text file - so when I try and compare against the empty string to check for the blank line, I never actually get a .True. on that comparison.
I feel like I'm missing something basic here, since this is a typical functionality in every other modern language, I'd be surprised if Fortran didn't somehow have it.
If you're using so-called "list-directed" input (format = '*'), Fortran does special handling to spaces, commas, and blank lines.
To your point, there's a feature which is using the BLANK keyword with read
read(iunit,'(i10)',blank="ZERO",err=1,end=2) array
You can set:
blank="ZERO" will return a valid zero value if a blank is found;
blank="NULL" is the default behavior that skips blank/returns an error depending on the input format.
If all your input values are positive, you could use blank="ZERO" and then use the location of zero values to process your data.
EDIT as #vladimir-f has correctly pointed out, you not only have blanks in between lines, but also after the end of the numbers in most lines, so this strategy will not work.
You can instead load everything into an array, and process it afterwards:
program array_with_blanks
integer :: ierr,num,iunit
integer, allocatable :: array(:)
open(newunit=iunit,file='stackoverflow',form='formatted',iostat=ierr)
allocate(array(0))
do
read(iunit,'(i10)',iostat=ierr) num
if (is_iostat_end(ierr)) then
exit
else
array = [array,num]
endif
end do
close(iunit)
print *, array
end program
Just read each line as a character (but note Francescalus's comment on the format). Then read the character as an internal file.
program stuff
implicit none
integer io, n, value, sum
character (len=1000) line
n = 0
sum = 0
io = 0
open( 42, file="stuff.txt" )
do while( io == 0 )
read( 42, "( a )", iostat = io ) line
if ( io /= 0 .or. line == "" ) then
if ( n > 0 ) print *, ( sum + 0.0 ) / n
n = 0
sum = 0
else
read( line, * ) value
n = n + 1
sum = sum + value
end if
end do
close( 42 )
end program stuff
456.000000
467.333344
3117.00000

How to extract just numeric value with REGEXP_EXTRACT in BigQuery?

I am trying to extract just the numbers from a particular column in BigQuery.
The fields concerned have this format: value = "Livraison_21J|Relais_19J" or "RELAIS_15 DAY"
I am trying to extract the number of days for each value preceeded by the keyword "Relais".
The days range from 1 to 100.
I used this to do so:
SELECT CAST(REGEXP_EXTRACT(delivery, r"RELAIS_([0-9]+J)") as string) as relayDay
FROM TABLE
I want to be able to extract just the number of days regardless of the the string that comes after the numbers, be it "J" or "DAY".
Sample data :
RETRAIT_2H|LIVRAISON_5J|RELAIS_5J | 5J
LIVRAISON_21J|RELAIS_19J | 19J
LIVRAISON_21J|RELAIS_19J | 19J
RETRAIT_2H|LIVRAISON_3J|RELAIS_3J | 3J
You may use
REGEXP_EXTRACT(delivery, r"(?:.*\D)?(\d+)\s*(?:J|DAY)")
See the regex demo
Details
(?:.*\D)? - an optional non-capturing group that matches 0+ chars other than line break chsrs as many as possible and then a non-digit char (this pattern is required to advance the index to the location right before the last sequence of digits, not the last digit)
(\d+) - Group 1 (just what the REGEXP_EXTRACT returns): one or more digits
\s* - 0+ whitespaces
(?:J|DAY) - J or DAY substrings.

regex - match exactly 10 digits with atleast one symbol or spaces between them

I'm trying to write a query in oracle sql to get rows which has invalid 10 digit numbers, ie with other symbols in between them.
For example:
(111) 111-1111 #10 digit number with some symbols and spaces in between
111-111-1111
(111)111-1111
111)111-1111
(111) 11 1-1111
ie, It should match exactly 10 digit numbers which are non consecutive because it has some symbols in it.
So it should not match the following example:
111 #consecutive 3 digit number
11 1 #3 digit number with spaces
11-1 #3 digit number with symbol in between
1111111111 #consective 10 digit number
And I'm using REGEXP_LIKE, something like this
select * from table where REGEXP_LIKE(column, ?)
Any help is much appreciated. Thanks.
You could use a combination of a regex and length; the latter to exclude a pure 10-digit number without other characters:
regexp_like(col, '^[ .()-]*(\d[ .()-]*){10}$') and length(col) > 10
In the [.()-] class you would list all the characters that you would allow as symbols among the digits. Note that - needs to be the last in that list or else be escaped.
If you would allow any non-digit to occur among the 10 digits, you can use \D:
regexp_like(col, '^\D*(\d\D*){10}$') and length(col) > 10
So: the string should have length greater than 10, and the total number of digits must be exactly 10. This can be done without regular expressions (which should make it faster):
... where length(str) > 10 and
length(str) = 10 + length(translate(str, 'z0123456789', 'z'))
translate will translate the letter z to itself and all the other characters (digits) to nothing. Having to include the z is annoying, but unavoidable; translate will return NULL if any of its arguments is NULL. The second condition says the length of the input str is exactly 10 more than the length of the string with all digits removed - so there are exactly 10 digits.

Explain why storing the value of printf in a variable and then printing it gives an extra value?

int d;
d=printf("\n%d%d%d%d",1,2,3,4);
printf("%d",d);
The code gives the output as 1,2,3,4,5.
I don't understand why an integer greater than the last one is being printed.
printf returns the total number of characters written. In the first printf call that is 4 digits from the 4 variables and the newline character which adds up to 5. So the return value is 5 which is what you get in the second call.

count occurences of string in substring with condition

I need to count how often a number is present in a string. it should count EVERY occurence with a whitespace in front, except those followed by a =.
For example:
If i need to know how many "1" there are in this string: this is a 1 ramdnom string with 2 numbers 1 with 1=something it should return 2, as the third one is followed by an =
To find the occurrences I am using this: occurences = mystring.Split(" 1").Length - 1
But how to exclude those followed by a =?
Thanks
Something like,
Dim occurrences = Regex.Matches(yourString, "\W[0-9]([^=]|$)").Count
If you'd like to do replacements, use a Regex.Replace overload.
Breaking it down, this expression matches
\W // any whitespace character
[0-9] // any deciaml digit
( // either
[^=] // not =
| // or
$ // the end of the string
)