Logically Impossible to fetch this particular string.? - sql

I have 3 strings which are random and look somewhat like this
1) ENTL.COMPENSATION REM REVERSE PAYMENT COUPON ON ISIN //IT0004889033 IN A TRIPARTY //TRANSACTION WITH 95724
2) 01P ISIN DE000A1H36U5 QTY 44527000, //C/P 19696
3) COUPON ISIN XS0820547742 QTY 466750,
Now what is expected is to fetch the values IT0004889033 or DE000A1H36U5 or XS0820547742. If you observe the 3 strings, these 3 expected values come rite after the ISIN. So we can take isin as a reference and then fetch the values after ISIN. But that is not what is required it seems. We should not fetch the value by taking some value as a reference.
Since the expected value is IT0004889033 which is a 12 digit character the information I have is; first 2 characters are alphabets, next 9 are alphanumeric and the last one is a digit. Just with this information is it possible to do a wildcard search or something and fetch this 12 digit value.?
I'm totally lost on this one logically.

You mentioned that ISIN should not be used as a reference. Therefore, the only thing for sure is that the string to be found starts with 2 letters, followed by 9 letters and/or numbers, and ends with a number.
I saved your example text as tmp, and ran the following egrep command... seems to work for me:
jim#debian:~/tmp$ egrep -o "[a-zA-Z]{2}[a-zA-Z0-9]{9}[0-9]{1}" tmp
IT0004889033
DE000A1H36U5
XS0820547742
The above solution is more correct than the previous ones because it takes a fixed amount of characters to filter the results. Only 12-character strings will be returned by the above code.
I hope this helps!

Using grep -oP:
grep -oP 'ISIN\W+\K\w+' file
IT0004889033
DE000A1H36U5
XS0820547742
if grep -P isn't available then you can use use awk:
awk -F '.*ISIN[^0-9a-zA-Z]*| ' '{print $2}' file
IT0004889033
DE000A1H36U5
XS0820547742
OR else:
awk -F '.*ISIN[^[:alnum:]]*| ' '{print $2}' file

Related

Generating 10 random numbers in a range in an awk script

So I'm trying to write an awk script that generates passwords given random names inputted from a .csv file. I'm aiming to do first 3 letters of last name, number of characters in the fourth field, then a random number between 1-200 after a space. So far I've got the letters and num of characters fine, but am having a hard time getting the syntax in my for loop to work for the random numbers. Here is an example of the input:
Danette,Suche,Female,"Kingfisher, malachite"
Corny,Chitty,Male,"Seal, southern elephant"
And desired output:
Suc21 80
Chi23 101
For 10 rows total. My code looks like this:
BEGIN{
FS=",";OFS=","
}
{print substr($2,0,3)length($4)
for(i=0;i<10;i++){
echo $(( $RANDOM % 200 ))
}}
Then I've been running it like
awk -F"," -f script.awk file.csv
But it only shows the 3 characters and length of fourth field, no random numbers. If anyone's able to point out where I'm screwing up it would be much appreciated , thanks guys
You can use rand() to generate a random number between 0 and 1:
awk -F, '{print substr($2,0,3)length($4),int(rand()*200)+1}' file.csv
BEGIN{
FS=",";OFS=","
}
{print substr($2,0,3)length($4)
for(i=0;i<10;i++){
echo $(( $RANDOM % 200 ))
}}
There is not echo function defined in GNU AWK, if you wish to use shell command you might use system function, however keep in mind that it does return status code and does print what said command output, without ability to alter it, so you need to design command so you get desired output from it.
Let file.txt content be
A
B
C
then
awk '{printf "%s ",$0;system("echo ${RANDOM}%200 | bc")}' file.txt
might give output
A 95
B 139
C 1
Explanation: firstly I use printf so no newline is appended automatically, I output whole line followed by space, then I execute command which does output random value in range
echo ${RANDOM}%200 | bc
it does simply ram RANDOM followed by %200 into calculator, which does output result of such action.
If you are not dead set on using RANDOM variable, then rand function, might be use without hassle.
(tested with gawk 4.2.1 and bc 1.07.1)

How to use awk to count the occurence of a word beginning with something?

I have a file that looks like this:
**FID IID**
1 RQ50131-0
2 469314
3 469704
4 469712
5 RQ50135-2
6 469720
7 470145
I want to use awk to count the occurences of IDs beginning with 'RQ' in column 2.
So for the little snapshot, it should be 2. After the RQ, the numbers differ so I want a count with anything that begins with RQ.
I am using this code
awk -F '\t' '{if(match("^RQ$",$2))print}'|wc -l ID.txt > RQ.txt
But I don't get an output.
Tabs are used as field delimiters by default (same as spaces), so you can omit -F '\t'.
You can use
awk '$2 ~ /^RQ/{cnt++} END{print cnt}' ID.txt > RQ.txt
Once Field 2 starts with RQ, increment cnt and once the file is processed print cnt.
See the online demo.
You did
{if(match("^RQ$",$2))print}
but compulsory arguments to match function are string, regexp. Also do not use $ if you are interesting in finding strings starting with as $ denotes end. After fixing that issues code would be
{if(match($2,"^RQ"))print}
Disclaimer: this answer does describe solely fixing problems with your current code, it does not contain any ways to ameliorate your code.
Also apart from the reversed parameters for match, the file ID.txt should come right after the closing single quote.
As you want to print the whole line, you can omit the if statement and the print statement because match returns the index at which that substring begins, or 0 if there is no match.
awk 'match($2,"^RQ")' ID.txt | wc -l > RQ.txt

How to compare digits after find with awk, egrep

i have some file.txt where is a lot of information. Input in file looks like:
<ss>283838<ss>
.
.
<ss>111 from 4444<ss>
.
<ss>255<ss>
The numbers can have any number of digits.
I need to find and compare these 2 numbers
If they equal print name of file and print that they are equal if not, reverse meneaning. Only one string in file have digits with word "from" between
I tried to do like
Awk '/[0-9]+ from./ {print $0} file.txt | egrep -o '[0-9]+'
With this command i get those two digits, but i im stacked now, and do not know how to compare them
With your shown samples, could you please try following. Simple explanation would be: getting respective values of digits by regex and then comparing them to check 3 cases either they are greater, lesser or equal to each other, will add detailed explanation in sometime.
awk '
match($0,/<[a-zA-Z]+[0-9]+/){
val1=substr($0,RSTART,RLENGTH)
gsub(/[^0-9]*/,"",val1)
match($0,/[0-9]+[a-zA-Z]+>/)
val2=substr($0,RSTART,RLENGTH)
gsub(/[^0-9]*/,"",val2)
if(val1>val2){
print "val1("val1 ")is Greater than val2("val2")"
}
if(val2>val1){
print "val2("val2 ")is Greater than val1("val1")"
}
if(val1==val2){
print "val1("val1 ")is equals to val2("val2")"
}
}' Input_file
For your current shown sample output will be as follows:
val2(333)is Greater than val1(222)

awk to remove 5th column from N column with fixed delimiter

I have file with Nth columns
I want to remove the 5th column from last of Nth columns
Delimiter is "|"
I tested with simple example as shown below:
bash-3.2$ echo "1|2|3|4|5|6|7|8" | nawk -F\| '{print $(NF-4)}'
4
Expecting result:
1|2|3|5|6|7|8
How should I change my command to get the desired output?
If I understand you correctly, you want to use something like this:
sed -E 's/\|[^|]*((\|[^|]*){4})$/\1/'
This matches a pipe character \| followed by any number of non-pipe characters [^|]*, then captures 4 more of the same pattern ((\|[^|]*){4}). The $ at the end matches the end of the line. The first part of the match (i.e. the fifth field from the end) is dropped.
Testing it out:
$ sed -E 's/\|[^|]*((\|[^|]*){4})$/\1/' <<<"1|2|3|4|5|6|7"
1|2|4|5|6|7
You could achieve the same thing using GNU awk with gensub but I think that sed is the right tool for the job in this case.
If your version of sed doesn't support extended regex syntax with -E, you can modify it slightly:
sed 's/|[^|]*\(\(|[^|]*\)\{4\}\)$/\1/'
In basic mode, pipes are interpreted literally but parentheses for capture groups and curly brcneed to be escaped.
AWK is your friend :
Sample Input
A|B|C|D|E|F|G|H|I
A|B|C|D|E|F|G|H|I|A
A|B|C|D|E|F|G|H|I|F|E|D|O|R|Q|U|I
A|B|C|D|E|F|G|H|I|E|O|Q
A|B|C|D|E|F|G|H|I|X
A|B|C|D|E|F|G|H|I|J|K|L
Script
awk 'BEGIN{FS="|";OFS="|"}
{$(NF-5)="";sub(/\|\|/,"|");print}' file
Sample Output
A|B|C|E|F|G|H|I
A|B|C|D|F|G|H|I|A
A|B|C|D|E|F|G|H|I|F|E|O|R|Q|U|I
A|B|C|D|E|F|H|I|E|O|Q
A|B|C|D|F|G|H|I|X
A|B|C|D|E|F|H|I|J|K|L
What we did here
As you are aware awk's has special variables to store each field in the record, which ranges from $1,$2 upto $(NF)
To exclude the 5th from the last column is as simple as
Emptying the colume ie $(NF-5)=""
Removing from the record, the consecutive | formed by the above step ie do sub(/\|\|/,"|")
another alternative, using #sjsam's input file
$ rev file | cut -d'|' --complement -f6 | rev
A|B|C|E|F|G|H|I
A|B|C|D|F|G|H|I|A
A|B|C|D|E|F|G|H|I|F|E|O|R|Q|U|I
A|B|C|D|E|F|H|I|E|O|Q
A|B|C|D|F|G|H|I|X
A|B|C|D|E|F|H|I|J|K|L
not sure you want the 5'th from the last or 6th. But it's easy to adjust.
Thanks for the help and guidance.
Below is what I tested:
bash-3.2$ echo "1|2|3|4|5|6|7|8|9" | nawk 'BEGIN{FS="|";OFS="|"} {$(NF-4)="!";print}' | sed 's/|!//'
Output: 1|2|3|4|6|7|8|9
Further tested on the file that I have extracted from system and so it worked fine.

Getting numerical sub-string of fields using awk

I was wondering how I can get the numerical sub-string of fields using awk in a text file like what is shown below. I am already familiar with substr() function. However, since the length of fields are not fixed, I have no idea how to separate text from numerical part.
A.txt
"Asd.1"
"bcdujcd.2"
"mshde.3333"
"deuhdue.777"
P.S. All the numbers are separated from text part with a single dot (.).
You may try like this:
rt$ echo "bcdujcd.2"|awk -F'[^0-9]*' '$0=$2'
If you don't care about any non-digit parts of the line and only want to see the digit parts as output you could use:
awk '{gsub(/[^[:digit:]]+/, " ")}7' A.txt
which will generate:
1
2
3333
777
as output (there's a leading space on each line for the record).
If there can only be one number field per line than the replacement above can be "" instead of " " in the gsub and the leading space will do away. The replacement with the space will keep multiple numerical fields separated by a space if they occur on a single line. (i.e. "foo.88.bar.11" becomes 88 11 instead of 8811).
If you just need the second (period delimited) field of each line of that sort then awk -F. '{print $2}' will do that.
$ awk -F'[".]' '{print $3}' file
1
2
3333
777