Using variables to initialize regular expressions in awk - variables

I want to initialize a variable with a regular expression, and then use it for pattern matching. Results do not come as expected . So for example I have,
BEGIN {
item_code_pattern=/ITM-CD-10/ ;
}
$0 ~ $item_code_pattern{ print ; }
I see that records which do not have pattern as ITM-CD-10 are also coming in the output.
Please suggest what should be the correct boolean expression before the block.
Thanks

You want to use a regular string:
awk '
BEGIN {
item_code_pattern = "ITM-CD-10" ;
}
$0 ~ $item_code_pattern { print ; }
'
The /pattern/ construct checks whether $0 matches the given pattern, so your original code is equivalent to saying:
item_code_pattern = $0 ~ "ITM-CD-10"
Since $0 is empty in the BEGIN section, item_code_pattern is set to 0.

You need to drop the $ and the / symbols (and there's no need for a BEGIN block, just assign the variable on the command line):
awk '$0 ~ item_code_pattern' item_code_pattern=ITM-CD-10
When you use $, some versions of awk will emit an error while others will silently convert the variable to an integer value of 0 so that $item_code_pattern is exactly the same as $0, and the code $0 ~ $item_code_pattern is the tautology $0 ~ $0.
If you insist on using a BEGIN block, the syntax is:
BEGIN { item_code_pattern="ITM-CD-10" }
$0 ~ item_code_pattern
Note that { print } is the default rule when no rule is given, so it is redundant.

Related

Issue converting github.com/*/raw/* URLs to raw.githubusercontent.com URLS using AWK

Given the following example URLs:
urls.txt
https://github.com/2RDLive/Pi-Hole/raw/master/Blacklist.txt
https://github.com/34730/asd/raw/master/adaway-export
https://github.com/568475513/secret_domain/raw/master/filter.txt
https://github.com/BlackJack8/iOSAdblockList/raw/master/Regular%20Hosts.txt
https://github.com/CipherOps/MiscHostsFiles/raw/master/MiscAdTrackingHostBlock.txt
https://github.com/DK-255/Pi-hole-list-1/raw/main/Ads-Blocklist
https://github.com/DRSDavidSoft/additional-hosts/raw/master/domains/blacklist/adservers-and-trackers.txt
https://github.com/DRSDavidSoft/additional-hosts/raw/master/domains/blacklist/unwanted-iranian.txt
https://github.com/DandelionSprout/adfilt/raw/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt
https://github.com/DavidTai780/AdGuard-Home-Private-Rules/raw/master/hosts.txt
https://github.com/DivineEngine/Profiles/raw/master/Quantumult/Filter/Guard/Advertising.list
https://github.com/Hariharann8175/Indicators-of-Compromise-IOC-/raw/master/Ransomware%20URL's
https://github.com/JumbomanXDA/host/raw/main/hosts
https://github.com/Kees1958/W3C_annual_most_used_survey_blocklist/raw/master/EU_US%2Bmost_used_ad_and_tracking_networks
https://github.com/KurzGedanke/kurzBlock/raw/master/kurzBlock.txt
https://github.com/MajkiIT/polish-ads-filter/raw/master/polish-adblock-filters/adblock.txt
https://github.com/MitaZ/Better_Filter/raw/master/Quantumult_X/Filter.list
https://github.com/MrWaste/Ad-BlockList-2019-08-31/raw/master/Pi-Hole%20BackUps/Black%20List/All%20Server%20Black%20List
https://github.com/Neo23x0/signature-base/raw/master/iocs/c2-iocs.txt
https://github.com/Pentanium/ABClientFilters/raw/master/ko/korean.txt
https://github.com/Phentora/AdguardPersonalList/raw/master/blocklist.txt
https://github.com/ShadowWhisperer/BlockLists/raw/master/Lists/Malware
https://github.com/SlashArash/adblockfa/raw/master/adblockfa.txt
https://github.com/SukkaW/Surge/raw/master/List/domainset/reject_sukka.conf
https://github.com/Th3M3/blocklists/raw/master/tracking%26ads.list
https://github.com/TonyRL/blocklist/raw/master/hosts
https://github.com/UnbendableStraw/samsungnosnooping/raw/master/README.md
https://github.com/UnluckyLuke/BlockUnderRadarJunk/raw/master/blockunderradarjunk-list.txt
https://github.com/VernonStow/Filterlist/raw/master/Filterlist.txt
https://github.com/What-Zit-Tooya/Ad-Block/raw/main/Main-Blocklist/Ad-Block-HOSTS.txt
https://github.com/XionKzn/PiHole-Lists/raw/master/PiHole/Blocklist_HOSTS.txt
https://github.com/YanFung/Ads/raw/master/Mobile
https://github.com/Yuki2718/adblock/raw/master/adguard/tracking-plus.txt
https://github.com/Yuki2718/adblock/raw/master/japanese/jp-filters.txt
https://github.com/ZYX2019/host-block-list/raw/master/Custom.txt
https://github.com/abc45628/hosts/raw/master/hosts
https://github.com/aleclee/DNS-Blacklists/raw/master/AdHosts.txt
https://github.com/angelics/pfbng/raw/master/ads/ads-domain-list.txt
https://github.com/blocklistproject/Lists/raw/master/ransomware.txt
https://github.com/cchevy/macedonian-pi-hole-blocklist/raw/master/hosts.txt
https://github.com/craiu/mobiletrackers/raw/master/list.txt
https://github.com/curutpilek12/adguard-custom-list/raw/main/custom
https://github.com/damengzhu/banad/raw/main/jiekouAD.txt
https://github.com/deletescape/noads/raw/master/lists/add-switzerland.txt
https://github.com/doadin/Pi-Hole-Blocklist/raw/main/block.list
https://github.com/dreammjow/MyFilters/raw/main/src/filters.txt
https://github.com/durablenapkin/block/raw/master/streaming.txt
https://github.com/eEIi0A5L/adblock_filter/archive/master.zip
https://github.com/easylist-thailand/easylist-thailand/raw/master/subscription/easylist-thailand.txt
https://github.com/fandagroupofficial/hosts/raw/main/pihole/ads
https://github.com/fandagroupofficial/hosts/raw/main/pihole/log
https://github.com/fandagroupofficial/hosts/raw/main/pihole/trackers
https://github.com/faralai/Pihole-Rules/raw/master/Fara-Popups_Head
https://github.com/faralai/Pihole-Rules/raw/master/Fara-Xiaomi-info
https://github.com/farrokhi/adblock-iran/raw/master/filter.txt
https://github.com/fskreuz/blocklists/raw/dev/domains.txt
https://github.com/ftpmorph/ftprivacy/raw/master/regex-blocklists/smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
https://github.com/hell-sh/Evil-Domains/raw/master/evil-domains.txt
https://github.com/hosts-file/BulgarianHostsFile/raw/master/bhf.txt
https://github.com/igorskyflyer/ad-void/raw/main/AdVoid.Core.txt
https://github.com/jackrabbit335/UsefulLinuxShellScripts/raw/master/Hosts%20%26%20sourcelist/blacklist.txt
https://github.com/jakdev121/AMS2/raw/master/pi_indo_ads.txt
https://github.com/jakejarvis/ios-trackers/raw/master/blocklist.txt
https://github.com/jasirfayas/jBlocklist/raw/master/domains.lst
https://github.com/javabean/dnsmasq-antispy/raw/master/dnsmasq.ghostery_bugs.conf
https://github.com/javabean/dnsmasq-antispy/raw/master/dnsmasq.zz-extra-servers-manual.conf
https://github.com/jdlingyu/ad-wars/raw/master/hosts
https://github.com/jlonborg/piblacklist/raw/main/blacklist.txt
https://github.com/joaopinto14/PiHole/raw/main/adverts
https://github.com/kang49/kang49regexblacklistproject/raw/main/blacklist
https://github.com/lesong/Surge/raw/main/rule/BanProgramAD.list
https://github.com/lhie1/Rules/raw/master/Auto/REJECT.conf
https://github.com/mayesidevel/PiHoleLists/raw/master/MiscBlocklist
https://github.com/meinhimmel/hosts/raw/master/hosts
https://github.com/mhhakim/pihole-blocklist/raw/master/custom-blocklist.txt
https://github.com/migueldemoura/ublock-umatrix-rulesets/raw/master/Hosts/ads-tracking
https://github.com/minoplhy/filters/raw/main/Resources/blocked.txt
https://github.com/monojp/hosts_merge/raw/master/hosts_blacklist.txt
https://github.com/mtbnunu/ad-blocklist/raw/master/kr-list.txt
https://github.com/mtxadmin/ublock/raw/master/hosts/_telemetry
https://github.com/mullvad/dns-adblock/raw/main/lists/doh/adblock/custom
https://github.com/muxcc/AdsBlockLists/raw/master/aumm.hosts
https://github.com/nimasaj/uBOPa/raw/master/uBOPa.txt
https://github.com/notracking/hosts-blocklists/raw/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
https://github.com/npljy/npljy.github.io/raw/main/blocks/dns.txt
https://github.com/npljy/npljy.github.io/raw/main/blocks/filter.txt
https://github.com/olegwukr/polish-privacy-filters/raw/master/adblock.txt
https://github.com/parseword/nolovia/raw/master/skel/hosts-government-malware.txt
https://github.com/parseword/nolovia/raw/master/skel/hosts-nolovia.txt
https://github.com/pathforwardit/BlockList/raw/main/DomainList
https://github.com/pirat28/IHateTracker/raw/master/iHateTracker.txt
https://github.com/sa-ki13/jmsf/raw/master/japanese_mobile_site_dns_filter.txt
https://github.com/saurane/Turkish-Blocklist/raw/master/Blocklist/domains.txt
https://github.com/scomper/surge-list/raw/master/reject.list
https://github.com/sirsunknight/QuantumultX/raw/master/Filter/Radical-Advertising
https://github.com/smed79/blacklist/raw/master/hosts.txt
https://github.com/soteria-nou/domain-list/archive/master.zip
https://github.com/stamparm/maltrail/raw/master/trails/static/suspicious/pua.txt
https://github.com/sutchan/dnsmasq_ads_filter/raw/main/dnsmasq-ads-filter-list.txt
https://github.com/svetlyobg/svet-custom-domains/raw/master/ads-domains
https://github.com/tomzuu/blacklist-named/raw/master/ad.sites.conf
https://github.com/tomzuu/blacklist-named/raw/master/phishing.sites.conf
https://github.com/tomzuu/blacklist-named/raw/master/pushing.sites.conf
https://github.com/uBlockOrigin/uAssets/raw/master/filters/badware.txt
https://github.com/uBlockOrigin/uAssets/raw/master/filters/filters.txt
https://github.com/uBlockOrigin/uAssets/raw/master/filters/privacy.txt
https://github.com/unchartedsky/adguard-kr/raw/master/adguard-kr.txt
https://github.com/unflac/adFILTER/raw/master/filter.txt
https://github.com/vokins/ad/raw/main/ad.list
https://github.com/willianreis89/ADsBlock/raw/master/list.txt
https://github.com/wrysunny/ad_list/raw/master/adlist.txt
https://github.com/xOS/Config/raw/Her/Surge/RuleSet/Advertising.list
https://github.com/xinggsf/Adblock-Plus-Rule/raw/master/rule.txt
https://github.com/xlimit91/xlimit91-block-list/raw/master/blacklist.txt
https://github.com/xylagbx/ADBLOCK/raw/master/BLOCK/customadblockdomain.txt
https://github.com/ziozzang/adguard/raw/master/filter.txt
https://github.com/zznidar/BAR/raw/master/BAR-list
I'm using this command:
awk 'BEGIN{FS=OFS="/"}{if ($6~/^raw$/){$3="raw.githubusercontent.com"; for(i=0;i<=NF;++i) if (i!=6) {printf("%s%s",$i,(i==NF)?"\n":OFS)}}}' urls.txt
To produce this desired output:
https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
...
But it yields this output:
https://raw.githubusercontent.com/2RDLive/Pi-Hole/raw/master/Blacklist.txt/https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/raw/master/adaway-export/https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/raw/master/filter.txt/https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/raw/master/Regular%20Hosts.txt/https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
...
Why is it printing a semblance of the original URL before the correct output?
Here is the above code formatted legibly with gawk -o-:
BEGIN {
FS = OFS = "/"
}
{
if ($6 ~ /^raw$/) {
$3 = "raw.githubusercontent.com"
for (i = 0; i <= NF; ++i) {
if (i != 6) {
printf "%s%s", $i, (i == NF) ? "\n" : OFS
}
}
}
}
Your only real problem is that awk fields, arrays, and strings all start at 1, not 0, so your loop should have started at 1, not 0. As written first time through your loop print $i is doing print $0.
Having said that, I think what you want is the following with a couple of other things tidied up:
$ cat tst.awk
BEGIN { FS=OFS="/" }
sub(/^raw$/,RS,$6) && sub(OFS RS,"") {
$3 = "raw.githubusercontent.com"
print
}
$ awk -f tst.awk urls.txt
https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
https://raw.githubusercontent.com/CipherOps/MiscHostsFiles/master/MiscAdTrackingHostBlock.txt
https://raw.githubusercontent.com/DK-255/Pi-hole-list-1/main/Ads-Blocklist
https://raw.githubusercontent.com/DRSDavidSoft/additional-hosts/master/domains/blacklist/adservers-and-trackers.txt
https://raw.githubusercontent.com/DRSDavidSoft/additional-hosts/master/domains/blacklist/unwanted-iranian.txt
https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt
https://raw.githubusercontent.com/DavidTai780/AdGuard-Home-Private-Rules/master/hosts.txt
https://raw.githubusercontent.com/DivineEngine/Profiles/master/Quantumult/Filter/Guard/Advertising.list
https://raw.githubusercontent.com/Hariharann8175/Indicators-of-Compromise-IOC-/master/Ransomware%20URL's
https://raw.githubusercontent.com/JumbomanXDA/host/main/hosts
https://raw.githubusercontent.com/Kees1958/W3C_annual_most_used_survey_blocklist/master/EU_US%2Bmost_used_ad_and_tracking_networks
https://raw.githubusercontent.com/KurzGedanke/kurzBlock/master/kurzBlock.txt
https://raw.githubusercontent.com/MajkiIT/polish-ads-filter/master/polish-adblock-filters/adblock.txt
https://raw.githubusercontent.com/MitaZ/Better_Filter/master/Quantumult_X/Filter.list
https://raw.githubusercontent.com/MrWaste/Ad-BlockList-2019-08-31/master/Pi-Hole%20BackUps/Black%20List/All%20Server%20Black%20List
https://raw.githubusercontent.com/Neo23x0/signature-base/master/iocs/c2-iocs.txt
https://raw.githubusercontent.com/Pentanium/ABClientFilters/master/ko/korean.txt
https://raw.githubusercontent.com/Phentora/AdguardPersonalList/master/blocklist.txt
https://raw.githubusercontent.com/ShadowWhisperer/BlockLists/master/Lists/Malware
https://raw.githubusercontent.com/SlashArash/adblockfa/master/adblockfa.txt
https://raw.githubusercontent.com/SukkaW/Surge/master/List/domainset/reject_sukka.conf
https://raw.githubusercontent.com/Th3M3/blocklists/master/tracking%26ads.list
https://raw.githubusercontent.com/TonyRL/blocklist/master/hosts
https://raw.githubusercontent.com/UnbendableStraw/samsungnosnooping/master/README.md
https://raw.githubusercontent.com/UnluckyLuke/BlockUnderRadarJunk/master/blockunderradarjunk-list.txt
https://raw.githubusercontent.com/VernonStow/Filterlist/master/Filterlist.txt
https://raw.githubusercontent.com/What-Zit-Tooya/Ad-Block/main/Main-Blocklist/Ad-Block-HOSTS.txt
https://raw.githubusercontent.com/XionKzn/PiHole-Lists/master/PiHole/Blocklist_HOSTS.txt
https://raw.githubusercontent.com/YanFung/Ads/master/Mobile
https://raw.githubusercontent.com/Yuki2718/adblock/master/adguard/tracking-plus.txt
https://raw.githubusercontent.com/Yuki2718/adblock/master/japanese/jp-filters.txt
https://raw.githubusercontent.com/ZYX2019/host-block-list/master/Custom.txt
https://raw.githubusercontent.com/abc45628/hosts/master/hosts
https://raw.githubusercontent.com/aleclee/DNS-Blacklists/master/AdHosts.txt
https://raw.githubusercontent.com/angelics/pfbng/master/ads/ads-domain-list.txt
https://raw.githubusercontent.com/blocklistproject/Lists/master/ransomware.txt
https://raw.githubusercontent.com/cchevy/macedonian-pi-hole-blocklist/master/hosts.txt
https://raw.githubusercontent.com/craiu/mobiletrackers/master/list.txt
https://raw.githubusercontent.com/curutpilek12/adguard-custom-list/main/custom
https://raw.githubusercontent.com/damengzhu/banad/main/jiekouAD.txt
https://raw.githubusercontent.com/deletescape/noads/master/lists/add-switzerland.txt
https://raw.githubusercontent.com/doadin/Pi-Hole-Blocklist/main/block.list
https://raw.githubusercontent.com/dreammjow/MyFilters/main/src/filters.txt
https://raw.githubusercontent.com/durablenapkin/block/master/streaming.txt
https://raw.githubusercontent.com/easylist-thailand/easylist-thailand/master/subscription/easylist-thailand.txt
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/ads
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/log
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/trackers
https://raw.githubusercontent.com/faralai/Pihole-Rules/master/Fara-Popups_Head
https://raw.githubusercontent.com/faralai/Pihole-Rules/master/Fara-Xiaomi-info
https://raw.githubusercontent.com/farrokhi/adblock-iran/master/filter.txt
https://raw.githubusercontent.com/fskreuz/blocklists/dev/domains.txt
https://raw.githubusercontent.com/ftpmorph/ftprivacy/master/regex-blocklists/smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
https://raw.githubusercontent.com/hell-sh/Evil-Domains/master/evil-domains.txt
https://raw.githubusercontent.com/hosts-file/BulgarianHostsFile/master/bhf.txt
https://raw.githubusercontent.com/igorskyflyer/ad-void/main/AdVoid.Core.txt
https://raw.githubusercontent.com/jackrabbit335/UsefulLinuxShellScripts/master/Hosts%20%26%20sourcelist/blacklist.txt
https://raw.githubusercontent.com/jakdev121/AMS2/master/pi_indo_ads.txt
https://raw.githubusercontent.com/jakejarvis/ios-trackers/master/blocklist.txt
https://raw.githubusercontent.com/jasirfayas/jBlocklist/master/domains.lst
https://raw.githubusercontent.com/javabean/dnsmasq-antispy/master/dnsmasq.ghostery_bugs.conf
https://raw.githubusercontent.com/javabean/dnsmasq-antispy/master/dnsmasq.zz-extra-servers-manual.conf
https://raw.githubusercontent.com/jdlingyu/ad-wars/master/hosts
https://raw.githubusercontent.com/jlonborg/piblacklist/main/blacklist.txt
https://raw.githubusercontent.com/joaopinto14/PiHole/main/adverts
https://raw.githubusercontent.com/kang49/kang49regexblacklistproject/main/blacklist
https://raw.githubusercontent.com/lesong/Surge/main/rule/BanProgramAD.list
https://raw.githubusercontent.com/lhie1/Rules/master/Auto/REJECT.conf
https://raw.githubusercontent.com/mayesidevel/PiHoleLists/master/MiscBlocklist
https://raw.githubusercontent.com/meinhimmel/hosts/master/hosts
https://raw.githubusercontent.com/mhhakim/pihole-blocklist/master/custom-blocklist.txt
https://raw.githubusercontent.com/migueldemoura/ublock-umatrix-rulesets/master/Hosts/ads-tracking
https://raw.githubusercontent.com/minoplhy/filters/main/Resources/blocked.txt
https://raw.githubusercontent.com/monojp/hosts_merge/master/hosts_blacklist.txt
https://raw.githubusercontent.com/mtbnunu/ad-blocklist/master/kr-list.txt
https://raw.githubusercontent.com/mtxadmin/ublock/master/hosts/_telemetry
https://raw.githubusercontent.com/mullvad/dns-adblock/main/lists/doh/adblock/custom
https://raw.githubusercontent.com/muxcc/AdsBlockLists/master/aumm.hosts
https://raw.githubusercontent.com/nimasaj/uBOPa/master/uBOPa.txt
https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
https://raw.githubusercontent.com/npljy/npljy.github.io/main/blocks/dns.txt
https://raw.githubusercontent.com/npljy/npljy.github.io/main/blocks/filter.txt
https://raw.githubusercontent.com/olegwukr/polish-privacy-filters/master/adblock.txt
https://raw.githubusercontent.com/parseword/nolovia/master/skel/hosts-government-malware.txt
https://raw.githubusercontent.com/parseword/nolovia/master/skel/hosts-nolovia.txt
https://raw.githubusercontent.com/pathforwardit/BlockList/main/DomainList
https://raw.githubusercontent.com/pirat28/IHateTracker/master/iHateTracker.txt
https://raw.githubusercontent.com/sa-ki13/jmsf/master/japanese_mobile_site_dns_filter.txt
https://raw.githubusercontent.com/saurane/Turkish-Blocklist/master/Blocklist/domains.txt
https://raw.githubusercontent.com/scomper/surge-list/master/reject.list
https://raw.githubusercontent.com/sirsunknight/QuantumultX/master/Filter/Radical-Advertising
https://raw.githubusercontent.com/smed79/blacklist/master/hosts.txt
https://raw.githubusercontent.com/stamparm/maltrail/master/trails/static/suspicious/pua.txt
https://raw.githubusercontent.com/sutchan/dnsmasq_ads_filter/main/dnsmasq-ads-filter-list.txt
https://raw.githubusercontent.com/svetlyobg/svet-custom-domains/master/ads-domains
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/ad.sites.conf
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/phishing.sites.conf
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/pushing.sites.conf
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/badware.txt
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters.txt
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/privacy.txt
https://raw.githubusercontent.com/unchartedsky/adguard-kr/master/adguard-kr.txt
https://raw.githubusercontent.com/unflac/adFILTER/master/filter.txt
https://raw.githubusercontent.com/vokins/ad/main/ad.list
https://raw.githubusercontent.com/willianreis89/ADsBlock/master/list.txt
https://raw.githubusercontent.com/wrysunny/ad_list/master/adlist.txt
https://raw.githubusercontent.com/xOS/Config/Her/Surge/RuleSet/Advertising.list
https://raw.githubusercontent.com/xinggsf/Adblock-Plus-Rule/master/rule.txt
https://raw.githubusercontent.com/xlimit91/xlimit91-block-list/master/blacklist.txt
https://raw.githubusercontent.com/xylagbx/ADBLOCK/master/BLOCK/customadblockdomain.txt
https://raw.githubusercontent.com/ziozzang/adguard/master/filter.txt
https://raw.githubusercontent.com/zznidar/BAR/master/BAR-list
The only slightly tricky part in that is sub(/^raw$/,RS,$6) && sub(OFS RS,"") which is how you remove a mid-record field in awk - first convert the field to a string that matches RS since that can't be present in the input (we can use RS directly when it's a string like \n rather than a regexp) so we changed raw to \n in the 6th field which meant the record now contained /\n/ and then removed /\n thereby removing the 6th field and preceding /.

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1
Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

awk - send variable to .awk script and use as conditional

I want to send a variable to my external .awk script to use as a conditional. The following script is however not working.
Here first is the command:
awk -v myVar="optA" -f /users/test/fixit.awk /users/test/input.txt > /users/test/output.txt
The sample fixit.awk script is:
BEGIN { printf "TITLE:\nDocuments \n", myVar, FS=" "; }
if (myVar="optA")
printf myVar
else
printf "OptB"
Can someone please help diagnose the problem?
An awk assignment is also an expression with the return value of what was assigned. If you write
if (myVar = "optA")
you actually check the return value of the assignment, optA, which awlays evaluates to "true". You want
if (myVar == "optA")
for comparison instead of assignment.
Also, you can't have "naked" statements like this. Your if/else clause either has to be part of the BEGIN block:
BEGIN {
printf "TITLE:\nDocuments \n", myVar, FS=" "
if (myVar=="optA")
printf myVar
else
printf "OptB"
}
to execute once, or in a separate block if it should be executed for every single line (less likely, though):
BEGIN { printf "TITLE:\nDocuments \n", myVar, FS=" " }
{
if (myVar=="optA")
printf myVar
else
printf "OptB"
}
As an aside, the way you use printf doesn't make much sense: you could either do
print "TITLE:\nDocuments\n" myVar
or
printf "TITLE:\nDocuments\n%s\n", myVar
And for printf myVar or printf "OptB", unless you explicitly don't want that newline, you can as well use print myVar and print "OptB".
And finally, that FS= assignment looks a bit out of place and is probably not needed as " " is the default value of FS.

awk: non-terminated string

I'm trying to run the command below, and its giving me the error. Thoughts on how to fix? I would rather have this be a one line command than a script.
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } ' |
awk -F\" ' { print "url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/'{ print $1 }'\?schema\=1\.3\.0\&form\=json\&pretty\=true\&token\=582EVTY78-03iBkTAf0JAhwOBx\&account\=room_event\"" } '
awk: non-terminated string url = "ht... at source line 1
context is
>>> <<<
awk: giving up
source line number 2
The line below exports out a single column of ID's:
grep "id\": \"http://room.event.assist.com/event/room/event/" failed_events.txt |
head -n1217 |
awk -F/ ' { print $7 } '
156512145
898545774
454658748
898432413
I'm looking to get the ID's above into a string like so:
" url = "string...'ID'string"
take a look what you have in last awk :
awk -F\"
' #single start here
{ print " #double starts for print, no ends
url \= \"http\:\/\/room\.event\.assist\.com\/event\/room\/event\/
' #single ends here???
{ print $1 }'..... #single again??? ...
(rest codes)
and you want to print exact {print } out? i don't think so. why you were nesting print ?
Most of the elements of your pipe can be expressed right inside awk.
I can't tell exactly what you want to do with the last awk script, but here are some points:
Your "grep" is really just looking for a string of text, not a
regexp.
You can save time and simplify things if you use awk's
index() function instead of a RE. Output formats are almost always
best handled using printf().
Since you haven't provided your input data, I can't test this code, so you'll need to adapt it if it doesn't work. But here goes:
awk -F/ '
BEGIN {
string="id\": \"http://room.event.assist.com/event/room/event/";
fmt="url = http://example.com/event/room/event/%s?schema=whatever\n";
}
count == 1217 { nextfile; }
index($0, string) {
split($7, a, "\"");
printf(fmt, a[0]);
count++;
}' failed_events.txt
If you like, you can use awk's -v option to pass in the string variable from a shell script calling this awk script. Or if this is a stand-alone awk script (using #! shebang), you could refer to command line options with ARGV.

AWK -- How to assign a variable's value from matching regex which comes later?

While I have this awk script,
/regex2/{
var = $1}
/regex1/{
print var}
which I executed over input file:
regex1
This should be assigned as variable regex2
I got no printed output. The desired output is: "This" to be printed out.
I might then think to utilize BEGIN:
BEGIN{
/regex2/
var = $1}
/regex1/{
print var}
But apparently BEGIN cannot accommodate regex matching function. Any suggestion to this?
This would achieve the desired result:
awk '/regex2/ { print $1 }'
Otherwise, you'll need to read the file twice and perform something like the following. It will store the last occurrence of /regex2/ in var. Upon re-reading the file, it will print var for each occurrence of /regex1/. Note that you'll get an empty line in the output and the keyword 'This' on a newline:
awk 'FNR==NR && /regex2/ { var = $1; next } /regex1/ { print var }' file.txt{,}
Typically this sort of thing is done with a flag:
/regex1/ { f = 1 }
f && /regex2/ { var = $1; f = 0 }