AWK invalid char ''' in expression - awk

The AWK command doesn't execute in a script but when checking in online awk validator shows my script has invalid char ''' in expression. Surprisingly I haven't changed anything in the awk command and was running all good till yesterday.
My Script:
awk '
BEGIN{ FS="," }
{
machinename=$1
compornot=$3
for (i=1; i<=NF; i++) {
if ($1 == machinename) {
if (compornot == "compliant" && $3 == "compliant") {
value = "NA"
} else if (compornot == "noncompliant" && $3 == "noncompliant") {
value = "No"
} else {
value = "Yes"
}
}
}
print $1 "," $2 "," compornot "," value "," $3
}
' $SCRIPT_PATH/$SCRIPT_NAME/kraft_ansible_temp.csv | tee $SCRIPT_PATH/$SCRIPT_NAME/kraft_ansible_temp.csv
Error in online AWK validator:
gawk: prog.awk:1: awk '
gawk: prog.awk:1: ^ invalid char ''' in expression
gawk: prog.awk:1: awk '
gawk: prog.awk:1: ^ syntax error`

The "script" you posted is a shell script, not an awk script. It does not conform to awk's syntax but rather a shell's, hence the syntax error.
The code between the two single quotes, on the other hand, is properly awk, so you should have the awk syntax validator validate that instead.

Related

Issue converting github.com/*/raw/* URLs to raw.githubusercontent.com URLS using AWK

Given the following example URLs:
urls.txt
https://github.com/2RDLive/Pi-Hole/raw/master/Blacklist.txt
https://github.com/34730/asd/raw/master/adaway-export
https://github.com/568475513/secret_domain/raw/master/filter.txt
https://github.com/BlackJack8/iOSAdblockList/raw/master/Regular%20Hosts.txt
https://github.com/CipherOps/MiscHostsFiles/raw/master/MiscAdTrackingHostBlock.txt
https://github.com/DK-255/Pi-hole-list-1/raw/main/Ads-Blocklist
https://github.com/DRSDavidSoft/additional-hosts/raw/master/domains/blacklist/adservers-and-trackers.txt
https://github.com/DRSDavidSoft/additional-hosts/raw/master/domains/blacklist/unwanted-iranian.txt
https://github.com/DandelionSprout/adfilt/raw/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt
https://github.com/DavidTai780/AdGuard-Home-Private-Rules/raw/master/hosts.txt
https://github.com/DivineEngine/Profiles/raw/master/Quantumult/Filter/Guard/Advertising.list
https://github.com/Hariharann8175/Indicators-of-Compromise-IOC-/raw/master/Ransomware%20URL's
https://github.com/JumbomanXDA/host/raw/main/hosts
https://github.com/Kees1958/W3C_annual_most_used_survey_blocklist/raw/master/EU_US%2Bmost_used_ad_and_tracking_networks
https://github.com/KurzGedanke/kurzBlock/raw/master/kurzBlock.txt
https://github.com/MajkiIT/polish-ads-filter/raw/master/polish-adblock-filters/adblock.txt
https://github.com/MitaZ/Better_Filter/raw/master/Quantumult_X/Filter.list
https://github.com/MrWaste/Ad-BlockList-2019-08-31/raw/master/Pi-Hole%20BackUps/Black%20List/All%20Server%20Black%20List
https://github.com/Neo23x0/signature-base/raw/master/iocs/c2-iocs.txt
https://github.com/Pentanium/ABClientFilters/raw/master/ko/korean.txt
https://github.com/Phentora/AdguardPersonalList/raw/master/blocklist.txt
https://github.com/ShadowWhisperer/BlockLists/raw/master/Lists/Malware
https://github.com/SlashArash/adblockfa/raw/master/adblockfa.txt
https://github.com/SukkaW/Surge/raw/master/List/domainset/reject_sukka.conf
https://github.com/Th3M3/blocklists/raw/master/tracking%26ads.list
https://github.com/TonyRL/blocklist/raw/master/hosts
https://github.com/UnbendableStraw/samsungnosnooping/raw/master/README.md
https://github.com/UnluckyLuke/BlockUnderRadarJunk/raw/master/blockunderradarjunk-list.txt
https://github.com/VernonStow/Filterlist/raw/master/Filterlist.txt
https://github.com/What-Zit-Tooya/Ad-Block/raw/main/Main-Blocklist/Ad-Block-HOSTS.txt
https://github.com/XionKzn/PiHole-Lists/raw/master/PiHole/Blocklist_HOSTS.txt
https://github.com/YanFung/Ads/raw/master/Mobile
https://github.com/Yuki2718/adblock/raw/master/adguard/tracking-plus.txt
https://github.com/Yuki2718/adblock/raw/master/japanese/jp-filters.txt
https://github.com/ZYX2019/host-block-list/raw/master/Custom.txt
https://github.com/abc45628/hosts/raw/master/hosts
https://github.com/aleclee/DNS-Blacklists/raw/master/AdHosts.txt
https://github.com/angelics/pfbng/raw/master/ads/ads-domain-list.txt
https://github.com/blocklistproject/Lists/raw/master/ransomware.txt
https://github.com/cchevy/macedonian-pi-hole-blocklist/raw/master/hosts.txt
https://github.com/craiu/mobiletrackers/raw/master/list.txt
https://github.com/curutpilek12/adguard-custom-list/raw/main/custom
https://github.com/damengzhu/banad/raw/main/jiekouAD.txt
https://github.com/deletescape/noads/raw/master/lists/add-switzerland.txt
https://github.com/doadin/Pi-Hole-Blocklist/raw/main/block.list
https://github.com/dreammjow/MyFilters/raw/main/src/filters.txt
https://github.com/durablenapkin/block/raw/master/streaming.txt
https://github.com/eEIi0A5L/adblock_filter/archive/master.zip
https://github.com/easylist-thailand/easylist-thailand/raw/master/subscription/easylist-thailand.txt
https://github.com/fandagroupofficial/hosts/raw/main/pihole/ads
https://github.com/fandagroupofficial/hosts/raw/main/pihole/log
https://github.com/fandagroupofficial/hosts/raw/main/pihole/trackers
https://github.com/faralai/Pihole-Rules/raw/master/Fara-Popups_Head
https://github.com/faralai/Pihole-Rules/raw/master/Fara-Xiaomi-info
https://github.com/farrokhi/adblock-iran/raw/master/filter.txt
https://github.com/fskreuz/blocklists/raw/dev/domains.txt
https://github.com/ftpmorph/ftprivacy/raw/master/regex-blocklists/smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
https://github.com/hell-sh/Evil-Domains/raw/master/evil-domains.txt
https://github.com/hosts-file/BulgarianHostsFile/raw/master/bhf.txt
https://github.com/igorskyflyer/ad-void/raw/main/AdVoid.Core.txt
https://github.com/jackrabbit335/UsefulLinuxShellScripts/raw/master/Hosts%20%26%20sourcelist/blacklist.txt
https://github.com/jakdev121/AMS2/raw/master/pi_indo_ads.txt
https://github.com/jakejarvis/ios-trackers/raw/master/blocklist.txt
https://github.com/jasirfayas/jBlocklist/raw/master/domains.lst
https://github.com/javabean/dnsmasq-antispy/raw/master/dnsmasq.ghostery_bugs.conf
https://github.com/javabean/dnsmasq-antispy/raw/master/dnsmasq.zz-extra-servers-manual.conf
https://github.com/jdlingyu/ad-wars/raw/master/hosts
https://github.com/jlonborg/piblacklist/raw/main/blacklist.txt
https://github.com/joaopinto14/PiHole/raw/main/adverts
https://github.com/kang49/kang49regexblacklistproject/raw/main/blacklist
https://github.com/lesong/Surge/raw/main/rule/BanProgramAD.list
https://github.com/lhie1/Rules/raw/master/Auto/REJECT.conf
https://github.com/mayesidevel/PiHoleLists/raw/master/MiscBlocklist
https://github.com/meinhimmel/hosts/raw/master/hosts
https://github.com/mhhakim/pihole-blocklist/raw/master/custom-blocklist.txt
https://github.com/migueldemoura/ublock-umatrix-rulesets/raw/master/Hosts/ads-tracking
https://github.com/minoplhy/filters/raw/main/Resources/blocked.txt
https://github.com/monojp/hosts_merge/raw/master/hosts_blacklist.txt
https://github.com/mtbnunu/ad-blocklist/raw/master/kr-list.txt
https://github.com/mtxadmin/ublock/raw/master/hosts/_telemetry
https://github.com/mullvad/dns-adblock/raw/main/lists/doh/adblock/custom
https://github.com/muxcc/AdsBlockLists/raw/master/aumm.hosts
https://github.com/nimasaj/uBOPa/raw/master/uBOPa.txt
https://github.com/notracking/hosts-blocklists/raw/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
https://github.com/npljy/npljy.github.io/raw/main/blocks/dns.txt
https://github.com/npljy/npljy.github.io/raw/main/blocks/filter.txt
https://github.com/olegwukr/polish-privacy-filters/raw/master/adblock.txt
https://github.com/parseword/nolovia/raw/master/skel/hosts-government-malware.txt
https://github.com/parseword/nolovia/raw/master/skel/hosts-nolovia.txt
https://github.com/pathforwardit/BlockList/raw/main/DomainList
https://github.com/pirat28/IHateTracker/raw/master/iHateTracker.txt
https://github.com/sa-ki13/jmsf/raw/master/japanese_mobile_site_dns_filter.txt
https://github.com/saurane/Turkish-Blocklist/raw/master/Blocklist/domains.txt
https://github.com/scomper/surge-list/raw/master/reject.list
https://github.com/sirsunknight/QuantumultX/raw/master/Filter/Radical-Advertising
https://github.com/smed79/blacklist/raw/master/hosts.txt
https://github.com/soteria-nou/domain-list/archive/master.zip
https://github.com/stamparm/maltrail/raw/master/trails/static/suspicious/pua.txt
https://github.com/sutchan/dnsmasq_ads_filter/raw/main/dnsmasq-ads-filter-list.txt
https://github.com/svetlyobg/svet-custom-domains/raw/master/ads-domains
https://github.com/tomzuu/blacklist-named/raw/master/ad.sites.conf
https://github.com/tomzuu/blacklist-named/raw/master/phishing.sites.conf
https://github.com/tomzuu/blacklist-named/raw/master/pushing.sites.conf
https://github.com/uBlockOrigin/uAssets/raw/master/filters/badware.txt
https://github.com/uBlockOrigin/uAssets/raw/master/filters/filters.txt
https://github.com/uBlockOrigin/uAssets/raw/master/filters/privacy.txt
https://github.com/unchartedsky/adguard-kr/raw/master/adguard-kr.txt
https://github.com/unflac/adFILTER/raw/master/filter.txt
https://github.com/vokins/ad/raw/main/ad.list
https://github.com/willianreis89/ADsBlock/raw/master/list.txt
https://github.com/wrysunny/ad_list/raw/master/adlist.txt
https://github.com/xOS/Config/raw/Her/Surge/RuleSet/Advertising.list
https://github.com/xinggsf/Adblock-Plus-Rule/raw/master/rule.txt
https://github.com/xlimit91/xlimit91-block-list/raw/master/blacklist.txt
https://github.com/xylagbx/ADBLOCK/raw/master/BLOCK/customadblockdomain.txt
https://github.com/ziozzang/adguard/raw/master/filter.txt
https://github.com/zznidar/BAR/raw/master/BAR-list
I'm using this command:
awk 'BEGIN{FS=OFS="/"}{if ($6~/^raw$/){$3="raw.githubusercontent.com"; for(i=0;i<=NF;++i) if (i!=6) {printf("%s%s",$i,(i==NF)?"\n":OFS)}}}' urls.txt
To produce this desired output:
https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
...
But it yields this output:
https://raw.githubusercontent.com/2RDLive/Pi-Hole/raw/master/Blacklist.txt/https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/raw/master/adaway-export/https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/raw/master/filter.txt/https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/raw/master/Regular%20Hosts.txt/https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
...
Why is it printing a semblance of the original URL before the correct output?
Here is the above code formatted legibly with gawk -o-:
BEGIN {
FS = OFS = "/"
}
{
if ($6 ~ /^raw$/) {
$3 = "raw.githubusercontent.com"
for (i = 0; i <= NF; ++i) {
if (i != 6) {
printf "%s%s", $i, (i == NF) ? "\n" : OFS
}
}
}
}
Your only real problem is that awk fields, arrays, and strings all start at 1, not 0, so your loop should have started at 1, not 0. As written first time through your loop print $i is doing print $0.
Having said that, I think what you want is the following with a couple of other things tidied up:
$ cat tst.awk
BEGIN { FS=OFS="/" }
sub(/^raw$/,RS,$6) && sub(OFS RS,"") {
$3 = "raw.githubusercontent.com"
print
}
$ awk -f tst.awk urls.txt
https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
https://raw.githubusercontent.com/CipherOps/MiscHostsFiles/master/MiscAdTrackingHostBlock.txt
https://raw.githubusercontent.com/DK-255/Pi-hole-list-1/main/Ads-Blocklist
https://raw.githubusercontent.com/DRSDavidSoft/additional-hosts/master/domains/blacklist/adservers-and-trackers.txt
https://raw.githubusercontent.com/DRSDavidSoft/additional-hosts/master/domains/blacklist/unwanted-iranian.txt
https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt
https://raw.githubusercontent.com/DavidTai780/AdGuard-Home-Private-Rules/master/hosts.txt
https://raw.githubusercontent.com/DivineEngine/Profiles/master/Quantumult/Filter/Guard/Advertising.list
https://raw.githubusercontent.com/Hariharann8175/Indicators-of-Compromise-IOC-/master/Ransomware%20URL's
https://raw.githubusercontent.com/JumbomanXDA/host/main/hosts
https://raw.githubusercontent.com/Kees1958/W3C_annual_most_used_survey_blocklist/master/EU_US%2Bmost_used_ad_and_tracking_networks
https://raw.githubusercontent.com/KurzGedanke/kurzBlock/master/kurzBlock.txt
https://raw.githubusercontent.com/MajkiIT/polish-ads-filter/master/polish-adblock-filters/adblock.txt
https://raw.githubusercontent.com/MitaZ/Better_Filter/master/Quantumult_X/Filter.list
https://raw.githubusercontent.com/MrWaste/Ad-BlockList-2019-08-31/master/Pi-Hole%20BackUps/Black%20List/All%20Server%20Black%20List
https://raw.githubusercontent.com/Neo23x0/signature-base/master/iocs/c2-iocs.txt
https://raw.githubusercontent.com/Pentanium/ABClientFilters/master/ko/korean.txt
https://raw.githubusercontent.com/Phentora/AdguardPersonalList/master/blocklist.txt
https://raw.githubusercontent.com/ShadowWhisperer/BlockLists/master/Lists/Malware
https://raw.githubusercontent.com/SlashArash/adblockfa/master/adblockfa.txt
https://raw.githubusercontent.com/SukkaW/Surge/master/List/domainset/reject_sukka.conf
https://raw.githubusercontent.com/Th3M3/blocklists/master/tracking%26ads.list
https://raw.githubusercontent.com/TonyRL/blocklist/master/hosts
https://raw.githubusercontent.com/UnbendableStraw/samsungnosnooping/master/README.md
https://raw.githubusercontent.com/UnluckyLuke/BlockUnderRadarJunk/master/blockunderradarjunk-list.txt
https://raw.githubusercontent.com/VernonStow/Filterlist/master/Filterlist.txt
https://raw.githubusercontent.com/What-Zit-Tooya/Ad-Block/main/Main-Blocklist/Ad-Block-HOSTS.txt
https://raw.githubusercontent.com/XionKzn/PiHole-Lists/master/PiHole/Blocklist_HOSTS.txt
https://raw.githubusercontent.com/YanFung/Ads/master/Mobile
https://raw.githubusercontent.com/Yuki2718/adblock/master/adguard/tracking-plus.txt
https://raw.githubusercontent.com/Yuki2718/adblock/master/japanese/jp-filters.txt
https://raw.githubusercontent.com/ZYX2019/host-block-list/master/Custom.txt
https://raw.githubusercontent.com/abc45628/hosts/master/hosts
https://raw.githubusercontent.com/aleclee/DNS-Blacklists/master/AdHosts.txt
https://raw.githubusercontent.com/angelics/pfbng/master/ads/ads-domain-list.txt
https://raw.githubusercontent.com/blocklistproject/Lists/master/ransomware.txt
https://raw.githubusercontent.com/cchevy/macedonian-pi-hole-blocklist/master/hosts.txt
https://raw.githubusercontent.com/craiu/mobiletrackers/master/list.txt
https://raw.githubusercontent.com/curutpilek12/adguard-custom-list/main/custom
https://raw.githubusercontent.com/damengzhu/banad/main/jiekouAD.txt
https://raw.githubusercontent.com/deletescape/noads/master/lists/add-switzerland.txt
https://raw.githubusercontent.com/doadin/Pi-Hole-Blocklist/main/block.list
https://raw.githubusercontent.com/dreammjow/MyFilters/main/src/filters.txt
https://raw.githubusercontent.com/durablenapkin/block/master/streaming.txt
https://raw.githubusercontent.com/easylist-thailand/easylist-thailand/master/subscription/easylist-thailand.txt
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/ads
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/log
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/trackers
https://raw.githubusercontent.com/faralai/Pihole-Rules/master/Fara-Popups_Head
https://raw.githubusercontent.com/faralai/Pihole-Rules/master/Fara-Xiaomi-info
https://raw.githubusercontent.com/farrokhi/adblock-iran/master/filter.txt
https://raw.githubusercontent.com/fskreuz/blocklists/dev/domains.txt
https://raw.githubusercontent.com/ftpmorph/ftprivacy/master/regex-blocklists/smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
https://raw.githubusercontent.com/hell-sh/Evil-Domains/master/evil-domains.txt
https://raw.githubusercontent.com/hosts-file/BulgarianHostsFile/master/bhf.txt
https://raw.githubusercontent.com/igorskyflyer/ad-void/main/AdVoid.Core.txt
https://raw.githubusercontent.com/jackrabbit335/UsefulLinuxShellScripts/master/Hosts%20%26%20sourcelist/blacklist.txt
https://raw.githubusercontent.com/jakdev121/AMS2/master/pi_indo_ads.txt
https://raw.githubusercontent.com/jakejarvis/ios-trackers/master/blocklist.txt
https://raw.githubusercontent.com/jasirfayas/jBlocklist/master/domains.lst
https://raw.githubusercontent.com/javabean/dnsmasq-antispy/master/dnsmasq.ghostery_bugs.conf
https://raw.githubusercontent.com/javabean/dnsmasq-antispy/master/dnsmasq.zz-extra-servers-manual.conf
https://raw.githubusercontent.com/jdlingyu/ad-wars/master/hosts
https://raw.githubusercontent.com/jlonborg/piblacklist/main/blacklist.txt
https://raw.githubusercontent.com/joaopinto14/PiHole/main/adverts
https://raw.githubusercontent.com/kang49/kang49regexblacklistproject/main/blacklist
https://raw.githubusercontent.com/lesong/Surge/main/rule/BanProgramAD.list
https://raw.githubusercontent.com/lhie1/Rules/master/Auto/REJECT.conf
https://raw.githubusercontent.com/mayesidevel/PiHoleLists/master/MiscBlocklist
https://raw.githubusercontent.com/meinhimmel/hosts/master/hosts
https://raw.githubusercontent.com/mhhakim/pihole-blocklist/master/custom-blocklist.txt
https://raw.githubusercontent.com/migueldemoura/ublock-umatrix-rulesets/master/Hosts/ads-tracking
https://raw.githubusercontent.com/minoplhy/filters/main/Resources/blocked.txt
https://raw.githubusercontent.com/monojp/hosts_merge/master/hosts_blacklist.txt
https://raw.githubusercontent.com/mtbnunu/ad-blocklist/master/kr-list.txt
https://raw.githubusercontent.com/mtxadmin/ublock/master/hosts/_telemetry
https://raw.githubusercontent.com/mullvad/dns-adblock/main/lists/doh/adblock/custom
https://raw.githubusercontent.com/muxcc/AdsBlockLists/master/aumm.hosts
https://raw.githubusercontent.com/nimasaj/uBOPa/master/uBOPa.txt
https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
https://raw.githubusercontent.com/npljy/npljy.github.io/main/blocks/dns.txt
https://raw.githubusercontent.com/npljy/npljy.github.io/main/blocks/filter.txt
https://raw.githubusercontent.com/olegwukr/polish-privacy-filters/master/adblock.txt
https://raw.githubusercontent.com/parseword/nolovia/master/skel/hosts-government-malware.txt
https://raw.githubusercontent.com/parseword/nolovia/master/skel/hosts-nolovia.txt
https://raw.githubusercontent.com/pathforwardit/BlockList/main/DomainList
https://raw.githubusercontent.com/pirat28/IHateTracker/master/iHateTracker.txt
https://raw.githubusercontent.com/sa-ki13/jmsf/master/japanese_mobile_site_dns_filter.txt
https://raw.githubusercontent.com/saurane/Turkish-Blocklist/master/Blocklist/domains.txt
https://raw.githubusercontent.com/scomper/surge-list/master/reject.list
https://raw.githubusercontent.com/sirsunknight/QuantumultX/master/Filter/Radical-Advertising
https://raw.githubusercontent.com/smed79/blacklist/master/hosts.txt
https://raw.githubusercontent.com/stamparm/maltrail/master/trails/static/suspicious/pua.txt
https://raw.githubusercontent.com/sutchan/dnsmasq_ads_filter/main/dnsmasq-ads-filter-list.txt
https://raw.githubusercontent.com/svetlyobg/svet-custom-domains/master/ads-domains
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/ad.sites.conf
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/phishing.sites.conf
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/pushing.sites.conf
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/badware.txt
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters.txt
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/privacy.txt
https://raw.githubusercontent.com/unchartedsky/adguard-kr/master/adguard-kr.txt
https://raw.githubusercontent.com/unflac/adFILTER/master/filter.txt
https://raw.githubusercontent.com/vokins/ad/main/ad.list
https://raw.githubusercontent.com/willianreis89/ADsBlock/master/list.txt
https://raw.githubusercontent.com/wrysunny/ad_list/master/adlist.txt
https://raw.githubusercontent.com/xOS/Config/Her/Surge/RuleSet/Advertising.list
https://raw.githubusercontent.com/xinggsf/Adblock-Plus-Rule/master/rule.txt
https://raw.githubusercontent.com/xlimit91/xlimit91-block-list/master/blacklist.txt
https://raw.githubusercontent.com/xylagbx/ADBLOCK/master/BLOCK/customadblockdomain.txt
https://raw.githubusercontent.com/ziozzang/adguard/master/filter.txt
https://raw.githubusercontent.com/zznidar/BAR/master/BAR-list
The only slightly tricky part in that is sub(/^raw$/,RS,$6) && sub(OFS RS,"") which is how you remove a mid-record field in awk - first convert the field to a string that matches RS since that can't be present in the input (we can use RS directly when it's a string like \n rather than a regexp) so we changed raw to \n in the 6th field which meant the record now contained /\n/ and then removed /\n thereby removing the 6th field and preceding /.

Multiple options of nf for identify duplicate in different positions awk?

I hope you find yourself well, I am writing to know if it is possible to do something like this in awk
I NEED SOMETHING LIKE MANY CASE OF NF...
FOR NF = 7 PK IS $1,$5, BUT FOR NF=8 $1,$6
INPUT
AAA|BBB|CCC|DDD|111|20220129|JONH1
AAA|XXX|YYY|DDD|444|20210115|JONH2
AAA|B10|CCC|DDD|000|20200127|JONH3
AAA|BBB|MMM|DDD|444|20200131|JONH4
AAA|BBB|CCC|DDD|777|0054256|JONH5|MARY
AAA|BBB|CCC|DDD|111|0036000|JONH5|MARY
AAA|BBB|CCC|DDD|888|0089999|CENTRAL|MARY
AAA|BBB|CCC|DDD|999|0054256|JONH5|MARY
AAA|BBB|CCC|DDD|202|0054256|JONH5|MARY|MIAMI|FL
DESIRE OUTPUTS
file .PK_OK_1
AAA|BBB|CCC|DDD|111|20220129|JONH1
AAA|B10|CCC|DDD|000|20200127|JONH3
file DUPLICATE_PK_1
AAA|XXX|YYY|DDD|444|20210115|JONH2
AAA|BBB|MMM|DDD|444|20200131|JONH4
file PK_OK_2
AAA|BBB|CCC|DDD|111|0036000|JONH5|MARY
AAA|BBB|CCC|DDD|888|0089999|CENTRAL|MARY
file DUPLICATE_PK_2
AAA|BBB|CCC|DDD|777|0054256|JONH5|MARY
AAA|BBB|CCC|DDD|999|0054256|JONH5|MARY
file INVALID_LENGHT
AAA|BBB|CCC|DDD|202|0054256|JONH5|MARY|MIAMI|FL
MY CODE IS something like this (NOM_ARCH IS A VARIABLE)
BEGIN { FS="|";
OFS="|"
}
NF == 7 {
if (!seen[$1,$5]) {
print > NOM_ARCH".PK_OK_1"; seen[$1,$5]=1
}else{
print > NOM_ARCH".DUPLICATE_PK_1"
}
next
}
NF == 8 {
if (!seen[$1,$6]) {
print > NOM_ARCH".PK_OK_2"; seen[$1,$6]=1
}else{
print > NOM_ARCH".DUPLICATE_PK_2"
}
next
}
{ print > NOM_ARCH".INVALID_LENGHT" }
With your shown samples, please try following awk code.
awk '
BEGIN{ FS=OFS="|" }
{
if(NF==7){ key=($1 FS $5) }
if(NF==8){ key=($1 FS $6) }
}
FNR==NR{
arr1[key]++
next
}
NF==7{
outputFile=(arr1[key]==1?"file.PK_OK_1":"file_DUPLICATE_PK_1")
}
NF==8{
outputFile=(arr1[key]==1?"file.PK_OK_2":"file_DUPLICATE_PK_2")
}
NF>8{
outputFile="file_INVALID_LENGHTH"
}
{
print > (outputFile)
}
' Input_file Input_file
OR use following code without ternary operators as per OP's request:
awk '
BEGIN{ FS=OFS="|" }
{
if(NF==7){ key=($1 FS $5) }
if(NF==8){ key=($1 FS $6) }
}
FNR==NR{
arr1[key]++
next
}
NF==7{
if(arr1[key]==1){ outputFile="file.PK_OK_1" }
else { outputFile="file_DUPLICATE_PK_1"}
}
NF==8{
if(arr1[key]==1){ outputFile="file.PK_OK_2" }
else { outputFile="file_DUPLICATE_PK_2"}
}
NF>8{
outputFile="file_INVALID_LENGHTH"
}
{
print > (outputFile)
}
' Input_file Input_file
Explanation: Adding detailed explanation for above.
## Starting awk program from here.
awk '
## Starting BEGIN section of this program from here, setting FS and OFS to | here.
BEGIN{ FS=OFS="|" }
##Starting main program from here.
{
##Checking condition if NF is 7 then set key to $1 FS $5.
if(NF==7){ key=($1 FS $5) }
##Checking condition if NF is 8 then set key to $1 FS $6.
if(NF==8){ key=($1 FS $6) }
}
##Checking condition FNR==NR which will be TRUE when 1st time Input_file is being read.
FNR==NR{
##Creating array arr1 with index of key and keep increasing same key value with 1 here.
arr1[key]++
##next will skip all further statements from here.
next
}
##Checking condition if NF==7 then do following.
NF==7{
##Setting outputFile(where contents will be written to), either file.PK_OK_1 OR file_DUPLICATE_PK_1 depending upon value of arr1.
##Basically it uses ternary operators ? and :
##Statements after ? will executed if condition arr1[key]==1 is TRUE.
##Statements after : will be executed if condition ar1[key]==1 is FALSE.
outputFile=(arr1[key]==1?"file.PK_OK_1":"file_DUPLICATE_PK_1")
}
##Checking condition if NF==8 then do following.
NF==8{
##Setting outputFile(where contents will be written to), either file.PK_OK_2 OR file_DUPLICATE_PK_2 depending upon value of arr1.
outputFile=(arr1[key]==1?"file.PK_OK_2":"file_DUPLICATE_PK_2")
}
##Checking condition if NF>8 then do following.
NF>8{
##Setting outputFile(where contents will be written to) to file_INVALID_LENGHTH here.
outputFile="file_INVALID_LENGHTH"
}
{
##Printing current line to outputFile(already set its value above)
print > (outputFile)
}
##Mentioning Input_file names here.
' Input_file Input_file
Normally I'd recommend a first pass with sort and uniq -c for efficiency but I started out assuming the wrong requirements and so wrote most of this under that assumption and so I've just tweaked it now for the real requirements and so here's how to do it all in one awk script:
$ cat tst.awk
BEGIN {
FS=OFS="|"
map[7] = 1
map[8] = 2
}
{ key = $1 FS $(NF-2) FS NF }
NR==FNR {
cnt[key]++
next
}
{
if ( NF in map ) {
sfx = ( cnt[key]>1 ? "DUPLICATE_PK" : "PK_OK" ) "_" map[NF]
}
else {
sfx = "INVALID_LENGTH"
}
print > (nom_arch "." sfx)
}
$ awk -v nom_arch='foo' -f tst.awk file file
$ head foo.*
==> foo.DUPLICATE_PK_1 <==
AAA|XXX|YYY|DDD|444|20210115|JONH2
AAA|BBB|MMM|DDD|444|20200131|JONH4
==> foo.DUPLICATE_PK_2 <==
AAA|BBB|CCC|DDD|777|0054256|JONH5|MARY
AAA|BBB|CCC|DDD|999|0054256|JONH5|MARY
==> foo.INVALID_LENGTH <==
AAA|BBB|CCC|DDD|202|0054256|JONH5|MARY|MIAMI|FL
==> foo.PK_OK_1 <==
AAA|BBB|CCC|DDD|111|20220129|JONH1
AAA|B10|CCC|DDD|000|20200127|JONH3
==> foo.PK_OK_2 <==
AAA|BBB|CCC|DDD|111|0036000|JONH5|MARY
AAA|BBB|CCC|DDD|888|0089999|CENTRAL|MARY
I corrected the spelling of LENGTH above.
Note that NF is included in key = $1 FS $(NF-2) FS NF so we avoid a potential case pointed out by #rowboat where a line with 7 fields has the same $1 and $(NF-2) as a line with 8 fields and so we would otherwise end up counting that twice when it should be 2 separate counts of 1.
We could have used NF-6 instead of map[NF] when setting the sfx but the map[] is useful for identifying valid NF values too and there may be other values of NF in future for which the sfx can't be determined by just subtracting 6.
This uses GNU awk for multidimensional arrays:
# classify.awk
BEGIN {
FS = "|"
ok[7] = ".PK_OK_1"; dup[7] = ".DUPLICATE_PK_1"
ok[8] = ".PK_OK_2"; dup[8] = ".DUPLICATE_PK_2"
}
NF < 7 || NF > 8 {
print > nom_arch".INVALID_LENGTH"
next
}
{
pk = $1 SUBSEP (NF == 7 ? $5 : $6)
count[NF][pk]++
lines[NF][pk] = lines[NF][pk] $0 ORS
}
END {
for (nf in count)
for (pk in count[nf]) {
outfile = nom_arch (count[nf][pk] == 1 ? ok[nf] : dup[nf])
sub(ORS"$", "", lines[nf][pk])
print lines[nf][pk] > outfile
}
}
Then this will produce the desired output files
gawk -f classify.awk -v nom_arch="foo" file
The awk SUBSEP variable is used in array keys when you do something like
var[x,y] = 10
awk uses the value of SUBSEP to join the values of x and y.
The default SUBSEP value is octal value 034, an ASCII character unlikely to appear in text data.
This version is more portable, does not require GNU awk
BEGIN {
FS = "|"
ok[7] = ".PK_OK_1"; dup[7] = ".DUPLICATE_PK_1"
ok[8] = ".PK_OK_2"; dup[8] = ".DUPLICATE_PK_2"
}
NF < 7 || NF > 8 {
print > (nom_arch".INVALID_LENGTH")
next
}
{
pk = NF SUBSEP $1 SUBSEP (NF == 7 ? $5 : $6)
count[pk]++
lines[pk] = lines[pk] $0 ORS
}
END {
for (pk in count) {
sub(ORS"$", "", lines[pk])
nf = pk; sub(SUBSEP".*", "", nf)
outfile = nom_arch (count[pk] == 1 ? ok[nf] : dup[nf])
print lines[pk] > outfile
}
}
If it's ok to put the first occurrence of a dup in with the OK's, then one pass is easy.
NOM_ARCH=/tmp/mytest
awk -v nom_arch="$NOM_ARCH" ' BEGIN { FS=OFS="|" }
{ if (NF ~ /^[78]$/) { key=($1 FS NF-2) } else { print > (nom_arch ".INVALID_LENGTH"); next; }
print > ( nom_arch "." ( seen[key]++ ? "DUPLICATE_PK" : "PK_OK" ) "_" NF-6 ) } ' file
c.f. AAA|B10|CCC|DDD|000|20200127|JONH3 and AAA|BBB|CCC|DDD|999|0054256|JONH5|MARY which land in the OK files as the first hit, but subsequent dups get seen and directed elsewhere. Note that it might still be faster to shift those records between smaller files on a second pass after the fact.
Personally, I'd probably just split the records to key-sorted files by NF first. Then the second pass each is easy.
NOM_ARCH=/tmp/mytest
# this pre-sort is likely the slow part, though smaller files and in parallel
awk 'BEGIN { FS=OFS="|" } { k2=NF-2; print | "sort -t\\| -k1,1 -k"k2","k2">NF"NF; }' file
shopt -s extglob; cat NF!([78]) > $NOM_ARCH.INVALID_LENGTH &
​for f in NF[78]; do
awk -v nom_arch="$NOM_ARCH" '
BEGIN { FS=OFS="|"; lastkey=""; lastrec=""; }
END { if(""!=lastrec){print lastrec>f} }
{ key=($1 FS $(NF-2));
if ( key==lastkey ) {
f=(nom_arch".DUPLICATE_PK_"NF-6);
if(""!=lastrec){print lastrec>f}
print $0>f;
lastrec="";
} else {
if(""!=lastrec){print lastrec>f}
f=(nom_arch".PK_OK_"NF-6);
lastkey=($1 FS $(NF-2));
lastrec=$0;
}
}' "$f" &
​done
​wait
Now your data should be sorted to files. This likely reorders the records in those files (see below), so if that matters you should add sorts to those outputs as well.
mytest.PK_OK_1:
​AAA|B10|CCC|DDD|000|20200127|JONH3
​AAA|BBB|CCC|DDD|111|20220129|JONH1
mytest.PK_OK_2:
​AAA|BBB|CCC|DDD|111|0036000|JONH5|MARY
​AAA|BBB|CCC|DDD|888|0089999|CENTRAL|MARY
mytest.DUPLICATE_PK_1:
​AAA|BBB|MMM|DDD|444|20200131|JONH4
​AAA|XXX|YYY|DDD|444|20210115|JONH2
mytest.DUPLICATE_PK_2:
​AAA|BBB|CCC|DDD|777|0054256|JONH5|MARY
​AAA|BBB|CCC|DDD|999|0054256|JONH5|MARY
mytest.INVALID_LENGTH:
​ AAA|BBB|CCC|DDD|202|0054256|JONH5|MARY|MIAMI|FL
This uses more disk space but less memory than an internal lookup table, and is likely a lot slower.
YMMV.

awk - ternary conditional expression not working

Input (sample)
=== account ===
title,altTitle,platform,url,
title,altTitle,platform,url,
title,altTitle,platform,url,
title,altTitle,platform,url,
title,altTitle,platform,url,
__collate-by-account.awk
#! /usr/bin/awk -f
#
# Group together lines (records) by account name
BEGIN { FS = ":" }
### generate headers ###
{s = $1}
{if (s != p)
print "\n\n=== ", s " ==="
}
{p = s}
### process records ###
# print field $2 to last field
{for (i = 2; i <= NF; ++i)
# {if (i!=NF) printf $i":"; else printf $i}
{ i != NF ? printf $i":" : printf $i }
}
{printf "\n"}
This part works as intended:
{if (i!=NF) printf $i":"; else printf $i}
Why doesn't this work:
{ i != NF ? printf $i":" : printf $i }
Getting the following errors:
awk: scripts/utils/metadata/__collate-by-account.awk:18: { i != NF ? printf $i":" : printf $i }
awk: scripts/utils/metadata/__collate-by-account.awk:18: ^ syntax error
awk: scripts/utils/metadata/__collate-by-account.awk:18: { i != NF ? printf $i":" : printf $i }
awk: scripts/utils/metadata/__collate-by-account.awk:18: ^ syntax error
Solution, thanks to #James Brown:
### process records ###
# print field $2 to last field
{for (i = 2; i <= NF; ++i)
{ printf "%s%s",$i,(i!=NF?":":"") }
}
{printf "\n"}
Explaination:
First off, note that printf can't be inside the ternary operator, neither the conditional expression to be evaluated (for obvious reasons) nor the resulting if-else expressions that will be executed after evaluation.
printf formats and prints the results
%s%s format specifiers, outputs or substitutes the next 2 arguments as strings:
https://www.gnu.org/software/gawk/manual/html_node/Format-Modifiers.html
https://en.wikipedia.org/wiki/Printf_format_string
$i simply output the field that's being looped over, see the above for-loop
(i!=NF?":":"")
output ":" if i is not equal to NF,
otherwise output empty string ""

How to translate a column value in the file using awk with tr command in unix

Details:
Input file : file.txt
P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2
Expected output:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
If i try using a variable it is giving proper result.
(i.e) /tmp>echo "P123456789"|tr "0-9" "5-9"|tr "A-Z" "X-Z"
Z678999999
But if i do with awk command it is not giving result instead giving error:
/tmp>$ awk 'BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }' /tmp/file.txt >/tmp/file.txt.tmp
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
awk: BEGIN { FS=OFS="," } { $1=echo $1|tr "0-9" "5-9"|tr "A-Z" "X-Z";$2="COLUMN2"); print }
awk: ^ syntax error
Can anyone help please?
just do what you wanted, without changing your logic:
awk line:
awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
with your data:
kent$ echo "P123456789,COLUMN2
P123456790,COLUMN2
P123456791,COLUMN2"|awk -F, -v OFS="," '{ "echo \""$1"\"|tr \"0-9\" \"5-9\"|tr \"A-Z\" \"X-Z\"" |getline $1}7'
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
$ cat tst.awk
function tr(old,new,str, oldA,newA,strA,i,j) {
split(old,oldA,"")
split(new,newA,"")
split(str,strA,"")
str = ""
for (i=1;i in strA;i++) {
for (j=1;(j in oldA) && !sub(oldA[j],newA[j],strA[i]);j++)
;
str = str strA[i]
}
return str
}
BEGIN { FS=OFS="," }
{ print tr("P012345678","Z567899999",$1), $2 }
$ awk -f tst.awk file
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2
Unfortunately, AWK does not have a built in translation function. You could write one like Ed Morton has done, but I would reach for (and highly recommend) a more powerful tool. Perl, for example, can process fields using the autosplit (-a) command switch:
-a turns on autosplit mode when used with a -n or -p. An implicit split command to the #F array is done as the first thing inside the
implicit while loop produced by the -n or -p.
You can type perldoc perlrun for more details.
Here's my solution:
perl -F, -lane '$F[0] =~ tr/0-9/5-9/; $F[0] =~ tr/A-Z/X-Z/; print join (",", #F)' file.txt
Results:
Z678999999,COLUMN2
Z678999995,COLUMN2
Z678999996,COLUMN2

Parsing errors in awk blocks

awk 'BEGIN
{
INPUTFILE ='XXX'; iterator =0;
requestIterator =0;
storageFlag =T;
printFlag =F;
currentIteration =F;
recordCount =1;
while (getline < "'"$INPUTFILE"'")
{
requestArray[requestIterator]++;
requestIterator++;
}
}
if ($1 ~ /RequestId/)
{
FS = "=";
if($2 in requestArray)
{
storage[iterator] =$0;
printFlag =T;
next
}
else
{
storageFlag =F;
next
}
}
else
{
if((storageFlag =='T' && $0 != "EOE"))
{
storage[iterator]=$0; iterator++;
}
else {if(storageFlag == 'F')
{
next
}
else
{
if(printFlag == 'T')
{
for(details in storage)
{
print storage[details] >> FILE1;
delete storage[details];
}
printFlag =F;
storageFlag =T;
next
}
}'
I am facing some syntax error in the above code. Could you ppl please help me?
awk: BEGIN{INPUTFILE =XXXX;iterator =0;requestIterator =0;storageFlag =T;printFlag =F;currentIteration =F;recordCount =1;while (getline < ""){requestArray[requestIterator]++;requestIterator++;}}if ($1 ~ /RequestId/){FS = "=";if($2 in requestArray){storage[iterator] =$0;printFlag =T;next}else{storageFlag =F;next}}else{if((storageFlag ==T && $0 != EOE)){storage[iterator]=$0;iterator++;}else{if(storageFlag == F){next}else{if(printFlag == T){for(details in storage){print storage[details] >> XXXX;delete storage[details];}printFlag = F;storageFlag =T;next}}}}
awk: ^ syntax error
awk: ^ syntax error
Quotes are the problem. The first single quotes on INPUTFILE ='XXX' is going to be parsed as matching the one before BEGIN, and from then on all the parsing is broken.
Either escape the quotes or just put the awk file into a seperate file rather than "inline".
# STARTING POINT - known bad
awk 'BEGIN { INPUTFILE ='XXX'; iterator =0; ... '
Has to be rewritten to remove all of the single quotes inside the outer pair
awk 'BEGIN { INPUTFILE ="XXX"; iterator =0; ... '
Or depending on if you need doubles or singles, use doubles outside and single inside
awk "BEGIN { INPUTFILE ='XXX'; iterator =0; ... '
or escape the singles quotes so they make it through to awk and don't get consumed by the shell.
awk 'BEGIN { INPUTFILE =\'XXX\'; iterator =0; ... '
All of your problems go away if you put the awk script into a separate file rather than inlining it the shell. You can have whatever quotes you like and no one will care !!