extract info from a tag using awk - awk

I have multi columns file and i want to extract some info in column 71.
I want to extract using tags which the value can be anything, for example i want to just extract AC=* ; AF=* , where the value can be anything .
I found similar question and gave it a try but it didn't work
Extract columns with values matching a specific pattern
Column 71 looks like this:
AC=14511;AC_AFR=382;AC_AMR=1177;AC_Adj=14343;AC_EAS=5;AC_FIN=427;AC_Het=11813;AC_Hom=1265;AC_NFE=11027;AC_OTH=97;AC_SAS=1228;AF=0.137;AN=106198;AN_AFR=8190;AN_AMR=10424;AN_Adj=99264;AN_EAS=7068;AN_FIN=6414;AN_NFE=51090;AN_OTH=658;AN_SAS=15420;BaseQRankSum=1.73;ClippingRankSum=-1.460e-01;DB;DP=1268322;FS=0.000;GQ_MEAN=190.24;GQ_STDDEV=319.67;Het_AFR=358;Het_AMR=1049;Het_EAS=5;Het_FIN=399;Het_NFE=8799;Het_OTH=83;Het_SAS=1120;Hom_AFR=12;Hom_AMR=64;Hom_EAS=0;Hom_FIN=14;Hom_NFE=1114;Hom_OTH=7;Hom_SAS=54;InbreedingCoeff=0.0478;MQ=60.00;MQ0=0;MQRankSum=0.037;NCC=270;POSITIVE_TRAIN_SITE;QD=21.41;ReadPosRankSum=0.212;VQSLOD=4.79;culprit=MQ;DP_HIST=30|3209|1539|1494|30007|7938|4130|2038|1310|612|334|185|97|60|31|25|9|11|7|33,0|66|339|1048|2096|2665|2626|1832|1210|584|323|179|89|54|31|22|7|9|4|15;GQ_HIST=84|66|56|82|3299|568|617|403|250|319|436|310|28566|2937|827|834|451|186|217|12591,15|15|13|16|25|11|22|28|18|38|52|31|65|76|39|83|93|65|97|12397;CSQ=T|ENSG00000186868|ENST00000334239|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS11502.1|ENSP00000334886|TAU_HUMAN|B4DSE3_HUMAN|UPI0000000C16||||2/8||ENST00000334239.8:c.134-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000570299|Transcript|intron_variant&non_coding_transcript_variant||||||rs754512|1||1|MAPT|HGNC|6893|processed_transcript||||||||||2/6||ENST00000570299.1:n.262-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000340799|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS45716.1|ENSP00000340438|TAU_HUMAN||UPI000004EEE6||||3/10||ENST00000340799.5:c.221-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000262410|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS11501.1|ENSP00000262410|TAU_HUMAN||UPI0000EE80B7||||4/13||ENST00000262410.5:c.308-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000446361|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS11500.1|ENSP00000408975|TAU_HUMAN||UPI000004EEE5||||2/9||ENST00000446361.3:c.134-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000574436|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS11499.1|ENSP00000460965|TAU_HUMAN||UPI000002D754||||3/10||ENST00000574436.1:c.308-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000571987|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS11501.1|ENSP00000458742|TAU_HUMAN||UPI0000EE80B7||||3/12||ENST00000571987.1:c.308-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000415613|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS45715.1|ENSP00000410838|TAU_HUMAN||UPI0001AE66E9||||3/13||ENST00000415613.2:c.308-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000571311|Transcript|intron_variant&NMD_transcript_variant||||||rs754512|1||1|MAPT|HGNC|6893|nonsense_mediated_decay|||ENSP00000460048||I3L2Z2_HUMAN|UPI00025A2E6E||||4/4||ENST00000571311.1:c.*176-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000535772|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS56033.1|ENSP00000443028|TAU_HUMAN|B4DSE3_HUMAN|UPI000004EEE4||||4/10||ENST00000535772.1:c.308-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000576518|Transcript|stop_gained|5499|7|3|K/*|Aag/Tag|rs754512|1||1|MAPT|HGNC|6893|protein_coding|||ENSP00000458621||I3L170_HUMAN&B4DSE3_HUMAN|UPI0001639A7C|||1/7|||ENST00000576518.1:c.7A>T|ENSP00000458621.1:p.Lys3Ter|T:0.1171|||||||||15792962|||||POSITION:0.00682261208576998&ANN_ORF:-255.6993&MAX_ORF:-255.6993|PHYLOCSF_WEAK|ANC_ALLELE|LC,T|ENSG00000186868|ENST00000420682|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS45716.1|ENSP00000413056|TAU_HUMAN||UPI000004EEE6||||2/9||ENST00000420682.2:c.221-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000572440|Transcript|non_coding_transcript_exon_variant&non_coding_transcript_variant|2790|||||rs754512|1||1|MAPT|HGNC|6893|retained_intron|||||||||1/1|||ENST00000572440.1:n.2790A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000351559|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS11499.1|ENSP00000303214|TAU_HUMAN||UPI000002D754||||4/11||ENST00000351559.5:c.308-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000344290|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding|YES|CCDS45715.1|ENSP00000340820|TAU_HUMAN||UPI0001AE66E9||||4/14||ENST00000344290.5:c.308-94A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000347967|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding|||ENSP00000302706|TAU_HUMAN|B4DSE3_HUMAN|UPI0000173D91||||4/10||ENST00000347967.5:c.32-100A>T||T:0.1171|||||||||15792962||||||||,T|ENSG00000186868|ENST00000431008|Transcript|intron_variant||||||rs754512|1||1|MAPT|HGNC|6893|protein_coding||CCDS56033.1|ENSP00000389250|TAU_HUMAN|B4DSE3_HUMAN|UPI000004EEE4||||3/9||ENST00000431008.3:c.308-94A>T||T:0.1171|||||||||15792962||||||||
The code that i tried:
awk '{
for (i = 1; i <= NF; i++) {
if ($i ~ /AC|AF/) {
printf "%s %s ", $i, $(i + 1)
}
}
print ""
}'
I keep getting syntax error.
output wanted :
AC=14511;AF=0.137

Whenever you have name=value pairs, it's usually simplest to first create an array that maps names to values (n2v[] below) and then you can just access the values by their names.
$ cat file
AC=1;AC_AFR=2;AF=3 AC=4;AC_AFR=5;AF=6
$ cat tst.awk
{
delete n2v
split($2,tmp,/[;=]/)
for (i=1; i in tmp; i+=2) {
n2v[tmp[i]] = tmp[i+1]
}
prt("AC")
prt("AF")
}
function prt(name) { print name, "=", n2v[name] }
$ awk -f tst.awk file
AC = 4
AF = 6
Just change $2 to $71 for your real input.

Something like this should do it (in Gnu awk due to switch):
$ awk '{split($71,a,";");for(i in a )if(a[i]~/^AF/) print a[i]}' foo
AF=0.137
You split the field $71 by ;s, loop thru the array you split to looking for desired match. For multiple matches use switch:
$ awk '{
split($0,a,";");
for(i in a )
switch(a[i]) {
case /^AF=/:
b=b a[i] OFS;
break;
case /^AC=/:
b=b a[i] OFS;
break
}
sub(/.$/,"\n",b);
printf b
}' foo
AC=14511 AF=0.137
EDIT: Now it buffers output to a variable and prints it in the end. You can control the separator with OFS.

Related

Issue converting github.com/*/raw/* URLs to raw.githubusercontent.com URLS using AWK

Given the following example URLs:
urls.txt
https://github.com/2RDLive/Pi-Hole/raw/master/Blacklist.txt
https://github.com/34730/asd/raw/master/adaway-export
https://github.com/568475513/secret_domain/raw/master/filter.txt
https://github.com/BlackJack8/iOSAdblockList/raw/master/Regular%20Hosts.txt
https://github.com/CipherOps/MiscHostsFiles/raw/master/MiscAdTrackingHostBlock.txt
https://github.com/DK-255/Pi-hole-list-1/raw/main/Ads-Blocklist
https://github.com/DRSDavidSoft/additional-hosts/raw/master/domains/blacklist/adservers-and-trackers.txt
https://github.com/DRSDavidSoft/additional-hosts/raw/master/domains/blacklist/unwanted-iranian.txt
https://github.com/DandelionSprout/adfilt/raw/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt
https://github.com/DavidTai780/AdGuard-Home-Private-Rules/raw/master/hosts.txt
https://github.com/DivineEngine/Profiles/raw/master/Quantumult/Filter/Guard/Advertising.list
https://github.com/Hariharann8175/Indicators-of-Compromise-IOC-/raw/master/Ransomware%20URL's
https://github.com/JumbomanXDA/host/raw/main/hosts
https://github.com/Kees1958/W3C_annual_most_used_survey_blocklist/raw/master/EU_US%2Bmost_used_ad_and_tracking_networks
https://github.com/KurzGedanke/kurzBlock/raw/master/kurzBlock.txt
https://github.com/MajkiIT/polish-ads-filter/raw/master/polish-adblock-filters/adblock.txt
https://github.com/MitaZ/Better_Filter/raw/master/Quantumult_X/Filter.list
https://github.com/MrWaste/Ad-BlockList-2019-08-31/raw/master/Pi-Hole%20BackUps/Black%20List/All%20Server%20Black%20List
https://github.com/Neo23x0/signature-base/raw/master/iocs/c2-iocs.txt
https://github.com/Pentanium/ABClientFilters/raw/master/ko/korean.txt
https://github.com/Phentora/AdguardPersonalList/raw/master/blocklist.txt
https://github.com/ShadowWhisperer/BlockLists/raw/master/Lists/Malware
https://github.com/SlashArash/adblockfa/raw/master/adblockfa.txt
https://github.com/SukkaW/Surge/raw/master/List/domainset/reject_sukka.conf
https://github.com/Th3M3/blocklists/raw/master/tracking%26ads.list
https://github.com/TonyRL/blocklist/raw/master/hosts
https://github.com/UnbendableStraw/samsungnosnooping/raw/master/README.md
https://github.com/UnluckyLuke/BlockUnderRadarJunk/raw/master/blockunderradarjunk-list.txt
https://github.com/VernonStow/Filterlist/raw/master/Filterlist.txt
https://github.com/What-Zit-Tooya/Ad-Block/raw/main/Main-Blocklist/Ad-Block-HOSTS.txt
https://github.com/XionKzn/PiHole-Lists/raw/master/PiHole/Blocklist_HOSTS.txt
https://github.com/YanFung/Ads/raw/master/Mobile
https://github.com/Yuki2718/adblock/raw/master/adguard/tracking-plus.txt
https://github.com/Yuki2718/adblock/raw/master/japanese/jp-filters.txt
https://github.com/ZYX2019/host-block-list/raw/master/Custom.txt
https://github.com/abc45628/hosts/raw/master/hosts
https://github.com/aleclee/DNS-Blacklists/raw/master/AdHosts.txt
https://github.com/angelics/pfbng/raw/master/ads/ads-domain-list.txt
https://github.com/blocklistproject/Lists/raw/master/ransomware.txt
https://github.com/cchevy/macedonian-pi-hole-blocklist/raw/master/hosts.txt
https://github.com/craiu/mobiletrackers/raw/master/list.txt
https://github.com/curutpilek12/adguard-custom-list/raw/main/custom
https://github.com/damengzhu/banad/raw/main/jiekouAD.txt
https://github.com/deletescape/noads/raw/master/lists/add-switzerland.txt
https://github.com/doadin/Pi-Hole-Blocklist/raw/main/block.list
https://github.com/dreammjow/MyFilters/raw/main/src/filters.txt
https://github.com/durablenapkin/block/raw/master/streaming.txt
https://github.com/eEIi0A5L/adblock_filter/archive/master.zip
https://github.com/easylist-thailand/easylist-thailand/raw/master/subscription/easylist-thailand.txt
https://github.com/fandagroupofficial/hosts/raw/main/pihole/ads
https://github.com/fandagroupofficial/hosts/raw/main/pihole/log
https://github.com/fandagroupofficial/hosts/raw/main/pihole/trackers
https://github.com/faralai/Pihole-Rules/raw/master/Fara-Popups_Head
https://github.com/faralai/Pihole-Rules/raw/master/Fara-Xiaomi-info
https://github.com/farrokhi/adblock-iran/raw/master/filter.txt
https://github.com/fskreuz/blocklists/raw/dev/domains.txt
https://github.com/ftpmorph/ftprivacy/raw/master/regex-blocklists/smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
https://github.com/hell-sh/Evil-Domains/raw/master/evil-domains.txt
https://github.com/hosts-file/BulgarianHostsFile/raw/master/bhf.txt
https://github.com/igorskyflyer/ad-void/raw/main/AdVoid.Core.txt
https://github.com/jackrabbit335/UsefulLinuxShellScripts/raw/master/Hosts%20%26%20sourcelist/blacklist.txt
https://github.com/jakdev121/AMS2/raw/master/pi_indo_ads.txt
https://github.com/jakejarvis/ios-trackers/raw/master/blocklist.txt
https://github.com/jasirfayas/jBlocklist/raw/master/domains.lst
https://github.com/javabean/dnsmasq-antispy/raw/master/dnsmasq.ghostery_bugs.conf
https://github.com/javabean/dnsmasq-antispy/raw/master/dnsmasq.zz-extra-servers-manual.conf
https://github.com/jdlingyu/ad-wars/raw/master/hosts
https://github.com/jlonborg/piblacklist/raw/main/blacklist.txt
https://github.com/joaopinto14/PiHole/raw/main/adverts
https://github.com/kang49/kang49regexblacklistproject/raw/main/blacklist
https://github.com/lesong/Surge/raw/main/rule/BanProgramAD.list
https://github.com/lhie1/Rules/raw/master/Auto/REJECT.conf
https://github.com/mayesidevel/PiHoleLists/raw/master/MiscBlocklist
https://github.com/meinhimmel/hosts/raw/master/hosts
https://github.com/mhhakim/pihole-blocklist/raw/master/custom-blocklist.txt
https://github.com/migueldemoura/ublock-umatrix-rulesets/raw/master/Hosts/ads-tracking
https://github.com/minoplhy/filters/raw/main/Resources/blocked.txt
https://github.com/monojp/hosts_merge/raw/master/hosts_blacklist.txt
https://github.com/mtbnunu/ad-blocklist/raw/master/kr-list.txt
https://github.com/mtxadmin/ublock/raw/master/hosts/_telemetry
https://github.com/mullvad/dns-adblock/raw/main/lists/doh/adblock/custom
https://github.com/muxcc/AdsBlockLists/raw/master/aumm.hosts
https://github.com/nimasaj/uBOPa/raw/master/uBOPa.txt
https://github.com/notracking/hosts-blocklists/raw/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
https://github.com/npljy/npljy.github.io/raw/main/blocks/dns.txt
https://github.com/npljy/npljy.github.io/raw/main/blocks/filter.txt
https://github.com/olegwukr/polish-privacy-filters/raw/master/adblock.txt
https://github.com/parseword/nolovia/raw/master/skel/hosts-government-malware.txt
https://github.com/parseword/nolovia/raw/master/skel/hosts-nolovia.txt
https://github.com/pathforwardit/BlockList/raw/main/DomainList
https://github.com/pirat28/IHateTracker/raw/master/iHateTracker.txt
https://github.com/sa-ki13/jmsf/raw/master/japanese_mobile_site_dns_filter.txt
https://github.com/saurane/Turkish-Blocklist/raw/master/Blocklist/domains.txt
https://github.com/scomper/surge-list/raw/master/reject.list
https://github.com/sirsunknight/QuantumultX/raw/master/Filter/Radical-Advertising
https://github.com/smed79/blacklist/raw/master/hosts.txt
https://github.com/soteria-nou/domain-list/archive/master.zip
https://github.com/stamparm/maltrail/raw/master/trails/static/suspicious/pua.txt
https://github.com/sutchan/dnsmasq_ads_filter/raw/main/dnsmasq-ads-filter-list.txt
https://github.com/svetlyobg/svet-custom-domains/raw/master/ads-domains
https://github.com/tomzuu/blacklist-named/raw/master/ad.sites.conf
https://github.com/tomzuu/blacklist-named/raw/master/phishing.sites.conf
https://github.com/tomzuu/blacklist-named/raw/master/pushing.sites.conf
https://github.com/uBlockOrigin/uAssets/raw/master/filters/badware.txt
https://github.com/uBlockOrigin/uAssets/raw/master/filters/filters.txt
https://github.com/uBlockOrigin/uAssets/raw/master/filters/privacy.txt
https://github.com/unchartedsky/adguard-kr/raw/master/adguard-kr.txt
https://github.com/unflac/adFILTER/raw/master/filter.txt
https://github.com/vokins/ad/raw/main/ad.list
https://github.com/willianreis89/ADsBlock/raw/master/list.txt
https://github.com/wrysunny/ad_list/raw/master/adlist.txt
https://github.com/xOS/Config/raw/Her/Surge/RuleSet/Advertising.list
https://github.com/xinggsf/Adblock-Plus-Rule/raw/master/rule.txt
https://github.com/xlimit91/xlimit91-block-list/raw/master/blacklist.txt
https://github.com/xylagbx/ADBLOCK/raw/master/BLOCK/customadblockdomain.txt
https://github.com/ziozzang/adguard/raw/master/filter.txt
https://github.com/zznidar/BAR/raw/master/BAR-list
I'm using this command:
awk 'BEGIN{FS=OFS="/"}{if ($6~/^raw$/){$3="raw.githubusercontent.com"; for(i=0;i<=NF;++i) if (i!=6) {printf("%s%s",$i,(i==NF)?"\n":OFS)}}}' urls.txt
To produce this desired output:
https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
...
But it yields this output:
https://raw.githubusercontent.com/2RDLive/Pi-Hole/raw/master/Blacklist.txt/https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/raw/master/adaway-export/https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/raw/master/filter.txt/https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/raw/master/Regular%20Hosts.txt/https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
...
Why is it printing a semblance of the original URL before the correct output?
Here is the above code formatted legibly with gawk -o-:
BEGIN {
FS = OFS = "/"
}
{
if ($6 ~ /^raw$/) {
$3 = "raw.githubusercontent.com"
for (i = 0; i <= NF; ++i) {
if (i != 6) {
printf "%s%s", $i, (i == NF) ? "\n" : OFS
}
}
}
}
Your only real problem is that awk fields, arrays, and strings all start at 1, not 0, so your loop should have started at 1, not 0. As written first time through your loop print $i is doing print $0.
Having said that, I think what you want is the following with a couple of other things tidied up:
$ cat tst.awk
BEGIN { FS=OFS="/" }
sub(/^raw$/,RS,$6) && sub(OFS RS,"") {
$3 = "raw.githubusercontent.com"
print
}
$ awk -f tst.awk urls.txt
https://raw.githubusercontent.com/2RDLive/Pi-Hole/master/Blacklist.txt
https://raw.githubusercontent.com/34730/asd/master/adaway-export
https://raw.githubusercontent.com/568475513/secret_domain/master/filter.txt
https://raw.githubusercontent.com/BlackJack8/iOSAdblockList/master/Regular%20Hosts.txt
https://raw.githubusercontent.com/CipherOps/MiscHostsFiles/master/MiscAdTrackingHostBlock.txt
https://raw.githubusercontent.com/DK-255/Pi-hole-list-1/main/Ads-Blocklist
https://raw.githubusercontent.com/DRSDavidSoft/additional-hosts/master/domains/blacklist/adservers-and-trackers.txt
https://raw.githubusercontent.com/DRSDavidSoft/additional-hosts/master/domains/blacklist/unwanted-iranian.txt
https://raw.githubusercontent.com/DandelionSprout/adfilt/master/Alternate%20versions%20Anti-Malware%20List/AntiMalwareHosts.txt
https://raw.githubusercontent.com/DavidTai780/AdGuard-Home-Private-Rules/master/hosts.txt
https://raw.githubusercontent.com/DivineEngine/Profiles/master/Quantumult/Filter/Guard/Advertising.list
https://raw.githubusercontent.com/Hariharann8175/Indicators-of-Compromise-IOC-/master/Ransomware%20URL's
https://raw.githubusercontent.com/JumbomanXDA/host/main/hosts
https://raw.githubusercontent.com/Kees1958/W3C_annual_most_used_survey_blocklist/master/EU_US%2Bmost_used_ad_and_tracking_networks
https://raw.githubusercontent.com/KurzGedanke/kurzBlock/master/kurzBlock.txt
https://raw.githubusercontent.com/MajkiIT/polish-ads-filter/master/polish-adblock-filters/adblock.txt
https://raw.githubusercontent.com/MitaZ/Better_Filter/master/Quantumult_X/Filter.list
https://raw.githubusercontent.com/MrWaste/Ad-BlockList-2019-08-31/master/Pi-Hole%20BackUps/Black%20List/All%20Server%20Black%20List
https://raw.githubusercontent.com/Neo23x0/signature-base/master/iocs/c2-iocs.txt
https://raw.githubusercontent.com/Pentanium/ABClientFilters/master/ko/korean.txt
https://raw.githubusercontent.com/Phentora/AdguardPersonalList/master/blocklist.txt
https://raw.githubusercontent.com/ShadowWhisperer/BlockLists/master/Lists/Malware
https://raw.githubusercontent.com/SlashArash/adblockfa/master/adblockfa.txt
https://raw.githubusercontent.com/SukkaW/Surge/master/List/domainset/reject_sukka.conf
https://raw.githubusercontent.com/Th3M3/blocklists/master/tracking%26ads.list
https://raw.githubusercontent.com/TonyRL/blocklist/master/hosts
https://raw.githubusercontent.com/UnbendableStraw/samsungnosnooping/master/README.md
https://raw.githubusercontent.com/UnluckyLuke/BlockUnderRadarJunk/master/blockunderradarjunk-list.txt
https://raw.githubusercontent.com/VernonStow/Filterlist/master/Filterlist.txt
https://raw.githubusercontent.com/What-Zit-Tooya/Ad-Block/main/Main-Blocklist/Ad-Block-HOSTS.txt
https://raw.githubusercontent.com/XionKzn/PiHole-Lists/master/PiHole/Blocklist_HOSTS.txt
https://raw.githubusercontent.com/YanFung/Ads/master/Mobile
https://raw.githubusercontent.com/Yuki2718/adblock/master/adguard/tracking-plus.txt
https://raw.githubusercontent.com/Yuki2718/adblock/master/japanese/jp-filters.txt
https://raw.githubusercontent.com/ZYX2019/host-block-list/master/Custom.txt
https://raw.githubusercontent.com/abc45628/hosts/master/hosts
https://raw.githubusercontent.com/aleclee/DNS-Blacklists/master/AdHosts.txt
https://raw.githubusercontent.com/angelics/pfbng/master/ads/ads-domain-list.txt
https://raw.githubusercontent.com/blocklistproject/Lists/master/ransomware.txt
https://raw.githubusercontent.com/cchevy/macedonian-pi-hole-blocklist/master/hosts.txt
https://raw.githubusercontent.com/craiu/mobiletrackers/master/list.txt
https://raw.githubusercontent.com/curutpilek12/adguard-custom-list/main/custom
https://raw.githubusercontent.com/damengzhu/banad/main/jiekouAD.txt
https://raw.githubusercontent.com/deletescape/noads/master/lists/add-switzerland.txt
https://raw.githubusercontent.com/doadin/Pi-Hole-Blocklist/main/block.list
https://raw.githubusercontent.com/dreammjow/MyFilters/main/src/filters.txt
https://raw.githubusercontent.com/durablenapkin/block/master/streaming.txt
https://raw.githubusercontent.com/easylist-thailand/easylist-thailand/master/subscription/easylist-thailand.txt
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/ads
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/log
https://raw.githubusercontent.com/fandagroupofficial/hosts/main/pihole/trackers
https://raw.githubusercontent.com/faralai/Pihole-Rules/master/Fara-Popups_Head
https://raw.githubusercontent.com/faralai/Pihole-Rules/master/Fara-Xiaomi-info
https://raw.githubusercontent.com/farrokhi/adblock-iran/master/filter.txt
https://raw.githubusercontent.com/fskreuz/blocklists/dev/domains.txt
https://raw.githubusercontent.com/ftpmorph/ftprivacy/master/regex-blocklists/smartphone-and-general-ads-analytics-regex-blocklist-ftprivacy.txt
https://raw.githubusercontent.com/hell-sh/Evil-Domains/master/evil-domains.txt
https://raw.githubusercontent.com/hosts-file/BulgarianHostsFile/master/bhf.txt
https://raw.githubusercontent.com/igorskyflyer/ad-void/main/AdVoid.Core.txt
https://raw.githubusercontent.com/jackrabbit335/UsefulLinuxShellScripts/master/Hosts%20%26%20sourcelist/blacklist.txt
https://raw.githubusercontent.com/jakdev121/AMS2/master/pi_indo_ads.txt
https://raw.githubusercontent.com/jakejarvis/ios-trackers/master/blocklist.txt
https://raw.githubusercontent.com/jasirfayas/jBlocklist/master/domains.lst
https://raw.githubusercontent.com/javabean/dnsmasq-antispy/master/dnsmasq.ghostery_bugs.conf
https://raw.githubusercontent.com/javabean/dnsmasq-antispy/master/dnsmasq.zz-extra-servers-manual.conf
https://raw.githubusercontent.com/jdlingyu/ad-wars/master/hosts
https://raw.githubusercontent.com/jlonborg/piblacklist/main/blacklist.txt
https://raw.githubusercontent.com/joaopinto14/PiHole/main/adverts
https://raw.githubusercontent.com/kang49/kang49regexblacklistproject/main/blacklist
https://raw.githubusercontent.com/lesong/Surge/main/rule/BanProgramAD.list
https://raw.githubusercontent.com/lhie1/Rules/master/Auto/REJECT.conf
https://raw.githubusercontent.com/mayesidevel/PiHoleLists/master/MiscBlocklist
https://raw.githubusercontent.com/meinhimmel/hosts/master/hosts
https://raw.githubusercontent.com/mhhakim/pihole-blocklist/master/custom-blocklist.txt
https://raw.githubusercontent.com/migueldemoura/ublock-umatrix-rulesets/master/Hosts/ads-tracking
https://raw.githubusercontent.com/minoplhy/filters/main/Resources/blocked.txt
https://raw.githubusercontent.com/monojp/hosts_merge/master/hosts_blacklist.txt
https://raw.githubusercontent.com/mtbnunu/ad-blocklist/master/kr-list.txt
https://raw.githubusercontent.com/mtxadmin/ublock/master/hosts/_telemetry
https://raw.githubusercontent.com/mullvad/dns-adblock/main/lists/doh/adblock/custom
https://raw.githubusercontent.com/muxcc/AdsBlockLists/master/aumm.hosts
https://raw.githubusercontent.com/nimasaj/uBOPa/master/uBOPa.txt
https://raw.githubusercontent.com/notracking/hosts-blocklists/master/dnscrypt-proxy/dnscrypt-proxy.blacklist.txt
https://raw.githubusercontent.com/npljy/npljy.github.io/main/blocks/dns.txt
https://raw.githubusercontent.com/npljy/npljy.github.io/main/blocks/filter.txt
https://raw.githubusercontent.com/olegwukr/polish-privacy-filters/master/adblock.txt
https://raw.githubusercontent.com/parseword/nolovia/master/skel/hosts-government-malware.txt
https://raw.githubusercontent.com/parseword/nolovia/master/skel/hosts-nolovia.txt
https://raw.githubusercontent.com/pathforwardit/BlockList/main/DomainList
https://raw.githubusercontent.com/pirat28/IHateTracker/master/iHateTracker.txt
https://raw.githubusercontent.com/sa-ki13/jmsf/master/japanese_mobile_site_dns_filter.txt
https://raw.githubusercontent.com/saurane/Turkish-Blocklist/master/Blocklist/domains.txt
https://raw.githubusercontent.com/scomper/surge-list/master/reject.list
https://raw.githubusercontent.com/sirsunknight/QuantumultX/master/Filter/Radical-Advertising
https://raw.githubusercontent.com/smed79/blacklist/master/hosts.txt
https://raw.githubusercontent.com/stamparm/maltrail/master/trails/static/suspicious/pua.txt
https://raw.githubusercontent.com/sutchan/dnsmasq_ads_filter/main/dnsmasq-ads-filter-list.txt
https://raw.githubusercontent.com/svetlyobg/svet-custom-domains/master/ads-domains
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/ad.sites.conf
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/phishing.sites.conf
https://raw.githubusercontent.com/tomzuu/blacklist-named/master/pushing.sites.conf
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/badware.txt
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/filters.txt
https://raw.githubusercontent.com/uBlockOrigin/uAssets/master/filters/privacy.txt
https://raw.githubusercontent.com/unchartedsky/adguard-kr/master/adguard-kr.txt
https://raw.githubusercontent.com/unflac/adFILTER/master/filter.txt
https://raw.githubusercontent.com/vokins/ad/main/ad.list
https://raw.githubusercontent.com/willianreis89/ADsBlock/master/list.txt
https://raw.githubusercontent.com/wrysunny/ad_list/master/adlist.txt
https://raw.githubusercontent.com/xOS/Config/Her/Surge/RuleSet/Advertising.list
https://raw.githubusercontent.com/xinggsf/Adblock-Plus-Rule/master/rule.txt
https://raw.githubusercontent.com/xlimit91/xlimit91-block-list/master/blacklist.txt
https://raw.githubusercontent.com/xylagbx/ADBLOCK/master/BLOCK/customadblockdomain.txt
https://raw.githubusercontent.com/ziozzang/adguard/master/filter.txt
https://raw.githubusercontent.com/zznidar/BAR/master/BAR-list
The only slightly tricky part in that is sub(/^raw$/,RS,$6) && sub(OFS RS,"") which is how you remove a mid-record field in awk - first convert the field to a string that matches RS since that can't be present in the input (we can use RS directly when it's a string like \n rather than a regexp) so we changed raw to \n in the 6th field which meant the record now contained /\n/ and then removed /\n thereby removing the 6th field and preceding /.

awk to print of column headers if field has dot in it

In the tab-delimited file below I am trying to use awk to print out the headers of the fields if they contain a single . (dot). The other fields should not contain a . and I am going to use another awk to detect there data type (either alpha or integer --- could be a decimal). The below seems close but not working as expected. Thank you :).
file
Index HGMD Sanger Classification Pop
1 . . VUS .36
awk
awk -F'\t' '$2 && $3 ~ /./ && FNR == 1 {printf "dot detected in fields: ORS $0"}' file
Index HGMD Sanger Classification
desired output
dot detected in fields: HGMD, Sanger
Assuming you want the headers of the columns that have a single dot on any line (HGMD and Sanger here):
Index HGMD Sanger Classification Pop
1 . 2 VUS .36
1 . . VUS .36
One solution would be:
awk -F'\t' 'NR==1 {for (i=0 ; i <= NF ; i++) headers[i] = $i; } # 1
NR!=1 {for (i=0 ; i <= NF ; i++) if ($i == ".") dots[i] = 1} # 2
END { printf "Dots in fields: ";
for (x in headers) if (dots[x]) printf "%s ", headers[x]; # 3
printf "\n"
} ' file
(1) collect the headers from the first input line to array headers.
(2) On other input lines, compare the value to a single dot, and set the entry in array dots to record any found dots.
(3) Finally, print out headers of the columns with dots[i] set.
Output is Dots in fields: HGMD Sanger, i.e. they are only listed once.
The dot matches any character in a regex, so $3 ~ /./ in your snippet would be true if field 3 contained any character. Also, $2 && $3 ~ ... would first test field 2 for truthiness (an empty string is falsy), and then do the match on field 3.
Use an Awk as below
awk 'BEGIN{FS="\t"}NR==1{for(i=1;i<=NF;i++) header[i]=$i}{for(i=1;i<=NF;i++) { if (match($i,/^\.$/)) { print header[i] } } }' file
HGMD
Sanger
The idea is to get the header information from the first line hashed by index 1..n and when processing the actual lines, if the . is encountered, get the hashed value from the array and print it.
awk '
NR==1 { split($0,hdr); next }
{
for (i=1; i<=NF; i++) {
if ($i != ".") {
delete hdr[i]
}
}
}
END {
printf "dot detected in fields"
for (i in hdr) {
printf "%s %s", (c++?",":":"), hdr[i]
}
print ""
}
' file
dot detected in fields: HGMD, Sanger

missing field and extra space after using for loop in awk

I need to use an awk script to extract some information from a file.
I have a title line which has 11 field and I split it to an array called titleList.
Student Number:Name:Lab1:Lab2:Lab3:Lab4:Lab5:Lab6:Exam1:Exam2:Final
After finding a proper line I need to print the fields which proceeds by the titles for example if the result is :
92839342:Robert Bloomingdale:9:26:18:22:9:12:25:39:99
I must print it in this way:
Student Number:92839342 Name:Robert Bloomingdale Lab1:9 Lab2:26 Lab3:18
Lab4:22 Lab5:9 Lab6:12 Exam1:25 Exam2:39 Final:99
I use a for loop to manage it:
for (i=0 ;i<=NF ;i++)
{
printf "%s %s %s %s",titleList[i],":",$i," "
}
everything look good except the result which has 2 problems:
first there is an extra space between each result and second the last field of the searched line is missing
Student Number : 92839342 Name : Robert Bloomingdale Lab1 : 9 Lab2 : 26
Lab3:18 Lab4 : 22 Lab5 : 9 Lab6 : 12 Exam1 : 25 Exam2 : 39 Final
what should I do?
is there any problem with \n at the end of the search result?
You can correct the amount of extra whitespace between fields by correcting the printf statement:
awk -F ":" 'NR == 1 { split($0, array, FS) } NR >= 2 { for (i=1; i<=NF; i++) printf "%s:%s ", array[i], $i; printf "\n" }' file.txt
Contents of file.txt:
Student Number:Name:Lab1:Lab2:Lab3:Lab4:Lab5:Lab6:Exam1:Exam2:Final
92839342:Robert Bloomingdale:9:26:18:22:9:12:25:39:99
Results:
Student Number:92839342 Name:Robert Bloomingdale Lab1:9 Lab2:26 Lab3:18 Lab4:22 Lab5:9 Lab6:12 Exam1:25 Exam2:39 Final:99
EDIT:
Also, your missing the last value because the file you're working with probably has windows newline endings. To fix this, run: dos2unix file.txt before running your awk code. Alternatively, you can set awk's record separater so that it understands newline endings:
awk 'BEGIN { RS="\r\n"; FS=":" } NR == 1 { split($0, array, FS) } NR >= 2 { for (i=1; i<=NF; i++) printf "%s:%s ", array[i], $i; printf "\n" }' file.txt
EDIT:
The above requires GNU awk, split() splits on the FS by default so no need to use that as an arg, it's common to use "next" rather than specifying opposite conditions, and it's common to use print "" instead of printf "\n" so you use the ORS setting rather than hard-coding it's value in output statements. So, the above should be tweaked to:
gawk 'BEGIN { RS="\r\n"; FS=":" } NR == 1 { split($0, array); next } { for (i=1; i<=NF; i++) printf "%s:%s ", array[i], $i; print "" }' file.txt

awk Joining n fields with delimiter

How can I use awk to join various fields, given that I don't know how many of them I have? For example, given the input string
aaa/bbb/ccc/ddd/eee
I use -F'/' as delimiter, do some manipulation on aaa, bbb, ccc, ddd, eee (altering, removing...) and I want to join it back to print something line
AAA/bbb/ddd/e
Thanks
... given that I don't know how many of them I have?
Ah, but you do know how many you have. Or you will soon, if you keep reading :-)
Before giving you a record to process, awk will set the NF variable to the number of fields in that record, and you can use for loops to process them (comments aren't part of the script, I've just put them there to explain):
$ echo pax/is/a/love/god | awk -F/ '{
gsub (/god/,"dog",$5); # pax,is,a,love,dog
$4 = ""; # pax,is,a,,dog
$6 = $5; # pax,is,a,,dog,dog
$5 = "rabid"; # pax,is,a,,rabid,dog
printf $1; # output "pax"
for (i = 2; i <= NF; i++) { # output ".<field>"
if ($i != "") { # but only for non-blank fields (skip $4)
printf "."$i;
}
}
printf "\n"; # finish line
}'
pax.is.a.rabid.dog
This shows manipulation of the values, as well as insertion and deletion.
The following will show you how to process each field and do some example manipulations on them.
The only caveat of using the output field separator OFS is that "deleted" fields will still have delimiters as shown in the output below; however it makes the code much simpler if you can live with that.
awk '
BEGIN{FS=OFS="/"}
{
for(i=1;i<=NF;i++){
if($i == "aaa")
$i=toupper($i)
else if($i ~ /c/)
$i=""
else if($i ~ /^eee$/)
$i="e"
}
}1' <<<'aaa/bbb/ccc/ddd/eee'
Output
AAA/bbb//ddd/e
This might work for you:
echo "aaa/bbb/ccc/ddd/eee" |
awk 'BEGIN{FS=OFS="/"}{sub(/../,"",$4);NF=4;print}'
aaa/bbb/ccc/d
To delete fields not at the end use a function to shuffle the values:
echo "aaa/bbb/ccc/ddd/eee" |
awk 'func d(n){for(x=n;x<=NF-1;x++){y=x+1;$x=$y}NF--};BEGIN{FS=OFS="/"}{d(2);print}'
aaa/ccc/ddd/eee
Deletes the second field.
awk -F'/' '{ # I'd suggest to add them to an array, like:
# for (i=1;i<=NF;i++) {a[i]=$i }
# Now manipulate your elements in the array
# then finally print them:
n = asorti(a, dest)
for (i=1;i<=n;i++) { output+=dest[i] "/") }
print gensub("/$","","g",output)
}' INPUTFILE
Doing it this way you can delete elements as well. Note deleting an item can be done like delete array[index].

Use Awk to Print every character as its own column?

I am in need of reorganizing a large CSV file. The first column, which is currently a 6 digit number needs to be split up, using commas as the field separator.
For example, I need this:
022250,10:50 AM,274,22,50
022255,11:55 AM,275,22,55
turned into this:
0,2,2,2,5,0,10:50 AM,274,22,50
0,2,2,2,5,5,11:55 AM,275,22,55
Let me know what you think!
Thanks!
It's a lot shorter in perl:
perl -F, -ane '$,=","; print split("",$F[0]), #F[1..$#F]' <file>
Since you don't know perl, a quick explanation. -F, indicates the input field separator is the comma (like awk). -a activates auto-split (into the array #F), -n implicitly wraps the code in a while (<>) { ... } loop, which reads input line-by-line. -e indicates the next argument is the script to run. $, is the output field separator (it gets set iteration of the loop this way, but oh well). split has obvious purpose, and you can see how the array is indexed/sliced. print, when lists as arguments like this, uses the output field separator and prints all their fields.
In awk:
awk -F, '{n=split($1,a,""); for (i=1;i<=n;i++) {printf("%s,",a[i])}; for (i=2;i<NF;i++) {printf("%s,",$i)}; print $NF}' <file>
I think this might work. The split function (at least in the version I am running) splits the value into individual characters if the third parameter is an empty string.
BEGIN{ FS="," }
{
n = split( $1, a, "" );
for ( i = 1; i <= n; i++ )
printf("%s,", a[i] );
sep = "";
for ( i = 2; i <= NF; i++ )
{
printf( "%s%s", sep, $i );
sep = ",";
}
printf("\n");
}
here's another way in awk
$ awk -F"," '{gsub(".",",&",$1);sub("^,","",$1)}1' OFS="," file
0,2,2,2,5,0,10:50 AM,274,22,50
0,2,2,2,5,5,11:55 AM,275,22,55
Here's a variation on a theme. One thing to note is it prints the remaining fields without using a loop. Another is that since you're looping over the characters in the first field anyway, why not just do it without using the null-delimiter feature of split() (which may not be present in some versions of AWK):
awk -F, 'BEGIN{OFS=","} {len=length($1); for (i=1;i<len; i++) {printf "%s,", substr($1,i,1)}; printf "%s", substr($1,len,1);$1=""; print $0}' filename
As a script:
BEGIN {FS = OFS = ","}
{
len = length($1);
for (i=1; i<len; i++)
{printf "%s,", substr($1, i, 1)};
printf "%s", substr($1, len, 1)
$1 = "";
print $0
}