removing whitespace between variable strings in a file - removing-whitespace

I have 718 similarly formatted files that need to be a little cleaned up for a program to use each one. The original file begins with a space on the very first line, which needs to go. The DNA sequences should not have a space between every 10 bases (it's okay that the sequences are broken up on multiple lines). Below, I first show what the original file looks like, which is then followed by how it should look like.
Original:
14 128
Alydidae_Micrelytrinae_Leptocorisini_Stenocoris_tipuloides_CMF_0174_S59_L005 caggacccga ggttcaacag cgagattgac atgaggacag gttacaagac
Coreidae_Coreinae_Acanthocephalini_Acanthocephala_thomasi_CMF0028_UQ caggacccgc gatttaacag tgagatagac atgcgaacag gctacaagac
Coreidae_Coreinae_Anisoscelini_Anisoscelis_alipes_CMF0018_UQ caggacccgc ggtttaacag tgagatagac atgcgaactg gctacaagac
Coreidae_Coreinae_Mictini_Anoplocnemis_sp_CMF0020_UQ caggacccgc gcttcaacag tgagatagac atgcgaacag gctataagac
Coreidae_Coreinae_Mictini_Mygdonia_tuberculosa_CMF0053_UQ caggacccgc gcttcaacag tgagatagac atgcgaacag gctataagac
Coreidae_Coreinae_Nematopodini_Mozena_nr_lineolata_CMF0026_UQ caggacccgc ggttcaacag cgatatagac atgcgaacag gctacaggac
Coreidae_Coreinae_Nematopodini_Thasus_neocalifornicus_CMF0190_UQ caggacccgc gtttcaacag cgagatcgat atgcggacag ggtacaagac
Coreidae_Pseudophloeinae_Clavigrallini_Clavigralla_sp_CMF_0335_S81_L005_UQ caggatccga ggttcaacag cgagatagac atgaggacag gttacaaaac
Coreidae_Pseudophloeinae_Pseudophloeini_Myla_sp_CMF_0091_S35_L005_UQ caggacccga ggttcaacag cgagatagac atgcggacag gctataaaac
Largidae_Largus_sp_CMF_0230_S65_L005_UQ caggacccga ggttcaacag cgaaatagac atgaggactg gctataagac
Pentatomidae_Halyomorpha_halys_halhal1 caggatccga ggttcaacag cgaaatcgac atgaggactg gctacaagac
Pyrrhocoridae_Dysdercus_mimus_CMF_0110_S42_L005_UQ caggatcctc gtttcaacag cgaaatcgac atgagaacag gttacaagac
Pyrrhocoridae_Dysdercus_suturellus_CMF_0305_S71_L005_UQ caggatcctc gtttcaacag cgaaatcgac atgagaacag gttacaagac
Rhopalidae_Serinethinae_Serinethini_Jadera_haematoloma_CMF_0281_S69_L005_UQ caggaccccc gttttaacag tgaaatagac atgcgaaccg gttacaagac
taacactatc ctctgcggcc ccatctctaa ctacgaaggt gatgtgattg
caacactatc ctctgtgggc ccatctctaa ctacgaagga gaggtgatag
caacaccatc ctctgtgggc ctatttctaa ctacgaaggg gaggtgatag
caacactata ctctgcgggc ctatatccaa ctacgaagga gaggtgattg
caacacgata ctctgtgggc ctatatctaa ctacgaagga gaggtgatag
gaacaccatc ctttgcgggc cgatctccaa ctacgagggg gaggtgatcg
caacaccatc ctctgcgggc ctatctccaa ctacgaaggg gaggtgatcg
caacaccatc ctctgtggac ccatctctaa ctacgaagga gaagtgatag
caacaccatc ctctgcgggc ccatctccaa ctacgaaggg gaggtgatcg
tcataccatt ctatgtgggc ctatttcaaa ttacgaaggg gaagtgatcg
taacaccatc ctctgcggcc ccatttccaa ctacgaaggc gaagtgattg
caacacaata ctctgcggac ccatatcgaa ctacgaaggt gaagtcatag
caacacaata ctctgcggac ccatatcgaa ctacgaaggt gaagtcatag
ccacaccatc ctctgcggac ccatctccaa ctacgaaggt gaggtgatag
gagttgccca gatcatcaac aagactga
gagtagctca gatcatcaac aagaccga
gggtagctca gatcatcaac aagacgga
gagtagctca gatcatcaat aagactga
gagtagctca gatcatcaat aagaccga
gggtggcaca gatcatcaac aagacgga
gagtggctca gatcatcaac aagacgga
gcgtcgcaca gatcatc--- --------
gcgtcgcaca gatcataaac aagaccga
gggtagccca gatcataaac aaaacaga
gagtcgccca gatcatcaac aaaactga
gagtggcgca gatcatcatt aaaaccga
gagtggcgca gatcatcaat aaaacgga
gagtagccca gatcatcaac aagacgga
How it should look after processing:
14 128
Alydidae_Micrelytrinae_Leptocorisini_Stenocoris_tipuloides_CMF_0174_S59_L005 caggacccgaggttcaacagcgagattgacatgaggacaggttacaagac
Coreidae_Coreinae_Acanthocephalini_Acanthocephala_thomasi_CMF0028_UQ caggacccgcgatttaacagtgagatagacatgcgaacaggctacaagac
Coreidae_Coreinae_Anisoscelini_Anisoscelis_alipes_CMF0018_UQ caggacccgcggtttaacagtgagatagacatgcgaactggctacaagac
Coreidae_Coreinae_Mictini_Anoplocnemis_sp_CMF0020_UQ caggacccgcgcttcaacagtgagatagacatgcgaacaggctataagac
Coreidae_Coreinae_Mictini_Mygdonia_tuberculosa_CMF0053_UQ caggacccgcgcttcaacagtgagatagacatgcgaacaggctataagac
Coreidae_Coreinae_Nematopodini_Mozena_nr_lineolata_CMF0026_UQ caggacccgcggttcaacagcgatatagacatgcgaacaggctacaggac
Coreidae_Coreinae_Nematopodini_Thasus_neocalifornicus_CMF0190_UQ caggacccgcgtttcaacagcgagatcgatatgcggacagggtacaagac
Coreidae_Pseudophloeinae_Clavigrallini_Clavigralla_sp_CMF_0335_S81_L005_UQ caggatccgaggttcaacagcgagatagacatgaggacaggttacaaaac
Coreidae_Pseudophloeinae_Pseudophloeini_Myla_sp_CMF_0091_S35_L005_UQ caggacccgaggttcaacagcgagatagacatgcggacaggctataaaac
Largidae_Largus_sp_CMF_0230_S65_L005_UQ caggacccgaggttcaacagcgaaatagacatgaggactggctataagac
Pentatomidae_Halyomorpha_halys_halhal1 caggatccgaggttcaacagcgaaatcgacatgaggactggctacaagac
Pyrrhocoridae_Dysdercus_mimus_CMF_0110_S42_L005_UQ caggatcctcgtttcaacagcgaaatcgacatgagaacaggttacaagac
Pyrrhocoridae_Dysdercus_suturellus_CMF_0305_S71_L005_UQ caggatcctcgtttcaacagcgaaatcgacatgagaacaggttacaagac
Rhopalidae_Serinethinae_Serinethini_Jadera_haematoloma_CMF_0281_S69_L005_UQ caggacccccgttttaacagtgaaatagacatgcgaaccggttacaagac
taacactatcctctgcggccccatctctaactacgaaggtgatgtgattg
caacactatcctctgtgggcccatctctaactacgaaggagaggtgatag
caacaccatcctctgtgggcctatttctaactacgaaggggaggtgatag
caacactatactctgcgggcctatatccaactacgaaggagaggtgattg
caacacgatactctgtgggcctatatctaactacgaaggagaggtgatag
gaacaccatcctttgcgggccgatctccaactacgagggggaggtgatcg
caacaccatcctctgcgggcctatctccaactacgaaggggaggtgatcg
caacaccatcctctgtggacccatctctaactacgaaggagaagtgatag
caacaccatcctctgcgggcccatctccaactacgaaggggaggtgatcg
tcataccattctatgtgggcctatttcaaattacgaaggggaagtgatcg
taacaccatcctctgcggccccatttccaactacgaaggcgaagtgattg
caacacaatactctgcggacccatatcgaactacgaaggtgaagtcatag
caacacaatactctgcggacccatatcgaactacgaaggtgaagtcatag
ccacaccatcctctgcggacccatctccaactacgaaggtgaggtgatag
gagttgcccagatcatcaacaagactga
gagtagctcagatcatcaacaagaccga
gggtagctcagatcatcaacaagacgga
gagtagctcagatcatcaataagactga
gagtagctcagatcatcaataagaccga
gggtggcacagatcatcaacaagacgga
gagtggctcagatcatcaacaagacgga
gcgtcgcacagatcatc-----------
gcgtcgcacagatcataaacaagaccga
gggtagcccagatcataaacaaaacaga
gagtcgcccagatcatcaacaaaactga
gagtggcgcagatcatcattaaaaccga
gagtggcgcagatcatcaataaaacgga
gagtagcccagatcatcaacaagacgga

Related

get the index of string element in np1 while it has a substring in np2

my code is as below:
import numpy as np
keywordlist = ['cpp-4.8.5', 'CUnit-2.1.3', 'CUnit-devel', 'doxygen-1.8.5', 'e2fsprogs-1.42.9', 'e2fsprogs-libs', 'epel-release', 'fuse3-devel', 'fuse3-libs', 'gcc-4.8.5', 'gcc-c++', 'gcc-gfortran', 'ghc-array', 'ghc-base', 'ghc-bytestring', 'ghc-containers', 'ghc-deepseq', 'ghc-directory', 'ghc-filepath', 'ghc-json', 'ghc-mtl', 'ghc-old', 'ghc-parsec', 'ghc-pretty', 'ghc-regex', 'ghc-regex', 'ghc-ShellCheck', 'ghc-syb', 'ghc-text', 'ghc-time', 'ghc-transformers', 'ghc-unix', 'git-1.8.3.1', 'graphviz-2.30.1', 'help2man-1.41.1', 'ibacm-22.4', 'keyutils-libs', 'krb5-devel', 'krb5-libs', 'krb5-workstation', 'lcov-1.13', 'libaio-devel', 'libblkid-2.23.2', 'libcom_err-1.42.9', 'libcom_err-devel', 'libgcc-4.8.5', 'libgfortran-4.8.5', 'libgomp-4.8.5', 'libibumad-22.4', 'libibverbs-22.4', 'libiscsi-devel', 'libkadm5-1.15.1', 'libmount-2.23.2', 'libpmem-1.5.1', 'libpmemblk-1.5.1', 'libpmemblk-devel', 'libpmem-devel', 'libquadmath-4.8.5', 'libquadmath-devel', 'librdmacm-22.4', 'libselinux-2.5', 'libselinux-devel', 'libselinux-python', 'libselinux-utils', 'libsepol-devel', 'libsmartcols-2.23.2', 'libss-1.42.9', 'libstdc++-4.8.5', 'libstdc++-devel', 'libunwind-1.2', 'libunwind-devel', 'libuuid-2.23.2', 'libuuid-devel', 'libverto-devel', 'libXaw-1.0.13', 'libXScrnSaver-1.2.2', 'make-3.82', 'nasm-2.10.07', 'numactl-devel', 'numactl-libs', 'openssl-1.0.2k', 'openssl-devel', 'openssl-libs', 'pcre-devel', 'perl-Digest', 'perl-Digest', 'perl-GD', 'perl-Git', 'python-2.7.5', 'python2-pycodestyle', 'python-libs', 'rdma-core', 'rdma-core', 'sg3_utils-1.37', 'sg3_utils-libs', 'ShellCheck-0.3.8', 'util-linux', 'zlib-devel']
np1 = np.array(keywordlist)
# ['cpp-4.8.5' 'CUnit-2.1.3' 'CUnit-devel' 'doxygen-1.8.5' ... 'ShellCheck-0.3.8' 'util-linux' 'zlib-devel']
result = ['epel-release-7-12.noarch', 'rdma-core-22.4-5.el7.x86_64', 'cpp-4.8.5-44.el7.x86_64', 'doxygen-1.8.5-4.el7.x86_64', 'ghc-base-4.6.0.1-26.4.el7.x86_64', 'libuuid-2.23.2-65.el7.x86_64', 'python-libs-2.7.5-89.el7.x86_64', 'libkadm5-1.15.1-50.el7.x86_64', 'libmount-2.23.2-65.el7.x86_64', 'libquadmath-4.8.5-44.el7.x86_64', 'util-linux-2.23.2-65.el7.x86_64', 'libss-1.42.9-19.el7.x86_64', 'keyutils-libs-1.5.8-3.el7.x86_64', 'e2fsprogs-libs-1.42.9-19.el7.x86_64', 'ghc-pretty-1.1.1.0-26.4.el7.x86_64', 'libXaw-1.0.13-4.el7.x86_64', 'libselinux-2.5-15.el7.x86_64', 'libibverbs-22.4-5.el7.x86_64', 'libselinux-utils-2.5-15.el7.x86_64', 'libgomp-4.8.5-44.el7.x86_64', 'libblkid-2.23.2-65.el7.x86_64', 'gcc-c++-4.8.5-44.el7.x86_64', 'e2fsprogs-1.42.9-19.el7.x86_64', 'CUnit-devel-2.1.3-8.el7.x86_64', 'make-3.82-24.el7.x86_64', 'numactl-libs-2.0.12-5.el7.x86_64', 'perl-Git-1.8.3.1-23.el7_8.noarch', 'openssl-libs-1.0.2k-19.el7.x86_64', 'gcc-4.8.5-44.el7.x86_64', 'CUnit-2.1.3-8.el7.x86_64', 'ghc-syb-0.4.0-35.el7.x86_64', 'gcc-gfortran-4.8.5-44.el7.x86_64', 'libselinux-python-2.5-15.el7.x86_64', 'sg3_utils-libs-1.37-19.el7.x86_64', 'fuse3-libs-3.6.1-4.el7.x86_64', 'libquadmath-devel-4.8.5-44.el7.x86_64', 'libgfortran-4.8.5-44.el7.x86_64', 'krb5-workstation-1.15.1-50.el7.x86_64', 'librdmacm-22.4-5.el7.x86_64', 'sg3_utils-1.37-19.el7.x86_64', 'libsmartcols-2.23.2-65.el7.x86_64', 'fuse3-devel-3.6.1-4.el7.x86_64', 'python-2.7.5-89.el7.x86_64', 'openssl-1.0.2k-19.el7.x86_64', 'libgcc-4.8.5-44.el7.x86_64', 'libaio-devel-0.3.109-13.el7.x86_64', 'ghc-old-locale-1.0.0.5-26.4.el7.x86_64', 'libcom_err-1.42.9-19.el7.x86_64', 'git-1.8.3.1-23.el7_8.x86_64', 'krb5-libs-1.15.1-50.el7.x86_64']
np2 = np.array(result)
# ['epel-release-7-12.noarch' 'rdma-core-22.4-5.el7.x86_64' ... 'krb5-libs-1.15.1-50.el7.x86_64']
expectation = ['cpp-4.8.5-39.el7.x86_64', 'CUnit-2.1.3-8.el7.x86_64', 'CUnit-devel-2.1.3-8.el7.x86_64', 'doxygen-1.8.5-4.el7.x86_64', 'e2fsprogs-1.42.9-17.el7.x86_64', 'e2fsprogs-libs-1.42.9-17.el7.x86_64', 'epel-release-latest-7.noarch', 'fuse3-devel-3.6.1-4.el7.x86_64', 'fuse3-libs-3.6.1-4.el7.x86_64', 'gcc-4.8.5-39.el7.x86_64', 'gcc-c++-4.8.5-39.el7.x86_64', 'gcc-gfortran-4.8.5-39.el7.x86_64', 'ghc-array-0.4.0.1-26.4.el7.x86_64', 'ghc-base-4.6.0.1-26.4.el7.x86_64', 'ghc-bytestring-0.10.0.2-26.4.el7.x86_64', 'ghc-containers-0.5.0.0-26.4.el7.x86_64', 'ghc-deepseq-1.3.0.1-26.4.el7.x86_64', 'ghc-directory-1.2.0.1-26.4.el7.x86_64', 'ghc-filepath-1.3.0.1-26.4.el7.x86_64', 'ghc-json-0.7-4.el7.x86_64', 'ghc-mtl-2.1.2-27.el7.x86_64', 'ghc-old-locale-1.0.0.5-26.4.el7.x86_64', 'ghc-parsec-3.1.3-31.el7.x86_64', 'ghc-pretty-1.1.1.0-26.4.el7.x86_64', 'ghc-regex-base-0.93.2-29.el7.x86_64', 'ghc-regex-tdfa-1.1.8-11.el7.x86_64', 'ghc-ShellCheck-0.3.8-1.el7.x86_64', 'ghc-syb-0.4.0-35.el7.x86_64', 'ghc-text-0.11.3.1-2.el7.x86_64', 'ghc-time-1.4.0.1-26.4.el7.x86_64', 'ghc-transformers-0.3.0.0-34.el7.x86_64', 'ghc-unix-2.6.0.1-26.4.el7.x86_64', 'git-1.8.3.1-23.el7_8.x86_64', 'graphviz-2.30.1-21.el7.x86_64', 'help2man-1.41.1-3.el7.noarch', 'ibacm-22.4-2.el7_8.x86_64', 'keyutils-libs-devel-1.5.8-3.el7.x86_64', 'krb5-devel-1.15.1-46.el7.x86_64', 'krb5-libs-1.15.1-46.el7.x86_64', 'krb5-workstation-1.15.1-46.el7.x86_64', 'lcov-1.13-1.el7.noarch', 'libaio-devel-0.3.109-13.el7.x86_64', 'libblkid-2.23.2-63.el7.x86_64', 'libcom_err-1.42.9-17.el7.x86_64', 'libcom_err-devel-1.42.9-17.el7.x86_64', 'libgcc-4.8.5-39.el7.x86_64', 'libgfortran-4.8.5-39.el7.x86_64', 'libgomp-4.8.5-39.el7.x86_64', 'libibumad-22.4-2.el7_8.x86_64', 'libibverbs-22.4-2.el7_8.x86_64', 'libiscsi-devel-1.9.0-7.el7.x86_64', 'libkadm5-1.15.1-46.el7.x86_64', 'libmount-2.23.2-63.el7.x86_64', 'libpmem-1.5.1-2.1.el7.x86_64', 'libpmemblk-1.5.1-2.1.el7.x86_64', 'libpmemblk-devel-1.5.1-2.1.el7.x86_64', 'libpmem-devel-1.5.1-2.1.el7.x86_64', 'libquadmath-4.8.5-39.el7.x86_64', 'libquadmath-devel-4.8.5-39.el7.x86_64', 'librdmacm-22.4-2.el7_8.x86_64', 'libselinux-2.5-15.el7.x86_64', 'libselinux-devel-2.5-15.el7.x86_64', 'libselinux-python-2.5-15.el7.x86_64', 'libselinux-utils-2.5-15.el7.x86_64', 'libsepol-devel-2.5-10.el7.x86_64', 'libsmartcols-2.23.2-63.el7.x86_64', 'libss-1.42.9-17.el7.x86_64', 'libstdc++-4.8.5-39.el7.x86_64', 'libstdc++-devel-4.8.5-39.el7.x86_64', 'libunwind-1.2-2.el7.x86_64', 'libunwind-devel-1.2-2.el7.x86_64', 'libuuid-2.23.2-63.el7.x86_64', 'libuuid-devel-2.23.2-63.el7.x86_64', 'libverto-devel-0.2.5-4.el7.x86_64', 'libXaw-1.0.13-4.el7.x86_64', 'libXScrnSaver-1.2.2-6.1.el7.x86_64', 'make-3.82-24.el7.x86_64', 'nasm-2.10.07-7.el7.x86_64', 'numactl-devel-2.0.12-5.el7.x86_64', 'numactl-libs-2.0.12-5.el7.x86_64', 'openssl-1.0.2k-19.el7.x86_64', 'openssl-devel-1.0.2k-19.el7.x86_64', 'openssl-libs-1.0.2k-19.el7.x86_64', 'pcre-devel-8.32-17.el7.x86_64', 'perl-Digest-1.17-245.el7.noarch', 'perl-Digest-MD5-2.52-3.el7.x86_64', 'perl-GD-2.49-3.el7.x86_64', 'perl-Git-1.8.3.1-23.el7_8.noarch', 'python-2.7.5-88.el7.x86_64', 'python2-pycodestyle-2.5.0-1.el7.noarch', 'python-libs-2.7.5-88.el7.x86_64', 'rdma-core-22.4-2.el7_8.x86_64', 'rdma-core-devel-22.4-2.el7_8.x86_64', 'sg3_utils-1.37-19.el7.x86_64', 'sg3_utils-libs-1.37-19.el7.x86_64', 'ShellCheck-0.3.8-1.el7.x86_64', 'util-linux-2.23.2-63.el7.x86_64', 'zlib-devel-1.2.7-18.el7.x86_64']
np3 = np.array(expectation)
# ['cpp-4.8.5-39.el7.x86_64' 'CUnit-2.1.3-8.el7.x86_64' ... 'util-linux-2.23.2-63.el7.x86_64' 'zlib-devel-1.2.7-18.el7.x86_64']
ready = []
for i in keywordlist:
for j in result:
x = np.char.startswith(j, i)
if x:
ready.append(np3[np.where(np.char.startswith(np3, i))])
np4 = np.array(ready)
# [array(['cpp-4.8.5-39.el7.x86_64'], dtype='<U39') array(['CUnit-2.1.3-8.el7.x86_64'], dtype='<U39') ... array(['util-linux-2.23.2-63.el7.x86_64'], dtype='<U39')]
notready = [i for i in np3 if i not in np4]
print(f"not ready: {notready}")
The purpose is to use string format keyword in keyword list to examine its existence in all np2 elements.
If any element in np2 starts with any keyword, or keyword is the substring of any element in np2, get the index of element in expectation which also start with that keyword and form into np4.
Finally, get not ready which is made up of elements that are in np3 but not in np4.
To make my explanation more vividly, I have a bunch of rpm files to be installed, the list of expectation.
The keyword list catches the former two keywords of each rpm file name.
Result is the standard output of already installed rpm files.
Taking cpp-4.8.5 as an example, I can see cpp-4.8.5-44.el7.x86_64 in result, which means currently cpp-4.8.5-44.el7.x86_64 has been installed. So, cpp-4.8.5-39.el7.x86_64 in expectation can be removed, since cpp-4.8.5-*.rpm has been successfully installed. Next step, deal with the other left items in expectation.
My question is: there any easier or more efficient way to get the result equivalent to notready? maybe with any other numpy built-in methods, but not with for loop.

Shortcut to move to a specific part of a text file

I'd like to create a keyboard shortcut (such as CTRL+T) that automatically moves the cursor to the line after the occurence of a fixed text, such as &todo.
Example:
foo
bar
&todo
fix bug #783
blah
blah2
Pressing CTRL+T would automatically move the cursor to the line beginning with fix ....
Currently I'm doing it like this:
CTRL F
enter &todo, ENTER
ESCAPE (closes the Search bottom panel)
HOME
DOWN ARROW (moves to next line)
but this requires too many actions.
How to do that in a single key shortcut?
The best solution is it use a plugin to do that.
The plugin below does what you require. It will find the next occurrence of pattern (i.e. the &todo marker) below the current cursor position, move the cursor to the line below it, and centre that position in the window. If the pattern is not found below the current cursor position it will search again from the top of the buffer providing a wrap around feature.
Copy and paste the following Python code into a buffer and save it in your Sublime Text config User folder as GoToPattern.py.
import sublime
import sublime_plugin
class GotoPatternCommand(sublime_plugin.TextCommand):
def run(self, edit, pattern):
sels = self.view.sel()
# Optional flags; see API.
flags = sublime.LITERAL | sublime.IGNORECASE
start_pos = sels[0].end() if len(sels) > 0 else 0
find_pos = self.view.find(pattern, start_pos, flags)
if not find_pos and start_pos > 0:
# Begin search again at the top of the buffer; wrap around
# feature, i.e. do not stop the search at the buffer's end.
find_pos = self.view.find(pattern, 0, flags)
if not find_pos:
sublime.status_message("'{}' not found".format(pattern))
return
sels.clear()
sels.add(find_pos.begin())
self.view.show_at_center(find_pos.begin())
row, col = self.view.rowcol(find_pos.begin())
self.view.run_command("goto_line", {"line": row + 2})
# Uncomment for: cursor to the end of the line.
# self.view.run_command("move_to", {"to": "eol"})
Add key bindings:
// The pattern arg, i.e. "&todo", can be changed to anything you want
// and other key bindings can also be added to use different patterns.
{"keys": ["???"], "command": "goto_pattern", "args": {"pattern": "&todo"}}
Add a Command Palette entry to Default.sublime-commands if you want:
{"caption": "GoToPattern: &todo", "command": "goto_pattern", "args": {"pattern": "&todo"}},
These links may be useful to you ST v. 2 API and ST v. 3 API.
P.S. Did you know that Sublime Text has bookmarks? [Just in case you didn't.]
The accepted answer is really better and I'm finally using it.
For reference, here an old solution I used: first create a gototodo.py file in "C:\Users\User\AppData\Roaming\Sublime Text 2\Packages\User\" containing:
import sublime, sublime_plugin
class GototodoCommand(sublime_plugin.TextCommand):
def run(self, edit):
contents = self.view.substr(sublime.Region(0, self.view.size())) # https://stackoverflow.com/questions/20182008/sublime-text-3-api-get-all-text-from-a-file
a = contents.find('&todo')
cursors = self.view.sel()
cursors.clear()
location = sublime.Region(a, a)
cursors.add(location)
self.view.show_at_center(location)
(row, col) = self.view.rowcol(self.view.sel()[0].begin()) # go to the next line
self.view.run_command("goto_line", {"line": row+2})
Then add this in "C:\Users\User\AppData\Roaming\Sublime Text 2\Packages\User\Default (Windows).sublime-keymap":
{ "keys": ["ctrl+t"], "command": "gototodo" }
Done!

Finding next all values in Array

Need help in finding all particular values after certain match in array
Alpha 24835 line 24837 node 24780 destination 11.a1.v2.bt.13.91 next 24801
Alpha 24840 line 22543 node 24784 destination 10.a1.32.b2.12.10 next 24637
Alpha 24855 line 24734 node 24798 destination 10.a1.cb.62.41.31 next 24564
Alpha 24861 line 24947 node 24800 destination 12.g3.55.b7.76.19 next 24435
Alpha 24890 line 23538 node 24880 destination 10.b1.59.v5.25.33 next 24543
This is an exact example of the scenario, I would like to take output of destinations so when I found node 24784 in the second row which I can find with array(node) then I would like to display all remaining destinations and then when my required node is 24800 then I just need only two destinations as an output i.e: 12.g3.55.b7.76.19 and 10.b1.59.v5.25.33
The basic structure here is list of associative arrays, but Tcl arrays don't really lend themselves to being put in lists. It would be easier to put Tcl dicts in a list.
You can get a list of dicts like this:
set data [split [string trim {
Alpha 24835 line 24837 node 24780 destination 11.a1.v2.bt.13.91 next 24801
Alpha 24840 line 22543 node 24784 destination 10.a1.32.b2.12.10 next 24637
Alpha 24855 line 24734 node 24798 destination 10.a1.cb.62.41.31 next 24564
Alpha 24861 line 24947 node 24800 destination 12.g3.55.b7.76.19 next 24435
Alpha 24890 line 23538 node 24880 destination 10.b1.59.v5.25.33 next 24543
}] \n]
Then you can find the list index of the dict that holds the searched-for node like this:
set node 24784
set idx [lsearch -index 5 $data $node]
and print out the list of destinations like this:
if {$idx >= 0} {
puts [lmap item [lrange $data $idx end] {dict get $item destination}]
}
The number of remaining destinations is [llength [lrange $data $idx end]], assuming that $idx >= 0.
Documentation: dict, if, llength, lmap, lmap replacement, lrange, lsearch, puts, set, split, string

Replace values in a file conditional on their value

I have a file full of numbers that range 10.00-10.66, 20.67-21.33, 30.67-31.33 and 40.34-42.00.
Example input:
10.21 21.12 10.50 30.80
30.91 31.12 31.00 10.30
21.21 20.99 20.90 31.20
41.71 41.72 10.10 41.80
I want to convert the file such that:
10.00-10.20 = 0|0:[DOSE]
10.21-10.66 = .|.:[DOSE]
20.90-21.10 = 1|0:[DOSE]
20.67-20.89 = .|.:[DOSE]
21.11-21.33 = .|.:[DOSE]
30.90-31.10 = 0|1:[DOSE]
30.67-30.89 = .|.:[DOSE]
31.11-31.33 = .|.:[DOSE]
41.80-42.00 = 1|1:[DOSE]
41.34-41.79 = .|.:[DOSE]
Example output:
.|.:10.21 .|.:21.12 .|.:10.50 .|.:30.80
0|1:30.91 .|.:31.12 0|1:31.00 .|.:10.30
.|.:21.21 1|0:20.99 1|0:20.90 .|.:31.20
.|.:41.71 .|.:41.72 0|0:10.10 1|1:41.80
I can think of a way to do this in R, but the actual file is roughly 1000*5000000 elements in size, and I don't think R can cope!
Is there a way to conditionally replace all elements in a file dependant on their value with an in-line text editor like sed or awk? Alternative programs are welcome!
A simple way to do this in awk would be like this:
{
for (i=1;i<=NF;++i) {
if ($i>=10&&$i<=10.2) $i="0|0:"$i
else if ($i>=10.21&&$i<=10.66) $i=".|.:"$i
# etc.
}
print
}
That is, loop through each field of each record and add the strings you want depending on the value of the field. You can put the script in a file and run it like awk -f script.awk input_file

itext setField duplicates the field value

I am having a strange problem with iText and acrofields. I created a PDF and added the acrofields. Now when I do form.setField ('a field name', "a value") and I display or print the PDF, the value gets duplicated (once in smaller font and once in the intended font for that document). I checked the structure of the document and it doesn't look that my Acrofield are duplicated. What could be the cause of this
Thanks in advance
Pascal
Please find link here: https://drive.google.com/file/d/0B8O5n5QFSSNrSGVlNllOcEJHRzQ/edit?usp=sharing
I am on Ubuntu. Maybe that's why! I am using evince to look at the file, however I get the same result when I print it. I included a screenshot of what I see. https://drive.google.com/file/d/0B8O5n5QFSSNrWXJyY2VpSkt5NE0/edit?usp=sharing
When I say duplicated, I should say shadowed. The value of the field is first displayed without font styling then overwritten with the required font.
The code I showed is pretty straightforward. The 2 arrrays are the name of the fields and their associated values. If the value is xxxx I set the field value to its index in that array. As you can see on the screenshot it gets shadowed too. My printout looks exactly like the screenshot. I haven't tried it yet on another platform.
Here is the code written in groovy
File mergeForm (String path, Map fields, Map values, String newFile) {
println "Merge Form: $path"
def file = grailsApplication.mainContext.getResource(path)?.inputStream
if (file == null)
return null
def reader = new PdfReader(file)
def stamper = new PdfStamper(reader, new FileOutputStream(newFile))
def form = stamper.getAcroFields()
fields.eachWithIndex { k, v, i ->
def val = ""
if (v instanceof Closure) {
val = v(values)
}
else if (v == '_xxxx_') {
val = "${i + 1}"
}
else if (values[v]) {
val = values."$v"
}
println "setting value[$i]: ${val} to: $k"
form.setField (k, val)
}
stamper.close()
return new File (newFile)
}
Summing it up
The issue seems to be due to multiple field annotations in the PDF at hand for the each field which differ somewhat, though, and therefore have different appearances.
In detail
Looking at the document version BOE-267-L1-Rev-1.unlocked-with-fields.pdf we will inspect the topmost field on the first page, "This Claim is Filed for Fiscal Year 20". We see that the page object 9 in its annotations array (in object 265) has (among many others) object 304 and object 180 which both are annotations of that field!
304 0 obj
<<
/Ff 12582912
/MaxLen 2
/F 4
/Type/Annot
/Subtype/Widget
/T(This Claim is Filed for Fiscal Year 20)
/P 9 0 R
/Q 1
/MK<<>>
/FT/Tx
/Rect[166.765 693.57 188.965 701.479]
/DA(/Arial 8 Tf 0 g)
/AA<</F 333 0 R/K 334 0 R>>
>>
endobj
...
180 0 obj
<<
/Ff 0
/F 4
/Type/Annot
/Subtype/Widget
/DR<</Font<</Helv 2 0 R>>>>
/T(This Claim is Filed for Fiscal Year 20)
/V()
/AP<</N 179 0 R>>
/P 9 0 R
/BS<</W 0.5/S/S>>
/FT/Tx
/Rect[165.4 706.28 187.6 714.19]
/DA(/Helv 0 Tf 0 g )
>>
endobj
The definitions of these describe slightly different positions on the page:
/Rect[166.765 693.57 188.965 701.479]
...
/Rect[165.4 706.28 187.6 714.19]
and different default appearance strings
/DA(/Arial 8 Tf 0 g)
...
/DA(/Helv 0 Tf 0 g )
Thus, it is not a surprise that you get multiple, non-identical appearances of this field. The actual surprise is that the version filled by iText on Adobe Reader does not display double values.
#Bruno someone might want to look into this as soon as there is some time.
The other fields have duplicate appearances, too; most often the page positions are nearly identical, though, but the default appearance streams still differ which results in multiple, non-identical appearances for them, too.