While read loop and command with file output - while-loop

I have run into an issue making a while loop (yes, I am new at this..).
I have a file $lines_to_find.txt, containing a list of names which I would like to find in another (large) file $file_to_search.fasta.
When the lines in lines_to_find.txt are found in file_to_search.fasta, the lines with search hits I would like to be printed to a new file: output_file.fasta.
So I have a command similar to grep, that takes the sequences (for that is whats in the large file), and prints them to a new file:
obigrep -D SEARCHWORD INPUTFILE.fasta > OUPUTFILE.fasta
Now I would like the searchword to be replaced with the file lines_to_find.txt, and each line should be read and matched to the file_to_search.fasta. Output should preferably be one file, containing the sequence-hits from all lines in file lines_to_find.txt.
I tried this:
while read line
do
obigrep -D '$line' file_to_search.fasta >> outputfile.fasta
done < lines_to_find.txt
But my outputfile just returns empty.
What am I doing wrong?
Am I just building the while read loop wrong?
Are there other ways to do it?
I'm open to all suggestions, and as I am new, please point out obvious begginer-flaws.

Related

Comparing a .TTL file to a CSV file and extract "similar" results into a new file

I have a large CSV file that is filled with millions of different lines of which each have the following format:
/resource/example
Now I also have a .TTL file in which each line possibly has the exact same text. Now I want to extract every single line from that .TTL file containing the same text as my current CSV file into a new CSV file.
I think this is possible using grep but that is a linux command and I am very, very inexperienced with that. Is it possible to do this in Windows? I could write a Python script that compares the two files, but since both files contain millions of lines that would literally take days to execute I think. Could anyone point me in the right direction on how to do this?
Thanks in advance! :)
Edit:
Example line from .TTL file:
<nl.dbpedia.org/resource/Algoritme>; <purl.org/dc/terms/subject>; <nl.dbpedia.org/resource/Categorie:Algoritme>; .
Example line from current CSV file:
/resource/algoritme
So with these two example lines it should export the line from the .TTL file into a new CSV file.
Using GNU awk. First read the CSV and hash it to a. Then compare each entry in a against each row in the TTL file:
$ awk 'BEGIN { IGNORECASE = 1 } # ignoring the case
NR==FNR { a[$1]; next } # hash csv to a hash
{
for(i in a) # each entry in a
if($0 ~ i) { # check against every record of ttl
print # if match, output matched ttl record
next # and skip to next ttl record
}
}' file.csv file.ttl
<nl.dbpedia.org/resource/Algoritme>; <purl.org/dc/terms/subject>; <nl.dbpedia.org/resource/Categorie:Algoritme>; .
Depending on the sizes of files it might be slow and maybe could be made faster but not based on info offered in the OP.

How to use command prompt to list file names in directory but exclude 1st 3 characters

I'm trying to write some vba code that sends a line of code to the command prompt and executes it. I have that part down, but I need help getting the actual code to work.
I want to list all of the files in a specific folder that are the .doc file extension, but I want to exclude the first three characters of the filename that gets printed to my output text file. (Note: I'm using vba because this is one of several different commands I'd like to get into a single vba macro, and I cannot use batch files b/c they are blocked on my system so I'd like to work directly with the command prompt)
The following code works and gives me the file names without the file extension (ie. ABC201704.doc will return as ABC201704)
*%comspec% /c for %i in (C:\Test\ABC*.doc) do #echo %~ni >> C:\Test\Output.txt*
However, I don't know how to modify this so that it doesn't include the first 3 characters (ie. I'd like it to return 201704 instead of ABC201704).
Any help would be greatly appreciated! I tried using the following link, but I couldn't figure out how to get that to work for my situation.
Not tested:
#echo off
setlocal enableDelayedExpansion
for %%a in ("C:\Test\ABC*.doc") do (
set docname=%%~nxa
echo !docname:~3!
)
In command prompt:
cmd /v:on /c "for %a in ("C:\Test\ABC*.doc") do set docname=%~nxa & echo !docname:~3!"

Sejda merging PDFs from CSV filelist names

I recently installed sedja-console for merging pdf files from command line.
The names of the input pdf files are in a CSV file named filelist-inputs.csv like this:
./Temp/source/046032.pdf,./Temp/source/048155.pdf
./Temp/source/049278.pdf,./Temp/source/050818.pdf,./Temp/source/052962.pdf
./Temp/source/052962.pdf,./Temp/source/054117.pdf
I need one output pdf file for the first line of the CSV filelist names, other output pdf file for the second line of the second line, other output for the third line, and so...
I tried a command line like this:
~$ sejda-console merge -l filelist-inputs.csv -o ./Temp/target/merged[FILENUMBER####].pdf
But it only creates a unique file named literally merged[FILENUMBER####].pdf, when I want 3 files:
merged0001.pdf
merged0002.pdf
merged0003.pdf
I've simplified the problem, because I need to merge more than 3500 pdf files in 700 output files.
Sejda takes all the values in the CSV and generates a single merged PDF, there isn't any option or setting in Sejda to achieve what you asked, you will need some scripting to loop through the CSV lines, create a CSV per line and feed it to Sejda.
The output file name merged[FILENUMBER####].pdf is literally used because the PDF merge task generates one output file and it expects an explicit output file name. Prefixes like [CURRENTPAGE] or [FILENUMBER] are valid when used as -p argument in tasks generating multiple output PDF files (split tasks etc).

how can I find the total number of lines in a file and also detect the empty lines by using CGI and Perl

I have a script which reads a text file and print it. How can I detect the empty lines in the file and ignore them.
is there any way to find out the total line number of the file without running the
while (<$file>)
$linenumbers++;
To print the number of non empty lines in a file
perl -le 'print scalar(grep{/./}<>)'

Windows Batch File - Using Append with File Name that has spaces

I am creating a batch file to consolidate some hard coded text with a few of other existing text files.
for this I am using the below.
set "txtFile=.\text.txt"
call:Append "C:\test 123\test.txt" %textFile%
over here, when I execute it, it thros an error as it is not able to proceed with the path as it has spaces.
how should this be addressed.
I have no idea what your append batch file is doing, but you can simply use copy to concatenate two files.
It's not clear to me what the needs to be appended to what, but the following will append the contents of text.txt to C:\test 123\test.txt by writing everything to C:\test 123\test.txt.
set txtFile=.\text.txt
copy "C:\test 123\test.txt" /a + %txtFile% /a "C:\test 123\test.txt"
If you want a different output file, just change the last parameter.
Btw: it's better to not rely on a specific working directory
The following:
set txtFile=%~dp0text.txt
makes sure that the text.txt is used that is in the same directory as your batch file.