How to add asterisk to a list of filenames and then make it a line using Notepad++ - line

I have a list of file names (about 4000).
For example:
A-67569
H-67985
J-87657
K-85897
...
I need to put an asterisk before and after each file name. And then make it a line format.
Example:
*A-67569* *H-67985* *J-87657* *K-85897* so on...
Note that there is a space between filenames.
Forgot to mention, I'm trying to do this with Notepad++
How can I do it?
Please advise.
Thanks

C# example for list to string plus edits
List<string> list = new List<string> { "A - 67569"), "H-67985", "J-87657", "K-85897"};
string outString = "";
foreach(string item in list)
{
outString += "*" + item + "* ";
}
content of outstring: *A - 67569* *H-67985* *J-87657* *K-85897*

Use the Replace of your Notedad++ (Search > Replace..)
Select Extended (\n \r \t \0 \x...) on the bottom of the Replace window
In the field Find what write '\r\n' and in the field Replace with write * *
Replace all
Note, that you should manually place the single asterisk before the first and after the last words.
If this won't work, in step 2. instead of \r\n try to use only \n or \r.

You can use Regular expression in the search Mode.
Find what:
(\S+)(\R|$)
Replace with:
*$1
Note the space after de number one
For the archive
A-67569
H-67985
J-87657
K-85897
Output:
*A-67569 *H-67985 *J-87657 *K-85897
Explication of regex:
(\S+) Mean find one or more caracters is not a blank.
(\R|$) Mean find any end of line or end of file
(\S+)(\R|$) Mean find any gorup of caracters not blank ho end with end of line or end of file.
Explication of Replace with
When you use the $ simpbol, you are using a reference to the groups finded, $1 is the first group, in this case the group (\S+).

Related

Is there a possibility to delete a string that starts with a ">" and ends with a newline in postgresql via sql update statement?

Lets say I have a database with about 50k entries in a column called content.
This column contains strings which causes problems to my further work.
Now here is the thing I need to do it for all the rows inside of that table.
Any Ideas?
Here an example:
'user wrote:
-----------------------------------------------------
> Some text
> that vary too much and I dont need it actually
> here is end of the text
The text I actually need.'
I would like to remove all of the unnecessary part so the only thing that is left is in this case :
'The text I actually need.'
This should delete all lines that start with a >:
regexp_replace(textcol, E'^>.*\n', '', 'gn');
The g flag is needed to delete all such lines, and the n flag makes the ^ match the position right after each line break.
I use an “extended” string literal (the leading E) so that I can write a newline as \n.

REGEX check only when start of line

Suppose we want to keep the entire line of a string only if a particular word say e.g 'test' appears at starting of line.
If it appears anywhere then the entire line should be removed
e.g
if function_test()=5; //here this entire line should be removed
test sample =5; //here this entire should be there
From Oracle 10g R2 on you should be able to use the anchor \A to require the match at the beginning of the string (will only work for single-line strings thus).
http://www.regular-expressions.info/oracle.html
What do you mean by keep / remove lines? Where is this regex supposed to run? I.e. is it a part of an SQL command, or part of a grep, or sg else?
Regarding SQL you can use LIKE operator:
WHERE line LIKE 'test%'
You can use substring too:
WHERE substring(line, 1, 4) = 'test'
Using grep or any other language, you can specify start of line, e.g.:
grep '^test' bigfile.txt
Try...
...
WHEN REGEXP_LIKE(string,'^test','i') THEN
//this is a good line, do what you want or return string;
END
...

AutoIt ControlSend() function occasionally replaces hyphens with underscores

I'm using an AutoIt script to automate interaction with a GUI, and part of the process involves using the ControlSend() function to place a file path into a combo box. The majority of the time, the process works properly, but occasionally ( ~ 1/50 calls to the function? ) a single hyphen in the filepath is replaced with an underscore. The script is to be run unsupervised for bulk data processing, and such an error typically results in a forced-focus popup that screams "The file could not be found!" and halts further processing.
Unfortunately, due to the character limit of the combo box, I cannot supply all 16 arguments with a single call, and I am forced to load each of the images individually using the following for-loop:
;Iterate through each command line argument (file path)
For $i = 1 To $CmdLine[0]
;click the "Disk" Button to load an image from disk
ControlClick("Assemble HDR Image", "", "[CLASS:Button; TEXT:Disk; Instance:1]")
;Give the dialogue time to open before entering text
Sleep(1000)
;Send a single file path to the combo box
ControlSend("Open", "" , "Edit1", $CmdLine[$i])
;"Press Enter" to load the image
Send("{ENTER}")
Next
In an errant run, the file path
C:\my\file\path\hdr_2016-04-22T080033_00_rgb
^Hyphen
is converted to
C:\my\file\path\hdr_2016_04-22T080033_00_rgb
^Underscore
Due to the existence of both hyphens and underscores in the file name, it is difficult to perform a programmatic correction (e.g. replace all underscores with hyphens).
What can be done to correct or prevent such an error?
This is both my first attempt at GUI automation and my first question on SO, and I apologize for my lack of experience, poor wording, or deviations from StackOverflow convention.
Just use ControlSetText instead of ControlSend as it will set the complete Text at once and won't allow other keystrokes (like Shift) to interfere with the many virtual keystrokes that the Send-function fires.
If the hyphen is the problem and you need to replace it, you can do so:
#include <File.au3>
; your path
$sPath = 'C:\my\file\path'
; get all files from this path
$aFiles = _FileListToArray($sPath, '*', 1)
; if all your files looks like that (with or without hyphen), you can work with "StringRegExpReplace"
; 'hdr_2016-04-22T080033_00_rgb'
$sPattern = '(\D+\d{4})(.)(.+)'
; it means:
; 1st group: (\D+\d{4})
; \D+ one or more non-digit, i.e. "hdr_"
; \d{4} digit 4-times, i.e. "2016"
; 2nd group: (.)
; . any character, hyphen, underscore or other, only one character, i.e. "~"
; 3rd group: (.+)
; . any character, one or more times, i.e. "22T080033_00_rgb"
; now you change the filename for all cases, where this pattern matches
Local $sTmpName
For $i = 1 To $aFiles[0]
; check for pattern match
If StringRegExp($aFiles[$i]) Then
; replace the 2nd group with underscore
$sTmpName = StringRegExpReplace($aFiles[$i], $sPattern, '\1_\3')
FileMove($sPath & '\' & $aFiles[$i], $sPath & '\' & $sTmpName)
EndIf
Next

How to check that whole string matching to pattern instead find substrings that matching using NSRegularExpression? [duplicate]

I would like to write a regular expression that starts with the string "wp" and ends with the string "php" to locate a file in a directory. How do I do it?
Example file: wp-comments-post.php
This should do it for you ^wp.*php$
Matches
wp-comments-post.php
wp.something.php
wp.php
Doesn't match
something-wp.php
wp.php.txt
^wp.*\.php$ Should do the trick.
The .* means "any character, repeated 0 or more times". The next . is escaped because it's a special character, and you want a literal period (".php"). Don't forget that if you're typing this in as a literal string in something like C#, Java, etc., you need to escape the backslash because it's a special character in many literal strings.
Example:
ajshdjashdjashdlasdlhdlSTARTasdasdsdaasdENDaknsdklansdlknaldknaaklsdn
1) START\w*END
return: STARTasdasdsdaasdEND - will give you words between START and END
2) START\d*END
return: START12121212END - will give you numbers between START and END
3) START\d*_\d*END
return: START1212_1212END - will give you numbers between START and END having _

Mark duplicate headers in a fasta file

I have a big Fasta file which I want to modify. It basically consists of many sequences with headers that start ">". My Problem is, that some of the Headers are not unique, even though the Sequences are unique.
Example:
>acrdi|AD19082
STSTAFPLLTQFYGCAIIILVLAMCCSCLVYAMYFMNSSGLQTHESTVTQKVKDFSLQ
WLQPILFGCSWRHRLIAKSRRNRSKIQPMTGTEPPWNESKDAFENLKTWALNKQNRNCLL
EINFLEAKDFIVMCKDVVCFEEDDKDERNLNLCLKTLTEAFRFLRNCCAETPKNQSFVIS
SGVAKQAIEVILILLRPVFQEREKGTEVITDTIRSGLQLLGNTVVKNIDTQEFIWNCCCP
QFFLDVLLSRHHSIQDCLCMIIFNCLNQQRRLQLVNNPKIISQIVHLCADKSLLEWGYFI
LDCLICEGLFPDLYQGMEFDPLARIILLDLFQVKITDALDESSERTERTETPKELYASSL
NYLAEQFETYFIDIIQRLQQLDYSSNDFFQVLVVTRLLSLLSTSTGLKSSMTGLQDRASL
LETCVDLLRETSKPEAKAAFKRPGTSYWEYVLPTFP
>acrdi|AD19082
MLRQSEPPWNESKDAFENLKTWALNKQNRNCLLEINFLEAKDFIVMCKDVVCFEEDDKDE
RNLNLCLKTLTEAFRFLRNCCAETPKNQSFVISSGVAKQAIEVILILLRPVFQEREKGTE
VITDTIRSGLQLLGNTVVKNIDTQEFIWNCCCPQFFLDVLLSRHHSIQDCLCMIIFNCLN
QQRRLQLVNNPKIISQIVHLCADKSLLEWGYFILDCLICEGLFPDLYQGMEFDPLARIIL
LDLFQVKITDALDESSERTERTETPKELYASSLNYLAEQFETYFIDIIQRLQQLDYSSND
FFQVLVVTRLLSLLSTSTGLKSSMTGLQDRASLLETCVDLLRETSKPEAKAAFSNVSSFP
HSVDSGRISPSHGFQRDLVRVIGNMCYQHFPNQEKVRELDGIPLLLDHCNIDDHNPYICQ
WAIFAIRNVLENNKENQDIVASIHPLGLADMSRLQQFGVDAVEFDGEKI
Now I want to find all duplicates in my big Fasta File and append numbers to the duplicates, so that I know which duplicate it is (1,2,3,...,x). When a new duplicate is found (one with another header), the counter should start from the beginning.
The output should be something like this:
>acrdi|AD19082
STSTAFPLLTQFYGCAIIILVLAMCCSCLVYAMYFMNSSGLQTHESTVTQKVKDFSLQ
WLQPILFGCSWRHRLIAKSRRNRSKIQPMTGTEPPWNESKDAFENLKTWALNKQNRNCLL
EINFLEAKDFIVMCKDVVCFEEDDKDERNLNLCLKTLTEAFRFLRNCCAETPKNQSFVIS
SGVAKQAIEVILILLRPVFQEREKGTEVITDTIRSGLQLLGNTVVKNIDTQEFIWNCCCP
QFFLDVLLSRHHSIQDCLCMIIFNCLNQQRRLQLVNNPKIISQIVHLCADKSLLEWGYFI
LDCLICEGLFPDLYQGMEFDPLARIILLDLFQVKITDALDESSERTERTETPKELYASSL
NYLAEQFETYFIDIIQRLQQLDYSSNDFFQVLVVTRLLSLLSTSTGLKSSMTGLQDRASL
LETCVDLLRETSKPEAKAAFKRPGTSYWEYVLPTFP
>acrdi|AD19082-1
MLRQSEPPWNESKDAFENLKTWALNKQNRNCLLEINFLEAKDFIVMCKDVVCFEEDDKDE
RNLNLCLKTLTEAFRFLRNCCAETPKNQSFVISSGVAKQAIEVILILLRPVFQEREKGTE
VITDTIRSGLQLLGNTVVKNIDTQEFIWNCCCPQFFLDVLLSRHHSIQDCLCMIIFNCLN
QQRRLQLVNNPKIISQIVHLCADKSLLEWGYFILDCLICEGLFPDLYQGMEFDPLARIIL
LDLFQVKITDALDESSERTERTETPKELYASSLNYLAEQFETYFIDIIQRLQQLDYSSND
FFQVLVVTRLLSLLSTSTGLKSSMTGLQDRASLLETCVDLLRETSKPEAKAAFSNVSSFP
HSVDSGRISPSHGFQRDLVRVIGNMCYQHFPNQEKVRELDGIPLLLDHCNIDDHNPYICQ
WAIFAIRNVLENNKENQDIVASIHPLGLADMSRLQQFGVDAVEFDGEKI
I would prefer a method with awk or sed, so that I can easily modify the code to run on all files in a directory.
I have to admit, that I am just starting to learn programming and parsing, but I hope this is not a stupid question.
THX in advance for the help.
An awk script:
BEGIN {
OFS="\n";
ORS=RS=">";
}
{
name = $1;
$1 = "";
suffix = names[name] ? "-" names[name] : "";
print name suffix $0, "\n";
names[name]++;
}
The above uses the ">" as a record separator, and checks the first field (which is the header name that can be duplicated). For each line it prints, it adds a suffix after the header name for each additional time the field appears (i.e. '-1' for the first dup, '-2' for the second...)