Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I have a sting "some text #texttext some other text #texttagtext". I need get all words with '#' symbol. If there are some ## or more symbols together, I need to replace them with one symbol '#'. Could any one help me with regular expression ? Thanks in advance.
Regex:
(?<=^|\s)#+(?=\S+)
Replacement string:
#
In objective-c, you need to escape backslash one more time.
DEMO
To find all the words that starts with #
(?<=^|\s)#\S+
\S+ would match any non-space character one or more times.
OR
(?<=^|\s)#\w+
\w+ Match any word character one or more times.
To find all the words that starts with one or more #
(?<=^|\s)#+\S+
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I have a long text file with the following format:
>foo_bar
TATGTTCTGCAACTGTATAATGGTATAAAAACATTGCAAAATGTAATGAAACTTGTTATTTTGTGAAATACATTCTATAAATATCACTATTTCATGAAAA
ATATTGAAAATCATTTATTTTCGACAAGTAGAACCATAGGTTCTGTAATTGTAAATAGTTCTGCAAACTTAACCTGTTTTGCAGAAGAATATGTTTTCAC
TAGTTAACTTGTAGAATGTTTAGGATTGTTAAAATTTTTAACAAAATAAGATTTTATAGAACATGATTTGCAAAATAACACATTTTGCAATATTTTTATA
CCATATATAGTTGCAGAACATATGGGGACTACGGGCAGCCGGTAAATATGTGGACTACATGGAACTTGTTCAGATACATCTGGAGCAAAGAGCCACCGCT
CTAAATTATCTCTTCTCATTTCCAGTATTATATCTCTCATGCTAAATTATCTCTACAAATCATGACCTCTCTTAGCAATCTCCCTGAGCATCTCCGTAGG
GAGCAGATATTCACCCGTCTTCCGATGAAAGACCTAATGGTCCTCGCATCTGCAAGTCATGTCTTGCGTTAATCTTTCTCTCTCTTTTTGTGGAATCCCA
TCTCTCCTCTTATCAACTAAACCAGATACAGTTTGCACCAACTTTCTTCACTCCCCTGTTACATGAGAAGGCCAGACTTAGGTAGCTTCTGAATCAGAAC
CCGGTCATTCCAAGCATGGGATTTCTTGTTGATCTCTTGTTTTTATGTAATAGTGATCATTTGATATCTGGTGTTGATGGGAATTCAGATGTATGGGACT
TTGTTTATTGTTGATGTGGAATTCTTATATTTTACTGTGTACTATAAAATTTTAGTGATACCTACTATCTATTGTATAAATTGATTAATTGATGTTCTTA
>bar_foo
TATGTTCTGCAACTGTATAATGGTATAAAAACATTGCAAAATGTAATGAAACTTGTTATTTTGTGAAATACATTCTATAAATATCACTATTTCATGAAAA
ATATTGAAAATCATTTATTTTCGACAAGTAGAACCATAGGTTCTGTAATTGTAAATAGTTCTGCAAACTTAACCTGTTTTGCAGAAGAATATGTTTTCAC
TAGTTAACTTGTAGAATGTTTAGGATTGTTAAAATTTTTAACAAAATAAGATTTTATAGAACATGATTTGCAAAATAACACATTTTGCAATATTTTTATA
CCATATATAGTTGCAGAACATATGGGGACTACGGTACTACGGTAAATATGTGGACTACATGGAACTTGTTCAGATACATCTGGAGCAAAGAGCCACCGCT
CTAAATTATCTCTTCTCATTTCCAGCTGCATATCTCTCATGCTAAATTATCTCTACAAATCATGACCTCTCTTAGCAATCTCCCTGAGCATCTCCGTAGG
GAGCAGATATTCACCCGTCTTCCGATGAAAGACCTAATGGTCCTCGCATCTGCAAGTCATGTCTTGCGTTAATCTTTCTCTCTCTTTTTGTGGAATCCCA
TCTCTCCTCTTATCAACTAAACCAGATACAGTTTGCACCAACTTTCTTCACTCCCCTGTTACATGAGAAGGCCAGACTTAGGTAGCTTCTGAATCAGAAC
CCGGTCATTCCAAGCATGGGATTTCTTGTTGATCTCTTGTTTTTATGTAATAGTGATCATTTGATATCTGGTGTTGATGGGAATTCAGATGTATGGGACT
TTGTTTATTGTTGATGTGGAATTCTTATATTTTACTGTGTACTATAAAATTTTAGTGATACCTACTATCTATTGTATAAATTGATTAATTGATGTTCTTA
I.e., there is a header line which begins with a ">", and then an arbitrary number of lines with no more than 100 letters in them. I would like to find the positions within the non-header lines that match either "GCAGC" or "GCTGC". Overlapping match sites would both get recorded individually.
An example output would be a three column text file where the first column contained the header line for that block of text minus the ">", the second column contained the start position of a pattern match (i.e., the number of characters into the text block, excluding line-break characters), and the third column recorded which of the two patterns were matched. E.g.:
foo_bar 109 GCAGC
bar_foo 58289 GCTGC
Not sure how complex this task is, and in particular whether there is a memory-efficient way to perform this operation in a streaming fashion. awk or sed seem like two utilities which might work, but the required command is beyond my limited understanding of the programs.
A tiny tweak on yesterdays answer:
sub(/^>/,"") {
hdr = $0
next
}
{
while ( match($0,/GC[AT]GC/) ) {
print hdr, RSTART, substr($0,RSTART,RLENGTH)
$0 = substr($0,1,RSTART-1) " " substr($0,RSTART+1)
}
}
Please get the book Effective AWK Programming, 5th Edition, by Arnold Robbins to learn the basics of awk.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
i have a csv file as below only one column(cust_code) with quotation marks and each row also has quotations
“CUST_CODE”
“CST001001”
“CST000235”
“CST010231”
“CST010235”
“CST010231”
“CST010235”
“CST010231”
“CST040015”
i am tried to read this file in pandas and i'm getting error as
'utf-8' codec can't decode byte 0x93 in position 0: invalid start byte
also, i tried by passing encoding type as ascii and utf-8
but nothing worked
Try passing encoding='cp1252' instead. Make sure to swap out 'Documents\Book1.csv' with whatever your filepath to the file is below:
df = pd.read_csv('Documents\Book1.csv', encoding='cp1252')
df
“CUST_CODE”
0 “CST001001”
1 “CST000235”
2 “CST010231”
3 “CST010235”
4 “CST010231”
5 “CST010235”
6 “CST010231”
7 “CST040015”
Here is a wikipedia with more info about that encoding type: https://en.wikipedia.org/wiki/Windows-1252 . A quote from the Wikipedia article:
"...common result was that all the quotes and apostrophes (produced by "smart quotes" in word-processing software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read."
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In the following example, I need to readjust the content within the 2nd set of quotes on line 5, up to, but not beyond the decimal point.
The contents of the quotes vary so everything between " and . must be captured and cannot be matched by using a search string based on any contents between.
It is also possible that in the future the line number may change, however, the line can always be found by searching for "Item".
The process should utilize awk, grep, cat, sed or a combination of them due to the limitations of the proprietary environment/OS. I have searched around but wasn't able to find anything that would work as desired.
filename: data.json
{
"Brand": "Marketside",
"Price": "3.97",
"SKU": "48319448",
"Item": "12-ct_Large_Grade_A(Brown_Organic).48319448",
}
An Example of a successful output would be:
12-ct_Large_Grade_A(Brown_Organic)
The requirement to rely exclusively on line-oriented tools to manipulate JSON seems extremely misdirected. When manipulating structured formats, use tools which understand the structured format.
jq '.Item|split(".")[0]' data.json
to extract up to the first dot; or
jq '.Item|sub("[.][^.]*$";"")' data.json
to discard the text from the last dot until the end of the field.
(jq doesn't like the superfluous last comma after the Item in your pseudo-JSON, though.)
There is no doubt in anyone's mind that your acute problem as stated can be solved with a simple Awk or sed script. What happens then - what already happened here - is that you discover additional requirements which were not obvious from the toy example you posted. A proper, portable solution copes with JSON samples with strings with embedded commas and escaped double quotes, and continues to work when the superficial JSON format changes because a component somewhere upstream is updated to put all the JSON on a single line or whatever.
Here is an awk:
awk -F'.' '/Item/{split(substr($0,1,L=length($0)-length($NF)-1),a,"\"");print a[4]}'
12.ct.Large.Grade.A(Brown_Organic)
It search for Item and then print from " to latest .
Split the string by .
Find the length of latest part after the split length($NF)
Extract this lengt from total to find position of latest . length($0)-length($NF)
Then split the the first part by " and print the 4th part.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to pull the first word from a variable in my batch script to another variable.
ex
if %hello% had "apples are awesome" in it and was pulled and put into %hi%
%hi% would say "apples"
thanks in Advance
This can be done using a for loop:
for /f %%h in ("%hello%") do [command that uses %%h]
The behaviour of "for" in this circumstance is to split its input up into lines (there is only one, assuming there are no newline characters in your input variable), then split each line into tokens on spaces (you can change the delimiter using the "delim=[chars]" option) and pass the first token of each line to the specified command (you can use "tokens=n,n,..." to get at tokens other than the first on the line).
Note that AIUI you can only use a single letter variable name for the variable to receive the word, so you can't use %%hi as you requested.
(This is all untested, as I'm not at a machine running Windows at the moment, but ought to work if I'm reading the documentation correctly.)
set "hi="
for %%h in (%hello%) do if not defined hi set "hi=%%h"
echo %hi%
should work, as would
set "hi="
for %%h in (%hello%) do set "hi=%%h"&goto done
:done
echo %hi%
Note that the set "var=string" syntax ensures that trailing spaces on the line, as left by some editors, are not included in the value assigned.
You don't say clearly whether the value of hello is apples are awesomeor "apples are awesome" - the first is a string of three words with space-separators, the second a single string containing one "word" which contains spaces. I've assumed the former.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I would like to create an input mask which looks like this C-HG__.
But because C represent option character or space in masking (VB.net). It wouldn't let me.
Please assist.
Try using the escape element: \
MSDN has a fairly nice write-up. Here's an excerpt:
\
Escape. Escapes a mask character, turning it into a literal. "\\" is the escape sequence for a backslash.
Possible duplicate with this question and/or this question.