removing space for a url string inside a text file [closed] - awk

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have a very big text file (1 GB) and I see that there are few places where the http url field has a space there.
For example in the lines below we have space between "brad pitt" and "[30 wet=]". They should be changed to "bradpitt" and "[30wet=]" but they can occur in any url or trim_url. I am currently finding these places using my program and then manually fixing it vim. Is there a way using awk/sed we can do it?
0.0 q:hello url:http://sapient.com/bapper/30/brad pitt/C345/surf trim_url:http://sapient.com/bapper/30/brad pitt/C345 rating:good
0.0 q:hello url:http://sick.com/bright/[30 wet=]/sound trim_url:http://sick.com/bright/[30 wet=]rating:good
What I tried to do was sed:
sed -i -e 's/*http*[:space:]*/*http*/g' test.txt

Using perl and a proper module to URI encode the URL:
perl -MURI::Escape -pe 's!(https?://)(.*)!$1 . uri_escape($2)!e' file
You even can replace the file in place with -i switch (just like sed) perl -MURI::Escape -i -pe [...]
Output
0.0 q:hello url:http://sapient.com%2Fbapper%2F30%2Fbrad%20pitt%2FC345%2Fsurf%20trim_url%3Ahttp%3A%2F%2Fsapient.com%2Fbapper%2F30%2Fbrad%20pitt%2FC345%20rating%3Agood
0.0 q:hello url:http://sick.com%2Fbright%2F%5B30%20wet%3D%5D%2Fsound%20trim_url%3Ahttp%3A%2F%2Fsick.com%2Fbright%2F%5B30%20wet%3D%5Drating%3Agood
URI::Escape - Percent-encode and percent-decode unsafe characters
Note
As msanford said in comments, spaces in a URL are meaningful. You can't decide to cut them without breaking the link in something that just become not reachable

Related

Quantitavely replace digit (as counter) with string in sed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 months ago.
Improve this question
Let's say i have the following file:
balloons:
- 2
- 3
Each number above should represents how many times i want to print the string. So for example I would like to process this to output as following:
balloons:
- red
- red
- blue
- blue
- blue
I only have red and blue balloons. The digits will vary from one file to another, so my search string would be a simple regex search sed -e "/[[:digit:]]\+/ perform_my_action"
Try:
awk 'BEGIN{idx[2]="red"; idx[3]="blue"}
/^-[ \t]+[0-9]+/{for(i=1;i<=$2;i++) print idx[$2]; next}
1
' file

Return not so similar codes from a single group [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I have a list of product codes grouped in 2 or 3 lines. I need to return the group where codes are not same (or consecutive)
9003103
9003103
9003978
9003979
9003763
9003728
9003543
9003543
9003543
In this case, only the third group should be returned:
9003763
9003728
I would harness GNU AWK for this task following way, let file.txt content be
9003103
9003103
9003978
9003979
9003763
9003728
9003543
9003543
9003543
then
awk 'BEGIN{RS=""}{diff=$NF-$1;diff=diff>0?diff:-diff}diff>NF' file.txt
gives output
9003763
9003728
Explanation: I set RS to empty string to provoke paragraph mode, thus every block is treated as single line, then for each block I compute absolute of difference between first and last field, if difference is bigger than number of field block is printed.
(tested in GNU Awk 5.0.1)

Need help using awk or similar to print/output a partial line of a JSON file [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
In the following example, I need to readjust the content within the 2nd set of quotes on line 5, up to, but not beyond the decimal point.
The contents of the quotes vary so everything between " and . must be captured and cannot be matched by using a search string based on any contents between.
It is also possible that in the future the line number may change, however, the line can always be found by searching for "Item".
The process should utilize awk, grep, cat, sed or a combination of them due to the limitations of the proprietary environment/OS. I have searched around but wasn't able to find anything that would work as desired.
filename: data.json
{
"Brand": "Marketside",
"Price": "3.97",
"SKU": "48319448",
"Item": "12-ct_Large_Grade_A(Brown_Organic).48319448",
}
An Example of a successful output would be:
12-ct_Large_Grade_A(Brown_Organic)
The requirement to rely exclusively on line-oriented tools to manipulate JSON seems extremely misdirected. When manipulating structured formats, use tools which understand the structured format.
jq '.Item|split(".")[0]' data.json
to extract up to the first dot; or
jq '.Item|sub("[.][^.]*$";"")' data.json
to discard the text from the last dot until the end of the field.
(jq doesn't like the superfluous last comma after the Item in your pseudo-JSON, though.)
There is no doubt in anyone's mind that your acute problem as stated can be solved with a simple Awk or sed script. What happens then - what already happened here - is that you discover additional requirements which were not obvious from the toy example you posted. A proper, portable solution copes with JSON samples with strings with embedded commas and escaped double quotes, and continues to work when the superficial JSON format changes because a component somewhere upstream is updated to put all the JSON on a single line or whatever.
Here is an awk:
awk -F'.' '/Item/{split(substr($0,1,L=length($0)-length($NF)-1),a,"\"");print a[4]}'
12.ct.Large.Grade.A(Brown_Organic)
It search for Item and then print from " to latest .
Split the string by .
Find the length of latest part after the split length($NF)
Extract this lengt from total to find position of latest . length($0)-length($NF)
Then split the the first part by " and print the 4th part.

Setting A variable with the first word from another variable [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I need to pull the first word from a variable in my batch script to another variable.
ex
if %hello% had "apples are awesome" in it and was pulled and put into %hi%
%hi% would say "apples"
thanks in Advance
This can be done using a for loop:
for /f %%h in ("%hello%") do [command that uses %%h]
The behaviour of "for" in this circumstance is to split its input up into lines (there is only one, assuming there are no newline characters in your input variable), then split each line into tokens on spaces (you can change the delimiter using the "delim=[chars]" option) and pass the first token of each line to the specified command (you can use "tokens=n,n,..." to get at tokens other than the first on the line).
Note that AIUI you can only use a single letter variable name for the variable to receive the word, so you can't use %%hi as you requested.
(This is all untested, as I'm not at a machine running Windows at the moment, but ought to work if I'm reading the documentation correctly.)
set "hi="
for %%h in (%hello%) do if not defined hi set "hi=%%h"
echo %hi%
should work, as would
set "hi="
for %%h in (%hello%) do set "hi=%%h"&goto done
:done
echo %hi%
Note that the set "var=string" syntax ensures that trailing spaces on the line, as left by some editors, are not included in the value assigned.
You don't say clearly whether the value of hello is apples are awesomeor "apples are awesome" - the first is a string of three words with space-separators, the second a single string containing one "word" which contains spaces. I've assumed the former.

Input Mask start with letter C in VB.net [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I would like to create an input mask which looks like this C-HG__.
But because C represent option character or space in masking (VB.net). It wouldn't let me.
Please assist.
Try using the escape element: \
MSDN has a fairly nice write-up. Here's an excerpt:
\
Escape. Escapes a mask character, turning it into a literal. "\\" is the escape sequence for a backslash.
Possible duplicate with this question and/or this question.