Extracting a specific value from a text file - awk

I am running a script that outputs data.
I am specifically trying to extract one number. However, each time I run the script and get the output file, the position of the number I am interested in will be in a different position (due to the log nature of the output file).
I have tried several awk, sed, grep commands but I can't get any to work as many of them rely on the position of the word or number remaining constant.
This is what I am dealing, with. The value I require is the bold one:
Energy initial, next-to-last, final =
-5.96306582435 -5.96306582435 -5.96349956298

You can try
awk '{print $(i++%3+6)}' infile

Related

Recursively search directory for occurrences of each string from one column of a .csv file

I have a CSV file--let's call it search.csv--with three columns. For each row, the first column contains a different string. As an example (punctuation of the strings is intentional):
Col 1,Col 2,Col 3
string1,valueA,stringAlpha
string 2,valueB,stringBeta
string'3,valueC,stringGamma
I also have a set of directories contained within one overarching parent directory, each of which have a subdirectory we'll call source, such that the path to source would look like this: ~/parentDirectory/directoryA/source
What I would like to do is search the source subdirectories for any occurrences--in any file--of each of the strings in Col 1 of search.csv. Some of these strings will need to be manually edited, while others can be categorically replaced. I run the following command . . .
awk -F "," '{print $1}' search.csv | xargs -I# grep -Frli # ~/parentDirectory/*/source/*
What I would want is a list of files that match the criteria described above.
My awk call gets a few hits, followed by xargs: unterminated quote. There are some single quotes in some of the strings in the first column that I suspect may be the problem. The larger issue, however, is that when I did a sanity check on the results I got (which seemed far too few to be right), there was a vast discrepancy. I ran the following:
ag -l "searchTerm" ~/parentDirectory
Where searchTerm is a substring of many (but not all) of the strings in the first column of search.csv. In contrast to my above awk-based approach which returned 11 files before throwing an error, ag found 154 files containing that particular substring.
Additionally, my current approach is too low-resolution even if it didn't error out, in that it wouldn't distinguish between which results are for which strings, which would be key to selectively auto-replacing certain strings. Am I mistaken in thinking this should be doable entirely in awk? Any advice would be much appreciated.

How to remove unix timestamp specific data from a flatfile

I have a huge file containing a list like this
email#domain.com^B1569521698
email2#domain.com,#2domain.com^B1569521798
email3#domain.com,test#2domain.com^B1569521898
email10000#domain.com^B1569521998
..
..
The file is named /usr/local/email/whitelist
The number after ^B is a unix timestamp
I need to remove from the list all the rows having a timestamp smaller than
(e.g.) 1569521898.
I tried using various awk/sed combinations with no result.
The character ^B you notice is a control character. The first 32 control-characters which are ASCII codes 0 through 1FH, form a special set of non-printing characters. These characters are called the control characters because these characters perform various printer and display control operations rather than displaying symbols. This particular one stands for STX or Start of Text.
You can type control-charcters in a shell as: Ctrl+v Ctrl+b, or you can use the octal representation directly (\002).
awk -F '\002' '($2 >= 1569521898)'
Since you have control characters in your Input_file could you please try following once. This is written and tested with given samples only.
awk '
match($0,/\002[0-9]+/){
val=substr($0,RSTART+1,RLENGTH-1)
if(val>=1569521898){ print }
val=""
}
' Input_file

Sed script needed to insert LF before each time match in a large single string

I have lengthy string that I need to put a line feed before each instance of a time stamp.
03:38:11,03/07/2017,node,cpu,user,sys,idle,intr/s,ctxt/s,0,0,0,9,91,0,1,0,24,75,0,total,0,17,83,2370,3574,1,0,3,4,
93,1,1,10,4,86,1,total,7,4,89,2922,4653,03:39:11,03/07/2017,node,cpu,user,sys,idle,intr/s,ctxt/s,0,0,4,25,71,0,1,5
,16,79,0,total,4,21,75,2487,3876,1,0,0,3,97,1,1,1,1,98,1,total,1,2,98,2880,4728,03:40:11,03/07/2017,node,cpu,user,
sys,idle,intr/s,ctxt/s,0,0,1,30,69,0,1,1,30,69,0,total,1,30,69,3237,4344,1,0,3,49,47,1,1,10,47,43,1,total,6,48,45,
3920,5702,
I need to see about formatting it as such:
03:38:11,03/07/2017,node,cpu,user,sys,idle,intr/s,ctxt/s,0,0,0,9,91,0,1,0,24,75,0,total,0,17,83,2370,3574,1,0,3,4,93,1,1,10,4,86,1,total,7,4,89,2922,4653,
03:39:11,03/07/2017,node,cpu,user,sys,idle,intr/s,ctxt/s,0,0,4,25,71,0,1,5,16,79,0,total,4,21,75,2487,3876,1,0,0,3,97,1,1,1,1,98,1,total,1,2,98,2880,4728,
03:40:11,03/07/2017,node,cpu,user,sys,idle,intr/s,ctxt/s,0,0,1,30,69,0,1,1,30,69,0,total,1,30,69,3237,4344,1,0,3,49,47,1,1,10,47,43,1,total,6,48,45,3920,5702,
I am currently trying to use the following:
sed -e 's/^[[:digit:]][[:digit:]]\:[[:digit:]][[:digit:]]/\n&/g' cpu.log
The ^ line anchor forces sed to only match the first date stamp. Remove it and you should be fine.
To avoid roplacing the first, maybe massage the script to require something before the match (hard-coding a comma would seem to work, based on your sample data); or just post-process the output to remove the first newline.
sed 's/[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/\n&/g'

Find duplicate records with only text case difference

I have a log file with 8M entries/records with URLs. I'd like to find duplicate URLs (same URLs) with the only difference being their type / text case.
Example:
origin-www.example.com/this/is/hard.html
origin-www.example.com/this/is/HARD.html
origin-www.example.com/this/is/Hard.html
In this case, there are three duplicates with case sensitivity.
Output should be just the count -c and a new file with the duplicates.
Use the typical awk '!seen[$0]++' file trick combined with tolower() or toupper() to make all lines be in the same case:
$ awk '!seen[tolower($0)]++' file
origin-www.example.com/this/is/hard.html
For a different output and counters whatsoever, provide a valid desired output.

Awk Sum skipping Special Character Row

I am trying to take the sum of a particular column in a file i.e. column 18.
Using awk command along with Printf to display it in proper decimal format.
SUM=`cat ${INF_TARGET_FILE_PATH}/${EXTRACT_NAME}_${CURRENT_DT}.txt|awk -F"" '{s+=$18}END{printf("%24.2f\n", s)}'
Above command is skipping those rows in file which has the special character in one of the column 5 - RÉPARATIONS. Hence Awk skips these rows and doesnt consider sum for that row. Please help how to resolve this issue to take sum of all rows.
There is missing a back tics in your example, should be:
SUM=`cat ${INF_TARGET_FILE_PATH}/${EXTRACT_NAME}_${CURRENT_DT}.txt|awk -F"" '{s+=$18}END{printf("%24.2f\n", s)}'`
But you should not use back tics, you should use parentheses $(code)
Using cat to enter data to awk is also wrong way to do it, add pat after awk
SUM=$(awk -F"" '{s+=$18} END {printf "%24.2f\n",s}' ${INF_TARGET_FILE_PATH}/${EXTRACT_NAME}_${CURRENT_DT}.txt)
This may resolve your problem, but gives a more correct code.
If you give us your input file, it would help us to understand the problem.