I have one big file with the content as below:
dn: CN=Brower\, Stephen,OU=Recipients,OU=Mailboxes,OU=Exchange2000,DC=raritanval,DC=edu
changetype: modify
replace: department
department: Computer Science
-
dn: CN=Brower\, Stephen,OU=Recipients,OU=Mailboxes,OU=Exchange2000,DC=raritanval,DC=edu
changetype: modify
replace: description
description: Computer Science
-
I want to create multiple file which will have the output starting from "dn:" to "-", it means it will start searching from dn: and will print till - in one file and so on. And there is a requirement to have some standard name for the file as well which will be created.
An awk script can be helpful. Since you are running on AIX, you might want to get GNU awk (or gawk) installed. gawk has many more useful features than vanilla awk.
awk 'BEGIN{fnum=1;}($1=="dn:"){flag=1; x=""}(flag==1){x=x "\n" $0}($1=="-"){flag=0; fn="file" fnum; print x > fn; close (fn); fnum++;}'
This dumps the data into files that are sequentially numbered file1, file2, ...
[Edited to reflect OP's needs and added close(fn)]
Related
This question already has answers here:
Save modifications in place with awk
(7 answers)
Closed 1 year ago.
I have a lot of files, where I would like to edit only those lines that start with private.
It principle I want to
gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_"); print}' filename
but this only prints out the modified part of the file, and not everything.
Question
Does gawk have a way similar to sed -i inplace?
Or is there are much simpler way to do the above woth either sed or gawk?
Just move the final print outside of the filtered pattern. eg:
gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")} {print}'
usually, that is simplified to:
gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")}1'
You really, really, really, (emphasis on "really") do not want to use something like sed -i to edit the files "in-place". (I put "in-place" in quotes, because gnu's sed does not edit the files in place, but creates new files with the same name.) Doing so is a recipe for data corruption, and if you have a lot of files you don't want to take that risk. Just write the files into a new directory tree. It will make recovery much simpler.
eg:
d=backup/$(dirname "$filename")
mkdir -p "$d"
awk '...' "$filename" > "$d/$filename"
Consider if you used something like -i which puts backup files in the same directory structure. If you're modifying files in bulk and the process is stopped half-way through, how do you recover? If you are putting output into a separate tree, recovery is trivial. Your original files are untouched and pristine, and there are no concerns if your filtering process is terminated prematurely or inadvertently run multiple times. sed -i is a plague on humanity and should never be used. Don't spread the plague.
GNU awk from 4.1.0 has the in place ability.
And you should put the print outside the reg match block.
Try this:
gawk '/^private/{gsub(/[/-]/, "_");} 1' filename
or, make sure you backed up the file:
gawk -i inplace '/^private/{gsub(/[/-]/, "_");} 1' filename
You forgot the ^ to denote start, you need it to change lines started with private, otherwise all lines contain private will be modified.
And yeah, you can combine the two gsubs with a single one.
The sed command to do the same would be:
sed '/^private/{s/[/-]/_/g;}' filename
Add the -i option when you done testing it.
I want to use a large awk script that was designed to take a particular input. For example "city zipcode street housenumber", so $2 is zipcode, etc...
Now the input is provided to me in a new format. In my example "city" is now missing. The new file is "zipcode street housenumber" (not for real, just trying to make the example simple)
but I happen to know that the city is a constant for that input (which is why it's not in the dataset). So if I run it through the original script $2 is now street, and everything is one field off.
I could first process the input file to prepend the city name to each line (using awk, sed, or whatever), then run it through the original script, but I would prefer to run only one script that supports both formats. I could add a command-line option that tells it the city, but I don't know how to insert it in front of the current record at the top of the script so that the rest of the script can be unchanged. It looks like I can change a field but what I want to do is "shift" the fields right so I can modify $1.
Did I mention I am a complete awk novice? (Perl is my poison.)
I think I fixed my own problem, I'm doing the following (haven't figured out how to do this conditionally based on a command line option, but it should be easy to find tutorials for that)
NF+=1;
for(i=NF; i>1; --i) $(i)=$(i-1);
$1="Vancouver";
I had the loop wrong in my comment above, but the basic idea of manipulating NF and copying fields into each others seems to work
Something in the lines of this should do it. First some missed test data:
$ cat file
1 2 3 4
2 3 4
The awk:
$ awk -v c=V '{ # define external var
if(NF==3) # if record has only three fields
$0=v FS $0 # prepend the var to the record
print $1 # print first field
}' file
Output:
1
V
Just a clarifications before starting: The server where the script needs to run is an AIX box. Shell is KSH, so I do not have the advanced features Bash provides.
Ok, this is what I need to acccomplish:
I have two files. Both of them have two colums, separated by comma.
File "A" has ~170K lines, and it looks like this:
0000A7AED4F0C9FB1ADC14134700CadsevDDD4A000CEDCF.ext,\\server000005\F$\DICE\0035\
0000C3793C4CD6095947E44618D4Eadsev397460011D036.ext,\\server000005\F$\DICE\0020\
0001168DDDA4DF294E37753FE891BadsevB33900011EEA3.ext,\\server000005\F$\DICE\0088\
00014E6A3AFF0911D95A933778895adsev6C81E00088E97.ext,\\server000005\F$\DICE\0009\
0001A65FA90EC0E6640E1589C4B66adsev6FE1F00088EB9.ext,\\server000005\F$\DICE\0009\
0001C5AA0A9AC8E4EDFC69C483160adsev881CC001157ED.ext,\\server000005\F$\DICE\0034\
0003270ED2D2AB11739029711A233adsev55605000CFC63.ext,\\server000005\F$\DICE\0028\
000327C08A0ECD8F23EE6AE42B3C3adsevE35F00011481D.ext,\\server000005\F$\DICE\0061\
0003423C2592EF9D0AD9A7E2B595Cadsev6ABD9000D3501.ext,\\server000005\F$\DICE\0022\
00035862746EFB2098EC965F31328adsev66800000DA8CF.ext,\\server000005\F$\DICE\0021\
File "B" has ~2Million lines, and it looks like this:
0000294A3F3997slredA9D7ADBEE0C0CDE67C100001245C.nlo,\\server000002\F$\RESTORE_DICE\DICE\0083\
00003FFF21F5DAslred8F20FCF0A5CEE9920A4A00016835.nlo,\\server000002\F$\RESTORE_DICE\DICE\0029\
00005B1FFB996Fslred065F708695ADDD987AF9002139AD.nlo,\\server000002\F$\RESTORE_DICE\DICE\0157\
00005CF3C87456slred41FDB077914EB04FFA2B001F9D74.nlo,\\server000002\F$\RESTORE_DICE\DICE\0103\
00006BD33D737FslredD717F08A20F44F2B878500011050.nlo,\\server000002\F$\RESTORE_DICE\DICE\0094\
00008254F4D661slred6C05CFC91D9BCB82EDD800077FFA.nlo,\\server000002\F$\RESTORE_DICE\DICE\0082\
000092391392E3slredB744E98697FA39CEEDCD0004FB66.nlo,\\server000002\F$\RESTORE_DICE\DICE\0032\
0000945EDBB916slredAB08CD7AA8B825E1F55C0000FDC0.nlo,\\server000002\F$\RESTORE_DICE\DICE\0093\
0000C3793C4CD6slredE44618D4E0A2C50397460011D036.nlo,\\server000002\F$\RESTORE_DICE\DICE\0146\
0000D0DA56260DslredF30BCC9CDFF2A4556A7500039400.nlo,\\server000002\F$\RESTORE_DICE\DICE\0054\
In boh cases, column 1 is a filename while column 2 is it's path.
What I need:
for each line in file A, grab the filename, look for it in file B, and compose a newline, to be echoed/print into a new file "C" consisting of the following three fields:
Column 1: Path for the file as shown in file B.
Column 2: Path for the file as shown in file A.
Column 3: File name.
I have tried to keep it short as I have been "accused" of diarrhea of the words in previous questions, but feel free to let me know in case I am missing important details here.
Just so you know, I have working batch and ksh codes for this that work fine with smaller (lesser lines) files, tough none of them do the job in this case. (CMD Batch one only processes a couple thousand lines an hour, while the ksh one refuses to even read such files based on memory limitations..)
Thank you guys for being always there!
MartÃn.
It's like 3 lines and a couple of temporary files if you don't have access to ksh93 (I think AIX uses ksh88, not ksh93?), bash, zsh, or another shell that understands <(command) redirection...
$ sort -t, -k1 filea > sorted_filea
$ sort -t, -k1 fileb > sorted_fileb
$ join -t, -j1 -o '2.2 1.2 0' sorted_filea sorted_fileb > filec
(fewer if one or both of the files are already sorted on filename.)
If you do have one of those shells:
$ join -t, -j1 -o '2.2 1.2 0' <(sort -t, -k1 filea) <(sort -t, -k1 fileb) > filec
I think awk will be the solution to my problem. My tools are limited b/c I'm using busybox on ESXi 4.0u1. I have a log file from a VM backup program (ghettoVCB). I need to scan this file for the expression
"Failed to clone disk : There is not enough space on the file system for the selected operation"
In my file, this is around line '43'. The previous field (in AWK vocab) represents the VM name that I want to print to an output text file. In my example the VM name is TEST12-RH4-AtlassianTest.
awk 'RS=""
/There is not enough space/ {
print $17
} '
print $17 is hard-coded, and I don't want this. I want to find the field that is one less than the first field on the line returned by the regex above. Any suggestions are appreciated.
[Awk Input File]
Update (Optimized version)
awk 'NR==1{print $NF}' RS="Failed to clone" input-awk.txt
Proof of Concept
$ awk 'NR==1{print $NF}' RS="Failed to clone" input-awk.txt
TEST12-RH4-AtlassianTest
Update 2 (Uber optimized version)
Technically, the following would be the uber optimized version but it leaves too much chance for false hits on the record separator, although it works for your sample input.
awk 'NR<2{print $NF}' RS="Fa" input-awk.txt`
Update 3 (Ultimate mega-kill optimized version)
I wouldn't use this in production code, but it just goes to show you there is always a way to make it simpler. If somebody can beat this for code golf purposes, I'd certainly like to see it!
awk '!a++,$0=$NF' RS="Fa" input-awk.txt
Original
Assuming your VM name is always the last field in the record you want to print, this works:
awk '/not enough space/{split(pre,a);print a[pNF]}{pre=$0;pNF=NF}' input-awk.txt
So couldn't you use something like
'{if it matches, print foo; foo=$17}'
I currently have a request to build a shell script to get some data from the table using SQL (Oracle). The query which I'm running return a number of rows. Is there a way to use something like result set?
Currently, I'm re-directing it to a file, but I'm not able to reuse the data again for the further processing.
Edit: Thanks for the reply Gene. The result file looks like:
UNIX_PID 37165
----------
PARTNER_ID prad
--------------------------------------------------------------------------------
XML_FILE
--------------------------------------------------------------------------------
/mnt/publish/gbl/backup/pradeep1/27241-20090722/kumarelec2.xml
pradeep1
/mnt/soar_publish/gbl/backup/pradeep1/11089-20090723/dataonly.xml
UNIX_PID 27654
----------
PARTNER_ID swam
--------------------------------------------------------------------------------
XML_FILE
--------------------------------------------------------------------------------
smariswam2
/mnt/publish/gbl/backup/smariswam2/10235-20090929/swam2.xml
There are multiple rows like this. My requirement is only to use shell script and write this program.
I need to take each of the pid and check if the process is running, which I can take care of.
My question is how do I check for each PID so I can loop and get corresponding partner_id and the xml_file name? Since it is a file, how can I get the exact corresponding values?
Your question is pretty short on specifics (a sample of the file to which you've redirected your query output would be helpful, as well as some idea of what you actually want to do with the data), but as a general approach, once you have your query results in a file, why not use the power of your scripting language of choice (ruby and perl are both good choices) to parse the file and act on each row?
Here is one suggested approach. It wasn't clear from the sample you posted, so I am assuming that this is actually what your sample file looks like:
UNIX_PID 37165 PARTNER_ID prad XML_FILE /mnt/publish/gbl/backup/pradeep1/27241-20090722/kumarelec2.xml pradeep1 /mnt/soar_publish/gbl/backup/pradeep1/11089-20090723/dataonly.xml
UNIX_PID 27654 PARTNER_ID swam XML_FILE smariswam2 /mnt/publish/gbl/backup/smariswam2/10235-20090929/swam2.xml
I am also assuming that:
There is a line-feed at the end of
the last line of your file.
The columns are separated by a single
space.
Here is a suggested bash script (not optimal, I'm sure, but functional):
#! /bin/bash
cat myOutputData.txt |
while read line;
do
myPID=`echo $line | awk '{print $2}'`
isRunning=`ps -p $myPID | grep $myPID`
if [ -n "$isRunning" ]
then
echo "PARTNER_ID `echo $line | awk '{print $4}'`"
echo "XML_FILE `echo $line | awk '{print $6}'`"
fi
done
The script iterates through every line (row) of the input file. It uses awk to extract column 2 (the PID), and then does a check (using ps -p) to see if the process is running. If it is, it uses awk again to pull out and echo two fields from the file (PARTNER ID and XML FILE). You should be able to adapt the script further to suit your needs. Read up on awk if you want to use different column delimiters or do additional text processing.
Things get a little more tricky if the output file contains one row for each data element (as you indicated). A good approach here is to use a simple state mechanism within the script and "remember" whether or not the most recently seen PID is running. If it is, then any data elements that appear before the next PID should be printed out. Here is a commented script to do just that with a file of the format you provided. Note that you must have a line-feed at the end of the last line of input data or the last line will be dropped.
#! /bin/bash
cat myOutputData.txt |
while read line;
do
# Extract the first (myKey) and second (myValue) words from the input line
myKey=`echo $line | awk '{print $1}'`
myValue=`echo $line | awk '{print $2}'`
# Take action based on the type of line this is
case "$myKey" in
"UNIX_PID")
# Determine whether the specified PID is running
isRunning=`ps -p $myValue | grep $myValue`
;;
"PARTNER_ID")
# Print the specified partner ID if the PID is running
if [ -n "$isRunning" ]
then
echo "PARTNER_ID $myValue"
fi
;;
*)
# Check to see if this line represents a file name, and print it
# if the PID is running
inputLineLength=${#line}
if (( $inputLineLength > 0 )) && [ "$line" != "XML_FILE" ] && [ -n "$isRunning" ]
then
isHyphens=`expr "$line" : -`
if [ "$isHyphens" -ne "1" ]
then
echo "XML_FILE $line"
fi
fi
;;
esac
done
I think that we are well into custom software development territory now so I will leave it at that. You should have enough here to customize the script to your liking. Good luck!