How to hexdump few lines in a large using only line number.(Not bytes)

How to hexdump few lines in a large using only line number.(Not bytes) - packet-capture

I am using hexdump to display file in a readable format. The file is quite large and i know only the line numbers. length of lines are uncertain (may be 10 characters or 100 characters).
Is there any option to hexdum few line suppose 5 to 10 or 17-25.
I have read the man page and a better explanation from here.
But I couldn't get my answer.
So please help me.
Thanks..

Is it possible to cut the right lines from a file and pipe them into hexdump? For example,
cat myfile | sed -n '17,25 {p;n;n;}' | od -x
might be close to what you need if you want to dump lines 17 to 25 of myfile

Related

How to extract the first column from a tsv file?

I have a file containing some data and I want to use only the first column as a stdin for my script, but I'm having trouble extracting it.
I tried using this
awk -F"\t" '{print $1}' inputs.tsv
but it only shows the first letter of the first column. I tried some other things but it either shows the entire file or just the first letter of the first column.
My file looks something like this:
Harry_Potter 1
Lord_of_the_rings 10
Shameless 23
....

You can use cut which is available on all Unix and Linux systems:
cut -f1 inputs.tsv
You don't need to specify the -d option because tab is the default delimiter. From man cut:
-d delim
Use delim as the field delimiter character instead of the tab character.
As Benjamin has rightly stated, your awk command is indeed correct. Shell passes literal \t as the argument and awk does interpret it as a tab, while other commands like cut may not.
Not sure why you are getting just the first character as the output.
You may want to take a look at this post:
Difference between single and double quotes in Bash

Try this (better rely on a real csv parser...):
csvcut -c 1 -f $'\t' file
Check csvkit
Output :
Harry_Potter
Lord_of_the_rings
Shameless
Note :
As #RomanPerekhrest said, you should fix your broken sample input (we saw spaces where tabs are expected...)

Using awk on a folder and adding file name to output rows

I should start by thanking you all for all the work you put into the answers on this site. I have spent many hours reading through them but have not found anything fitting my question yet. Hence my own post.
I have a folder with multiple subfolders and txt-files within those. In column 7 of those files, there are gene names (I do genetics for a living :)). These are the string I am trying to extract. Shortly, I would like to search the whole folder for any rows within any of the files that contain a particular gene name/string. I have been using grep for this, writing something like:
grep -r GENE . > GENE.txt
Simple, but I need to be able to tweak the search further and it seems that then awk is the way to go.
So I tried using awk. I wrote something like this:
awk '$7 == "GENENAME"' FOLDER/* > GENENAME.txt
This works well (and now I can specify that the string has to be in a particular column, this I can not do with grep, right?).
However, in contrast to grep, which writes the file name at the start of each row, I now can not directly see which file which row in my output file comes from (which mostly defeats the point of the search). This, adding the name of the origin file somewhere to each row, seems like something that should absolutely be doable, but I am not able to figure it out.
The files I am searching within change (or rather get more numerous), but otherwise my search will always be for some specific string in column 7 of the same big folder. How can I get this working?
Thank you in advance,
Elisabet E

You can use FNR (FNR means file number of record) to print the row number and FILENAME to print the file's name, then you get the matching lines from which file and which row, for instance:
sample.csv:
aaa 123
bbb 456
aaa 789
command:
awk '$1 =="aaa"{print $0, FNR, FILENAME}' sample.csv
The output is:
aaa 123 1 sample.csv
aaa 789 3 sample.csv

Sounds like you're looking for:
awk '$7 == "GENENAME"{print FILENAME, $0}' FOLDER/*
If not then edit your question to clarify with sample input and expected output.

How to print the 'nth + x' lines after a match is found?

I have a file which contains the output below. I want only the lines which contain the actual vm_id number.
I want to match pattern 'vm_id' and print 2nd line + all other lines until 'rows' is reached.
FILE BEGIN:
vm_id
--------------------------------------
bf6c4f90-2e71-4253-a7f6-dbe5d666d3a4
bf6c4f90-2e71-4253-a7f6-dbe5d666d3a4
6ffac9a9-1b6b-4600-8114-1ca0666951be
47b5e6d1-6ddd-424a-ab08-18ee35b54ebf
cc0e8b36-eba3-4846-af08-67ab72d911fc
1b8c2766-92b7-477a-bc92-797a8cb74271
c37bf1d8-a6b2-4099-9d98-179b4e573c64
(6 rows)
datacenter=
FILE END:
So the resulting output would be;
bf6c4f90-2e71-4253-a7f6-dbe5d666d3a4
6ffac9a9-1b6b-4600-8114-1ca0666951be
47b5e6d1-6ddd-424a-ab08-18ee35b54ebf
cc0e8b36-eba3-4846-af08-67ab72d911fc
1b8c2766-92b7-477a-bc92-797a8cb74271
c37bf1d8-a6b2-4099-9d98-179b4e573c64
Also, the number of VM Id's will vary, this example has 6 while others could have 3 or 300.
I have tried the following but they only output a single line that's specified;
awk 'c&&!--c;/vm_id/{c=2}'
and
awk 'c&&!--c;/vm_id/{c=2+1}'

$ awk '/rows/{f=0} f&&(++c>2); /vm_id/{f=1}' file
bf6c4f90-2e71-4253-a7f6-dbe5d666d3a4
6ffac9a9-1b6b-4600-8114-1ca0666951be
47b5e6d1-6ddd-424a-ab08-18ee35b54ebf
cc0e8b36-eba3-4846-af08-67ab72d911fc
1b8c2766-92b7-477a-bc92-797a8cb74271
c37bf1d8-a6b2-4099-9d98-179b4e573c64
If you wanted that first line of hex(?) printed too then just change the starting number to compare c to from 2 to 1 (or 3 or 127 or however many lines you want to skip after hitting the vm_id line):
$ awk '/rows/{f=0} f&&(++c>1); /vm_id/{f=1}' file
bf6c4f90-2e71-4253-a7f6-dbe5d666d3a4
bf6c4f90-2e71-4253-a7f6-dbe5d666d3a4
6ffac9a9-1b6b-4600-8114-1ca0666951be
47b5e6d1-6ddd-424a-ab08-18ee35b54ebf
cc0e8b36-eba3-4846-af08-67ab72d911fc
1b8c2766-92b7-477a-bc92-797a8cb74271
c37bf1d8-a6b2-4099-9d98-179b4e573c64

What about this:
awk '/vm_id/{p=1;getline;next}/\([0-9]+ rows/{p=0}p'
I'm setting the p flag on vm_id and resetting it on (0-9+ rows).
Also sed comes in mind, the command follows basically the same logic as the awk command above:
sed -n '/vm_id/{n;:a;n;/([0-9]* rows)/!{p;ba}}'
Another thing, if it is safe that the only GUIDs in your input file are the vm ids, grep might be the tool of choise:
grep -Eo '([0-9a-f]+-){4}([0-9a-f]+)'
It's not 100% bullet proof in this form, but it should be good enough for the most use cases.
Bullet proof would be:
grep -Eoi '[0-9a-f]{8}(-[0-9a-f]{4}){3}-[0-9a-f]{12}'

very long line causing problems

I have a problem with a file containing ~80,000 lines. It is a large file of 23Gb. I have managed to chunk up similar files of that size with the following command:
awk '{fn = NR % 24; print > ("file1_" fn)}' file1
However, this command stalls on this one problem file. The problem file does have a very large line of 3 billion characters (longest lines in other files are less than 1 billion) and I am guessing this is the problem.
I would like to get rid of this long line from the file and proceed but this is proving difficult. I though simply using the following would work
awk 'length < 1000000000' file1 > file2
However, this is also still running after 3.5hrs. Is there a fast way of just going through a file and the moment a count for the number of characters in a line exceeds e.g. 1 billion, it stops counting and moves to the next line?

maybe you could try to combine the two awk lines into one command, it could be faster. Because, it processes your monster file only once. But you have to test.
awk '{fn = NR % 24; if(length< 1000000000) print > ("file1_" fn)}' file1

Try using sed to delete lines longer than a certain number of characters
# delete lines longer than 65 characters
sed '/^.\{65\}/d' file
You can also use a 2-step approach:
# use sed to output the line numbers containing lines
# longer than a certain number of characters
sed -n '/^.\{65\}/=' file
and then use that list to build a skip-list in awk, i.e. if NR equals any of those numbers, skip that line.

How to get few lines from a .gz compressed file without uncompressing

How to get the first few lines from a gziped file ?
I tried zcat, but its throwing an error
zcat CONN.20111109.0057.gz|head
CONN.20111109.0057.gz.Z: A file or directory in the path name does not exist.

zcat(1) can be supplied by either compress(1) or by gzip(1). On your system, it appears to be compress(1) -- it is looking for a file with a .Z extension.
Switch to gzip -cd in place of zcat and your command should work fine:
gzip -cd CONN.20111109.0057.gz | head
Explanation
-c --stdout --to-stdout
Write output on standard output; keep original files unchanged. If there are several input files, the output consists of a sequence of independently compressed members. To obtain better compression, concatenate all input files before compressing
them.
-d --decompress --uncompress
Decompress.

On some systems (e.g., Mac), you need to use gzcat.

On a mac you need to use the < with zcat:
zcat < CONN.20111109.0057.gz|head

If a continuous range of lines needs be, one option might be:
gunzip -c file.gz | sed -n '5,10p;11q' > subFile
where the lines between 5th and 10th lines (both inclusive) of file.gz are extracted into a new subFile. For sed options, refer to the manual.
If every, say, 5th line is required:
gunzip -c file.gz | sed -n '1~5p;6q' > subFile
which extracts the 1st line and jumps over 4 lines and picks the 5th line and so on.

If you want to use zcat, this will show the first 10 rows
zcat your_filename.gz | head
Let's say you want the 16 first row
zcat your_filename.gz | head -n 16

This awk snippet will let you show not only the first few lines - but a range you can specify. It will also add line numbers which i needed for debugging an error message pointing to a certain line way down in a gzipped file.
gunzip -c file.gz | awk -v from=10 -v to=20 'NR>=from { print NR,$0; if (NR>=to) exit 1}'
Here is the awk snippet used in the one liner above. In awk NR is a built-in variable (Number of records found so far) which usually is equivalent to a line number. the from and to variable are picked up from the command line via the -v options.
NR>=from {
print NR,$0;
if (NR>=to)
exit 1
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas