split video (avi/h264) on keyframe

split video (avi/h264) on keyframe - indexing

Hallo.
I have a big video file. ffmpeg, tcprobe and other tool say, it is an h264-stream in an AVI-container.
Now i'd like to cut out small chunks form the video.
Problem: The index of the video seam corrupted/destroyed. I kind of fixed this via mplayer -forceidx -saveidx <IndexFile> <BigVideoFile>. The Problem here is, that I'm now stuck with mplayer/mencoder which can use this index file via -loadidx <IndexFile>. I have tried correcting the index like described in man aviindex (mplayer -frames 0 -saveidx mpidx broken.avi ; aviindex -i mpidx -o tcindex ; avimerge -x tcindex -i broken.avi -o fixed.avi), but this didn't fix my video - meaning that most tools i've tested couldn't search in the video file.
Problem: I cut out parts of the video via following command: mencoder -loadidx in.idx -ss 8578 -endpos 20 -oac faac -ovc x264 -sws 9 -lavfopts format=mp4 -x264encopts <LotsOfOpts> -of lavf -vf scale=800:-10,harddup in.avi -o out.mp4. Now here the problem is, that some videos are corrupted at the beginning. I think this is because the fact, that i do not necessarily cut at keyframe.
Questions:
What is the best way to fix the index of an avi "inline" so that every tool can again work as expected with it?
How can i split at the keyframes? Is there an mencoder-option for this?
Are Keyframes coming in a frequency? How to find out this frequency? (So with a bit of math it should be possible to calculate the next keyframe and cut there)
Is ther perhaps some completely other way to split this movie? Doing it by hand is no option, i've to cut out 1000+ chunks ...
Thanks a lot!

https://spreadsheets.google.com/ccc?key=0AjWmZ0umsuZHdHNzZVhuMTkxTHdYbUdCQzF3cE51Snc&hl=en lists the various options available for splitting accurately.

I would attempt to use avidemux to repair the file before doing anything. You may also have better results using an MP4 Based Container than AVI.
As for ensuring your specified intervals are right at the keyframe, I would suggest encoding with FFMPEG with the: -g 1 option before using the split below to ensure every frame is in fact a keyframe. FFMPEG refers to keyframs as GOP or Groups of Pictures instead.
ffmpeg -i input.avi -g 1 -vcodec copy -acodec copy out.avi
Then multiple splits (with FFMPEG) :
ffmpeg -i input.avi -ss 00:00:10 -t 00:00:30 out1.avi -ss 00:00:35 -t 00:00:30 out2.avi
Some more options to try:
x264 Mapping FFMPEG encoding in linux

Related

Error in using nco ncremap to remap one netcdf file to grid of another

I have a data set with multiple netcdf files with the same variables and structure, though the grid shifts in the time series periodically. I am working on simply remapping one file to another. However, the following command when run with the linked data files, 2016090618.nc and 2016090712.nc:
ncremap -d 2016090618.nc -i 2016090712.nc -o outputfile_2016090712.nc
results in the following error:
Input #00: /content/drive/MyDrive/2016090712.nc
Grid(src): /tmp/ncremap_tmp_grd_src.nc.pid198744
Grid(dst): /tmp/ncremap_tmp_grd_dst.nc.pid198744
Map/Wgt : /tmp/ncremap_tmp_map_nco_nco_con.nc.pid198744
ncks: ERROR nco_rgr_wgt() reports frc_out == frac_b contains all zeros
ncremap: ERROR Failed to horizontally regrid. cmd_rgr[0] failed. Debug this:
ncks -O -t 2 --no_tmp_fl --gaa remap_script=ncremap --gaa remap_command="'/usr/bin/ncremap -d 2016090618.nc -i 2016090712.nc -o outputfile_2016090712.nc'" --gaa remap_hostname=e3d132815114 --gaa remap_version=4.9.1 --hdr_pad=10000 --rgr lat_nm_out=lat#lon_nm_out=lon --map="/tmp/ncremap_tmp_map_nco_nco_con.nc.pid198744" "/content/drive/MyDrive/Projects/20220014_CMM3/RDRS_input_data/CaPA_coarse/2016090712.nc" "outputfile_2016090712.nc"
This is being run in Google Colab with nco installed (thus the /content/drive/MyDrive path, and I omitted the exclamation mark from the ncremap command above).
I have tried to unpack the data with the -U flag and looked at the -R argument to no avail.
Incidentally, the cdo command below works fine to remap the file, but results in changes of the variable organization and naming that doesn't work well for my purposes, so I am trying to solve this with nco.
cdo remapbil,2016090618.nc 2016090712.nc outputfile_2016090712.nc

The good news is that newer versions of NCO do not die like you show above, so you might try upgrading to NCO 5.1.4:
zender#spectral:~/Downloads$ ncremap --version
ncremap, the NCO regridder and grid, map, and weight-generator, version 5.1.5-alpha02 "Champignons"
...
zender#spectral:~/Downloads$ ncremap -d 2016090618.nc -i 2016090712.nc -o outputfile_2016090712.nc
Input #00: /Users/zender/Downloads/2016090712.nc
Grid(src): /var/folders/ct/rzzvxlqn4_3f9cr8wgn2pm480000gn/T/ncremap_tmp_grd_src.nc.pid33012
Grid(dst): /var/folders/ct/rzzvxlqn4_3f9cr8wgn2pm480000gn/T/ncremap_tmp_grd_dst.nc.pid33012
Map/Wgt : /var/folders/ct/rzzvxlqn4_3f9cr8wgn2pm480000gn/T/ncremap_tmp_map_nco_nco_con.nc.pid33012
zender#spectral:~/Downloads$
The bad news is that the input files, and thus the output file, all contain NaN values. NCO does not like NaN for reasons described here. So I cannot tell whether it works as intended. BTW, if you want bilinear rather than conservative regridding, then use ncremap --alg_typ=bilinear ....

Issues converting a small Hex value to a Binary value

I am trying to take the contents of a file that has a Hex number and convert that number to Binary and output to a file.
This is what I am trying but not getting the binary value:
xxd -r -p Hex.txt > Binary.txt
The contents of Hex.txt is: ff
I have also tried FF and 0xFF, but would like to just use ff since the device I am pulling the info from has it in that format.
Instead of 11111111 which it should be, I get a y with 2 dots above it.
If I change it to ee, I get an i with 2 dots. It seems to be reading it just fine but according to what I have read on the xxd -r -p command, it is not outputing it in the correct format.
The other ways I have found to convert Hex to Binary have either also not worked or is a pretty big Bash script that seems unnecessary to do what I thought would be a simple task.
This also gives me the y with 2 dots.
$ for i in $(cat Hex.txt) ; do printf "\x$i" ; done > Binary.txt
For some reason almost every solution I find gives me this format instead of a human readable Binary value with 1s and 0s.
Any help is appreciated. I am planning on using this in a script to pull the Relay values from Digital Loggers devices using curl and giving Home Assistant a readable file to record the Relay State. Digital Loggers curl cmd gives the state of all 8 relays at once using Hex instead of being able to pull the status of a specific relay.

If "file.txt" contains:
fe
0a
and you run this:
perl -ane 'printf("%08b\n",hex($_))' file.txt
You'll get this:
11111110
00001010
If you use it a lot, you might want to make a bash function of it in your login profile along these lines - being extremely respectful of spaces and semi-colons that might look unnecessary:
bin(){ perl -ane 'printf("%08b\n",hex($_))' $1 ; }
Then you'll be able to do:
bin file.txt
If you dislike Perl for some reason, you can achieve something similar without it as follows:
tr '[:lower:]' '[:upper:]' < file.txt |
while read h ; do
echo "obase=2; ibase=16; $h" | bc
done

How would you crack this (MD5 HashCat)?

I was given this file:
hashes.txt
experthead:e10adc3949ba59abbe56e057f20f883e
interestec:25f9e794323b453885f5181f1b624d0b
ortspoon:d8578edf8458ce06fbc5bb76a58c5ca4
reallychel:5f4dcc3b5aa765d61d8327deb882cf99
simmson56:96e79218965eb72c92a549dd5a330112
bookma:25d55ad283aa400af464c76d713c07ad
popularkiya7:e99a18c428cb38d5f260853678922e03
eatingcake1994:fcea920f7412b5da7be0cf42b8c93759
heroanhart:7c6a180b36896a0a8c02787eeafb0e4c
edi_tesla89:6c569aabbf7775ef8fc570e228c16b98
liveltekah:3f230640b78d7e71ac5514e57935eb69
blikimore:917eb5e9d6d6bca820922a0c6f7cc28b
johnwick007:f6a0cb102c62879d397b12b62c092c06
flamesbria2001:9b3b269ad0a208090309f091b3aba9db
oranolio:16ced47d3fc931483e24933665cded6d
spuffyffet:1f5c5683982d7c3814d4d9e6d749b21e
moodie:8d763385e0476ae208f21bc63956f748
nabox:defebde7b6ab6f24d5824682a16c3ae4
bandalls:bdda5f03128bcbdfa78d8934529048cf
I thought I had to separate them, for example I put the experthead, interestec, etc. in one file named wordtext.txt and e10adc3949ba59abbe56e057f20f883e, etc in another file called hash.txt.
I then ran this:
hashcat -m 0 -a 0 /Users/myname/Desktop/hash.txt /Users/myname/Desktop/wordtext.txt -O
but I couldn't get anything. And then I googled e10adc3949ba59abbe56e057f20f883e and the output was 123456 so now I don't know how to approach this problem.

Just leave the hashes (erase the plaintext) on the txt file, hashcat will sort them out by itself. What I do is: hashcat.exe -m 0 -a 0 hashFile.txt dict.txt --show

The file appears to be in username:hash format. By default, hashcat assumes that only hashes are in the target file.
You can change this behavior with hashcat's --username option.

You don't need to place the -O at the end. It should work perfectly without it, but you do need hashcat.exe in the beginning.

How can I get the disk IO trace with actual input values?

I want to generate some trace file from disk IO, but the problem is I need the actual input data along with timestamp, logical address and access block size, etc.
I've been trying to solve the problem by using the "blktrace | blkparse" with "iozone" on the ubuntu VirtualBox environment, but it seems not working.
There is an option in blkparse for setting the output format to show the packet data, -f "%P", but it dose not print anything.
below is the command that i use:
$> sudo blktrace -a issue -d /dev/sda -o - | blkparse -i - -o ./temp/blktrace.sda.iozone -f "%-12C\t\t%p\t%d\t%S:%n:%N\t\t%P\n"
$> iozone -w -e -s 16M -f ./mnt/iozone.dummy -i 0
In the printing format "%-12C\t\t%p\t%d\t%S:%n:%N\t\t%P\n", all other things are printed well, but the "%P" is not printed at all.
Is there anyone who knows why the packet data is not displayed?
OR anyone who knows other way to get the disk IO packet data with actual input value?

As far as I know blktrace does not capture the actual data. It just capture the metadata. One way to capture real data is to write your own kernel module. Some students at FIU.edu did that in this paper:
"I/O deduplication: Utilizing content similarity to ..."
I would ask this question in linux-btrace mailing list as well:
http://vger.kernel.org/majordomo-info.html

High-res images from PDFS

I'm working on a project in which I need to extract a TIFF per page from multi-page PDFs. The PDFs contain images only and there is one image per page (I believe they were made on some kind of photocopier/scanner, but haven't confirmed this). The TIFFs are then used to create several other derivative versions of the document so the higher the resolution the better.
I've found two recipes, both with helpful aspects, but neither is ideal. Hoping someone can help me tune one of them, or offer a third option.
Recipe 1, pdfimages and ImageMagick:
First do:
$ pdfimages $MY_PDF.pdf foo"
Which results in several .pbm files (named foo-000.pbm, foo-001.pbm), etc.
Then for each *.pbm do:
$ convert $each -resize 3200x3200\> -quality 100 $new_name.tif
Pro: The resultant TIFFs are a healthy 3300+ pixels on the long dimension, (-resize just serves to normalize everything)
Con: The orientation of the pages is lost, and they come out rotated different directions (they follow logical patterns, so probably they are the orientation in which they were fed to the scanner??).
Recipe 2 Imagemagick solo:
convert +adjoin $MY_PDF.pdf pages.tif
This gives me a TIFF per page (pages-0.tif, pages-1.tif, etc.).
Pro: Orientation stays!
Con: The long dimension of the resultant file is < 800 px, which is too small to be useful, and it looks as though there is some compression applied.
How can I ditch the scaling of the image stream in the PDF, but retain the orientation? Is there some more magick in ImageMagick that I'm missing? Something else entirely?

Sorry for the noise on this old topic, but google took me here as one of the top results and it might take others, so I thought I'd post the solution for the TO's question that I found here: http://robfelty.com/2008/03/11/convert-pdf-to-png-with-imagemagick
In Short: You have to tell ImageMagick at which density it should scan the PDF.
so convert -density 600x600 foo.pdf foo.png will tell ImageMagick to treat the PDF as if it had a 600dpi resolution and thus output much larger PNGs. In my case, the resulting foo.png was sized 5000x6600px. You can optionally add -resize 3000x3000 or whatever size you require and it will be scaled down.
Note that as long as you only have vector images or text in your PDF-files, density might be set as high as needed. If the PDF contains rasterized images, it won't look good if you set it higher than those images' dpi, surprise! :)
Chris

I wanted to share my solution...it may not work for everyone, but since nothing else has come around maybe it will help someone else. I wound up going with the first option in my question, which was to use pdfimages to get large images that were rotated every which way. I then found a way to use OCR and word counts to guess at the orientation, which got me from (estimated) 25% rotated accurately to above 90%.
The flow is as follows:
Use pdfimages (apt-get install poppler-utils) to get a set of pbm
files (not shown below).
For each file:
Make four versions, rotated 0, 90, 180, and 270 degrees (I refer to them as "north", "east", "south", and "west" in my code).
OCR each. The two with the lowest word count are likely the right-side up and upside down versions. This was over 99% accurate in my set of images processed to date.
From the two with the lowest word count, run the OCR output through a spell check. The file with the least spelling errors (i.e. most recognizable words) is likely to be correct. For my set this was about 93% (up from 25%) accurate based on a sample of 500.
YMMV. My files are bitonal and highly textual. The source images are an average of 3300 px on the long side. I can't speak to greyscale or color, or files with a lot of images. Most of my source PDFs are bad scans of old photocopies, so the accuracy might be even better with cleaner files. Using -despeckle during the rotation made no difference and slowed things down considerably (~5×). I chose ocrad for speed and not accuracy since I only need rough numbers and am throwing away the OCR. Re: performance, my nothing-special Linux desktop machine can run the whole script over about 2-3 files/per second.
Here's the implementation in a simple bash script:
#!/bin/bash
# Rotates a pbm file in place.
# Pass a .pbm as the only arg.
file=$1
TMP="/tmp/rotation-calc"
mkdir $TMP
# Dependencies:
# convert: apt-get install imagemagick
# ocrad: sudo apt-get install ocrad
ASPELL="/usr/bin/aspell"
AWK="/usr/bin/awk"
BASENAME="/usr/bin/basename"
CONVERT="/usr/bin/convert"
DIRNAME="/usr/bin/dirname"
HEAD="/usr/bin/head"
OCRAD="/usr/bin/ocrad"
SORT="/usr/bin/sort"
WC="/usr/bin/wc"
# Make copies in all four orientations (the src file is north; copy it to make
# things less confusing)
file_name=$(basename $file)
north_file="$TMP/$file_name-north"
east_file="$TMP/$file_name-east"
south_file="$TMP/$file_name-south"
west_file="$TMP/$file_name-west"
cp $file $north_file
$CONVERT -rotate 90 $file $east_file
$CONVERT -rotate 180 $file $south_file
$CONVERT -rotate 270 $file $west_file
# OCR each (just append ".txt" to the path/name of the image)
north_text="$north_file.txt"
east_text="$east_file.txt"
south_text="$south_file.txt"
west_text="$west_file.txt"
$OCRAD -f -F utf8 $north_file -o $north_text
$OCRAD -f -F utf8 $east_file -o $east_text
$OCRAD -f -F utf8 $south_file -o $south_text
$OCRAD -f -F utf8 $west_file -o $west_text
# Get the word count for each txt file (least 'words' == least whitespace junk
# resulting from vertical lines of text that should be horizontal.)
wc_table="$TMP/wc_table"
echo "$($WC -w $north_text) $north_file" > $wc_table
echo "$($WC -w $east_text) $east_file" >> $wc_table
echo "$($WC -w $south_text) $south_file" >> $wc_table
echo "$($WC -w $west_text) $west_file" >> $wc_table
# Take the bottom two; these are likely right side up and upside down, but
# generally too close to call beyond that.
bottom_two_wc_table="$TMP/bottom_two_wc_table"
$SORT -n $wc_table | $HEAD -2 > $bottom_two_wc_table
# Spellcheck. The lowest number of misspelled words is most likely the
# correct orientation.
misspelled_words_table="$TMP/misspelled_words_table"
while read record; do
txt=$(echo $record | $AWK '{ print $2 }')
misspelled_word_count=$(cat $txt | $ASPELL -l en list | wc -w)
echo "$misspelled_word_count $record" >> $misspelled_words_table
done < $bottom_two_wc_table
# Do the sort, overwrite the input file, save out the text
winner=$($SORT -n $misspelled_words_table | $HEAD -1)
rotated_file=$(echo $winner | $AWK '{ print $4 }')
mv $rotated_file $file
# Clean up.
if [ -d $TMP ]; then
rm -r $TMP
fi

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas