How to create mfra box for ismv file if it is not present? - live-streaming

I am working on Live Smooth Streaming and it creates ismv file.
I want to copy this ismv file to other location.
But as streaming is in progress so copied file is corrupted.
I tested this file using Mp4Explorer and compared with other ismv files.
I found that copied ismv file lacks mfra atom.
Please suggest how to add mfra atom for ismv file so that copied ismv file can be easily played?

If you're live streaming I would not expect the Movie Fragment Random Access (mfra) box to be present. I believe it is used only used server side to allow for easy fragment extraction and is never passed across to the player. You'll also see the mfra in any mp4 files you have on your local disk.
What you are likely capturing is the individual fragments of the stream. These are made from the Movie Fragment (moof) and Media data (mdat). The moof has two other atoms inside, Movie Fragment Header (mfhd) and the Track Fragment (traf).
If you are trying to reassemble the file from fragments you will need to do this (conceptually)
[ftyp][moov][fragment][fragment]...[fragment][mfra]
Where each [] is a box that may contain other boxes. See Alex Zambelli's blog for some good info. You will need to create the [ftyp][moov] and the [mfra] to properly represent all of your other fragments.

Related

Embedding PDF graphics in PDF output file programmatically

I am looking for a rough overview of how one would go about embedding graphics (coming from a PDF file) into another PDF file when writing a C++ document processor.
Background: I work on the LilyPond music typesetter, and recently added Cairo output to the system. Now I would like to support adding externally provided graphics to the PDF files that we generate (eg. adding a logo onto page laid out). This is trivial with EPS for PS output.
I can see how you could hook up Poppler to read the PDF, and render the PDF contents onto a Cairo surface, but I wonder if there is a simpler shortcut (eg. embed the PDF file as a binary stream, and then point directly to that stream).
If you need to go via an external route, like reading the PDF and writing it into an existing PDF using Cairo, that would be simpler. To do it manually:
A PDF page consists of a stream of operators for drawing it, and a dictionary of external resources (fonts, images etc.). To stamp one PDF page onto another, you would need to:
a) Find all objects for external resources in the stamp which are needed, and add them to the destination PDF.
b) Convert the page to a "Form Xobject", which is a sort of reusable piece of content. Add this to the /XObjects entry in the destination page, making sure to pick a fresh name.
c) Add some operators to the page content in the destination page to invoke the new xobject
To see how this might work, you could play with -stamp-as-xobject and -postpend-content "/XObjName Do" from section 8.4 of the cpdf manual.
Making this work for arbitrary PDFs is really not for the faint of heart, I'm afraid.

PDF editing directly then deleting edits still leaves pdf corrupted

My PDF looked fine until I edited it, and now it still appears to be corrupted even after I took out my edits. A file diff program is saying that the two files are the same, but only one is displaying the information.
To reproduce:
1) Open PDF and make sure there is stuff inside of it
2) Open PDF in a text editor and add text at the top
3) Open PDF normally and it is empty
4) delete text added in step 2
5) PDF is still corrupted despite having SAME file contents
This also happens if I literally copy and paste the code from a PDF into a different file and try to open that. It won't open.
Is there any way to be able to be able to add text to a PDF and have it not corrupt?
PDF is a binary format. Even if it looks quite text'ish, it is not text. In particular PDF files usually contain binary data streams, e.g. for images or embedded fonts or compressed arbitrary content. Furthermore, PDFs rely on PDF objects starting at offsets noted in a cross reference table or stream in the file.
Many text editors, though, do not only apply the changes you type in to a document but also do other stuff, like unifying line breaks (DOS CRLF or Unix LF or Max CR), replacing byte sequences they could not interpret by a special character (e.g. the Unicode REPLACEMENT CHARACTER) or dropping them altogether, etc.
The former (unifying line breaks) moves the data without updating the cross reference information, rendering it useless. If the bytes interpreted as line break characters were actually parts of binary stream data, the stream data also is damaged.
The latter (byte sequence replacement) usually damages contents of streams in the PDF with compressed data or other sensitive binary data beyond repair. Depending on the sequence length, this also moves data and so invalidates cross references.
Thus, using a text editor to edit a PDF usually is a sure way to break a PDF.
Is there any way to be able to be able to add text to a pdf and have it not corrupt?
Yes, using PDF aware software, e.g. Adobe Acrobat but there also are others. If you prefer a programming approach, use a good general purpose PDF library. There are such libraries for many programming platforms.
For a very few types of changes, one can also use a hex editor (only replacing some bytes, not inserting or removing anything), but you really should know what you are doing.

How to use ctags for code documentation

I have some source code that I want to document without touching the code. For every source file (e.g., example.cpp, example.f90, etc.) I would like to have a separate documentation file (e.g., example.cpp.doc, example.f90.doc) that has some metadata (ctag) linking it to the original source file.
Ideally I could open the source file and the documentation file in parallel views in my favorite editor (ViM) and have the two files synced so that they scroll together. In this manner, I can keep my documentation visually inline with the un-touched source code.
I know this is likely to be a unique scenario. But I'm hoping someone else has already figured this out.
Is this even a possibility?
Create the initial .doc structure outside of Vim such that the "metadata" you want to keep is in the same line number as the original file.
Then open the two files in different Vim windows with vim -O example.cpp example.cpp.doc. At this point use :windo set scrollbind to enable scroll binding, which will allow to navigate any of the windows while keeping both in sync.

A Table of Contents Page for a Scanned PDF

I was given some really old but very useful hand-written notes recently and in a bid to preserve them, I had them scanned into a file in the PDF format. What I have is a 35 page PDF but I want to add a contents page at the beginning so that I can use the first page to click my way to a specific topic.
More precisely,
I want a page which says
Topic 1
Topic 2
Topic 3
...
Each one should be linked to a page of my choosing.
I've explored a lot of standard tools out there to help me with this, like LibreOffice, pdftk etc. but the solution does not appear to be in the form of a simple application and a few clicks. My hunch is that this will require a program written in a suitable language. The way I'd want this program to work as follows:
ProgramName Input.pdf CustomTOC.txt
Where CustomTOC.txt could be a simple ASCII table containing two columns, one column being the title and the second column being the page number. The output of this program will be another PDF file which contains one page appended at the beginning of Input.pdf containing a table of contents with hyperlinks to the right pages.
I have managed to solve this problem though I don't think this is the best way to do it. I have written a Python program that accepts two mandatory inputs - the input PDF file and '|' separated ASCII table containing columns and page numbers. A third optional output can be the name of a PDF file which contains the output. If this is not provided then the original input file is rewritten.
How the code works? Uses a system call to 'pdftk' for bursting the PDF file into its constituent pages. Writes a .tex file which contains a \listoffigures command for the first page with the package hyperref ensuring it links to the figures. The later part of the .tex code contains several figure insertion statements where the PDF file corresponding to each page is inserted, providing captions only to those PDFs for which there is an entry in the provided TOC table.
Why the code is not ideal? It relies on too many dependencies. It relies on a system call to the pdftk package, it requires that LaTeX be also installed on the machine with the graphics package. In the current version of the code, the PDFs on each page do have some offset which I am trying to solve using geometry package with custom margin settings. I will try to post the code once this problem is solved.
A more ideal solution. That which does not require LaTeX and can use some PDF library within Python to achieve the same effect. Comments and suggestions welcome!

Extract layers from PDF file to HTML

I have a PDF file, containing layers.
For example, on some pages, there are graphs, with additional data displayed on top of that graph, when clicking (layers).
Now I need to try to fetch all these layers out of the PDF file, or to be precise, I need ALL the data from that PDF file, including layers. The pdf file contains javascript to show/hide the layers when appropriate.
What is the best approach? Is there any tool that actually works for my intentions? Or should I write something myself? (If this is possible ofcourse).
Edit:
Here you can download the PDF file:
http://www.2shared.com/document/IutUfDfr/OR_erasmus.html
The password for viewing is: erasmus
I do not know if there are any tools per se but if you cannot find those you might do the following:
for each combination of on/off layers that you are interested in walk all pages and collect the content streams. Tokenize those and cut out the content you do not want to see (the commands you need to monitor to determine this are BDC and EMC). Save the stream again with the clipped content (naturally save the result in different files). You need something to read the PDF object structure and update some objects (there are lot's of libraries for that), plus you need tobe able to parse the content streams.
Now you will have a set of PDF files without layers (optional content) for which there are plenty tools to render to HTML etc.
Note: optional content <--> layer switches in the PDF viewer usually are 1:1 but the standard supports a full n:m mapping. I would concentrate on the real optional content blocks that can be turned on/off to keep things simple.
you can use this tool to extract images and text from even locked pdfs
http://download.cnet.com/Able2Extract/3000-2079_4-10249654.html
I use it myself sometimes and it has the ability to convert to HTML