Setting up box metadata for MediaSource to work - html5-video

Using mp4box to add h264 to a fragmented dash container (mp4)
Appending the m4s (dash) media segments. Don't know how to preserve order in the box metadata and not sure if it can be edited when using mp4box. Basically if I look at the segments this demo:
http://francisshanahan.com/demos/mpeg-dash
there is a bunch of stuff latent in the m4s files that specify order. Like the mfhd atom that has the sequence number. Or the sidx that keeps the earliest presentation time (which is identical to the base media decode time in the tfdt). Plus my sampleDuration times are zeroed in the sample entries in the trun (track fragment run).
Have tried to edit my m4s files with mp4parser without luck. Wondering if somebody else has taken h264 and built up a MediaSource stream?

Related

Audiobook chapters that don't start at beginning of file

We've implemented a SMAPI service and are attempting to serve up an audiobook. We can select the audiobook and start playback, but we run into issues when we want to move between chapters because our audio files are not split by chapter. Each audiobook is divided into roughly equal-length parts, and we have information on which part and how far into the part each chapter starts.
So we've run into an issue where our getMetadata response is giving back the chapters of the audiobook because that's how we'd like a user to be able to navigate the book, but our getMediaURI responses for each chapter are giving back URLs for the parts the audio files are divided into, and we seem to be unable to start at a specific position in those files.
Our first attempt to resolve the issue was to include positionInformation in our getMediaURI response. That would still leave us with an issue of ending a chapter at the appropriate place, but might allow us to start at the appropriate place. But according to the Sonos docs, you're not meant to include position information for individual audiobook chapters, and it seems to be ignored.
Our second thought, and possibly a better solution, was to use the httpHeaders section of the getMediaURI response to set a Range header for only the section of the file that corresponds to the chapter. But Sonos appears to have issues with us setting a Range header, and seems to either ignore our header or break when we try to play a chapter. We assume this is because Sonos is trying to set its own Range headers.
Our current thought is that we might be able to pass the media URLs through some sort of proxy, adjusting the Sonos Range header by adding an offset to the start and end values based on where the chapter starts in the audio file.
So right now we return <fileUrl> from getMediaURI and Sonos sends a request like this:
<fileUrl>
Range: bytes=100-200
Instead we would return <proxyUrl>?url=<urlEncodedFileUrl>&offset=3000 from getMediaURI. Sonos would send something like this:
<proxyUrl>?url=<htmlEncodedFileUrl>&offset=3000
Range: bytes=100-200
And the proxy would redirect to something like this:
<fileUrl>
Range: bytes=3100-3200
Has anyone else dealt with audio files that don't match up one-to-one with their chapters? How did you deal with it?
The simple answer is that Sonos players respect the duration of the file, not the duration expressed in the metadata. You can't get around this with positionInformation or Cloud Queues.
However, the note that you shouldn't use positonInformation for chapters in an audiobook seems incorrect, so I removed it. The Saving and Resuming documentation states that you should include it if a user is resuming listening. You could use this to start playback at a specific position in the audio file. Did you receive an error when you attempted to do this?
Note that you would not be able to stop playback within the file (for example, if a chapter ended before the file ended). The player would play the entire file before stopping. The metadata would also not change until the end of the file. So, for example, if the metadata for the file is "Chapter 2" and chapter 2 ends before the end of the file, the Sonos app would still display "Chapter 2" until the end of the file.
Also note that the reporting APIs have been deprecated. See Add Reporting for the new reporting endpoint that your service should host.

Is it possible to slice a video file blob and then re-encode it server side?

Been absolutely banging my head on this one and I would like a sanity check. Mainly if what I want to do it's even possible as I am severely constrained by react-native which has pretty dodgy Blob support.
We all know that video encoding is expensive, thus instead of forcing the user to encode using ffmpeg I would like to delegate the whole process to the backend. It's all good, except that sometimes you might want to trim 30s of a video and it's pointless to upload 3+ minutes of it.
So I had this idea of slicing the blob of the video file:
const startOffset = (startTime * blobSize) / duration;
const endOffset = (endTime * blobSize) / duration;
const slicedBlob = blob.slice(startOffset, endOffset);
// Setting the type as third option is ignored
Something like this, the problem is that the file becomes totally unreadable once it reaches the backend.
React Native cannot handle Blob uploads, thus they are converted in base64, which is totally fine for the whole video, but not for the sliced blob.
This even if I keep the beginning intact:
const slicedBlob = blob.slice(0, endOffset);
I feel like the reason is that the file becomes a application/octet-stream which might impact the decoding?
I am at a bit of a loss here as I cannot understand if this is a react native issue with blobs or if it simply cannot be done.
Thanks for any input.
p.s. I prefer to stick to vanilla expo without using external libraries, I am aware that one exists to handle blobs, but not keen on ejecting relying on external libraries if possible.
You can not simply cut of chunks of a file and have it readable on the other side. For example, in an mp4 the video resolution is only stored in one place. If those bytes get removed, the decoder has no idea how to decode the video.
Yes it is possible to repackage the video client side by rewriting the container, and dropping full GOPs. But it’s would be about 1000 lines for code for you to write and would be limited to certain codecs and containers.

Replacing bytes of an uploaded file in Amazon S3

I understand that in order to upload a file to Amazon S3 using Multipart, the instructions are here:
http://docs.aws.amazon.com/AmazonS3/latest/dev/llJavaUploadFile.html
How do I go about replacing the bytes (say, between the range 4-1523) of an uploaded file? Do I need to make use of Multipart Upload to achieve this? or do I fire a REST call with the range specified in the HTTP header?
Appreciate any advice.
Objects in S3 are immutable.
If it's a small object, you'll need to upload the entire object again.
If it's an object over 5MB in size, then there is a workaround that allows you to "patch" a file, using a modified approach to the multipart upload API.
Background:
As you know, a multipart upload allows you to upload a file in "parts," with minimum part size 5MB and maximum part count 10,000.
However a multipart "upload" doesn't mean you have to "upload" all the data again, if some or all of it already exists in S3, and you can address it.
PUT part/copy allows you to "upload" the individual parts by specifying octet ranges in an existing object. Or more than one object.
Since uploads are atomic, the "existing object" can be the object you're in the process of overwriting, since it remains unharmed and in place until you actually complete the multipart upload.
But there appears to be nothing stopping you from using the copy capability to provide the data for the parts you want to leave the same, avoiding the actual upload then using a normal PUT part request to upload the parts that you want to have different content.
So, while not a byte-range patch with granularity to the level of 1 octet, this could be useful for emulating an in-place modification of a large file. Examples of valid "parts" would be replacing a minimum 5 MB chunk, on a 5MB boundary, for files smaller than 50GB, or replacing a mimimum 500MB chunk on 500MB boundary for objects up to 5TB, with minimum part sizes varying between those to extremes, because of the requirement that a multipart upload have no more than 10,000 parts. The catch is that a part must start at an appropriate offset, and you need to replace the whole part.
Michael's answer is pretty explanatory on the background of the issue. Just adding the actual steps to be performed to achieve this, in case you're wondering.
List object parts using ListParts
Identify the part that has been modified
Start a multipart upload
Copy the unchanged parts using UploadPartCopy
Upload the modified part
Finish the upload to save the modification
Skip 2 if you already know which part has to be changed.
Tip: Each part has an ETag, which is MD5 hash of the specified part. This can be used to verify is that particular part has been changed.

Watson speech to text in Node-RED crashes Node-RED application

I am using Node-RED and I am trying to store text to speech data in a Cloudant database. That works fine and I also can get it out in msg.payload.speech but when I feed it into Speech To Text, my whole app crashes..... with this error:
ERR Dropped log message: message too long (>64K without a newline)
so it seems that the Speech To Text node cannot handle large messages. It also seems that Text to Speech makes a very long string regardless of what you inject. One word or a whole paragraph does not make any difference.
Is there a way to get around this issue in Node-RED?
What happens if you split the audio that you feed to the STT service into smaller chunks? Does that work? How much audio are you trying to feed?
If you give us more details about what you are trying to accomplish then we should be able to help.
Can you also explain the problem that you are experiencing with TTS, what do you mean with "Text to Speech makes a very long string regardless of what you inject"?
thank you
Thanks for your reaction.
What I basically want to do is, using the S2T node in Node-RED. I have put a .wav file in a Cloudant database. So when I feed this .wav file into the S2T node, the app crashes. I used several ways to get the Speech into the database; 1. via the text to speech node, 2. added manually the .wav file in the database.
When I look in Cloudant, it is one long line of characters, so I have placed the wave-file on different lines, which did not help, then I have split up the wave-file in smaller chucks, this did not work either probably because the wave file loses it's structure.
The next thing I tried was using a flac file, which is also supported by T2S and S2T and this is a compressed audio file (factor 10), it would be less then 64k. But I got the message that only wav files are supported. Then I looked in the code of the S2T-node and found out that only wav is supported (the Watson S2T service in Bluemix supports more audio formats).

How do I know the duration of a sound in Web Audio API

It's basicly all in the title, how do I know or how can I access the duration of a soundNode in Web Audio API. I was expecting something like source.buffer.size or source.buffer.duration to be available.
The only alternative I can think of in case this is not possible to acomplish is to read the file metadata.
Assuming that you are loading audio, when you decode it with context.decodeAudioData the resulting arrayBuffer has a .duration property. This would be the same arraybuffer you use to create the source node.
You can look at the SoundJS implementation, although there are easier to follow tutorials out there too.
Hope that helps.
Good and bad news; nodes do not have length - but I bet you can achieve what you want another way.
Audio nodes are sound processors or generators. The duration of a sound processed or made by a node can change - to the computer it is all the same, lack of sound is a buffer full of zeroes instead of other values.
So, if you needed to dynamically determine the duration of a sound, you could write a node which timed 'silences' in the input audio it received.
However I suspect from your mentioning of meta data that you are loading existing sounds - in which case you should indeed use their meta data, or determine the length by loading into an audio element and requesting the length from that.