Upload large video to GCS use signed url but video is broken - vue.js

I have a application for Google App Engine(GAE), using Vue.js on the frontend and Flask on the backend.
My app allows users to upload large video and will analyze it. But since GAE's upload size limit is 32MB, so I allow users to upload directly to Google Cloud Storage(GCS) using signed url.
The problem I am facing is that the user can successfully upload the video to GCS, but on the backend(flask) when downloading the video for analysis, get the error below:
*** OSError: MoviePy error: failed to read the duration of file /tmp/source_video.mp4.
Here are the file infos returned by ffmpeg:
ffmpeg version 4.2.2-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 8 (Debian 8.3.0-6)
configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7444500] Format mov,mp4,m4a,3gp,3g2,mj2 detected only with low score of 1, misdetection possible!
[mov,mp4,m4a,3gp,3g2,mj2 # 0x7444500] moov atom not found
/tmp/source_video.mp4: Invalid data found when processing input
Flask code download:
class Analyser:
def __init__(self):
self.gcs_bucket = 'bucket_name'
self.gcs_blob_video = 'videos/'
def __storage_bucket(self):
client = storage.Client()
bucket = client.get_bucket(self.gcs_bucket)
bucket.cors = [
{
"origin": ["*"],
"responseHeader": [
"Access-Control-Allow-Origin"
],
"method": ['PUT', 'POST', 'GET'],
"maxAgeSeconds": 3600
}
]
bucket.patch()
return bucket
def __generate_upload_signed_url(self, bucket, blob):
blob = bucket.blob(blob)
return blob.generate_signed_url(
version='v4',
expiration=datetime.timedelta(minutes=15),
method='PUT',
)
def analyze_video(self, pid):
src_filepath = '/tmp/source_video.mp4'
bucket = self.__storage_bucket()
blob = bucket.blob(self.gcs_blob_video + 'filename.mp4')
blob.download_to_filename(src_filepath)
#error is here
video = VideoFileClip(src_filepath)
Vuejs code upload:
note: Uploaded successfully, on gcs there is a file to upload.
async uploadVideo(_: any, video: File): Promise<string> {
signed_url = "https://storage.googleapis.com/bucket_name...." #signed url is return from flask
const formData = new FormData();
formData.append('file', video);
const response_upload_file = await fetch(
signed_url,
{
method: 'PUT',
body: formData
}
)
return true
}
I've searched on google quite a bit and still haven't found a solution.
I really don't know if it's broken during the upload or when I download for analysis.
I would be very grateful if someone could suggest me a solution to solve this problem.
Thanks for reading.

You are using formData with the HTTP PUT method which corrupts the upload as a binary data stream is expected and not MIME data.
To use formdata:
Upload an object with HTML forms

Related

Agora Cloud Recording doesn't record mixed in audio files

Hi I have been successfully recording an Agora audio call, where one person speaks in a broadcast role, and during the call mixes in a number of audio files.
All the audio was being recorded until we upgraded to flutter 2 and associated upgraded packages.
Now all that is recorded is the broadcaster voice, and no mixed in audio.
The broadcaster and audience members can all hear the mixed in audio within the call without issue.
The code (Flutter) is similar to this:
Mix in Audio into a valid RTC session, with default settings
final playing = await session.playAudioFile(path, (){
state = MessagePlayerState.STOPPED;
if (!disposing) {
whenFinished();
}
});
The recording options are as follows (My UID is a hardcoded string, that is not the same as any participant UIDs)
http.Response response = await http.post(
Uri.https(AGORA_REST_URL, '$AGORA_REST_API_VERSION/$appId/cloud_recording/resourceid/$resourceId/mode/mix/start'),
headers: <String, String>{
HttpHeaders.authorizationHeader: 'Basic $basicAuth',
HttpHeaders.contentTypeHeader: 'application/json; charset=UTF-8',
},
body: jsonEncode(<String, dynamic>{
'cname': channelName,
'uid': uid,
'clientRequest': {
'recordingConfig':{
'channelType':0,
'streamTypes':2, // TODO: Should be a streamTypes of 0 (audio only), but get failures.
'audioProfile':1,
'videoStreamType':0,
'maxIdleTime':120,
'transcodingConfig':{
'width':360,
'height':640,
'fps':30,
'bitrate':600,
'maxResolutionUid':'1',
'mixedVideoLayout':1
},
'recordingFileConfig':{
'avFileType': ['hls','mp4']
}
},
'storageConfig':{
'vendor':1,
'region':3,
'bucket':AWS_RECORDING_BUCKET, // TODO: Env Var
'accessKey':AWS_BUCKET_ACCESS_KEY,
'secretKey':AWS_BUCKET_SECRET_KEY,
}
},
}),
);
The m3u8 and ts files are present in the S3 bucket.
Adjusting the metadata tags in S3 results in a file that plays fine in Safari, but no mixed in audio is heard.
Converting the file to aac with ffmpeg shows this error
[hls # 0x7fd6cc808200] Opening '2838cfc6254e9fec2e3088976f39d7ce_bip_20210618014151427.ts' for reading
[mpegts # 0x7fd6cc00a600] Packet corrupt (stream = 0, dts = 1437390).
size= 480kB time=00:00:30.69 bitrate= 128.1kbits/s speed=1.49e+03x
video:0kB audio:470kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.093976%
And the result is the same as from the S3 bucket.
Any help or hints appreciated.
This can be closed/ignored. Turns out we had an edge condition that did not show when the app was used normally, but if, for instance, you wanted a very stage managed recording to show off to others it broke.

Cannot upload files with ACL public-read to Digital Ocean spaces

I'm trying to upload images to a Digital Ocean space from the browser. These images should be public. I'm able to upload the images successfully.
However, though the ACL is set to public-read, the uploaded files are always private.
I know they're private because a) the dashboard says that the permissions are "private", and b) because the public urls don't work, and c) manually changing the permissions to "public" in the dashboard fixes everything.
Here's the overall process I'm using.
Create a pre-signed URL on the backend
Send that url to the browser
Upload the image to that pre-signed url
Any ideas why the images aren't public?
Code
The following examples are written in TypeScript and use AWS's v3 SDK.
Backend
This generates the pre-signed url to upload a file.
import { S3Client, PutObjectCommand } from '#aws-sdk/client-s3'
import { getSignedUrl } from '#aws-sdk/s3-request-presigner'
const client = new S3Client({
region: 'nyc3',
endpoint: 'https://nyc3.digitaloceanspaces.com',
credentials: {
accessKeyId: process.env.DIGITAL_OCEAN_SPACES_KEY,
secretAccessKey: process.env.DIGITAL_OCEAN_SPACES_SECRET,
},
})
const command = new PutObjectCommand({
ACL: 'public-read',
Bucket: 'bucket-name',
Key: fileName,
ContentType: mime,
})
const url = await getSignedUrl(client, command)
The pre-signed url is then sent to the browser.
Frontend
This is the code on the client to actually upload the file to Digital Ocean. file is a File object.
const uploadResponse = await fetch(url, {
headers: {
'Content-Type': file.type,
'Cache-Control': 'public,max-age=31536000,immutable',
},
body: file,
method: 'PUT',
})
Metadata
AWS SDK: 3.8.0
Turns out that for Digital Ocean, you also need to set the public-read ACL as a header in the put request.
//front-end
const uploadResponse = await fetch(url, {
headers: {
'Content-Type': file.type,
'Cache-Control': 'public,max-age=31536000,immutable',
'x-amz-acl': 'public-read', // add this line
},
body: file,
method: 'PUT',
})
I don't have the reputation to comment, hence adding a response. Thank you #Nick ... this is one of the few working examples of code I have seen for DigitalOcean pre-signed url. While the official DigitalOcean description here mentions Content-Type is needed for uploading with pre-signed urls, there is no example code.
Another mistake that prevented me from uploading a file using pre-signed URLs in DigitalOcean was using 'Content-Type':'multipart/form-data' and FormData().
After seeing this post, I followed #Nick's suggestion of using a File() object and 'Content-Type':'<relevant_mime>'. Then, the file upload worked like a charm. This is also not covered in official docs.
Try this to force ACL to Public in Digital Ocean Spaces:
s3cmd --access_key=YOUR_ACCESS_KEY --secret_key=YOUR_SECRET_KEY --host=YOUR_BUCKET_REGION.digitaloceanspaces.com --host-bucket=YOUR_BUCKET_NAME.YOUR_BUCKET_REGION.digitaloceanspaces.com --region=YOUR_BUCKET_REGION setacl s3://YOUR_BUCKET_NAME --acl-public

How to get a pre-signed URL that downloads file with http compression

Here is my code in node.js:
const downloadURL = await s3.getSignedUrlPromise('getObject', {
Bucket: BUCKET_NAME,
Key: 'key to a large json file',
});
One got the URL, I want to download a very large JSON file stored in S3 from browser. Since it is large, I would like to use HTTP compression which would compress a 20MB JSON to less than 1MB. I could not find anywhere how to do it or whether it is at all possible with S3 APIs.
I also tried to do below when using the signed URL to download file and it seems not work.
const dataRes = await fetch(downloadURL, {
headers: {
'Accept-Encoding': 'gzip, deflate',
},
method: 'GET',
});
Hope somebody could help me out. Thanks a lot!
After doing some study, I have resolved this. Post here and hope it is helpful to others.
You cannot ask S3 to compress file on the fly when getObject or using signed URL to getObject
You would have to save the zipped file into S3 in the first place. In Linux, using below command to do it:
gzip -9 <file to compress>
Upload the zipped file to S3
Use below code to generate the signed URL:
const downloadURL = await s3.getSignedUrlPromise('getObject', {
Bucket: BUCKET_NAME,
Key: 'key to a large zipped json file',
ResponseContentEncoding: 'gzip',
ResponseContentType: 'application/json',
});
Use below code to download from the signed URL:
const res = await fetch(downloadurl);
const jsonData = await res.json();

Google cloud speech syncrecognize "INVALID_ARGUMENT"

I have managed the "overview tutorial" : https://cloud.google.com/speech/docs/getting-started
Then I tried to use my own audio file . I uploaded a .flac file with a sample rate of 16000Hz.
I only changed the sync-request.json file below with my own audio file hosted on google cloud storage (gs://my-bucket/test4.flac)
{
"config": {
"encoding":"flac",
"sample_rate": 16000
},
"audio": {
"uri":"gs://my-bucket/test4.flac"
}
}
The file is well recognized but the request return an "INVALID_ARGUMENT" error
{
"error": {
"code": 400,
"message": "Unable to recognize speech, code=-73541, possible error in recognition config. Please correct the config and retry the request.",
"status": "INVALID_ARGUMENT"
}
}
As per this answer, all encodings support only 1 channel (mono) audio
I was creating the FLAC file with this command:
ffmpeg -i test.mp3 test.flac
Sample rate in request does not match FLAC header
But adding the -ac 1 (setting number of audio channels to 1) fixed this issue.
ffmpeg -i test.mp3 -ac 1 test.flac
Here is my full Node.js code
const Speech = require('#google-cloud/speech');
const projectId = 'EnterProjectIdGeneratedByGoogle';
const speechClient = Speech({
projectId: projectId
});
// The name of the audio file to transcribe
var fileName = '/home/user/Documents/test/test.flac';
// The audio file's encoding and sample rate
const options = {
encoding: 'FLAC',
sampleRate: 44100
};
// Detects speech in the audio file
speechClient.recognize(fileName, options)
.then((results) => {
const transcription = results[0];
console.log(`Transcription: ${transcription}`);
}, function(err) {
console.log(err);
});
Sample rate could be 16000 or 44100 or other valid ones, and encoding can be FLAC or LINEAR16. Cloud Speech Docs
My bad, as the doc "https://cloud.google.com/speech/docs/basics", the .flac file have to be a 16-bit PCM
Sumup:
Encoding: FLAC
Channels: 1 # 16-bit
Samplerate: 16000Hz
/!\ pay attention to not export a stereo file (2 channels) file which throw an other error (only one channel accepted) Google speech API internal server error -83104

Objc-C Mongodb Driver to Meteor CollectionFS

I have written an objective-c app that can write files to my meteor mongodb database. Using RadMongoDB (https://github.com/timburks/RadMongoDB) I write an image over to my mongo's gridfs .files and .chunks.
//Defining RadMongoDB
RadMongoDB *rad = [[RadMongoDB alloc] init];
//Connection Dictionary
NSDictionary *connection = #{
#"host" : #"127.0.0.1",
#"port" : [NSNumber numberWithInt:3002]};
int num =[rad connectWithOptions:connection];
[rad writeFile:path2 withMIMEType:#"image/png" inCollection:#"contacts" inDatabase:#"meteor"]
The image (path2) is successfully written to the gridfs. Im my meteor mondgodb shell I can see the files successfully written.
.chunks:
.files:
These gridfs files are linked to a collectionfs (https://github.com/CollectionFS/Meteor-CollectionFS) collection containing a variety of other pictures that have been inserted via a meteor app. The problem is pulling the image out written by the driver using collectionfs. It is apparent that the files written to gridfs by the driver do not get worked on by the file handlers. Therefore I tried re-forcing all of the files through (collectionfs filehandler reset) but that still didn't work: (Javascript Below, note ContactsFS is the collectionfs collection corresponding to gridfs contacts).
//Reset
ContactsFS.find({}).forEach(function(doc) {
ContactsFS.update({ _id: doc._id}, { $set: { handledAt: null, fileHandler: {} } });
});
//Set Completed to True
ContactsFS.update(fileRecord, {$set: {complete: true}});
I have come to the conclusion that the way the driver interacts with gridfs is much different than how meteor and collectionfs read and write to it. Is there any way to fix this? I am desperate for help, thanks!
EDIT:
After setting the uploaded files complete = true the file handler attempts to act on the drivers inserted file. However now I receive a server side error:
I believe this is because of how collectionfs reads the gridfs file. The gridfs image's data is stored by the obj-c driver as a Uint8Array (as seen in screenshot 1). I have tried setting each paramater on the obj-c driver image so that collectionfs will be happy:
ContactsFS.update(imageid, {$set: {handledAt: null}});
ContactsFS.update(imageid, {$set: {uploadDate: date}});
ContactsFS.update(imageid, {$set: {countChunks: 1}});
ContactsFS.update(imageid, {$set: {numChunks: 1}});
ContactsFS.update(imageid, {$set: {length: len}});›
ContactsFS.update(imageid, {$set: {uploading: false}});
ContactsFS.update(imageid, {$set: {encoding: encode}});
//Setting complete to True will send the file through the filehandlers
ContactsFS.update(imageid, {$set: {complete: true}});
Still nothing. How do I get around this problem?
Try this:
var len = "" + fileRecord.plength;
var chunkSize = 256 * 1024; // set this to whatever chunk size RadMongoDB is using
var chunkCount = Math.ceil(fileRecord.plength / chunkSize);
ContactsFS.update(imageid, {$set: {
handledAt: null,
uploadDate: Date.now(),
countChunks: chunkCount,
numChunks: chunkCount,
length: len,
uploading: false,
encoding: 'binary'
}});
ContactsFS.update(imageid, {$set: {complete: true}});
Requires the fix discussed in this issue, too.