I have managed the "overview tutorial" : https://cloud.google.com/speech/docs/getting-started
Then I tried to use my own audio file . I uploaded a .flac file with a sample rate of 16000Hz.
I only changed the sync-request.json file below with my own audio file hosted on google cloud storage (gs://my-bucket/test4.flac)
{
"config": {
"encoding":"flac",
"sample_rate": 16000
},
"audio": {
"uri":"gs://my-bucket/test4.flac"
}
}
The file is well recognized but the request return an "INVALID_ARGUMENT" error
{
"error": {
"code": 400,
"message": "Unable to recognize speech, code=-73541, possible error in recognition config. Please correct the config and retry the request.",
"status": "INVALID_ARGUMENT"
}
}
As per this answer, all encodings support only 1 channel (mono) audio
I was creating the FLAC file with this command:
ffmpeg -i test.mp3 test.flac
Sample rate in request does not match FLAC header
But adding the -ac 1 (setting number of audio channels to 1) fixed this issue.
ffmpeg -i test.mp3 -ac 1 test.flac
Here is my full Node.js code
const Speech = require('#google-cloud/speech');
const projectId = 'EnterProjectIdGeneratedByGoogle';
const speechClient = Speech({
projectId: projectId
});
// The name of the audio file to transcribe
var fileName = '/home/user/Documents/test/test.flac';
// The audio file's encoding and sample rate
const options = {
encoding: 'FLAC',
sampleRate: 44100
};
// Detects speech in the audio file
speechClient.recognize(fileName, options)
.then((results) => {
const transcription = results[0];
console.log(`Transcription: ${transcription}`);
}, function(err) {
console.log(err);
});
Sample rate could be 16000 or 44100 or other valid ones, and encoding can be FLAC or LINEAR16. Cloud Speech Docs
My bad, as the doc "https://cloud.google.com/speech/docs/basics", the .flac file have to be a 16-bit PCM
Sumup:
Encoding: FLAC
Channels: 1 # 16-bit
Samplerate: 16000Hz
/!\ pay attention to not export a stereo file (2 channels) file which throw an other error (only one channel accepted) Google speech API internal server error -83104
Related
I want to upload a video using the instagram graph api however the API asks for an url (and not a local path) for the video parameter. The video I want to upload is on my desktop (local) so I don't know how I can convert it so it works.
When I put a local path instead of the url I get the following message:
`
{
"error": {
"message": "The video file you selected is in a format that we don't support.",
"type": "OAuthException",
"code": 352,
"error_subcode": 2207026,
"is_transient": false,
"error_user_title": "Unsupported format",
"error_user_msg": "The video format is not supported. Please check the specs for the supported streams format",
"fbtrace_id": "AA3qcWZCmMsp0OfEE_m1cLJ"
}
}
`
Hi I have been successfully recording an Agora audio call, where one person speaks in a broadcast role, and during the call mixes in a number of audio files.
All the audio was being recorded until we upgraded to flutter 2 and associated upgraded packages.
Now all that is recorded is the broadcaster voice, and no mixed in audio.
The broadcaster and audience members can all hear the mixed in audio within the call without issue.
The code (Flutter) is similar to this:
Mix in Audio into a valid RTC session, with default settings
final playing = await session.playAudioFile(path, (){
state = MessagePlayerState.STOPPED;
if (!disposing) {
whenFinished();
}
});
The recording options are as follows (My UID is a hardcoded string, that is not the same as any participant UIDs)
http.Response response = await http.post(
Uri.https(AGORA_REST_URL, '$AGORA_REST_API_VERSION/$appId/cloud_recording/resourceid/$resourceId/mode/mix/start'),
headers: <String, String>{
HttpHeaders.authorizationHeader: 'Basic $basicAuth',
HttpHeaders.contentTypeHeader: 'application/json; charset=UTF-8',
},
body: jsonEncode(<String, dynamic>{
'cname': channelName,
'uid': uid,
'clientRequest': {
'recordingConfig':{
'channelType':0,
'streamTypes':2, // TODO: Should be a streamTypes of 0 (audio only), but get failures.
'audioProfile':1,
'videoStreamType':0,
'maxIdleTime':120,
'transcodingConfig':{
'width':360,
'height':640,
'fps':30,
'bitrate':600,
'maxResolutionUid':'1',
'mixedVideoLayout':1
},
'recordingFileConfig':{
'avFileType': ['hls','mp4']
}
},
'storageConfig':{
'vendor':1,
'region':3,
'bucket':AWS_RECORDING_BUCKET, // TODO: Env Var
'accessKey':AWS_BUCKET_ACCESS_KEY,
'secretKey':AWS_BUCKET_SECRET_KEY,
}
},
}),
);
The m3u8 and ts files are present in the S3 bucket.
Adjusting the metadata tags in S3 results in a file that plays fine in Safari, but no mixed in audio is heard.
Converting the file to aac with ffmpeg shows this error
[hls # 0x7fd6cc808200] Opening '2838cfc6254e9fec2e3088976f39d7ce_bip_20210618014151427.ts' for reading
[mpegts # 0x7fd6cc00a600] Packet corrupt (stream = 0, dts = 1437390).
size= 480kB time=00:00:30.69 bitrate= 128.1kbits/s speed=1.49e+03x
video:0kB audio:470kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 2.093976%
And the result is the same as from the S3 bucket.
Any help or hints appreciated.
This can be closed/ignored. Turns out we had an edge condition that did not show when the app was used normally, but if, for instance, you wanted a very stage managed recording to show off to others it broke.
I'm working on an AWS Lambda application that needs to take a TIFF file and convert it to a PDF. I'm using ImageMagick exensively, so the easiest thing to do was: convert input.tif output.pdf. That works fine in my Mac environment, but fails in to convert to a true PDF in the Lambda environment.
The ImageMagick build on Lambda seems to not support PDFs. If I run convert -list format in the Lambda environment, there's no entry for PDF. Here's my test Lambda function:
const im = require('imagemagick');
const fs = require('fs');
exports.handler = (event, context, callback) => {
var inputFileName = 'input.tif';
var imagesPath = 'assets/images';
var outputFile = '/tmp/output.pdf';
var args = [
imagesPath+'/'+inputFileName,
'-format',
'pdf',
outputFile
];
im.convert(args,
function(err, stdout, stderr){
if (err) throw err;
console.log('stdout:', stdout);
var imageRef = fs.readFileSync(outputFile);
callback(null, {
statusCode: 200,
headers: {
'Content-Type': 'application/pdf',
'Content-Disposition': 'attachment; filename=output.pdf'
},
body: imageRef.toString('base64'),
isBase64Encoded: true
});
});
}
When I run identify output.pdf (i.e. the downloaded file), the file is reported as a TIFF file:
/Users/myuser/Downloads/output.pdf TIFF 517x243 517x243+0+0 8-bit CMYK 1.1314MiB 0.000u 0:00.009
So ImageMagick seems to just be passing it through as a TIFF file.
I've tried using tiff2pdf - which is installed locally; not sure about Lambda - but that doesn't even work on my Mac. I get an error saying:
tiff2pdf: No support for /path/to/input.tif with 5 samples per pixel.
Created a Bluemix app to get the proper credentials and using Fiddler Text to Speech(TTS) to record prompts. Recordings use the default "Michael" voice. I want Allison.
If I try passing in "voice", I get the following error, even when I specify "Michael" as my choice:
{
"code_description": "Bad request",
"code": 400,
"error": "The argument(s) [u'voice'} are not allowed."
}
This is my payload:
{
"text": "Hello,, this is Dora. How are you today?",
"voice": "en-US_AllisonVoice"
}
I have a developer account, do I need to sign up to use "voice"? Even if I pass in the default "Michael"?
I think your problem is in the way you are specifing the voice parameter.
The voice and text parameters can be send as query parameters in a GET.
Examples
1. Curl
curl -u "{username}":"{password}" "https://stream.watsonplatform.net/text-to-speech/api/v1/synthesize?voice=en-US_AllisonVoice&text=Hello%2C%20this%20is%20Dora.%20How%20are%20you%20today%3F"
Node
var watson = require('watson-developer-cloud');
var fs = require('fs');
var text_to_speech = watson.text_to_speech({
username: '<username>',
password: '<password>',
version: 'v1'
});
var params = {
text: 'Hello, this is Dora. How are you today?',
voice: 'en-US_AllisonVoice',
accept: 'audio/wav'
};
// Pipe the synthesized text to a file
text_to_speech.synthesize(params).pipe(fs.createWriteStream('output.wav'));
See the Text to Speech API Reference for more examples on how to call the service.
Try the example above here:
https://text-to-speech-demo.mybluemix.net/api/synthesize?voice=en-US_AllisonVoice&text=Hello%2C%20this%20is%20Dora.%20How%20are%20you%20today%3F
Powered by the demo app: https://text-to-speech-demo.mybluemix.net
I want to have only audio stream communication between peers , I changed the parts of kurento.utils.js to get only audio stream via getusermedia
but it's not working
I used this example node-hello-world example
WebRtcPeer.prototype.userMediaConstraints = {
audio : true,
video : {
mandatory : {
maxWidth : 640,
maxFrameRate : 15,
minFrameRate : 15
}
}
};
to
WebRtcPeer.prototype.userMediaConstraints = {
audio : true,
video : false
};
is it possible use kurento service for only audio stream?
This is indeed possible with Kurento. There are two ways of doing this, depending on the desired scope of the modification:
Per webrtc endpoint: when you process the SDP offer sent by the client, you get an SDP answer from KMS that you have to send back. After invoking the processOffer method call, you can tamper the SDP to remove all video parts. That way, your client will send back only audio.
Globally: You can edit /etc/kurento/sdp_pattern.txt file removing all video related parts, this will force SdpEndpoints (parent class of WebrtcEndpoint) to only use audio.
EDIT 1
The file sdp_pattern.txt is deprecated in KMS 6.1.0, so method 2 shouldn't be used.
EDIT 2
There was an issue with the kurento-utils library, and the client was not correctly setting the OfferToReceiveAudio. It was fixed some time ago, and you shouldn't need tampering the SDPs now.
git origin: https://github.com/Kurento/kurento-tutorial-js.git
git branch: 6.6.0
My solution is only changing var offerVideo = true; to var offerVideo = false; in generateOffer function of kurento-utils.js file.
My approach is to modify the options that you pass to the WebRtcPeer.
var options = {
onicecandidate: onIceCandidate,
iceServers: iceServers,
mediaConstraints: {
audio:true,
video:false
}
}
Besides, in the kurento-utils.js, the mediaContraints is overidden by this line:
constraints.unshift(MEDIA_CONSTRAINTS);
So comment it.