How to pass audio stream recorded with WebRTC to Google Speech api for realtime transcription? - webrtc

What I'm trying to do is get real time transcription for video recorded in the browser with webRTC. Use case is basically subtitles in real time like google hangouts has.
So I have a WebRTC program running in the browser. It sends webm objects back to the server. They are linear32 audio encodings. Google speech to text only accepts linear16 or Flac files.
Is there a way to convert linear32 to linear16 in real time?
Otherwise has anyone been able to hook up webRTC with Google speech to get real time transcriptions working?
Any advice on where to look to solve this problem would be great

Check out this repository it might help you - https://github.com/muaz-khan/Translator
Translator.js is a JavaScript library built top on Google Speech-Recognition & Translation API to transcript and translate voice and text. It supports many locales and brings globalization in WebRTC!

I had the same problem and failed with webRTC. I recommend you use the Web Audio Api instead if you are just interested in transcribing the audio from the video.
Here is how I did it with a nodejs sever and react client app. It is uploaded to github here
You need an audio worklet script. (Put it in the public folder because that is where the API expects to find it)
recorderWorkletProcessor.js (saved in public/src/worklets/recorderWorkletProcessor.js)
/**
An in-place replacement for ScriptProcessorNode using AudioWorklet
*/
class RecorderProcessor extends AudioWorkletProcessor {
// 0. Determine the buffer size (this is the same as the 1st argument of ScriptProcessor)
bufferSize = 2048;
// 1. Track the current buffer fill level
_bytesWritten = 0;
// 2. Create a buffer of fixed size
_buffer = new Float32Array(this.bufferSize);
constructor() {
super();
this.initBuffer();
}
initBuffer() {
this._bytesWritten = 0;
}
isBufferEmpty() {
return this._bytesWritten === 0;
}
isBufferFull() {
return this._bytesWritten === this.bufferSize;
}
/**
* #param {Float32Array[][]} inputs
* #returns {boolean}
*/
process(inputs) {
// Grabbing the 1st channel similar to ScriptProcessorNode
this.append(inputs[0][0]);
return true;
}
/**
*
* #param {Float32Array} channelData
*/
append(channelData) {
if (this.isBufferFull()) {
this.flush();
}
if (!channelData) return;
for (let i = 0; i < channelData.length; i++) {
this._buffer[this._bytesWritten++] = channelData[i];
}
}
flush() {
// trim the buffer if ended prematurely
const buffer = this._bytesWritten < this.bufferSize ? this._buffer.slice(0, this._bytesWritten) : this._buffer;
const result = this.downsampleBuffer(buffer, 44100, 16000);
this.port.postMessage(result);
this.initBuffer();
}
downsampleBuffer(buffer, sampleRate, outSampleRate) {
if (outSampleRate == sampleRate) {
return buffer;
}
if (outSampleRate > sampleRate) {
throw new Error("downsampling rate show be smaller than original sample rate");
}
var sampleRateRatio = sampleRate / outSampleRate;
var newLength = Math.round(buffer.length / sampleRateRatio);
var result = new Int16Array(newLength);
var offsetResult = 0;
var offsetBuffer = 0;
while (offsetResult < result.length) {
var nextOffsetBuffer = Math.round((offsetResult + 1) * sampleRateRatio);
var accum = 0,
count = 0;
for (var i = offsetBuffer; i < nextOffsetBuffer && i < buffer.length; i++) {
accum += buffer[i];
count++;
}
result[offsetResult] = Math.min(1, accum / count) * 0x7fff;
offsetResult++;
offsetBuffer = nextOffsetBuffer;
}
return result.buffer;
}
}
registerProcessor("recorder.worklet", RecorderProcessor);
Install Socket.io-client on front end
npm i socket.io-client
React component code
/* eslint-disable react-hooks/exhaustive-deps */
import { default as React, useEffect, useState, useRef } from "react";
import { Button } from "react-bootstrap";
import Container from "react-bootstrap/Container";
import * as io from "socket.io-client";
const sampleRate = 16000;
const getMediaStream = () =>
navigator.mediaDevices.getUserMedia({
audio: {
deviceId: "default",
sampleRate: sampleRate,
sampleSize: 16,
channelCount: 1,
},
video: false,
});
interface WordRecognized {
final: boolean;
text: string;
}
const AudioToText: React.FC = () => {
const [connection, setConnection] = useState<io.Socket>();
const [currentRecognition, setCurrentRecognition] = useState<string>();
const [recognitionHistory, setRecognitionHistory] = useState<string[]>([]);
const [isRecording, setIsRecording] = useState<boolean>(false);
const [recorder, setRecorder] = useState<any>();
const processorRef = useRef<any>();
const audioContextRef = useRef<any>();
const audioInputRef = useRef<any>();
const speechRecognized = (data: WordRecognized) => {
if (data.final) {
setCurrentRecognition("...");
setRecognitionHistory((old) => [data.text, ...old]);
} else setCurrentRecognition(data.text + "...");
};
const connect = () => {
connection?.disconnect();
const socket = io.connect("http://localhost:8081");
socket.on("connect", () => {
console.log("connected", socket.id);
setConnection(socket);
});
socket.emit("send_message", "hello world");
socket.emit("startGoogleCloudStream");
socket.on("receive_message", (data) => {
console.log("received message", data);
});
socket.on("receive_audio_text", (data) => {
speechRecognized(data);
console.log("received audio text", data);
});
socket.on("disconnect", () => {
console.log("disconnected", socket.id);
});
};
const disconnect = () => {
if (!connection) return;
connection?.emit("endGoogleCloudStream");
connection?.disconnect();
processorRef.current?.disconnect();
audioInputRef.current?.disconnect();
audioContextRef.current?.close();
setConnection(undefined);
setRecorder(undefined);
setIsRecording(false);
};
useEffect(() => {
(async () => {
if (connection) {
if (isRecording) {
return;
}
const stream = await getMediaStream();
audioContextRef.current = new window.AudioContext();
await audioContextRef.current.audioWorklet.addModule(
"/src/worklets/recorderWorkletProcessor.js"
);
audioContextRef.current.resume();
audioInputRef.current =
audioContextRef.current.createMediaStreamSource(stream);
processorRef.current = new AudioWorkletNode(
audioContextRef.current,
"recorder.worklet"
);
processorRef.current.connect(audioContextRef.current.destination);
audioContextRef.current.resume();
audioInputRef.current.connect(processorRef.current);
processorRef.current.port.onmessage = (event: any) => {
const audioData = event.data;
connection.emit("send_audio_data", { audio: audioData });
};
setIsRecording(true);
} else {
console.error("No connection");
}
})();
return () => {
if (isRecording) {
processorRef.current?.disconnect();
audioInputRef.current?.disconnect();
if (audioContextRef.current?.state !== "closed") {
audioContextRef.current?.close();
}
}
};
}, [connection, isRecording, recorder]);
return (
<React.Fragment>
<Container className="py-5 text-center">
<Container fluid className="py-5 bg-primary text-light text-center ">
<Container>
<Button
className={isRecording ? "btn-danger" : "btn-outline-light"}
onClick={connect}
disabled={isRecording}
>
Start
</Button>
<Button
className="btn-outline-light"
onClick={disconnect}
disabled={!isRecording}
>
Stop
</Button>
</Container>
</Container>
<Container className="py-5 text-center">
{recognitionHistory.map((tx, idx) => (
<p key={idx}>{tx}</p>
))}
<p>{currentRecognition}</p>
</Container>
</Container>
</React.Fragment>
);
};
export default AudioToText;
server.js
const express = require("express");
const speech = require("#google-cloud/speech");
//use logger
const logger = require("morgan");
//use body parser
const bodyParser = require("body-parser");
//use corrs
const cors = require("cors");
const http = require("http");
const { Server } = require("socket.io");
const app = express();
app.use(cors());
app.use(logger("dev"));
app.use(bodyParser.json());
const server = http.createServer(app);
const io = new Server(server, {
cors: {
origin: "http://localhost:3000",
methods: ["GET", "POST"],
},
});
//TODO: run in terminal first to setup credentials export GOOGLE_APPLICATION_CREDENTIALS="./speech-to-text-key.json"
const speechClient = new speech.SpeechClient();
io.on("connection", (socket) => {
let recognizeStream = null;
console.log("** a user connected - " + socket.id + " **\n");
socket.on("disconnect", () => {
console.log("** user disconnected ** \n");
});
socket.on("send_message", (message) => {
console.log("message: " + message);
setTimeout(() => {
io.emit("receive_message", "got this message" + message);
}, 1000);
});
socket.on("startGoogleCloudStream", function (data) {
startRecognitionStream(this, data);
});
socket.on("endGoogleCloudStream", function () {
console.log("** ending google cloud stream **\n");
stopRecognitionStream();
});
socket.on("send_audio_data", async (audioData) => {
io.emit("receive_message", "Got audio data");
if (recognizeStream !== null) {
try {
recognizeStream.write(audioData.audio);
} catch (err) {
console.log("Error calling google api " + err);
}
} else {
console.log("RecognizeStream is null");
}
});
function startRecognitionStream(client) {
console.log("* StartRecognitionStream\n");
try {
recognizeStream = speechClient
.streamingRecognize(request)
.on("error", console.error)
.on("data", (data) => {
const result = data.results[0];
const isFinal = result.isFinal;
const transcription = data.results
.map((result) => result.alternatives[0].transcript)
.join("\n");
console.log(`Transcription: `, transcription);
client.emit("receive_audio_text", {
text: transcription,
final: isFinal,
});
});
} catch (err) {
console.error("Error streaming google api " + err);
}
}
function stopRecognitionStream() {
if (recognizeStream) {
console.log("* StopRecognitionStream \n");
recognizeStream.end();
}
recognizeStream = null;
}
});
server.listen(8081, () => {
console.log("WebSocket server listening on port 8081.");
});
// =========================== GOOGLE CLOUD SETTINGS ================================ //
// The encoding of the audio file, e.g. 'LINEAR16'
// The sample rate of the audio file in hertz, e.g. 16000
// The BCP-47 language code to use, e.g. 'en-US'
const encoding = "LINEAR16";
const sampleRateHertz = 16000;
const languageCode = "en-US"; //en-US
const alternativeLanguageCodes = ["en-US", "ko-KR"];
const request = {
config: {
encoding: encoding,
sampleRateHertz: sampleRateHertz,
languageCode: languageCode,
//alternativeLanguageCodes: alternativeLanguageCodes,
enableWordTimeOffsets: true,
enableAutomaticPunctuation: true,
enableWordConfidence: true,
enableSpeakerDiarization: true,
diarizationSpeakerCount: 2,
model: "video",
//model: "command_and_search",
useEnhanced: true,
speechContexts: [
{
phrases: ["hello", "안녕하세요"],
},
],
},
interimResults: true,
};

Related

Getting error from react-native-sms-retriever

I am using react-native-sms-retriever packages to listen my sms from mobile. But it gives me an error after several minute passed.Even though I have added remove listener in useEffect.
let isApiSubscribed = true;
useEffect(async() => {
firstInput.current.focus();
if(isApiSubscribed){
_onSmsListenerPressed();
}
return () => {
isApiSubscribed = false;
SmsRetriever.removeSmsListener();
}
}, []);
const _onSmsListenerPressed = async () => {
try {
const registered = await SmsRetriever.startSmsRetriever();
if (registered) {
SmsRetriever.addSmsListener((event) => {
const message = event.message;
const otpCode = /(\d{4})/g.exec(message)[1];
const output = [];
const sNumber = otpCode.toString();
for (var i = 0, len = sNumber.length; i < len; i += 1) {
output.push(+sNumber.charAt(i));
}
const [currentPin1, currentPin2, currentPin3, currentPin4] = output;
setOtp({
...otp,
pin1: currentPin1,
pin2: currentPin2,
pin3: currentPin3,
pin4: currentPin4,
});
SmsRetriever.removeSmsListener();
isApiSubscribed = false;
Keyboard.dismiss();
});
}
} catch (error) {
console.log(JSON.stringify(error));
}
};
enter image description here

Getting pc.iceConnectionState Checking, failed in pc.oniceconnectionstatechange event in webRTC

I'm using webrtc for video calling in react native app.
If I call to someone else and receivers receives call then I get stream of receiver but there is a problem at receiver side.
Receiver gets remotestream but it shows blank view.
import AsyncStorage from '#react-native-async-storage/async-storage';
import {
RTCIceCandidate,
RTCPeerConnection,
RTCSessionDescription,
} from 'react-native-webrtc';
import io from '../scripts/socket.io';
const PC_CONFIG = {
iceServers: [
{ url: 'stun:motac85002'},
],
};
export const pc = new RTCPeerConnection(PC_CONFIG);
// Signaling methods
export const onData = data => {
handleSignalingData(data.data);
};
const ENDPOINT = 'http://52.52.75.250:3000/';
const socket = io(ENDPOINT);
// const PeerConnection = () => {
const sendData = async data => {
const roomId = await AsyncStorage.getItem('roomId');
const userId = parseInt(await AsyncStorage.getItem('user_id'));
socket.emit('data', roomId, userId, data);
};
export const createPeerConnection = async(stream, setUsers) => {
try {
pc.onicecandidate = onIceCandidate;
const userId = parseInt(await AsyncStorage.getItem('user_id'));
pc.onaddstream = e => {
setUsers(e.stream);
};
pc.addStream(stream)
pc.oniceconnectionstatechange = function () {
// console.log('ICE state: ', pc);
console.log('iceConnectionState', pc.iceConnectionState);
if (pc.iceConnectionState === "failed" ||
pc.iceConnectionState === "disconnected" ||
pc.iceConnectionState === "closed") {
console.log('iceConnectionState restart', userId);
// console.log('ICE state: ', pc);
// Handle the failure
pc.restartIce();
}
};
console.log('PeerConnection created', userId);
// sendOffer();
} catch (error) {
console.error('PeerConnection failed: ', error);
}
};
export const callSomeone = () => {
pc.createOffer({}).then(setAndSendLocalDescription, error => {
console.error('Send offer failed: ', error);
});
};
const setAndSendLocalDescription = sessionDescription => {
pc.setLocalDescription(sessionDescription);
sendData(sessionDescription);
};
const onIceCandidate = event => {
if (event.candidate) {
sendData({
type: 'candidate',
candidate: event.candidate,
});
}
};
export const disconnectPeer = () =>{
pc.close();
}
const sendAnswer = () => {
pc.createAnswer().then(setAndSendLocalDescription, error => {
console.error('Send answer failed: ', error);
});
};
export const handleSignalingData = data => {
switch (data.type) {
case 'offer':
pc.setRemoteDescription(new RTCSessionDescription(data));
sendAnswer();
break;
case 'answer':
pc.setRemoteDescription(new RTCSessionDescription(data));
break;
case 'candidate':
pc.addIceCandidate(new RTCIceCandidate(data.candidate));
break;
}
};
// }
// export default PeerConnection
Can anyone please tell me why at receiver side video stream is not displaying?
Also there is a remoteStream issue in motorola device. Why is this happening?
This statement:
const PC_CONFIG = {
iceServers: [
{ url: 'stun:motac85002'},
],
};
Has two potential issues:
First, the parameter for the iceServers object should be urls, not url. (Although it wouldn't surprise me if the browsers accepted either).
Second, as I mentioned in comments to your question, the STUN address itself looks to be a local address instead of an Internet address. That might explain why you aren't seeing any srflx or UDP candidates in the SDP. And as such, that might explain connectivity issues.
So instead of the above, could you try this:
const PC_CONFIG= {iceServers: [{urls: "stun:stun.stunprotocol.org"}]};

react-native-background-actions issue for ios

Iam using react-native-background-actions to run a background task.
but when I exit app, the task just run in 30s and stop
How I can resolve to run forever, atleast untill app terminated
In my code, after 1s, increate 'COUNT' and save to Storage
// BackgroundTaskService.js
import AsyncStorage from '#react-native-community/async-storage';
import BackgroundService from 'react-native-background-actions';
import BackgroundJob from 'react-native-background-actions';
let sleep = ms => new Promise(resolve => setTimeout(resolve, ms));
const increaseCountTask = async taskDataArguments => {
const {delay} = taskDataArguments;
await new Promise(async resolve => {
for (let i = 0; BackgroundJob.isRunning(); i++) {
var value = await AsyncStorage.getItem('COUNT');
if(!value) {
await AsyncStorage.setItem('COUNT', "2");
}
var value = await AsyncStorage.getItem('COUNT');
await AsyncStorage.setItem('COUNT', (parseInt(value) + 1).toString());
value = await AsyncStorage.getItem('COUNT');
console.log('value', value);
await sleep(delay);
}
});
};
const options = {
taskName: 'Demo',
taskTitle: 'Demo Running',
taskDesc: 'Demo',
taskIcon: {
name: 'ic_launcher',
type: 'mipmap',
},
color: '#ff00ff',
parameters: {
delay: 1000,
},
actions: '["Exit"]',
};
const start = () => {
BackgroundService.start(increaseCountTask, options);
};
const stop = () => {
BackgroundService.stop();
};
export default {
start,
stop,
};
// App.js
BackgroundTaskService.start();
Are you testing on a physical device and have enabled Background App Refresh in the Xcode settings, as well as on your physical device? According to documentation in another package, React-native-background-task indicates that that package must be tested on a physical device, or else the iOS simulator version will have to emulate the tasks.
I attempted to rework your code. I am currently getting this to work with BackgroundService instead of BackgroundJob for running in the background of the app. I am also working on testing when the app is closed. Should be able to get back to you soon.
import BackgroundService from 'react-native-background-actions';
import AsyncStorage from "#react-native-async-storage/async-storage";
import BackgroundJob from 'react-native-background-actions';
import {Alert} from "react-native";
const sleep = (time) => new Promise((resolve) => setTimeout(() => resolve(), time));
// You can do anything in your task such as network requests, timers and so on,
// as long as it doesn't touch UI. Once your task completes (i.e. the promise is resolved),
// React Native will go into "paused" mode (unless there are other tasks running,
// or there is a foreground app).
const veryIntensiveTask = async (taskDataArguments) => {
// Example of an infinite loop task
const { delay } = taskDataArguments;
await new Promise( async (resolve) => {
for (let i = 0; BackgroundService.isRunning(); i++) {
console.log(i);
console.log("Running background task");
Alert.alert("Running background task");
await sleep(delay);
}
// for (let i = 0; BackgroundJob.isRunning(); i++) {
//
// var value = await AsyncStorage.getItem('COUNT');
// if(!value) {
// await AsyncStorage.setItem('COUNT', "2");
// }
// var value = await AsyncStorage.getItem('COUNT');
// await AsyncStorage.setItem('COUNT', (parseInt(value) + 1).toString());
// value = await AsyncStorage.getItem('COUNT');
// console.log('value', value);
// await sleep(delay);
//
// }
});
};
const options = {
taskName: 'Example',
taskTitle: 'ExampleTask title',
taskDesc: 'ExampleTask description',
taskIcon: {
name: 'ic_launcher',
type: 'mipmap',
},
color: '#ff00ff',
linkingURI: 'yourSchemeHere://chat/jane', // See Deep Linking for more info
parameters: {
delay: 10000,
},
};
const start = async () => {
return BackgroundService.start(veryIntensiveTask, options);
};
const stop = async () => {
return BackgroundService.stop();
};
export default {
start,
stop,
};
// await BackgroundService.start(veryIntensiveTask, options);
// await BackgroundService.updateNotification({taskDesc: 'New ExampleTask description'}); // Only Android, iOS will ignore this call
// // iOS will also run everything here in the background until .stop() is called
// await BackgroundService.stop();

Testing a cloudflare worker with HTMLRewriter fails as its undefined

I have a test to test my cloudflare worker that looks like this:
const workerScript = fs.readFileSync(
path.resolve(__dirname, '../pkg-prd/worker.js'),
'utf8'
);
describe('worker unit test', function () {
// this.timeout(60000);
let worker;
beforeEach(() => {
worker = new Cloudworker(workerScript, {
bindings: {
HTMLRewriter
},
});
});
it('tests requests and responses', async () => {
const request = new Cloudworker.Request('https://www.example.com/pathname')
const response = await worker.dispatch(request);
console.log(response);
// const body = await response.json();
expect(response.status).to.eql(200);
// expect(body).to.eql({message: 'Hello mocha!'});
});
});
In my worker I do something like this:
const response = await fetch(BASE_URL, request);
const modifiedResponse = new Response(response.body, response);
// Remove the webflow badge
class ElementHandler {
element(element) {
element.append('<style type="text/css">body .w-webflow-badge {display: none!important}</style>', {html: true})
}
}
console.log(3);
return new HTMLRewriter()
.on('head', new ElementHandler()).transform(modifiedResponse);
Now when i run my test I get this error message:
● worker unit test › tests requests and responses
TypeError: Cannot read property 'transform' of undefined
at evalmachine.<anonymous>:1:1364
at FetchEvent.respondWith (node_modules/#dollarshaveclub/cloudworker/lib/cloudworker.js:39:17)
What seems to be wrong?
HTMLRewriter i created looks like this:
function HTMLRewriter() {
const elementHandler = {};
const on = (selector, handler) => {
if (handler && handler.element) {
if (!elementHandler[selector]) {
elementHandler[selector] = [];
}
elementHandler[selector].push(handler.element.bind(handler));
}
};
const transform = async response => {
const tempResponse = response.clone();
const doc = HTMLParser.parse(await tempResponse.text());
Object.keys(elementHandler).forEach(selector => {
const el = doc.querySelector(selector);
if (el) {
elementHandler[selector].map(callback => {
callback(new _Element(el));
});
}
});
return new Response(doc.toString(), response);
};
return {
on,
transform
};
}
Since HTMLRewriter() is called with new, the function needs to be a constructor. In JavaScript, a constructor function should set properties on this and should not return a value. But, your function is written to return a value.
So, try changing this:
return {
on,
transform
};
To this:
this.on = on;
this.transform = transform;

How to share screen using webRTC

I need to get the screen sharing working. It is working if it is video sharing. This is the code:
public n = navigator as any;
ngAfterViewInit(): void {
const video = this.myVideo.nativeElement;
let peerx: any;
this.n.getUserMedia =
this.n.getUserMedia ||
this.n.webkitGetUserMedia ||
this.n.mozGetUserMedia ||
this.n.msGetUserMedia;
}
this.n.getUserMedia( // this.n.mediaDevices.getDisplayMedia
{
video: {
madatory: {
chromeMediaSource: 'screen',
maxWidth: 1920,
maxHeight: 1080,
minAspectRatio: 1.77
},
}
},
(stream: any) => {
peerx = new SimplePeer({
initiator: location.hash === '#init',
trickle: false,
stream,
});
peerx.on('signal', (data) => {
const strData = JSON.stringify(data);
console.log(strData);
this.targetpeer = data;
});
peerx.on('stream', (streamX: any) => {
if ('srcObject' in video) {
video.srcObject = streamX;
} else {
video.src = window.URL.createObjectURL(streamX);
}
const playPromise = video.play();
if (playPromise !== undefined) {
playPromise
.then((_) => {
video.play();
})
.catch((error) => {
console.log(`Playing was prevented: ${error}`);
});
}
});
If I change the line 'this.n.getUserMedia(....)' to this 'this.n.mediaDevices.getDisplayMedia(...)', I do not get the 'signal' (the key that I need to paste on the client to connect).
You are attempting to mix a constraints style that was needed years ago when a Chrome extension was required with getDisplayMedia. That isn't going to work.
const stream = await navigator.mediaDevices.getDisplayMedia({video: true})
See here for a canonical sample.