How to build group video call with React Native & WebRTC

Saigon Technology
11 min readMay 8, 2024

--

1. Overview

1.1 React Native

React Native is a framework that brings React’s declarative UI framework into mobile platforms. It allows developers to create cross-platform apps that render natively on iOS and Android platforms, sharing a single codebase. This enables faster development and easier maintenance compared to traditional native app development.

1.2 WebRTC

WebRTC stands for Real time communication for the web. It supports video, voice, and generic data to be sent between peers and available on all modern browsers as well as on native clients for all major platforms.

WebRTC go through common application flow:

  • Access media devices (microphone, camera)
  • Create peer-to-peer connections
  • Discover peer connections
  • Start streaming

2. How does WebRTC work

To have more understanding about how WebRTC works, we will go through some technical terms.

2.1 Peer-to peer (P2P) connection

A Peer-to-peer connection is an infrastructure that allows two or more end devices to share resources and communicate with each other directly without sending data to a separate server.

In the P2P connection, each end device is considered a “peer”. Each peer acts as client and server itself, it shares and receives resources with other peers.

2.2 Signaling server

With the help of a Signaling server, two peers can connect by providing SDP and ICE server configuration, which is generated by two servers: STUN or a TURN server. Their mission is to create ICE Candidates for each peer and exchange it with other peers. The process to exchange the information is signaling.

A Signaling server is basically a real-time server such as Firebase Firestore, WebSocket, OneSignal.

2.3 SDP

SDP stands for Session Description Protocol, which contains details of real-time communication sessions between two peers. Simple example is if a device’s camera is turned off, SDP will include this data to send to another device

SDP is part of creating a P2P connection process. The SDP negotiation involves two steps: offer and answer.

2.4 ICE candidates

To put it simply, ICE Candidates act as addresses of each peer which is used to connect with other peers through the internet. One client can have multiple ICE Candidates containing information about transport protocol, port number and IP addresses.

2.5 Visualize making one P2P connection

How exactly can webRTC establish peer-to-peer connection with these technical terms above? A simple graph will visualize steps on how webRTC can establish the connection.

In this example, Peer-1 is the device which wants to communicate with Peer-2

.

First, Peer-1 creates an Offer containing SDP and ICE Candidates generated by WebRTC API and sends it to the Signaling Server.

Meanwhile, Peer-2 listens to the Signaling server to receive the incoming Offer. After receiving the Peer-1’s Offer, Peer-2 creates an Answer which also contains the same structure data and sends it back to the Signaling Server.

Now, both devices have each other’s configuration. The peer connection is established.

2.6 Visualize making multiple P2P connections

How about another scenario where a device named Peer-3 wants to join the communication with Peer-1 and Peer-2 ?

Let’s assume Peer-1 and Peer-2 already establish a connection following steps in Section 2.5, An simple example demonstrates how Peer-3 can establish connection with Peer-1 and Peer-2.

When Peer-3 joins the communication, it will need to create a number of Offers depending on current total participants. In this case, there are already two participants in the group so that Peer-3 creates two Offers and sends these to the Signaling Server.

Although Peer-1 and Peer-2 already had P2P connection, they also needed to listen to new Offers coming from the Signaling Server. When both devices receive an Offer, each peer creates an Answer and sends it back to the Signaling Server.

As a result, Peer-3 receives both Answers from other peers so the P2P connections between three peers are established successfully.

2.7 Simplify steps create P2P connections in terms of implementation

What does the Offeror do?

  • Init peer connection
  • Add local media stream to created peer connection
  • Create offer and store offer as local description
  • Send offer SDP and ICE Candidates to Signaling server
  • Add event listen to coming answer
  • Add event listen to track coming answer’s media stream
  • Add event listen to coming answerer’s ICE Candidates
  • When receiving answer, store answer as remote description and add answerer’s ICE Candidates into Peer connection

What does the Answerer do?

  • Init peer connection
  • Add local media stream to created peer connection
  • Receive offer from Signaling server and store as remote description
  • Create answer and store answer as local description
  • Send answer SDP & ICE Candidates back to Signaling server
  • Add event listen to track coming offeror’s media stream
  • Add event listen to coming offer’s ICE Candidates
  • Add offeror’s ICE Candidates into Peer connection

Section 3 will demonstrate detail implementation for these steps above

3. Building demo: Group video call

3.1 Overview features

The source code of this demo is here. Basic feature for group video call including:

  • Create new room or join room with existed room ID
  • Control microphone and camera
  • One of the users in the call hangs up. App allows the call still processing unless everyone leaves the room
  • No limit participants

Next, we will go through step by step how we implement all these features.

3.2 Initialize React Native app

First, init React Native project with command line:

npx react-native@latest init YourProjectName

Install the react-native-webrtc module:

npm i react-native-webrtc

If you take a look at source code and wonder why there are plenty of packages beside react-native-webrtc. It’s because we used the rn-base-project-typescript to generate the source template for quick initialization. It only takes 1 command to install all needed packages to start the RN project and generate the best practice structure ready to implement immediately without spending too much time on initialization. More information is here.

Let’s go back to our demo, you also need to do extra steps depending on your target platform.

For iOS, add permissions for camera and microphone in Info.plist:

NSCameraUsageDescription

Camera Permission description

NSMicrophoneUsageDescription

Microphone Permission description

For Android, add permissions in AndroidManifest.xml:

android:name=”android.permission.INTERNET” />

android:name=”android.permission.CAMERA” />

android:name=”android.permission.RECORD_AUDIO” />

android:name=”android.permission.ACCESS_NETWORK_STATE” />

android:name=”android.permission.CHANGE_NETWORK_STATE” />

android:name=”android.permission.MODIFY_AUDIO_SETTINGS” />

To run the project, simply run command based on your target platform:

npx react-native run-android

npx react-native run-ios

3.3 Setup Signaling server structure with Firebase Firestore

In this demo, we used Firebase Firestore as our Signaling Server.

For installation and configuration with Firestore, I recommend checking the original document from React Native Firebase because it’s already clear and informative to configure. The document is here.

Next, we will jump to the Firestore structure part. First, we create a collection named Rooms to manage all room calls.

Room keys use the name of who created the room. Please note that, we used the name as room id for joining the room faster without copying and pasting the key in several devices, but in real situations, we recommend to work with UUID instead.

Each Room has 2 collections: Participants and Connections.

Participants is a collection listing all active participants in the room.

Each participant has information about their name and microphone, camera status. If the participant turns off their microphone which results in the microphone value being false. This logic applies the same for the camera value.

Connections collection containing all peer-to-peer connections. For example, if there are 3 participants in the room, there should be 3 peer-to-peer connections in the list.

The number of connections is calculated by this formula, n is total participants:

n (n — 1) / 2

Each connection has 2 collections answerCandidates and offerCandidates and contains 4 key-value pairs:

  • answer: SDP of the responder
  • offer: SDP of the requester
  • requester: name of the requester
  • responder: name of the responder

answerCandidates holds a list of ICE Candidates of an answerer and will be tracked and added by offeror to their PeerConnection. It’s the same with the offerCandidates which records data for an offeror.

So now we’re all set for the implementation part.

3.4 Construct states and refs

Location: src/screens/HomeComponent/HomeScreen.tsx

const [roomId, setRoomId] = useState(‘’)

const [localStream, setLocalStream] = useState | undefined>()

const [userName, setUserName] = useState(‘’)

const [screen, setScreen] = useState(Screen.CreateRoom)

const [remoteMedias, setRemoteMedias, remoteMediasRef] = useStateRef

<{ [key: string]: MediaControl }>({})

const [remoteStreams, setRemoteStreams, remoteStreamsRef] = useStateRef

<{ [key: string]: MediaStream }>({})

const [peerConnections, setPeerConnections] = useStateRef

<{ [key: string]: RTCPeerConnection}>({})

const [totalParticipants, setTotalParticipants] = useState(0)

const [localMediaControl, setLocalMediaControl] = useState({

mic: microphonePermissionGranted,

camera: cameraPermissionGranted,

})

Explanation:

  • roomId: stored room id that current user create room or input to join
  • localStream: media stream of user’s local device
  • screen: current display screen type, default value is CreateRoom

scr/screens/HomeComponent/types.ts

export enum Screen {

CreateRoom, // create or join room

InRoomCall, // participate in room call

}

  • remoteMedias: arrays have microphone and camera status of other user ‘s remote devices

export type MediaControl = {

mic: boolean

camera: boolean

}

  • remoteStreams: same with localStreams but for other user’s MediaStream
  • peerConnections: all P2P connections of current user with other users
  • totalParticipants: number of users in one room call
  • localMediaControl: same with remoteMedias but locally, it presented the status of the user’s local media device.

3.5 Request camera & microphone permission

Check and request permission with custom hook from hooks/useRequestPermissions.ts

const {

// Boolean value if permission is granted

cameraPermissionGranted,

microphonePermissionGranted,

// request permission methods

requestMicrophonePermission,

requestCameraPermission,

} = usePermission()

The layout when requesting permission:

After permission granted, we will call method openMediaDevices to show our MediaStream locally:

const openMediaDevices = useCallback(async (audio: boolean,video: boolean) => {

// get media devices stream from webRTC API

const mediaStream = await mediaDevices.getUserMedia({

audio,

video,

})

// init peer connection to show user’s track locally

const peerConnection = new RTCPeerConnection(peerConstraints)

// add track from created mediaStream to peer connection

mediaStream.getTracks().forEach(track =>

peerConnection.addTrack(track, mediaStream)

)

// set mediaStream in localStream

setLocalStream(mediaStream)

}, [])

As a result, we can see our camera display in the screen:

3.6 Create room

Default screen is CreateRoom screen, when the user has already entered the name, the button “Create room” will be enabled and the user can click on. This action triggers the function called createRoom:

const createRoom = useCallback(async () => {

// create room with current userName and set createdDate as current datetime

const roomRef = database.collection(FirestoreCollections.rooms).doc(userName)

await roomRef.set({createdDate: new Date()})

// create participants collection to room “userName”

roomRef.collection(FirestoreCollections.participants).doc(createdUserName)

.set({

// control mic and camera status of current user’s device

mic: localMediaControl?.mic,

camera: localMediaControl?.camera,

name: userName,

})

setRoomId(roomRef.id) // store new created roomId

setScreen(Screen.InRoomCall) // navigate to InRoomCall screen

// add listener to new peer connection in Firestore

await listenPeerConnections(roomRef, userName)

The function triggers navigating to InRoomCall screen immediately

For more understanding, users who join will be the one sending an offer to existing participants. So that’s why inside createRoom, we only have a method to listen to new coming connections. It’s redundant to create an offer and initiate the peer connection first with no one in the room.

In listenPeerConnections does:

  • Listen to new changes in Connections and loop to find if there are Offers sent to the current user. Current users can have multiple connections so the data will be created one by one when it loops. Each connection will be distinguished with the requester name to store in defined states and refs above.

const listenPeerConnections = useCallback(async (roomRef, userName) => {

roomRef.collection(FirestoreCollections.connections).onSnapshot(

connectionSnapshot => {

// looping changes from collection Connections

connectionSnapshot.docChanges().forEach(async change => {

if (change.type === ‘added’) {

const data = change.doc.data()

// find connections that request answer from current user

if (data.responder === userName) {

  • Get control and MediaStream data from requester to store in remoteMedias and remoteStreams

if (data.responder === createdUserName) {

// get requester’s location from collection Participants

const requestParticipantRef = roomRef.collection(

FirestoreCollections.participants).doc(data.requester)

// get requester’s data from requester’s location

const requestParticipantData = (await requestParticipantRef.get()).data()

// store requester’s control status in remoteMedias

setRemoteMedias(prev => ({

…prev,

[data.requester]: {

mic: requestParticipantData?.mic,

camera: requestParticipantData?.camera

},

}))

// init requester’s remoteStream to add track data from Peer Connection later

setRemoteStreams(prev => ({

…prev,

[data.requester]: new MediaStream([]),

}))

  • Init PeerConnection and control MediaStream for both requester and responder

// init PeerConnection

const peerConnection = new RTCPeerConnection(peerConstraints)

// add current user’s stream to created PC (Peer Connection)

localStream?.getTracks().forEach(track => {

peerConnection.addTrack(track, localStream)

})

// get requester’s MediaStream from PC

peerConnection.addEventListener(‘track’, event => {

event.streams[0].getTracks().forEach(track => {

const remoteStream = remoteStreams[data.requester] ?? new MediaStream([])

remoteStream.addTrack(track)

// and store in remoteStreams as it’s initialized before

setRemoteStreams(prev => ({

…prev,

[data.requester]: remoteStream,

}))

})

})

  • Processing SDP data inside Offer and Answer

// get location of connection between requester and current user

const connectionsCollection = roomRef.collection(

FirestoreCollections.connections)

const connectionRef = connectionsCollection

.doc(`${data.requester}-${userName}`)

// get data from requester-user’s connection

const connectionData = (await connectionRef.get()).data()

// receive offer SDP and set as remoteDescription

const offer = connectionData?.offer

await peerConnection.setRemoteDescription(offer)

// create answer SDP and set as localDescription

const answerDescription = await peerConnection.createAnswer()

await peerConnection.setLocalDescription(answerDescription)

// send answer to Firestore

const answer = {

type: answerDescription.type,

sdp: answerDescription.sdp,

}

await connectionRef.update({ answer })

  • Collect and add ICE Candidates for both users

// create answerCandidates collection

const answerCandidatesCollection = connectionRef.collection(

FirestoreCollections.answerCandidates)

// add current user’s ICE Candidates to answerCandidates collection

peerConnection.addEventListener(‘icecandidate’, event => {

if (event.candidate) {

answerCandidatesCollection.add(event.candidate.toJSON())

}

})

// collect Offer’s ICE candidates from offerCandidates collection and add in PC

connectionRef.collection(FirestoreCollections.offerCandidates)

.onSnapshot(iceCandidateSnapshot => {

iceCandidateSnapshot.docChanges().forEach(async iceCandidateChange => {

if (iceCandidateChange.type === ‘added’) {

await peerConnection.addIceCandidate(

new RTCIceCandidate(iceCandidateChange.doc.data()))

}

})

})

  • Store Peer Connection

setPeerConnections(prev => ({

…prev,

[data.requester]: peerConnection,

}))

So we’re done with listening to new connections. Let’s move on to the later part of how the Offeror creates an offer to Answerers.

3.7 Join room

The scenario will be different because the current user will now be the one who joins the room, they will enter the existing room id and click on the “Join” button.

….

Read more at: https://saigontechnology.com/blog/how-to-build-group-video-call-with-react-native-webrtc

--

--