Secure voice communication: The lay of the land in open source

Gus Andrews
15 min readApr 15, 2015

--

Originally published November 6, 2014 12:42 pm. at openitp.org.
Download a PDF of this report.

Activity: Market research on open-source alternatives to VoIP clients.
Takeaway:
No one tool provides end-to-end encryption, with open-source code, across mobile and desktop platforms. Many secure voice apps have severe usability issues, and some of these issues make it likely that users will make partially-encrypted or unencrypted calls without realizing what they’re doing.

Introduction

At OpenITP’s Secure User Practices (SUP) project, 2014 was the year of Internet voice calls. Every day people use Skype, Google Hangouts, Apple’s Facetime, and other multimedia apps to organize sensitive activities in difficult places. None of those options provides open-source, end-to-end security. That is to say, each one requires you to trust your service provider not to eavesdrop on your calls. We tried to find out what role free software plays in this landscape.

To that end, we collaborated with teams working on open-source voice clients. By getting our hands dirty working directly with the projects, we hoped to discover opportunities for OpenITP’s community to get involved. To really understand these tools, we performed user tests on a couple of projects, Linphone and CSipSimple, in July. A report from those tests is available on the SUP blog, and is the main basis for comments on usability in this report.

We surveyed the field and noted which tools focus on business users, which try to serve the role of traditional telephone networks, and which primarily aim at serving some kind of public interest. In the process we gained a clearer sense of how large and how active these teams are and were able to assess their efforts to support users.

For historical reasons, a good deal of secure voice client development is centered in France. Jitsi, Linphone, and CSipSimple all have key developers there. American developers care about voice too. The Guardian Project, which also works on CSipSimple, and Ostel.co are centered in New York City while Open Whisper Systems’s developers are predominantly on the West Coast.

After looking into each of thse clients, we found one strong end-to-end encryption option, Redphone/Signal, from Open Whisper Systems, and one promising replacement for Google Hangouts and Skype: Jitsi Meet. In user testing, we found that other options, like Linphone, Jitsi desktop, and CSipSimple, unfortunately make it too easy for users to fall into using an end-to-server SIP connection without realizing their call is essentially only half encrypted. Many secure voice apps have severe usability issues, in part because they prioritize SIP without making configuration easier for users.

Additionally, we know of only one trusted, ready-to-go service provider which both provides end-to-end encryption and makes its server-side code open and usable for others — Ostel.co — and its development appears to have slowed. Open Whisper Systems has yet to release their server-side code for Redphone to the public, making it difficult to set up a server which is compatible with their client. Other open-source SIP server setups are available, but mere availability is not the same thing as a deployed, redundant ecology of trusted service providers which users can switch to or from as servers are compromised, attacked, or shut down.

The result is that it is far too easy for users to stumble into a frustratingly unworkable tool, an end-to-server encrypted connection which is effectively only half encrypted, and/or a full-encrypted solution with the vulnerability of being routed over a single, closed-source point of failure. We are hopeful about Redphone/Signal and Jitsi Meet, however, and also the possibility of future development related to Ostel.co.

We hope this report on our findings will give insight into where effort might be effective, and what are the remaining hurdles that need to be overcome.

See the end of the report for a glossary of technical terms.

Tool by Tool

Open Whisper Systems

Open Whisper Systems’s Redphone/Signal app has been hailed as one of the most usable in the field. Their team has made active efforts to ensure their tools are well-designed and usable by everyday people who don’t have technical assistance ready to hand. Meanwhile, their text message offering, Textsecure, interfaces seamlessly with a user’s existing text-message system and contacts, and enables encrypted texting (with other users who have the app) with a minimum of hassle. It is, in fact, one of the few apps in our field that our staff find usable on a daily basis.

Open Whisper Systems is in the process of turning their Redphone secure voice app (which uses ZRTP) and their Textsecure SMS app into Signal, a combined solution for mobile devices. Redphone has previously been available only for Android; Signal is now available for iOS. The team hopes to be able to make integrated Signal available for Android sometime in 2015.

Redphone, Textsecure, and Signal connections go through Whisper Systems’ own servers; users cannot use these tools to connect to other SIP servers. This is a trade-off: while Redphone provides end-to-end encryption, its servers are essentially a single point of failure vulnerable to attack. Other developers in the space have expressed a desire to integrate Redphone into the secure voice call ecology, but Whisper Systems has not, as of yet, released the server-side source code for Redphone. Specifications for the Open Whisper Systems servers are available, meaning developers could make their own servers with similar specs, as well as custom RedPhone clients; however, this would take more guessing, prove more challenging than if the code was available, and the result might not be as reliable. (The server-side code for Textsecure is available, meanwhile, and there are other entities running TextSecure servers.)

Another potential drawback of Redphone/Signal is that the services rely on Google Cloud Messaging and Apple Push Notification for signaling. One developer in our field suggested that the Google dependency, in particular, might make these services unavailable in China and Iran. Whisper Systems counters that Google Cloud Messaging still works over VPNs, so this should not be a problem. However, this would necessitate that users need to additionally know how to set up a VPN and route their calls over it, an extra technical step.

Whisper Systems handles user help via a searchable online knowledge base to which users can pose questions that have not yet been answered. Staff are active in addressing user questions. Their projects are participating in Transifex; Textsecure has been translated into over 50 languages.

About 10 developers are contributing regularly to these projects; see the following for more detail:

https://www.openhub.net/p/redphone

https://github.com/WhisperSystems/RedPhone/pulse/weekly

https://github.com/WhisperSystems/Signal-iOS/pulse/weekly

https://github.com/WhisperSystems/TextSecure/pulse/weekly

Jitsi and Jitsi Meet

Jitsi provides a desktop application for Windows, Mac, Linux, and an experimental alpha for Android. The desktop app and the alpha provide secure video, voice, and chat via SRTP and ZRTP. Users may connect to any SIP service they choose, as well as a number of popular chat protocols. Both end-to-end and end-to-server encryption are possible with this tool.

The Jitsi team has also developed the browser-based Jitsi Meet application, which supports multi-party audio and video calls. Both Meet and the other applications run on the Jitsi Videobridge router.

The Jitsi team is focusing their development work on Jitsi Videobridge and on Meet at the moment. They are aware of the usability problems with their desktop client, which is far more difficult to set up. Generally, they have encouraged OpenITP to focus any usability research on Jitsi Meet, their web product, as that is where the biggest wins for ease-of-use can be achieved. We have suggested a few changes to the desktop client’s interface, however, and they have been able to address some of them.

Jitsi Meet compares favorably to Google Hangouts and similar popular tools in its simplicity and ease of use. In fact, because Meet is a standalone service which sets up a call immediately when a user accesses the service, it can be faster to set up than Hangouts. (Our trainers and developers have groused that setting up a Google Hangout takes a good amount of hunting around various Google services before one remembers how to start a meeting.)

Meet currently only works on Chrome. Protocol solutions which would allow the service to work on Firefox have been filed as feature requests with Mozilla. The Jitsi team has been in touch with Mozilla, and they expect to be able to start testing Meet with Firefox in December 2014.

Meet uses DTLS/SRTP, not ZRTP, as browsers do not support ZRTP natively. Supporting ZRTP would involve using JavaScript, and given that also has its vulnerabilities, the Jitsi team doesn’t see that as an improvement over offering DTLS/SRTP.

Meet’s encryption is from sender-to-bridge and bridge to sender. Neither DTLS/SRTP nor ZRTP can provide an encryption solution for multi-party calls, which Meet supports, so end-to-server encryption is a consequence of the protocol. As a result, if communication happens via a compromised server, the server is a point of vulnerability. It will be possible for users to run their own Meet servers, however, eliminating the need to trust third-party infrastructure. Jitsi currently runs one, and the Jitsi team set up a server for OpenITP and the Open Technology Institute to test and experience. Other Meet servers are run by the IETF, UNICEF, and RENATER, the organization which provides Internet service to French universities and research institutes.

The primary audience for the Jitsi desktop client, meanwhile, is businesses. Many of Jitsi’s users are in an office, with available tech support to help them with configuration. Unsupported end users may find it difficult to set up the desktop client for secure voice calling. Trainers we work with have expressed a wish that the team would improve and simplify the client’s interface and configuration. The Jitsi team counters that in order to give users the option to connect to a wide range of SIP servers rather than a few dedicated ones, a desktop client needs to have a great deal of configurability, with options very close to the surface. Regardless, the result is a desktop client that is difficult to configure and use.

The Jitsi team consists of 10 developers who work via the BlueJimp company and about 3–5 volunteers at a given time. Jitsi provides user support through its mailing lists and IRC channel. Developers are responsive to users in need. Their documentation is static (rather than question-and-answer), but it does include videos and screenshots.

Users who want to report problems face difficult hurdles. Jitsi’s system for vetting bugs requires a good deal of commitment from users (as do all open-source projects which deal with bugs via an email list, and there are many of them which do); they want users to check in with developers and look at already-known issues, subscribe to their user list, send in their bug report, and get a developer to adopt the bug. Bugs which cannot be immediately addressed are backed up in their tracker.

Linphone

Linphone has SIP voice clients for Windows, Mac, Linux, Android, iOS, and Windows Phone (though the latter is released under a proprietary license). They recently added a browser-based client as well as video capabilities. Linphone offers a free SIP account service, but users may also select other SIP servers. Both end-to-end and end-to-server encryption are possible with this tool, and it supports SRTP and ZRTP. All versions are available in English and French; the iOS version is also available in Russian.

As per a message on Linphone’s user list in August, developer Guillaume Bienkowski counts the number of developers actively contributing at eight (which seems about right). Linphone’s site notes that the tool was launched in 2001, and claims it was the first open-source SIP client on Linux.

Linphone development is supported by a company called Belledonne Communications. Our communications with Belledonne have only managed to raise one person, an account manager. Mail to the Linphone developer list about our user research on their tool did not elicit any response from the developer team, where Guillaume Bienkowski, who appears to be the person closest to the developers, responds sometimes.

Linphone’s account manager wrote, “Our users are mainly geeks and developers and their questions are advanced.” They do not appear to think of their tool as primarily serving free-speech activists, journalists, or others at risk who may not have advanced technical skills.

The Linphone team provides user support via a mailing list, and also has a lightweight FAQ and user guide. Discussion on the user list tends to be highly technical, with discussions of compiling and debugging the code. Comments on the list in August suggested that users are feeling unsupported and looking for other SIP tools (and not finding Jitsi to be a workable replacement). Response time from Linphone devs to issues raised on the list can be long, sometimes taking weeks.

Usability specialists participating in our July tests noted that the Linphone interface is somewhat idiosyncratic, not following the standards of the mobile OSes on which we tested it. This caused users some consternation.

CSipSimple

CSipSimple is available for Android. SRTP and ZRTP are both available options. Users may select from a dizzyingly wide range of SIP services, including ostel.co, which is run by a Guardian Project-affiliated developer. Both end-to-end and end-to-server encryption are possible with this tool.

We know of two CSipSimple developer efforts: one led by the Guardian Project, a development team which focuses primarily on tools for journalists, activists, and human rights workers, and one led by Régis Montoya. Both currently appear dormant. While Régis has been responsive to our queries about usability and was enthusiastic about the results of our user testing, he notes that he doesn’t currently have time to work on CSipSimple, and he has been a one-person team on that project. The Guardian Project’s fork of the project was worked on by Lee Azzarello, who also maintained Ostel.co. However, Lee is no longer officially with the Guardian Project. For more on Ostel.co, see below.

Nathan Freitas at Guardian is unsure whether he will commit more resources to developing CSipSimple or Ostel.co. He says he has considered combining CSipSimple and Chatsecure, but he also sees Redphone, Textsecure and Signal addressing the secure-cell-phone-replacement space for most users.

The Guardian team offers user support through a range of channels which are relatively easy to find on their site (though the best way to get a response is not clear): a support email address, a (relatively technical) bug reporter, IRC, and email lists.

Ostel.co

Ostel.co is currently supported by Lee Azzarello at Open Hosting. This SIP service provider was formerly funded and supported by the Guardian Project, but that team has moved away from supporting back-end infrastructure.

Open Hosting is a commercial company, and management of Ostel.co is not currently a large part of Lee’s role there. But the company is seeing strong and increasing business interest in secure calling solutions and could develop Ostel.co further for business interests in the future, provided they could also devote resources to improving on an existing voice client to the point where it is more usable. Lee sees Linphone as the most user-friendly front-end for Ostel.co, though it also needs usability improvements. He forked a version of Linphone on GitHub while working at the Guardian Project last year, though again, he is not actively working on development at this time.

Ostel.co’s help page redirects to the Guardian Project’s bug tracker. The website is currently only in English. It has been translated into five other languages, but Lee has not had time to get those on the web yet.

Analysis of potential directions for the field

The priorities of Jitsi, Linphone, and Ostel.co suggest that in many cases secure voice development is being driven by the needs of business, not of end-user consumers. The result is SIP clients which confront users with configuration options — a mess of possible servers, protocols, and even video settings which are meant to be left to an IT professional. None of these tools currently offers users end-to-end encryption from the start with end-to-server SIP options moved the background. With Guardian, Régis, and Ostel.co not currently developing clients, the end-user-friendly options are going to be Jitsi Meet, which has technical barriers to providing end-to-end encryption, and Whisper Systems’ Redphone and Signal tools, which run over a single system, may be unavailable in Iran and China, and whose server-side source code is not currently open for others to view or modify, and thus hard for other developers to interface with.

Our user tests on Linphone and CSipSimple in July indicated that, left to their own devices in these two apps’ complicated setup process, users would not reliably find their way to an end-to-end encrypted calling setup — or sometimes even to encrypted calling options at all. Both apps accidentally nudged users towards making calls over the traditional phone network by presenting users with a dial pad and failing to make it clear how to call alphanumeric account names. At best, these calls will be encrypted end-to-server. At worst, users might end up believing that phone-network calls could be made secure.

Signal appears to be cutting this Gordian knot of calling options by giving users only one server to connect to. This should ensure users follow a path of end-to-end encryption in a way the other applications do not.

However, the single service provider has the risk of becoming a central point of failure. It would be beneficial to have other tools in the ecology which can allow for connection to more networks. Still, the existing options should do more to provide users a frictionless path to end-to-end security.

Jitsi Meet also offers a more-usable direction for secure voice and video. It does, however, require users to know where to find a trusted server, and there is some risk of a man-in-the-middle attack without ZRTP available.

Additionally, testing in our office has revealed some of Jitsi Meet’s current hangups. A few of our trainers and colleagues who have tried the tool found it too slow to be usable. Conversations with the Jitsi team have indicated possible culprits. One possibility may be distance from the server. Another could be the fact that at present, Jitsi serves up the highest-quality-possible video to all users. This is bound to be a problem for anyone on a poor connection or a low-end computer. Jitsi Meet is quite new, and their team will be working to address these issues. However, trainers have expressed skepticism that the problems could be overcome for users in areas with low-quality connections and computers, where streaming video is difficult. (Video can be turned off and audio streamed alone in Meet, however.)

If it is determined that an alternatives to Signal and Meet should remain viable in the ecology, Régis and/or Lee/Ostel.co should be supported to revamp CSipSimple or Linphone. This would mean: develop a modern, standardized, usable interface, hide advanced configuration options, present a clear path to end-to-end encryption as the primary use case, and translate confusing system messages into plain, understandable language (users should see “User is not online” when that is true, not “User could not be found,” for example) This would go a long way toward developing a viable client.

A recent conversation suggested that Open Whisper Systems would also like to be more certain that users understand when Textsecure/Redphone/Signal communications are encrypted and when they are not. User tests on this fundamental issue would enlighten other developers in this space, not just the Whisper Systems team.

Other open-source tools for secure voice

Blink http://icanblink.com For Windows and Linux; has a Pro version for Mac. Mac version has ability to sync with a number of Mac services, like address book and iCloud. Supports SIP and SRTP, TLS for chat and file transfer.

Simlar https://www.simlar.org/en/ — In alpha. Available for Android and iOS. Traffic appears to go over one server, located in Germany. Could not find information about its encryption protocol. 2–3 developers working on the project.

Mumble http://wiki.mumble.info/wiki/Main_Page — For Windows, Mac, Linux, iOS, and Android. Primarily targeted at the gaming community, and as such has advanced features for audio and group management which our users may not need. Uses TLS for encryption.

Tox https://tox.im/ — Currently in very early stages of development. Planning to do video as well as voice. Mentions RTP as the protocol; not clear what else they’re using. Has no user support options yet. Developers chat over IRC and Reddit.

SFLPhone https://projects.savoirfairelinux.com/projects/sflphone — For Linux, and it appears they are working on Android as well. “The SFLphone project’s goal is to create a robust enterprise-class desktop phone. While it can serve home users very well, it is designed with a hundred-calls-a-day receptionist in mind.” Another project with a French-language developer base; Github indicates it is based in Montreal. Supports TLS, SRTP, and ZRTP; does video but it’s not clear if it’s encrypted. Apparently available in Russian, Chinese, and Vietnamese as well as English, French, Spanish, and other languages.

Previous research

The Guardian Project has written up a review of open source telephony, for the purposes of their own grant work.

Glossary of terms

SIP — a way (or “protocol”) of managing audio and video communications online; one kind of VoIP (voice over internet protocol).

TLS — a way (“protocol”) of encrypting SIP which relies on pre-established certificates to verify the authenticity of participants in the call. This model is considered vulnerable to faked certificates, and less secure than SRTP/ZRTP.

SRTP and ZRTP — Encrypted versions of RTP, a way (“protocol”) of transmitting audio and video data. SRTP does all of the encryption confirmation behind the scenes, which is believed to make it vulnerable to man-in-the-middle attacks. ZRTP is considered more secure, as exchanging simple verbal keys is left up to the users, who can then choose to terminate a call where mismatched keys have been generated by the software.

End-to-server encryption: Encryption that only happens between one user and the server their call or message initially goes to. The message is not encrypted between that server and the recipient of the message, making it vulnerable to attack on that half of the communication. Useful to companies doing business abroad: if they are doing business in China and do not want their trade secrets to be captured by the Chinese government, encrypting communication between employees in China and a server in the US presumably offers the protection they need.

End-to-end encryption: Encryption available over the entirety of an online communication, from sender to receiver and everything in between.

--

--

Gus Andrews

Researcher, educator, and speaker on human factors in tech. My policy work has been relied on by the EFF and US State Department. Author of keepcalmlogon.com