Talk to your Rocket.Chat today! Say hello to the newly integrated speech-to-text (GSoC 2018 project)

Gabriel Engel
August 1, 2018
·
min read

This series of blog posts will share the Google Summer of Code projects that our 2018 cohort of students are working on. Community members are encouraged to join them in creating and testing these exciting new areas of features.
Each project will have their open source repository listed. Please feel free to drop in and say hello, or join in the action with your own issue or code contribution.

Talk to your Rocket.Chat today with newly integrated speech-to-text

Student: Karan Bedi


Mentors: Gabriel Delavald, Pierre Lehnen

Welcome back to our 2018 GSoC blog series! Up next is Karan’s project that will enable you to literally talk to your Rocket.Chat. He developed this speech-to-text integration to connect some of the well-known speech-to-text APIs currently on the market directly with Rocket.Chat. The project also has wider implications for the accessibility of Rocket.Chat in the near future. It integrates open source speech-to-text engines for users with disabilities, as well as those who prefer an on-premise, full-featured installation.

Integrated Speech to Text

Solving Rocket.Chat Roadies’ #1 problem

One of Rocket.Chat’s long time collaborators spends a good part of every working day on the road and cannot safely send typed messages so often relies on recording audio messages. However, noisy environments often make this difficult and a solution was needed.

Some advanced mobile devices (and desktop PCs) have native speech-to-text technology and can resolve part of the issue. Moving speech-to-text technology so that it resides completely within Rocket.Chat will henceforth allow the solution to be enjoyed by everyone – regardless of the access device.

Karan’s project adds this long-demanded feature to Rocket.Chat. For his project, Karan used the Google Voice API to communicate what the recorded message sent by the user says, the engine then decodes the audio and translates it into text, all within a matter of seconds. The bonus is that Google Voice API can be swapped for other engines like IBM Watson or Bing Speech API because they all work almost the same (you send audio, they return text).

Speech API

Crafting the user experience – UI/UX improvements

One of the biggest challenges for alterntative input such as speech is the design and crafting of an actually usable user interface. Just translating existing visual designs typically does not result in acceptable user experience.

This project aimed at improving this by configuring the API/engine’s connection and usage attributes, and giving Rocket.Chat users the ability to click a button, record a message, send to the desired API/engine and return the results to the Rocket.Chat editing message box. In short, users will now have a fast, in-platform speech-to-text experience that facilitates user multi-tasking or those who do not wish to type their messages because they are busy doing something else, satisfying the Rocket.Chat roadies’ demand.

Blasting through technical challenges

Speech-to-text is a technology that has gained popularity recently, with some providers like Google, Microsoft and IBM improving their engines and providing some open APIs to translate audio files into textual content. However, since this is pretty new, there are still some challenges when trying to work with different providers, like different encodings and file formats that are expected on different APIs.

Karan’s idea was to provide an easy way to connect different speech-to-text solutions to Rocket.Chat. Currently we have only implemented the speech-to-text feature using the Google Speech API, but it is easily replaceable with other engines such as IBM Watson or Bing Speech API; we will investigate these other solutions at a later date.

Rocket.Chat is carrying out a review to make sure everything works properly but hope to go live with this feature towards the end of September to accompany that release.

A long and winding road

Karan has been an active open source community member within Rocket.Chat even before the official beginning of GSoC. He is the author of many well-received pull requests – landing him esteemed respect from many team members.

But Karan’s journey with us on GSoC 2018 has been rather eventful. He started with our core team on the Global Search project. But due to some of our own internal logistic problems, the project’s scope increased and it became unfeasible for it to continue as a GSoC project.

Thanks to Karan’s tenacity and agility, the crisis was averted. We were able to quickly pair him with his second winning GSoC proposal – speech-to-text APIs. Karan ended up working with two sets of mentors in his short GSoC 2018 term.

This exposure to a wide cross-section of the Rocket.Chat team will hopefully facilitate his future endeavours with our open source project.

About the creator: Karan Bedi

Karan Bedi is an undergrad student pursuing a major in Applied Mathematics at IIT Roorkee. In his own words:

Programming, good food, music and travelling are what get me going. My areas of interests are financial mathematics, numerical analysis, machine learning and natural language processing.

Karan Bedi

Karan BediApplied Mathematics student

Karan says that taking part in GSoC 2018 with Rocket.Chat over the summer has been ‘one of the best experiences’ of his life. He wants to continue contributing to open source software and hopes to become an increasingly active member of the OSS community.

Any parting thoughts Karan?

I sincerely thank the organization members for their continuous support in project and non-project related matters.

Karan Bedi

Karan BediApplied Mathematics student

Great news! Thank you for your contribution to Rocket.Chat this summer!

Get started with Rocket.Chat’s secure collaboration platform

Talk to sales

Frequently asked questions about <anything>

Gabriel Engel is the CEO and co-founder of Rocket.Chat, the leading open source communications platform.
Gabriel Engel
Related Article:
Team collaboration: 5 reasons to improve it and 6 ways to master it
Want to collaborate securely with your team?
Deploy Rocket.Chat on-premise or in the cloud and keep your conversations private.
  • Digital sovereignty
  • Federation capabilities
  • Scalable and white-labeled
Talk to sales
Looking for a HIPAA-ready communications platform?
Enable patients and healthcare providers to securely communicate without exposing their data.
  • Highly scalable and secure
  • Full patient conversation history
  • HIPAA-ready
Talk to sales
The #1 communications platform for government
Deploy Rocket.Chat on-premise, in the cloud, or air-gapped environment.
  • Digital sovereignty
  • Trusted by National Geospatial-Intelligence Agency (NGA), the US Army, the US Navy, and the US Air Force
  • Matrix federation capabilities
Talk to sales
Want to customize Rocket.Chat according to your own preferences?
See behind the engine and change the code how you see fit.
  • Open source code
  • Highly secure and scalable
  • Unmatched flexibility
Talk to sales
Looking for a secure collaboration platform?
Keep your conversations private while enjoying a seamless collaboration experience with Rocket.Chat.
  • End-to-end encryption
  • Cloud or on-prem deployment
  • Supports compliance with HIPAA, GDPR, FINRA, and more
Talk to sales
Want to build a highly secure in-app chat experience?
Use Rocket.Chat’s APIs, frameworks, and managed backend to build a secure in-app or live chat experience for your customers.
  • Supports compliance with HIPAA, GDPR, FINRA, and more
  • Highly secure and flexible
  • On-prem or cloud deployment
Talk to sales

Our best content, once a week

Share this on:

Get your free, personalized demo now!

Build the most secure chat experience for your team or customers

Book demo