THRIVEE Use of Advanced Technologies

THRIVEE Use of Advanced Technologies


This document outlines supporting evidence of advanced technologies that supports how THRIVEE delivers virtual care. THRIVEE utilizes advanced technology to aid providers in the delivery of virtual care to augment the core telebehavioral health platform. Advanced technologies serve to augment the patient and provider experience to enhance provider insights for identification and diagnosis in addition to improving workflow. THRIVEE intends to integrate real-time speech to text voice recognition in 2018 adding advanced voice recognition machine learning and artificial intelligence. THRIVEE will add advanced facial recognition in 2019 and beyond as the technologies mature and where they demonstrate improvement in the delivery of care.

Speech to text voice recognition is used to create real-time transcripts of virtual care encounters and group counseling sessions and identification of depression and anxiety. Facial recognition analysis supports providers with enhanced real-time tools to improve the clinical accuracy and care delivery insights. THRIVEE speech-to-text uses commercial cloud computing technologies for voice recognition. Natural language processing leverages machine learning to find insights and relationships within the text. These insights and relationships are presented as tools within the virtual visit encounter and flags for post-visit review and follow-up.


The clinical criteria for diagnosis for the continuum of anxiety and depression disorders is well established along with clinical observations of anxiety from auditory and visual cues. Yet the accuracy of diagnosis in the primary care setting is lacking. A 2011 study of Canadian primary care clinics assessing 840 patients by usual care versus a validated instrument for structured diagnostic psychiatric interview. Using this comparison, the researchers found detection rates by primary care providers to be considerably low. Misdiagnosis rates indicative of type II error (i.e. false negative) for general anxiety disorder was 71.0% and 65.9% for major depressive disorder. Therefore, primary care providers may benefit from any tool that aids in the identification of anxiety and depression.


Audio cues can aid in the real-time identification and diagnosis of depression using verbal attributes and vocal acoustics. Verbal attributes such as word cadence, pauses, and vocal inflection are extractable features from audio files for analysis. The pyAudioAnalysis is an open-source library for audio file analysis able to estimate a patient's depression using real-time audio analysis with 70% accuracy. The researchers applied estimated a clinical depression score using the Beck Depression Inventory (BDI), a validated instrument for measuring depression. The accuracy of the analysis was evaluated using the 2013 Continuous Audio/Visual Emotion and Depression Recognition Challenge (AVEC) dataset. Vocal acoustics is used to identify depression where patients exhibiting a depressive state exhibit different vocal patterns, specifically in higher energy bands. Classification of depression is possible using Mel-Scale Frequency Cepstral Coefficients (MFCC). A 2018 study of 36 patients with major depressive disorder and 36 healthy controls used the open-source OpenSMILE speech feature extraction tool to analyze acoustic features. The researchers found the second dimension of MFCC was able to discriminate between patient groups with a sensitivity of 77.8% and specificity of 86.1%.


Analysis of video enables identification of depression and anxiety in real-time involves the use of facial recognition and advanced analysis using algorithms. A generalized process involves mapping of facial features and landmarks, identifying transient changes, and correlating those changes to behavioral and emotional states. A key underlying assumption of many forms of video analysis is that “motion must be smooth in relation to the frame rate.” Humans process emotional observations with a tendency to focus on and identify the eyes and the mouth. In contrast, Facial Emotion Recognition (FER) technology uses an input image to identify facial landmarks well beyond the eyes and mouth, extracting features, and subsequent classification of features. The Facial Action Coding System (FACS) is the benchmark for facial analysis that categorizes facial muscle activation of an expression into Action Units (AU). A 2016 experimental trial of inducing emotional states via experimental tasks and conditions yielded 80-90% accuracy for facial recognition identification of stress and anxiety.14 This readily exceeds the high variation noted via human observers for detection of emotional state and trait anxiety using visual and audio cues. Photoplethysmography (PPG), a readily available optical method to detect microvascular blood flow changes, may aid in FER identification of emotional state along with other bio-signaling methods.


The benefit of multi-modal observation using both audio and visual cues is greatly improved accuracy and performance with deeper clinical insights. Various methods exist for fusion schemes and automation of algorithms used to merge and subsequently analyze audio and video feeds. A 2015 review of automated analysis of combined audio and video for depression detection cited classification accuracy for depression between 70% and 80% and some studies reporting as high as 90% accuracy.

Related Posts: