Described is a technology that provides highly accurate speech-recognized text transcripts of conversations, particularly telephone or meeting conversations. Speech is received for recognition when it is at a high quality and separate for each user, that is, independent of any transmission. Moreover, because the speech is received separately, a personalized recognition model adapted to each user's voice and vocabulary may be used. The separately recognized text is then merged into a transcript of the communication. The transcript may be labeled with the identity of each user that spoke the corresponding speech. The output of the transcript may be dynamic as the conversation takes place, or may occur later, such as contingent upon each user agreeing to release his or her text. The transcript may be incorporated into the text or data of another program, such as to insert it as a thread in a larger email conversation or the like.