1. A non-transitory computer-readable storage medium including one or more sequences of instructions that, when executed by one or more processors, causes:rendering a thumbnail at a particular location on a touch-sensitive display of an electronic device, the thumbnail being representative of a digital video segment corresponding to the thumbnail, wherein the digital video segment is stored in a memory associated with the electronic device, the digital video segment including an audio track and a video track;
receiving a touch and hold input at the particular location, wherein the touch and hold input indicates selection of the digital video segment;
while receiving the touch and hold input, capturing an audio segment, using a microphone of the electronic device, the audio segment having a duration corresponding to a first period of time over which the touch and hold input is received; and
combining the digital video segment and the audio segment into a digital media message in response to the touch and hold input, wherein the audio segment replaces at least a portion of the audio track of the digital video segment in the digital media message such that the portion of the audio track of the digital video segment is not included in the digital media message, wherein the video track of the digital video segment is presented simultaneously with the audio segment when the digital media message is played, and wherein a second period of time that the video track is presented is determined based on the duration of the audio segment.