Audio-to-video synchronization (also known as lip sync, or by the lack of it: lip sync error, lip flap) refers to the relative timing of audio (sound) and video (image) parts during creation, post-production (mixing), transmission, reception and play-back processing. AV synchronization can be an issue in television, videoconferencing, or film.
In industry terminology the lip sync error is expressed as an amount of time the audio departs from perfect synchronization with the video where a positive time number indicates the audio leads the video and a negative number indicates the audio lags the video. This terminology and standardization of the numeric lip sync error is utilized in the professional broadcast industry as evidenced by the various professional papers, standards such as ITU-R BT.1359-1, and other references below.
Digital or analog audio video streams or video files usually contain some sort of synchronization mechanism, either in the form of interleaved video and audio data or by explicit relative timestamping of data. The processing of data must respect the relative data timing by e.g. stretching between or interpolation of received data. If the processing does not respect the AV-sync error, it will increase whenever data gets lost because of transmission errors or because of missing or mis-timed processing.