Statistical video modeling is a fundamental issue that plays an important role in many content-based video analysis applications. Recent research shows that Gaussian mixture models (GMMs) offer good capability and flexibility of joint spatio-temporal video modeling. However, the major bottleneck of this kind of approaches is the high computational complexity and possible over-fitting during model learning. In this work, we have improved the efficiency and robustness of GMM training by incorporating key-frame extraction for model learning. Specifically, key-frame extraction is formulated as a feature selection process for video segmentation. Hence we have proposed a new video generative model that embeds frame saliency in GMM-based video modeling. The proposed video model is able to speed up the GMM training significantly by optimally selecting a set of video frames with strong saliency. More interestingly, all video frames can be characterized by a saliency indicator that exhibit specific joint spatio-temporal characteristics of visual features. This new video model may also lead to new functionalities for content-based video analysis. 


