We present a regularized reconstruction model to address video summarization.
We assume a video can be viewed as a subspace formed by a selected subset of
frames, with frames represented as a sparse linear combination of these
selected frames. Our method selects frames that contribute to the
reconstruction of the entire video by leveraging both the structure and
similarity between sparse codes. The structure is provided by groups of frames
showing subtle or significant changes, while the similarity ensures a balanced
contribution from the frames in these groups. We propose an optimization
problem to produce a sparse representation capturing the relevance of each
frame, solving this non-smooth problem using proximal gradient methods. We
compared our method with state-of-the-art methods through experiments using a
standard dataset and a new dataset for volleyball phase analysis. Our results
demonstrate that our method produces effective summaries and outperforms
existing methods.