We present a regularized reconstruction model to address video summarization. We assume a video can be viewed as a subspace formed by a selected subset of frames, with frames represented as a sparse linear combination of these selected frames. Our method selects frames that contribute to the reconstruction of the entire video by leveraging both the structure and similarity between sparse codes. The structure is provided by groups of frames showing subtle or significant changes, while the similarity ensures a balanced contribution from the frames in these groups. We propose an optimization problem to produce a sparse representation capturing the relevance of each frame, solving this non-smooth problem using proximal gradient methods. We compared our method with state-of-the-art methods through experiments using a standard dataset and a new dataset for volleyball phase analysis. Our results demonstrate that our method produces effective summaries and outperforms existing methods.