Exploring screen summarization with large language and multimodal models