Future Directions
LAVE represents an exciting research area that opens up more questions than it answers. The field of LM-based content editing and agent assistance is just starting to showcase its immense potential, promising many interesting opportunities. Below, we highlight several future directions in addition to those outlined in the paper.
Towards Delightful Mixed-Initiative Video Editing
Mixed-Initiative Interaction, a concept made prominent in the 90s and 00s by HCI+AI pioneers like Eric Horvitz and Marti Hearst, advocates for systems that enable both humans and AI to collaborate efficiently. With the advent of LLM-based agents, revisiting and expanding upon these ideas is timely. We build LAVE on the foundation of mixed-initiative interaction principles, yet it still has potential for enhancement. For example, while LAVE's agent contributes to the editing process, it only responds to user prompts and lacks the capability to autonomously initiate edits or monitor editing behaviors for proactive support. Furthermore, the agent can be extended to consider users' GUI manipulation history to provide personalized assistance. However, caution is advised as agents that proactively engage without solicitation can be perceived as intrusive. Thus, future studies are required to balance proactive engagement with user preferences, ensuring a delightful human-AI co-creation experience without overwhelming users.
Transitioning to Multi-Agent Architecture for In-Context Interaction within Functions
LAVE currently supports a single plan-and-execute agent equipped with functions and tools such as brainstorming and storyboarding, providing a unified interface for language interactions. However, this agent design limits its ability to facilitate in-depth, interactive discussions for specific actions. For example, if users are dissatisfied with ideas generated from the "brainstorming" function, in our current design, it is not straightforward to negotiate with the planning agent to refine the generated ideas iteratively. Instead, their feedback may lead the agent to plan for a new brainstorming execution rather than refining the ideas based on the current context. To overcome these limitations, evolving LAVE into a multi-agent system would be beneficial. This system could include a dedicated planning agent focused solely on planning and specialized agents for each specific function like brainstorming or storyboarding. Users could then engage with particular agents for focused discussions within the context of each function.
Integrating Video Editing Knowledge to LLM for Grounded Assistance
LAVE currently utilizes the standard GPT-4 model to facilitate a range of video editing tasks. This method is effective because the pre-trained LLM already possesses a decent understanding of storytelling and video editing techniques, as well as exceptional skills in information extraction and summarization. However, there is room for improvement in integrating domain-specific knowledge. This could be achieved through fine-tuning or few-shot prompting to align suggestions more closely with desired editing styles. For instance, brainstorming functions can be designed to generate video ideas that resemble the styles of certain creators.
Weaving AI Video Generation into the LAVE Editing Workflow
LAVE is fundamentally a video editing tool designed for users who already have a collection of videos ready for editing. However, the video creation workflow can vary widely. Sometimes, users may not start with an existing collection of videos, or they might find their current videos insufficient for their desired project. This is where recent advances in video generation models, such as Sora and Pika, become valuable. These models can generate video footage that can serve as B-roll to augment a user's collection or even provide entirely generated footage for editing. While LAVE focuses on editing and these models on generation, they naturally complement each other. Future research could explore the interplay between LAVE and such models to enhance the video creation workflow.