I watched the promo video for Google's Gemeni multimodal LLM this morning: https://www.youtube.com/watch?v=UIZAiXYceBI
It's very impressive, but I find it misleading and very hard to take seriously because of what it doesn't show. This performance is meaningless without context. How was the AI trained? What sort of prompting, scripting, puppeteering, and editing went into making such a smooth demo, transitioning from task to task without any explanation?
Without that, it's impossible to evaluate the AI's performance or viability for any task.
It's problematic that Google presents Gemini as a general purpose autonomous agent that "just works" out of the box. Intelligence is never like that. It's a process that turns information into behavior. To understand the behavior, we need to understand the process, and it's definitely not like what happens in a person's mind. If we think of this in human terms, we will get burned, and that's exactly what Google is promoting here.