In December of 2017, I sent all one hundred of the Most Influential Images of All Time through Google's Image Recognition Cloud Vision API Tool in an attempt to answer: "How does Artificial Intelligence see the world?" and "How does Artificial Intelligence interpret visual information?"
About the Cloud Vision API:
Interpreting Insight From Your Images: Easily detect broad sets of objects in your images, from flowers, animals, or transportation to thousands of other object categories commonly found within images.
Leveraging the Power of the Web: Vision API uses the power of Google Image Search to find topical entities like celebrities, logos, or news events.
Applying Labels to Images: Detect broad sets of categories within an image, ranging from modes of transportation to animals.
Identifying Explicit Content: Detect explicit content like adult content or violent content within an image.
Identifying Logos: Detect popular product logos within an image.
Identifying Landmarks: Detect popular natural and man-made structures within an image.
Reading and Extracting Text from Images: Detect and extract text within an image, with support for a broad range of languages, along with support for automatic language identification.
Scanning for Facial Detection: Detect multiple faces within an image, along with the associated key facial attributes like emotional state or wearing headwear.
Identifying Image Attributes: Detect general attributes of the image, such as dominant colors and appropriate crop hints.
the image analyses:
After running Time's The Most Influential Images of All Time through Google Cloud Vision, a few interesting trends emerged, along with a couple interesting tidbits of analysis:
The Cloud Vision API is an incredibly smart and interesting tool that gets a lot of things about images right, but also a lot of things wrong.
The system sometimes lacks awareness of what it is attempting to label or identify, which can lead to incorrect or upsetting analyses.
Sometimes the image associations can be directly correlated with the picture, but other times the associations can be way off.
The image labels can be contradictory — Artificial Intelligence is often guessing at what might be there.
Since the tool is told to search for text in every image, it will occasionally 'create' text in an attempt to identify it.
The recognition is often missing emotional intelligence and contextual awareness of what an image is conveying.
Empathy and Human Understanding are two major attributes that Artificial Intelligence is currently lacking.
The system is trained to get smarter, so it will be interesting to track changes and confidence levels over time to see how it grows and learns.
Below are a few excerpts from the analysis that highlight some errors, flaws, or quirks in the system:
There have been some other interesting Computer Vision, Deep Learning, and Machine Learning assessments and editorials, including:
What Do Tesla's Autopilot Vehicles See?
Simple Pictures that State of the Art AI Can't Recognize
Google's Artificial Intelligence can Dream
Smart Software Can be Tricked Into Seeing What Isn't There
These Images Show how Google's AI Sees the World