An introduction to the Microsoft Cognitive Services Visual Search API
In this post, we’ll look at the features and capabilities of the Bing Visual Search API and we’ll look at some code examples that show you how to implement the API. We’ll also see how the API can be used to surface image-related insights that can be beneficial to your software projects or business.
What is Bing Visual Search API?
The Bing Visual Search API lets you build deep and rich functionality similar to Bing.com/images. By uploading an image or providing a URL to one, this API can identify a variety of details about it, including visually similar images, shopping sources, web pages that include the image, and more. The API can also read barcodes and QR codes.
If you use the Bing Image Search API, you can use insight tokens attached to the API’s search results instead of uploading an image.
In the example below, Bing Visual Search has recognised the image and context and has provided search results of visually similar products.
Features and Insights
It doesn’t end there – at the time of writing the API also lets you discover the following insights:
- Visually similar images: a list of images that are visually similar to the image provided
- Shopping sources: places where you can buy the product shown in the input image
- Annotations: tags about entities present in the image such as popular places and landmarks, celebrities, animals, flowers, and other daily objects
- Related searches: related searches made by others or that are based on the contents of the image
- Web data resource: provides web pages that include the input image
- Recipes: web pages that include recipes for making the dish shown in the input image
Visual Search returns a diverse set of terms (tags) derived from the input image. You can use tags to fetch additional images or even cluster and present images (just like Pinterest does) thereby helping you or your users to explore related images or concepts. For example, if the image you upload is of a slice of carrot cake, the assigned tags may include cakes, desserts, sweets.
Another feature worth highlighting is object detection. For example, if you supply an image that contains several items of fashion or home furnishings, or an image that contains several celebrities, the API will include one or more bounding boxes for each item it can identify, or recognised celebrity in the image.
Consuming the Visual Search API
Now that you’ve got a good idea in terms of some of the key features of the Bing Visual Search API, let’s look at how you can consume it! Consuming the Bing Visual Search API involves this high-level process:
- Create a Cognitive Services API account (this needs an Azure subscription and you can get a free account here)
- Construct and send your request to the API endpoint
- Process the response that gets returned by the API
Like most of the other APIs within Cognitive Services, you can invoke the API using the REST endpoint, or if you prefer to write a little less code you can use the dedicated SDK.
The SDK encapsulates most of the low-level coding such as encoding your images to byte arrays, setting the http request headers and processing the raw http response to a custom set of POCOs. There are Quickstarts available for each SDK it supports: C#, Node.js and Python.
If you prefer full control of how the REST request is constructed and how the response is processed, you also have this option (I prefer this!) and it’s what we’ll look at next.
Using C# with the Bing Visual Search REST API
Prior to running this code you’ll need Visual Studio, a Cognitive Services API account and access to the Bing Search APIs. In this example, we’ll build a console application that:
- Loads a file from disk into a binary object
- Sets the image boundaries and formatting headers (these are important when uploading a local image)
- Defines the form data for the POST request that contains the image binary we’ve just loaded
- Construct a web request and sends this to the Bing Visual Search API
- Parse the response returned by the Bing Visual Search API and display key information related to the image in the console application
I won’t detail what every single line of code does in the following C# example as some of the methods are self-explanatory. You can see the bulk of the main logic below however:
Bing Image Search
One thing to highlight however is the method BingImageSearch. This method is responsible for setting up the necessary parameters and building the web request that gets sent to the Bing Visual Search API.
Before we look at the code in this method, it’s worth mentioning that as you’re dealing with images (binary data), it needs to be sent in a format that the Bing Visual Search API can understand.
One of the first things you need to do when dealing with local images is to add boundary strings, these help you format the data in a way that lets the Bing Visual Search API know where your form data (the image) begins and ends. In our example, we use a few constants to store these values:
Another important parameter to set is the Content-Disposition Header. This parameter is mandatory, and the name parameter needs to be set to type “image”. We have two variables that define these:
We then use these values to build strings that indicate the location of the image in the POST request.
You can see the code that forms the method BingImageSearch in the screenshot below. Here we’re passing in the image boundaries, the content (the actual image) and content type (image) which all get added to a POST request:
We then send this POST request to the Bing Visual Search API which will return JSON that contains rich information related to the image you’ve supplied.
Full source code can be downloaded here if you want to look at the lower level detail of each method.
For reference, we’re supplying an image from Wikipedia of Satya Nadella. Using this as an example with our code, we can make a call to the Bing Visual Search API and retrieve valuable insights.
Web Pages that include the Image
If we run the console application and place a breakpoint on the code, we can look at the JSON response in Visual Studio and examine the Actions node (which belongs to the Tags node). In this node we can see there is an actionType called “PagesIncluding”.
This insight contains web pages that also contain the image that we’ve just passed to the Bing Visual Search API.
If you take the URL which is highlighted above and paste it into your browser, you’ll see the following image is returned (which is hosted on www.technologyrecord.com) and is similar to our Wikipedia image!
Unsure how to verbalise a search term but have an image to hand? The Bing Visual Search API can return the search terms that other users have supplied to return similar images.
Take this image for example (all right, we all know what this is!):
After sending this image to the Bing Visual Search API, we can see the following search terms that result in similar images are returned:
These are just some examples of the types of insight that can be returned by the Bing Visual Search API. You can find a full list of the available insights on offer here.
Sometimes it can be difficult to describe a query in text and having an API like Bing Visual Search gives you another channel to drive search.
It certainly has its benefits and paves the way for new innovative solutions and I see a few use cases for the API:
Recommendation Engines: Use the API to search for image sources that are like yours (places, animals, flowers, celebrities etc). Extract the returned meta data and package this into datasets which can be used by your application auto-suggest “similar products”.
Deal Scanner: Interested in building a service that finds deals online? Use the API to surface details of retailers that sell products that are in the images you supply! Take the returned metadata and run queries over key fields such as cost.
Reputation Management / Copyright Infringement: Use the API to find out which websites your images are being rendered! Index these URLs and use the information to quickly identify websites that are hosting your images without your permission!
These are just some ideas and I’m sure you have your own!
In this blog post we’ve looked at the Bing Visual Search API. We’ve explored some of the features and some of the rich insights it can infer from images you supply. We’ve also looked at a sample C# application that can identify search terms that users are supplying to find images just like yours.
The API contains many more insights, you can find out more about them here and by adding other Bing / Cognitive Services APIs into the mix, the possibilities really are endless!
To find out more about Cognitive Services, Azure, and the Cognitive Search Services, contact the Grey Matter Bing Search team: +44 (0)1364 655 133 or firstname.lastname@example.org