Wednesday, April 10, 2019

Automatically tag images with Sitecore Powershell Extensions

A brief introduction

Last month I had the pleasure of participating in Sitecore Hackathon with two colleagues. As preparation we spent a couple of hours looking around for API's that could be fun to make some sort of integration to, which hopefully would fit into one of the announced categories. We had our eyes on IBM's Watson API, which provides a long range of services such as translation, tone analyzation and visual recognition.

We ended up choosing (and winning, yay!) the Sitecore Powershell Extensions category. We made a module that did image recognition and tone analyzation on command. If you're curious as to what we did, you can check it out our Hackathon 2019 github documentation or the Youtube presentation.

Tagging images

The Hackathon entry was made with ribbon buttons and modals, to show off some flashy functionality and make it easily understandable.
Personally i think the tone analyzation isn't quite mature enough as a service to be incorporated in actual Sitecore solutions (however it may just be me who's bad at reading the results).
But I really like the potential of the image tagging. The possibilities for editors with a large image catalog to search images based on their content could be a great tool. I decided to spend a bit of time automating it, and do a quick write-up on it here. It's almost a pure reuse of the Hackathon module, I just used the events integration point as a trigger instead, and made it write all keywords automatically.

Convert-ToBase64

In order to encode the API keys we need a helper to convert a string to base 64.
<#
    .SYNOPSIS
        Converts the input stream to base 64 encoded
    
    .PARAMETER Text
        Specifies the text to encode
    
    .OUTPUT
        - Base 64 encoded string
#>
function Convert-ToBase64 {
    #(Text)
    [CmdletBinding()]
    param(
        [ValidateNotNullOrEmpty()]
        [string]$Text
    )
    $bytes = [System.Text.Encoding]::ASCII.GetBytes($Text)
    [System.Convert]::ToBase64String($bytes)
}

The Hackathon submission had two additional helpers that had to do with the storing of settings for endpoints and keys. While I do think these are pretty nice, I've omitted them from this blogpost, to keep it on point (otherwise I had to post templates and settings hierarchy too). If you want the settings, grab them from the package in the Hackathon link in the introduction. In this example I have just typed in the endpoint and the API-Key below in the Invoke-WatsonVisualRecognition function itself. Notice that I haven't supplied my own API-Key, if you want to try it out, get a free API key here.

Invoking the external API

Next we need to make ready for invoking the Watson API. Normally I would always use Invoke-Webrequest, but we simply couldn't make it work with uploading the image. So we defaulted back to the good old .Net implementations.

Invoke-WatsonVisualRecognition
Import-Function -Name Using-Object
Import-Function -Name Convert-ToBase64
<#
    .SYNOPSIS
        Analyzes the image through IBM's Watson API
    
    .PARAMETER ImageStream
        Specifies the image to analyze
    
    .OUTPUT
        - Array of image keywords and Scores
#>
function Invoke-WatsonVisualRecognition {
    #($ImageStream)
    [CmdletBinding()]
    param(
        [Parameter(Mandatory=$true)]
        [System.IO.FileStream]$ImageStream
    )
    
    $endpoint = "https://gateway.watsonplatform.net/tone-analyzer/api/v3/tone?version=2016-05-19"
    $apikey = "[INSERT YOUR API KEY HERE!]"
    $auth = Convert-ToBase64 -Text "apikey:$($apikey)"

    #We could not get Watson to accept an Invoke-Webrequest with the image stream in the body. We make a good old .Net request
    $request = [System.Net.HttpWebRequest]::Create($endpoint)
    $request.Method = 'POST'
    $request.Headers.Add([System.Net.HttpRequestHeader]::Authorization, "Basic $($auth)")
    $stream = $request.GetRequestStream()
    $ImageStream.CopyTo($stream)
    $stream.Close()
    New-UsingBlock ($response = $request.GetResponse()) {
        New-UsingBlock ($reader = New-Object System.IO.StreamReader($response.GetResponseStream(), [System.Text.Encoding]::ASCII)) {
            $json = $reader.ReadToEnd()  | ConvertFrom-Json
            $json.images[0].classifiers[0].classes
        }
    }
}

Tying it all together

Now we just need to make sure that every time an image is saved, the keywords are set. First we make a function that calls the API.

Tag-Image
Import-Function -Name Invoke-WatsonVisualRecognition
Import-Function -Name Using-Object

<# 
    .SYNOPSIS
        Analyses the image on Watson and adds the relevant tags to the keywords field
#>
function Tag-Image {
 #Find the mediastream and send it to Watson for analysis
 $item = Get-Item -Path .
 $mediaItem = [Sitecore.Data.Items.MediaItem]$item
 $stream = $mediaItem.GetMediaStream()
 $result = Invoke-WatsonVisualRecognition -ImageStream $stream
 
 $list = @()
 foreach($classification in $result)
 {
     $list += $classification.class
 }
 
 $selectedImageTags = [string]::Join(", ", $result.class)
 New-UsingBlock ([Sitecore.Data.Events.EventDisabler]@{}) {
     $item.Editing.BeginEdit()
        $item.Fields["Keywords"].Value = $selectedImageTags
        $item.Editing.EndEdit()
    }
}

Then we make an integration point for the item:saved event which is only enabled if the item has the Image or Jpeg template. Notice that you need to enable the configuration in SPE in order to use these integration points.

Left side: the structure of the scripts. Right side: the item:saved integration point
Last thing is to make sure the keywords field is indexed. For that we need to go to master: /sitecore/system/Settings/Buckets/Search Types/Text and add keywords to the Field field - making the resulting value: _content,_name,_displayname,keywords

Now I can search for the image contents in the image dialog:

A demo of uploading an image, showing the keywords, and searching through image keywords


Disclaimer

While I'm generally impressed with the image recognition in Watson, I've had multiple test examples that yielded pretty bad results, this could be a help to find images, but it's not perfect. There's also the option to try other similar services, like Azure's cognitive services or Amazon's Rekognition.