Do you want to learn how to integrate Cognitive Services Computer Vision API into your apps? In this post, I’ll show you how to use Power Apps and Microsoft Cognitive Services Computer Vision to automatically create captions on your images you upload.
We recently worked on an interesting job where a detective wanted to add captions to pictures he takes at a crime scene without having to go back through each picture and document what each picture was looking at. We can use the Azure service Microsoft Computer Vision to accomplish this and add captions and inscriptions on pictures automatically.
Computer Vision can do many cool things - document a picture with captions; it can do Optical Character Recognition (OCR) or identify celebrities and even employees to name a few. In this case, I’ll add a picture control so when I open this app, it will go through and describe what is in the picture.
- To begin, I open the Azure service and a blank application in Power Apps.
- Next, I drag my Azure tenant down and create a brand-new resource by clicking on Create a resource and I’ll look for the Computer Vision resource. Once I open that, I click on Create. You can also integrate this into Microsoft Automate (formerly Flow) as well as Power Apps.
- Something to note, you will get a fair amount of calls for free with this but once you’ve used them, you’ll have to pay.
- Once created, you’ll go back to your Azure service dashboard and your Computer Vision item and click on the keys area which will give you a key for when you create a connection.
- Now, back to my Power Apps environment and I’ll create a connection. I click on Data sources and I’ll create a new connection to the Computer Vision API.
- From the Media tab pull down I select Add Picture to add a picture on the left side of my app which will allow my users to upload their own pictures.
- On the right side I want it to show the words that describe the picture on the top and the caption that describes it on the bottom. To do this I will add a collection which will come back as a table of all the things it sees in the picture. I’ll need a gallery and a collection for that gallery.
- To start, I’ll click on Add Media Button on the left under Screens and then add an On Change event. I keep it simple and set it up if you change the picture, I want to put my event on top of that. This is an automated approach so when you change the picture it will analyze it automatically.
- I’ll use a collection which is a type of variable that holds a table. I’ll use a function called Clear Collect; this will purge the collection and reload it again each time we change the picture. Watch my video to see the exact code here.
- In this demo I want to describe the contents of the image. When I add description in my code, it is going to load up all the things that describe this image.
- Now when you go to File under Collections, we will see that new collection although we still must populate it.
- Next, I’ll add my gallery where it will show a list of all the descriptors it thinks it sees. I choose a vertical gallery, and have it point to my collection and I choose Title as I only care about the words.
- In the code bar (see video) I need to set it up to navigate through the sub tables and get the tag values. Now when I change the picture, I will get a list of tags in my gallery.
- Now I want to get a caption listed under that gallery. I’ll add a label under that gallery and click on text. I’ll use some If/Then statements to specify the confidence level of how the API is captioning this, so if it’s a certain confidence level or less, then it won’t be show that caption. I must say, it’s generally pretty good at describing pictures but with an incorrect pick up here or there.
- With our code we’ll need to navigate through the multiple tables nested in here and down through a chain of events. Below is the code for how to do this, including how to get the confidence level set:
- First(First(colImageDesc).captions).text & " " & Text(First(First(colImageDesc).captions).confidence*100,"[$-en-US]##.#%")
You can see a little more detail and just how it works in my video demo below, but this is a pretty simple application using Computer Vision to quickly get tags and captions from pictures that are uploaded to the app. Very cool stuff. There’s a lot you can do with Microsoft Cognitive Services so be sure to go to the Microsoft website to learn more or contact us.
If you want to learn more Power Apps, we’ve got you covered. We have Power Apps courses as part of our library of 55+ courses in our On-Demand Learning platform which also includes courses on Azure, business intelligence, Power BI, SQL Server and much more.
How about starting with a FREE course? Our App in a Day course has 7+ hours of Power App content at no cost to you! Click the link below for your free course!