0 1 00:00:00,570 --> 00:00:00,960 All right. 1 2 00:00:00,960 --> 00:00:06,930 So in the last lesson, you saw how we were able to use the image that the user picked in the 2 3 00:00:06,930 --> 00:00:16,470 imagePickerController and convert that image into a ciimage, and then pass that ciimage into our detect image 3 4 00:00:16,590 --> 00:00:17,580 method. 4 5 00:00:17,580 --> 00:00:22,670 Now, once the image goes into this detect image method, then we do a couple of other things here. 5 6 00:00:22,860 --> 00:00:31,350 Firstly, we load up our model using the imported Inceptionv3 model, and then we create a request that 6 7 00:00:31,460 --> 00:00:37,830 asked the model to classify whatever data that we pass in. And the data that we pass in is defined over 7 8 00:00:37,830 --> 00:00:39,790 here using a handler. 8 9 00:00:39,990 --> 00:00:45,410 And then we use that image handler to perform the request of classifying the image. 9 10 00:00:45,510 --> 00:00:52,470 And once that process completes, then this callback gets triggered and we get back a request or an error 10 11 00:00:52,710 --> 00:00:57,690 and we print out the results that we got from that classification. 11 12 00:00:57,720 --> 00:01:03,360 And as you can see the results were pretty accurate. But now we have to dig into those results because 12 13 00:01:03,360 --> 00:01:10,260 we want to see if the top result, the result with the highest confidence, was classified as a hotdog 13 14 00:01:10,560 --> 00:01:12,680 or as not a hotdog, 14 15 00:01:12,870 --> 00:01:16,480 making it more like the Seafood app from Silicon Valley. 15 16 00:01:16,560 --> 00:01:24,330 So instead of printing out all of the results into the console, what we want is to change the title bar 16 17 00:01:24,810 --> 00:01:28,110 text up here to say, "Hotdog" 17 18 00:01:28,320 --> 00:01:35,950 if the image contains a hotdog, or "Not hotdog" if the image is classified to not contain a hotdog. 18 19 00:01:36,000 --> 00:01:37,060 So how can we do that? 19 20 00:01:37,080 --> 00:01:43,170 Well, the first thing we can do is delete this print statement. And we're going to check in the results 20 21 00:01:43,170 --> 00:01:46,760 that we get back for the first item. 21 22 00:01:47,130 --> 00:01:53,940 So that's usually the one with the highest confidence interval. In this case, it was 82 percent confident 22 23 00:01:54,090 --> 00:01:57,440 that the image contained a computer keyboard or keypad. 23 24 00:01:57,810 --> 00:02:10,170 So in order to tap into that first result, we can write if let's firstResult = results.first. 24 25 00:02:10,230 --> 00:02:10,980 That was easy, right? 25 26 00:02:11,920 --> 00:02:24,740 So, now inside here, we can tap into first results and we can check if firstResult.identifier 26 27 00:02:25,100 --> 00:02:28,180 contains a string called "hotdog," 27 28 00:02:28,730 --> 00:02:34,280 then the classification we got back was certain that it contained a hotdog, 28 29 00:02:34,370 --> 00:02:34,910 right? 29 30 00:02:34,910 --> 00:02:40,430 Then the classification we got back is pretty confident that there's a hotdog in the image. 30 31 00:02:40,430 --> 00:02:44,510 In that case, we're going to change navigation bar's title to say hotdog. 31 32 00:02:44,540 --> 00:02:45,230 So we'll write 32 33 00:02:45,260 --> 00:02:55,890 self.navigationItem.title = "Hotdog!" Now, if the first result doesn't contain the word 33 34 00:02:55,980 --> 00:03:03,960 "hotdog," then we're going to set the navigationItem's title to "Not Hotdog!" instead. 34 35 00:03:04,660 --> 00:03:04,980 All right. 35 36 00:03:04,980 --> 00:03:06,590 So that's pretty simple, right? 36 37 00:03:06,600 --> 00:03:12,120 So we're, basically, using optional chaining to check to make sure that the results that we get back definitely 37 38 00:03:12,120 --> 00:03:19,650 have a firstResult value, and then we're using that value to check that its identifier contains the 38 39 00:03:19,650 --> 00:03:20,780 word "hotdog." 39 40 00:03:20,790 --> 00:03:28,730 So what you see in the console at the moment is an array of all of the VNClassificationObservations. 40 41 00:03:28,890 --> 00:03:35,580 And if you have a look at the first VNClassificationObservation up here, you can see that it contains 41 42 00:03:35,640 --> 00:03:44,060 a number of properties. And one of those properties is this number here. And that's the percentage confidence 42 43 00:03:44,070 --> 00:03:46,410 the model has in its own prediction. 43 44 00:03:46,560 --> 00:03:54,150 So this prediction for that image we saw earlier on, the model has is 82 percent confident that it is 44 45 00:03:54,210 --> 00:03:55,760 a computer keyboard. 45 46 00:03:55,770 --> 00:04:04,690 Now, this string which says "Computer keyboard, keypad" is the identifier property of the firstResult. 46 47 00:04:04,980 --> 00:04:10,890 And in this line of code, we're checking to make sure that in that string, there contains the word "hotdog" 47 48 00:04:10,890 --> 00:04:16,680 in which case, we're looking for the most confident prediction from the model and checking to see 48 49 00:04:16,680 --> 00:04:22,620 if it includes a hotdog prediction. And if it does, then we're pretty certain that the image contains 49 50 00:04:22,620 --> 00:04:26,310 this hotdog and we're going to show the user the prediction. 50 51 00:04:26,310 --> 00:04:31,980 If not, then we'll show them the "Not Hotdog!" variant. So let's give it a spin, shall we? 51 52 00:04:31,980 --> 00:04:32,210 All right. 52 53 00:04:32,220 --> 00:04:40,320 So I've got the app loaded up on my phone and I'm going to first take a picture of the keyboard and 53 54 00:04:40,340 --> 00:04:41,520 hit Use Photo. 54 55 00:04:41,790 --> 00:04:44,270 Let's see what we get back. 55 56 00:04:44,340 --> 00:04:45,720 Not Hotdog! 56 57 00:04:46,110 --> 00:04:46,380 All right. 57 58 00:04:46,400 --> 00:04:50,920 Let's try it with the real deal now. Let's try it with a hotdog, Use Photo. 58 59 00:04:53,660 --> 00:04:54,490 Hotdog! 59 60 00:04:54,590 --> 00:04:55,060 Brilliant. 60 61 00:04:55,070 --> 00:04:57,790 So our app is working. 61 62 00:04:57,800 --> 00:05:04,370 I don't know how many hotdogs they've trained this Inceptionv3 model on, but from my testing, it's been 62 63 00:05:04,370 --> 00:05:08,850 able to pick up most of the hotdog images I can find on Google. 63 64 00:05:09,050 --> 00:05:12,660 Now, it can be a bit more variable when you try it on a live food item. 64 65 00:05:12,710 --> 00:05:18,360 I've tried a few times. But depending on the orientation, sometimes it identifies it as "Not a Hotdog." 65 66 00:05:18,500 --> 00:05:23,990 So I'm sure there's still more work for Google to be done on inception, but it's already performing pretty 66 67 00:05:23,990 --> 00:05:24,460 well. 67 68 00:05:24,470 --> 00:05:25,270 So there it is. 68 69 00:05:25,310 --> 00:05:32,870 There is our remarkably simple app where we managed to tap into the camera, take photos, and classify those 69 70 00:05:32,930 --> 00:05:41,180 images all on the device. So I can show you if I switch into airplane mode, you can see that the classification 70 71 00:05:41,180 --> 00:05:45,200 still works even though I have no internet, whatsoever. 71 72 00:05:45,200 --> 00:05:46,400 So there you go. 72 73 00:05:46,430 --> 00:05:52,890 That's how incredibly simple it is to implement image recognition using Vision and CoreML. 73 74 00:05:53,000 --> 00:05:57,920 So I hope you've learned something quite useful and you can start implementing these image recognition 74 75 00:05:57,920 --> 00:06:01,250 machine learning models in your own iOS apps 75 76 00:06:01,280 --> 00:06:02,510 from now on. 76 77 00:06:02,630 --> 00:06:09,790 So my recommendation is to use Inceptionv3 because I find from testing that it's most accurate. 77 78 00:06:09,800 --> 00:06:15,500 But, of course, feel free to play around with the other models that Apple has provided and you'll see 78 79 00:06:15,500 --> 00:06:20,660 that some of them have better applications in certain situations and scenarios. 79 80 00:06:20,660 --> 00:06:26,390 So I'm looking forward to hearing about all of your awesome apps that you've created using CoreML. And 80 81 00:06:26,420 --> 00:06:32,480 if you made anything really cool, then please post it in the discussion and we would love to congratulate 81 82 00:06:32,480 --> 00:06:34,470 you and have a play with it. 82 83 00:06:34,700 --> 00:06:37,860 Now, in the next module, we have a bonus tutorial for you. 83 84 00:06:37,970 --> 00:06:41,960 Now, it's an optional tutorial so you don't have to do it. In the next tutorial, 84 85 00:06:41,990 --> 00:06:46,940 we show you how you can recreate the Hotdog app using IBM Bluemix API. 85 86 00:06:47,060 --> 00:06:52,250 So we're basically using a different technology stack in order to achieve the same purpose, i.e., Visual 86 87 00:06:52,250 --> 00:06:53,300 Recognition. 87 88 00:06:53,300 --> 00:06:55,370 So as I mentioned, this is completely optional. 88 89 00:06:55,370 --> 00:07:00,780 You're going to be building the same Hotdog or Not Hotdog app, although with some neat add-on features. 89 90 00:07:01,070 --> 00:07:05,870 But if you are not interested in recreating the same app again, then you can go ahead and skip the next module, 90 91 00:07:06,230 --> 00:07:11,150 and go straight ahead to our Intermediate CoreML Module where we're going to be building a really 91 92 00:07:11,150 --> 00:07:13,440 awesome plant identification app. 92 93 00:07:13,550 --> 00:07:15,760 So I'll see you on one of those modules. 93 94 00:07:15,770 --> 00:07:16,450 See you soon.