0
1
00:00:00,570 --> 00:00:00,960
All right.
1

2
00:00:00,960 --> 00:00:06,930
So in the last lesson, you saw how we were able to use the image that the user picked in the
2

3
00:00:06,930 --> 00:00:16,470
imagePickerController and convert that image into a ciimage, and then pass that ciimage into our detect image
3

4
00:00:16,590 --> 00:00:17,580
method.
4

5
00:00:17,580 --> 00:00:22,670
Now, once the image goes into this detect image method, then we do a couple of other things here.
5

6
00:00:22,860 --> 00:00:31,350
Firstly, we load up our model using the imported Inceptionv3 model, and then we create a request that
6

7
00:00:31,460 --> 00:00:37,830
asked the model to classify whatever data that we pass in. And the data that we pass in is defined over
7

8
00:00:37,830 --> 00:00:39,790
here using a handler.
8

9
00:00:39,990 --> 00:00:45,410
And then we use that image handler to perform the request of classifying the image.
9

10
00:00:45,510 --> 00:00:52,470
And once that process completes, then this callback gets triggered and we get back a request or an error
10

11
00:00:52,710 --> 00:00:57,690
and we print out the results that we got from that classification.
11

12
00:00:57,720 --> 00:01:03,360
And as you can see the results were pretty accurate. But now we have to dig into those results because
12

13
00:01:03,360 --> 00:01:10,260
we want to see if the top result, the result with the highest confidence, was classified as a hotdog
13

14
00:01:10,560 --> 00:01:12,680
or as not a hotdog,
14

15
00:01:12,870 --> 00:01:16,480
making it more like the Seafood app from Silicon Valley.
15

16
00:01:16,560 --> 00:01:24,330
So instead of printing out all of the results into the console, what we want is to change the title bar
16

17
00:01:24,810 --> 00:01:28,110
text up here to say, "Hotdog"
17

18
00:01:28,320 --> 00:01:35,950
if the image contains a hotdog, or "Not hotdog" if the image is classified to not contain a hotdog.
18

19
00:01:36,000 --> 00:01:37,060
So how can we do that?
19

20
00:01:37,080 --> 00:01:43,170
Well, the first thing we can do is delete this print statement. And we're going to check in the results
20

21
00:01:43,170 --> 00:01:46,760
that we get back for the first item.
21

22
00:01:47,130 --> 00:01:53,940
So that's usually the one with the highest confidence interval. In this case, it was 82 percent confident
22

23
00:01:54,090 --> 00:01:57,440
that the image contained a computer keyboard or keypad.
23

24
00:01:57,810 --> 00:02:10,170
So in order to tap into that first result, we can write if let's firstResult = results.first.
24

25
00:02:10,230 --> 00:02:10,980
That was easy, right?
25

26
00:02:11,920 --> 00:02:24,740
So, now inside here, we can tap into first results and we can check if firstResult.identifier
26

27
00:02:25,100 --> 00:02:28,180
contains a string called "hotdog,"
27

28
00:02:28,730 --> 00:02:34,280
then the classification we got back was certain that it contained a hotdog,
28

29
00:02:34,370 --> 00:02:34,910
right?
29

30
00:02:34,910 --> 00:02:40,430
Then the classification we got back is pretty confident that there's a hotdog in the image.
30

31
00:02:40,430 --> 00:02:44,510
In that case, we're going to change navigation bar's title to say hotdog.
31

32
00:02:44,540 --> 00:02:45,230
So we'll write
32

33
00:02:45,260 --> 00:02:55,890
self.navigationItem.title = "Hotdog!" Now, if the first result doesn't contain the word
33

34
00:02:55,980 --> 00:03:03,960
"hotdog," then we're going to set the navigationItem's title to "Not Hotdog!" instead.
34

35
00:03:04,660 --> 00:03:04,980
All right.
35

36
00:03:04,980 --> 00:03:06,590
So that's pretty simple, right?
36

37
00:03:06,600 --> 00:03:12,120
So we're, basically, using optional chaining to check to make sure that the results that we get back definitely
37

38
00:03:12,120 --> 00:03:19,650
have a firstResult value, and then we're using that value to check that its identifier contains the
38

39
00:03:19,650 --> 00:03:20,780
word "hotdog."
39

40
00:03:20,790 --> 00:03:28,730
So what you see in the console at the moment is an array of all of the VNClassificationObservations.
40

41
00:03:28,890 --> 00:03:35,580
And if you have a look at the first VNClassificationObservation up here, you can see that it contains
41

42
00:03:35,640 --> 00:03:44,060
a number of properties. And one of those properties is this number here. And that's the percentage confidence
42

43
00:03:44,070 --> 00:03:46,410
the model has in its own prediction.
43

44
00:03:46,560 --> 00:03:54,150
So this prediction for that image we saw earlier on, the model has is 82 percent confident that it is
44

45
00:03:54,210 --> 00:03:55,760
a computer keyboard.
45

46
00:03:55,770 --> 00:04:04,690
Now, this string which says "Computer keyboard, keypad" is the identifier property of the firstResult.
46

47
00:04:04,980 --> 00:04:10,890
And in this line of code, we're checking to make sure that in that string, there contains the word "hotdog"
47

48
00:04:10,890 --> 00:04:16,680
in which case, we're looking for the most confident prediction from the model and checking to see
48

49
00:04:16,680 --> 00:04:22,620
if it includes a hotdog prediction. And if it does, then we're pretty certain that the image contains
49

50
00:04:22,620 --> 00:04:26,310
this hotdog and we're going to show the user the prediction.
50

51
00:04:26,310 --> 00:04:31,980
If not, then we'll show them the "Not Hotdog!" variant. So let's give it a spin, shall we?
51

52
00:04:31,980 --> 00:04:32,210
All right.
52

53
00:04:32,220 --> 00:04:40,320
So I've got the app loaded up on my phone and I'm going to first take a picture of the keyboard and
53

54
00:04:40,340 --> 00:04:41,520
hit Use Photo.
54

55
00:04:41,790 --> 00:04:44,270
Let's see what we get back.
55

56
00:04:44,340 --> 00:04:45,720
Not Hotdog!
56

57
00:04:46,110 --> 00:04:46,380
All right.
57

58
00:04:46,400 --> 00:04:50,920
Let's try it with the real deal now. Let's try it with a hotdog, Use Photo.
58

59
00:04:53,660 --> 00:04:54,490
Hotdog!
59

60
00:04:54,590 --> 00:04:55,060
Brilliant.
60

61
00:04:55,070 --> 00:04:57,790
So our app is working.
61

62
00:04:57,800 --> 00:05:04,370
I don't know how many hotdogs they've trained this Inceptionv3 model on, but from my testing, it's been
62

63
00:05:04,370 --> 00:05:08,850
able to pick up most of the hotdog images I can find on Google.
63

64
00:05:09,050 --> 00:05:12,660
Now, it can be a bit more variable when you try it on a live food item.
64

65
00:05:12,710 --> 00:05:18,360
I've tried a few times. But depending on the orientation, sometimes it identifies it as "Not a Hotdog."
65

66
00:05:18,500 --> 00:05:23,990
So I'm sure there's still more work for Google to be done on inception, but it's already performing pretty
66

67
00:05:23,990 --> 00:05:24,460
well.
67

68
00:05:24,470 --> 00:05:25,270
So there it is.
68

69
00:05:25,310 --> 00:05:32,870
There is our remarkably simple app where we managed to tap into the camera, take photos, and classify those
69

70
00:05:32,930 --> 00:05:41,180
images all on the device. So I can show you if I switch into airplane mode, you can see that the classification
70

71
00:05:41,180 --> 00:05:45,200
still works even though I have no internet, whatsoever.
71

72
00:05:45,200 --> 00:05:46,400
So there you go.
72

73
00:05:46,430 --> 00:05:52,890
That's how incredibly simple it is to implement image recognition using Vision and CoreML.
73

74
00:05:53,000 --> 00:05:57,920
So I hope you've learned something quite useful and you can start implementing these image recognition
74

75
00:05:57,920 --> 00:06:01,250
machine learning models in your own iOS apps
75

76
00:06:01,280 --> 00:06:02,510
from now on.
76

77
00:06:02,630 --> 00:06:09,790
So my recommendation is to use Inceptionv3 because I find from testing that it's most accurate.
77

78
00:06:09,800 --> 00:06:15,500
But, of course, feel free to play around with the other models that Apple has provided and you'll see
78

79
00:06:15,500 --> 00:06:20,660
that some of them have better applications in certain situations and scenarios.
79

80
00:06:20,660 --> 00:06:26,390
So I'm looking forward to hearing about all of your awesome apps that you've created using CoreML. And
80

81
00:06:26,420 --> 00:06:32,480
if you made anything really cool, then please post it in the discussion and we would love to congratulate
81

82
00:06:32,480 --> 00:06:34,470
you and have a play with it.
82

83
00:06:34,700 --> 00:06:37,860
Now, in the next module, we have a bonus tutorial for you.
83

84
00:06:37,970 --> 00:06:41,960
Now, it's an optional tutorial so you don't have to do it. In the next tutorial,
84

85
00:06:41,990 --> 00:06:46,940
we show you how you can recreate the Hotdog app using IBM Bluemix API.
85

86
00:06:47,060 --> 00:06:52,250
So we're basically using a different technology stack in order to achieve the same purpose, i.e., Visual
86

87
00:06:52,250 --> 00:06:53,300
Recognition.
87

88
00:06:53,300 --> 00:06:55,370
So as I mentioned, this is completely optional.
88

89
00:06:55,370 --> 00:07:00,780
You're going to be building the same Hotdog or Not Hotdog app, although with some neat add-on features.
89

90
00:07:01,070 --> 00:07:05,870
But if you are not interested in recreating the same app again, then you can go ahead and skip the next module,
90

91
00:07:06,230 --> 00:07:11,150
and go straight ahead to our Intermediate CoreML Module where we're going to be building a really
91

92
00:07:11,150 --> 00:07:13,440
awesome plant identification app.
92

93
00:07:13,550 --> 00:07:15,760
So I'll see you on one of those modules.
93

94
00:07:15,770 --> 00:07:16,450
See you soon.