WEBVTT

00:00:00.740 --> 00:00:04.830
Let's briefly discuss how C++ implements late binding to

00:00:04.830 --> 00:00:07.850
support the existence of virtual functions.

00:00:08.240 --> 00:00:12.640
Virtual functions introduce a small overhead in the size of your class,

00:00:12.650 --> 00:00:16.460
which could be important if your project relies on using them.

00:00:16.840 --> 00:00:20.580
I took a few steps back and returned the definition of the play()

00:00:20.580 --> 00:00:24.060
function from the base class before we made it pure.

00:00:24.440 --> 00:00:28.250
I think that this will make it easier to explain how virtual functions work.

00:00:29.540 --> 00:00:31.940
When you mark some function as virtual,

00:00:31.950 --> 00:00:35.180
program will create a virtual table in memory for the

00:00:35.180 --> 00:00:37.960
class that contains that virtual function.

00:00:38.340 --> 00:00:42.380
This virtual table is nothing more than a simple array of pointers,

00:00:42.390 --> 00:00:45.260
specifically, pointers to functions.

00:00:45.740 --> 00:00:49.790
Each position inside of that array is dedicated to a virtual function,

00:00:49.790 --> 00:00:53.050
but since we only have one, this table is small,

00:00:53.060 --> 00:00:54.760
it only has one element.

00:00:55.140 --> 00:00:57.030
If we had more virtual functions,

00:00:57.040 --> 00:01:00.150
each one of them would have a special position in this table.

00:01:00.640 --> 00:01:03.390
Since the right classes could override this virtual

00:01:03.390 --> 00:01:07.450
function from the base class, they also get their own virtual table.

00:01:08.340 --> 00:01:10.680
We know that each of these classes have their own

00:01:10.680 --> 00:01:12.450
version of the play() function,

00:01:12.460 --> 00:01:15.810
and the instructions for executing these functions are

00:01:15.810 --> 00:01:18.750
also stored in a special part of memory.

00:01:19.440 --> 00:01:22.710
These function pointers from the virtual table should point

00:01:22.710 --> 00:01:25.460
to the appropriate function in memory.

00:01:25.840 --> 00:01:29.620
Since every class in this example has its own function override,

00:01:29.630 --> 00:01:34.260
these pointers point to the function defined inside of that class.

00:01:34.640 --> 00:01:36.790
So let's say that I want to call the play()

00:01:36.790 --> 00:01:41.960
function from the Instrument pointer, which actually points to the Guitar object.

00:01:42.540 --> 00:01:47.080
Before executing the function, C++ will check the virtual table of the object,

00:01:47.080 --> 00:01:50.660
which in this case, is the object of the guitar class.

00:01:51.040 --> 00:01:55.030
Compiler knows that the position of this function pointer is 0,

00:01:55.030 --> 00:01:59.760
so it will use this function pointer to find the correct function in memory.

00:02:00.440 --> 00:02:02.220
Once it finds the correct function,

00:02:02.230 --> 00:02:07.160
it will bind it to the current statement and call this function from memory.

00:02:08.039 --> 00:02:11.220
This is why we call it late binding because the function

00:02:11.220 --> 00:02:13.760
is bound at the time of the execution.

00:02:14.140 --> 00:02:18.010
But what if I decide to remove this overridden function from the

00:02:18.010 --> 00:02:22.710
Synth derived class. The virtual table from this class will see that

00:02:22.710 --> 00:02:26.100
we don't have the appropriate override, so by default,

00:02:26.110 --> 00:02:30.440
this function pointer will point to the function defined in the base class,

00:02:30.450 --> 00:02:32.150
the Instrument class.

00:02:32.940 --> 00:02:36.160
And now when we try to call this play() function on the

00:02:36.160 --> 00:02:38.960
Synth object through the Instrument pointer,

00:02:38.970 --> 00:02:41.290
the virtual table will be checked again,

00:02:41.300 --> 00:02:44.520
and this pointer will tell the program to execute the

00:02:44.520 --> 00:02:47.060
function from the Instrument class.

00:02:47.840 --> 00:02:51.130
So these virtual tables will help the program to find the

00:02:51.130 --> 00:02:53.360
correct virtual function in memory.

00:02:53.740 --> 00:02:56.110
And we don't have to manage these tables,

00:02:56.120 --> 00:02:58.630
they are managed by the program itself.

00:02:58.670 --> 00:03:03.460
The overhead that I was talking about is the existence of the virtual pointer.

00:03:04.140 --> 00:03:06.180
Whenever you define a virtual function,

00:03:06.190 --> 00:03:09.480
that base class and all of its derived classes need

00:03:09.480 --> 00:03:14.250
to store another hidden member, which is known as the virtual pointer.

00:03:14.640 --> 00:03:16.270
We don't manage this pointer,

00:03:16.280 --> 00:03:21.060
but we have to be aware that it increases the size of the overall class.

00:03:21.440 --> 00:03:24.570
The only purpose of this hidden pointer is to point to the

00:03:24.570 --> 00:03:27.560
virtual table that belongs to that class.

00:03:27.940 --> 00:03:31.750
This is how a program gets access to the correct virtual table.

00:03:32.140 --> 00:03:36.870
And this pointer is inherited just like any other member of the base class,

00:03:36.870 --> 00:03:40.360
so all of the derived classes also have it.

00:03:40.740 --> 00:03:42.730
So let's say again that I'm calling the play()

00:03:42.730 --> 00:03:46.560
function from the Instrument pointer, which points to the Guitar object.

00:03:47.140 --> 00:03:52.750
This pointer is only aware of the inherited part of this Guitar object.

00:03:53.140 --> 00:03:55.200
As far as this pointer is concerned,

00:03:55.210 --> 00:03:59.710
this object is actually an instrument and it only has these two members,

00:03:59.720 --> 00:04:02.660
the Boolean value and the virtual pointer,

00:04:02.670 --> 00:04:06.060
because the number of strings is not inherited.

00:04:07.440 --> 00:04:10.790
But that's great because the Instrument pointer is still

00:04:10.790 --> 00:04:13.250
aware of this virtual pointer member,

00:04:13.260 --> 00:04:16.130
and since this pointer belongs to the Guitar object,

00:04:16.140 --> 00:04:19.860
that means that it is pointing to the Guitar's virtual table.

00:04:20.240 --> 00:04:24.320
And since the Guitar's virtual table has the correct function pointer,

00:04:24.330 --> 00:04:26.960
the overridden function will be used.

00:04:27.740 --> 00:04:29.500
That's a good idea, right?

00:04:29.500 --> 00:04:31.510
But as I already mentioned,

00:04:31.520 --> 00:04:36.140
the usage of virtual functions will cause an overhead in size because we

00:04:36.140 --> 00:04:39.260
need to store this virtual pointer in every object.

00:04:39.640 --> 00:04:43.360
To show you this, I will print out the size of every class.

00:04:44.340 --> 00:04:48.860
Before virtual functions, the Instrument needed only one byte of data.

00:04:49.240 --> 00:04:54.660
And now you can see that all of these objects suddenly need 16 bytes of memory.

00:04:55.740 --> 00:04:58.310
Since the virtual pointer is just a pointer,

00:04:58.320 --> 00:05:01.630
each class would need an additional 8 bytes of memory,

00:05:01.750 --> 00:05:04.260
at least on my 64‑bit computer.

00:05:04.840 --> 00:05:08.500
So if the Instrument class only required 1 byte,

00:05:08.510 --> 00:05:11.350
shouldn't the size now be 9 bytes?

00:05:11.840 --> 00:05:12.400
Yes,

00:05:12.400 --> 00:05:15.550
but the reason for this big increase is the occurrence of

00:05:15.550 --> 00:05:18.260
something known as structure padding.

00:05:18.640 --> 00:05:19.530
As I mentioned,

00:05:19.540 --> 00:05:22.240
all of the object members have to be stored next to

00:05:22.240 --> 00:05:24.460
each other in one block of memory.

00:05:24.840 --> 00:05:29.360
But the total size of this block will depend on the alignment of data.

00:05:29.740 --> 00:05:33.380
In essence, CPU works by processing data in cycles,

00:05:33.390 --> 00:05:37.650
and a so‑called word length determines the maximum amount of

00:05:37.650 --> 00:05:40.860
bytes that it can process in each cycle.

00:05:41.240 --> 00:05:45.090
On 32‑bit machines, this word length is 4 bytes,

00:05:45.100 --> 00:05:49.160
and on 64‑bit machines, like mine, it's 8 bytes.

00:05:50.040 --> 00:05:53.000
This makes sense because if it was only 1 byte,

00:05:53.010 --> 00:05:56.570
then we would need 4 cycles to process 1 integer,

00:05:56.580 --> 00:06:00.360
but now we can process 2 of them in only 1 cycle.

00:06:00.840 --> 00:06:02.510
To achieve better performance,

00:06:02.520 --> 00:06:05.660
the data from an object is aligned by the computer.

00:06:06.940 --> 00:06:09.250
If we just put members next to each other,

00:06:09.260 --> 00:06:12.040
Instrument class would only need 9 bytes,

00:06:12.050 --> 00:06:15.760
but this virtual pointer would be split in two parts.

00:06:16.140 --> 00:06:20.060
CPU prefers to read whole values in one cycle.

00:06:20.640 --> 00:06:21.770
To make this happen,

00:06:21.770 --> 00:06:25.820
the program will move this pointer into the next block of 8 bytes

00:06:25.820 --> 00:06:29.250
and add 7 padding bytes next to the Boolean.

00:06:30.240 --> 00:06:35.660
Now we can read the Boolean in one cycle and the whole pointer in the next one.

00:06:37.040 --> 00:06:39.680
And the same goes for the derived classes.

00:06:39.690 --> 00:06:43.330
Their virtual pointer is also moved to the next cycle,

00:06:43.340 --> 00:06:48.460
but the Boolean and the integer member are both available in the first cycle.

00:06:49.240 --> 00:06:51.850
If you remove the virtual functions and check the

00:06:51.850 --> 00:06:54.130
size of the Guitar or Synth class,

00:06:54.140 --> 00:06:58.260
you might be surprised that it's 8 bytes instead of 5.

00:06:58.640 --> 00:07:02.640
This is because the processor also likes to have data aligned in a way that

00:07:02.650 --> 00:07:06.350
every data type starts at an increment of its own size.

00:07:06.360 --> 00:07:07.190
In this case,

00:07:07.190 --> 00:07:10.850
an integer of 4 bytes should either start at the beginning of an

00:07:10.860 --> 00:07:13.760
8‑byte sequence or at least at the middle.

00:07:14.640 --> 00:07:18.750
So the computer adds 3 padding bytes to make this happen.

00:07:19.740 --> 00:07:22.490
I should note that this is not a language feature.

00:07:22.500 --> 00:07:24.870
Padding is implemented by the compiler,

00:07:24.880 --> 00:07:27.700
and just because it works like this on my computer,

00:07:27.710 --> 00:07:30.260
that doesn't mean that yours will do the same.

00:07:30.640 --> 00:07:34.140
The point of this little digression was to show you how this virtual

00:07:34.140 --> 00:07:37.770
pointer can possibly increase the size of your objects,

00:07:37.780 --> 00:07:40.060
in this case, significantly.

00:07:40.440 --> 00:07:44.310
So don't use virtual functions just because the cool kids

00:07:44.310 --> 00:07:46.560
are doing it; approach them wisely.
