1 00:00:00,090 --> 00:00:04,650 ‫What is the best strategy to delay your containers in order to keep them on a specific order like doing 2 00:00:04,910 --> 00:00:06,840 ‫D.B. Webb rabbit and so on. 3 00:00:06,840 --> 00:00:12,740 ‫So the number one thing is this isn't even a dev OP saying this is a distributed computing thing is 4 00:00:12,780 --> 00:00:13,070 ‫it. 5 00:00:13,200 --> 00:00:17,400 ‫I'm assuming you're talking about production not local development workflow with something like Docker 6 00:00:17,400 --> 00:00:18,360 ‫compose. 7 00:00:18,420 --> 00:00:23,130 ‫But you're talking about production all your apps have to be able to fail or retry. 8 00:00:24,240 --> 00:00:31,170 ‫So the entire like whether or not this is before Docker basically whatever you're using there it has 9 00:00:31,170 --> 00:00:37,030 ‫to recover in some fashion from not being able to talk to other services outside of its own. 10 00:00:37,440 --> 00:00:39,780 ‫And this is a core principle of distributed computing. 11 00:00:39,780 --> 00:00:49,940 ‫In fact if you look up what a resource would be here is 12 factor twelve factor dot net is sort of a 12 00:00:49,970 --> 00:00:55,920 ‫it's a decade old set of principles around the mindset of cloud native and distributed computing those 13 00:00:56,310 --> 00:01:03,810 ‫are two different but similar types of things that really it's about if I've got a bunch of servers 14 00:01:03,810 --> 00:01:08,550 ‫or a bunch of things that my servers have to talk to how do I or orchestrate all of those to be available 15 00:01:08,550 --> 00:01:09,960 ‫when they're needing to be available. 16 00:01:09,960 --> 00:01:15,360 ‫And the answer is you can't control startup right because startup is only a part of the problem. 17 00:01:15,480 --> 00:01:21,990 ‫When you have to replace a container if the other containers lose connection from it because a container 18 00:01:21,990 --> 00:01:26,460 ‫or any other service goes down for a second all those other services that are using it have to be to 19 00:01:26,460 --> 00:01:27,350 ‫recover. 20 00:01:27,390 --> 00:01:33,060 ‫And so unlike the old days where we had a single server and we put the database in the website on the 21 00:01:33,060 --> 00:01:39,240 ‫same server and that was always available and online until it went down that was easy. 22 00:01:39,240 --> 00:01:43,470 ‫But now in this world we have distribute computing your containers and all of your services have to 23 00:01:43,470 --> 00:01:44,210 ‫take that in mind. 24 00:01:44,220 --> 00:01:51,270 ‫So they either need to have a retry which if you're doing development most develop sorry. 25 00:01:51,270 --> 00:01:58,200 ‫Most database drivers all have built in retry designed in them so they will actually retry to you know 26 00:01:58,240 --> 00:02:03,600 ‫like Mongo D.B. and node.js actually even has a buffer protocol where it can't connect it'll hold the 27 00:02:03,600 --> 00:02:07,410 ‫commands for a little bit to wait for the database to come back online. 28 00:02:07,530 --> 00:02:12,360 ‫It's just built into the driver for your developing language so there's lots of stuff out there like 29 00:02:12,360 --> 00:02:12,670 ‫that. 30 00:02:12,720 --> 00:02:17,150 ‫And if if your app doesn't do anything like that then and it just fails. 31 00:02:17,280 --> 00:02:21,930 ‫The nice thing if you're using container orchestration is that part of that job of that orchestrator 32 00:02:21,930 --> 00:02:26,760 ‫is if the container just crashes because it loses connection from something then the orchestrator will 33 00:02:26,820 --> 00:02:29,840 ‫restart it will basically start a new copy of that somewhere else. 34 00:02:30,030 --> 00:02:32,910 ‫And that's one way to recover from failure. 35 00:02:32,940 --> 00:02:39,420 ‫It's a little bit cleaner and less taxing on your systems if they just retry but another way in Docker 36 00:02:39,420 --> 00:02:46,260 ‫to do it is to just let your apps crash essentially and then Docker will restart them based on your 37 00:02:46,260 --> 00:02:46,690 ‫settings. 38 00:02:46,710 --> 00:02:50,010 ‫So I know that's probably not the little click button. 39 00:02:50,010 --> 00:02:50,910 ‫Answer a lot. 40 00:02:50,910 --> 00:02:56,490 ‫People might just answer Oh you need to add retry to your doctor compose or something but that's not 41 00:02:56,490 --> 00:03:01,170 ‫a production solution because it only has to do that only has to do with original startup and if you 42 00:03:01,170 --> 00:03:06,780 ‫even google for something like Wait for it scripts those don't really solve the whole problem either 43 00:03:06,810 --> 00:03:09,850 ‫because you're going to one of the things is if you're going to start using containers you're going 44 00:03:09,850 --> 00:03:15,780 ‫to be updating them more often that's part of the progress of implementing the dev ops mindset is things 45 00:03:15,780 --> 00:03:20,220 ‫are going to be updated more often than they were in the past because that's one of the core tenants 46 00:03:20,220 --> 00:03:23,220 ‫of dev ops is continually evolving and improving. 47 00:03:23,250 --> 00:03:29,340 ‫So when you start doing that that means that any one piece of your puzzle has to be able to handle any 48 00:03:29,340 --> 00:03:35,100 ‫other piece of the puzzle going down and you can't really do that with startup order if you know what 49 00:03:35,100 --> 00:03:35,710 ‫I mean. 50 00:03:35,730 --> 00:03:37,020 ‫Hopefully that helps. 51 00:03:37,300 --> 00:03:42,830 ‫It's it's a tough problem to solve if you're dealing with legacy apps but it's a continuum. 52 00:03:42,840 --> 00:03:48,870 ‫You have to continually work on continuing on the process of getting your apps all handle failure. 53 00:03:48,870 --> 00:03:50,630 ‫Essentially it's not an easy problem innit. 54 00:03:50,670 --> 00:03:52,290 ‫It's a it's a process to go through.