DrupalCon Portland 2013

Probando automáticamente tu infraestructura Drupal

Barry Jaspan  · 


Extracto de la transcripción automática del vídeo realizada por YouTube.

so I'm assuming everyone's had enough of PowerPoint I'm going to be using the latest in presentation technologies known as less for today's presentation um so less is more and this is a talk about you know dev ops ii stuff so command line ok

so I'm talking about testing infrastructure it's triple con so I'm talking about testing Drupal infrastructure this this talk is going to be about what it means when you are configuring servers to you know to do things to run your app and your

and the the server infrastructure has someone of functionality in and of itself and how one tests that lots of people have given excellent talks and written a lot about how you test your website there are lots of tools for that that's not what this is

about this is about actually making sure the infrastructure behaves the way you think it will so the first question we have an advanced page forward technique how's that good ok so first I'm to talk about you know what what a Drupal infrastructure

looks like right it starts off and we all whoops we all kind of start here right one server you're running apache and mysql and it works pretty well and if this is your infrastructure there's not a whole lot you need to do though though i think yes

you'll see before i'm done there's still some even here you'd want to test but very quickly assuming your site becomes more successful you you go beyond this so the first thing you might do is say well let's separate our web server and

our database server so we can scale them both independently get more you know get more resources at bear and then perhaps after that you realize you need more than one web server sorry maybe you want got my slides wrong maybe you want to DB slave server so

you're you know you can offload some of your heavy read queries then you want multiple web servers so now of course you need to load balancer spanning over your multiple web servers and of course if you have multiple web servers and you're using drupal

which tends to have the files directory written in the file system somewhere you need some kind of file server whether that's NFS or perhaps you're using s3 or whatever it is you got to store your files somewhere and that's great this works pretty

well but now you know at this point your system is running on 67 servers probably that means you're getting relatively successful and someone's going to ask you hmmm why do we have single points of failure so you get into H a game you start off maybe

you have to load balancers in case one of them goes down eventually you need your file system to be highly available and then you do your database and so I'm not going to you know I'm not talking today about how do you implement a master master database

with failover but there's a variety of ways you do one of them right you have a system where you have all these moving parts now write multiple web nodes some form of failover or redundancy or replication or something across all these components and now

you know this is this is a pretty complex system and I haven't added in you know memcache or Redis and worker queues and you know you know Drupal's Drupal 7 or Drupal 8 supports are seven supports you know job queues and so you need cube workers and

there's you end up with with a fairly hefty infrastructure so this is what we're talking about testing is the stuff in this picture not the app itself so server configuration is software so when you have one server you can run apt-get install Apache

and you know your site will come up and you can install MySQL that's not too hard and you can edit your V hosts file by hand but pretty soon especially when you've got that machine's going you're going to realize you want to automate the configuration

of your servers so there are there are tools that let you do that they're pretty good Okwe cloud we happen to use puppet but chef is out there and ansible and there's some others and there's new ones coming out all the time and generally speaking

they're great they do fairly similar things they can you know install packages and cron jobs and and you know file permissions and users and whatever but the important thing is that when you're doing that the whole value of those tools is that you're

turning your server configuration from a manual task that your sis admin performs into software software has bugs that means software must be tested like all software if you go read the principles of continuous integration and any of the million web pages

that write about them you will see that the ones pertaining to testing tell you that your tests need to be automated they need to be as fast as possible and they need to run in a clone of the production environment it does no good to run your tests or does

less good to run your tests in something that's not like your production environment because obviously you test in something that's not your production environment yen you push your code to production and it's not going to work I I wanted to put

Jasmine's law but that seemed a little pretentious so i have this aphorism which says if it isn't tested it doesn't work and i cannot tell you how true that is brief example on Acquia cloud you customers can download their various logs apache access

and error logs and the PHP error log and their MySQL slow log and all that sort of stuff and we recently rolled out a change in the way we implemented that and after we rolled out the change the only log users could download was there mysql slow log all the

others didn't work turns out we had a test of our log downloading system that tested the mysql slow log and so guess what that one worked what we discovered that we had messed something up where our testing environment wasn't complete I wasn't

an exact replica of production and the mode on the file was different from the something that was expected so you couldn't download them so we quickly identified the problem rolled out a fix and then all of the logs except one worked because we tested

all of the other logs except that one it's not tested it doesn't work so whoops that's not what I meant there we go so I want to take a brief diversion and talk about unit tests and system tests a lot of people who talk about testing talk about

unit tests unit tests are great we should all have them the way the way unit tests work is they isolate individual components and they test those you you if you have dependencies like if you have a module that talks to a database you inject in a fake database

that hard codes the queries it's going to return so you can say you know what I call this function I expected this query I'll simulate return value and then the rest of code should behave in a predictable way they're great but it turns out that

they don't really work for infrastructure that well I mean they can test some of your code but you cannot mock out the entire real world you cannot mock out the operating system so a very good example of this is something we encountered recently where

we use puppet we install six or seven cron jobs on one of our one of our types of servers and it's been working great for years and then just recently in the last month or so one of our tests failed and I went and dug through the logs and dug and dug and

dug and dug and eventually I discovered that it looked an awful lot like one of our cron jobs just hadn't run at the exact moment that our tests expected it to and i discovered this great little bug in cron so when you use the crontab program to install

a new cron file what the crontab program does is it writes it into VAR spool cron tabs username and then the file I think it calls a crontab or maybe it names it after the username I don't remember the cron demon at the top of every minute wakes up stats

all the files in those directories if they're mod time has changed says 0 files new loads in the new file and then does whatever it says to do so because we're using puppet we have six or seven cron jobs that get installed automatically puppet is relatively

fast at doing that and it installed all six of those cron jobs within the one second at the beginning of the minute so begin between zero zero zero zero and you know 0001 after had updated four of those six cron jobs the cron demon woke up and stabbed the

file and said oh it's changed and loaded it in so the cron demon loaded in the new file and then within the same second puppet wrote two more cron jobs to that file one minute later cron woke up stat at the file file mod times are in a resolution of one

second and cron said oh the file hasn't changed didn't load it in there's no way any unit test of your software is ever going to catch issues like that so and you know our system test didn't catch it for a year or more but eventually it turned

up and so we were able to fix that and that's one more cause of random failures that one-half so the important thing about system tests is their end to end right we test our we write our system tests to test our infrastructure as our as the application

will as a Drupal site well so you know we launch real servers we happen to run in AC too so we're on we launch real ec2 servers we set real dns entries for them with our you know with our DNS vendors api we we you know everything that everything that the

drupal site can do we we exercise in the test themselves and I can talk about exactly what those things are in a few minutes but the point is that we're operating outside the environment we're basically doing blackbox testing of our infrastructure

it's hard there's an unbelievable number of race conditions there's a number of things that come up that actually in production would have been fine because you're writing a test and you want to make sure that you know you write your tests

and say make sure within 20 seconds that such and so happens you're going to find out one day it's going to take 22 seconds and your test is going to fail actually it would have been fine in production but you had to put some time out in so then your

test fails so you go fix it and you end up in ER ating over this process for a long time so you get all the time outs just write these things these tests are the hardest thing I've ever done I've been engineering since I was you know 30 years ago and

this is the hardest code I've ever written but without them we could not run our infrastructure because there's just no way we would ever keep it reliable because there's so many little details little ways that things can break okay so what are

the kinds of things we do so first of all we we start we build servers right we launch servers in the cloud and we can figure them with puppet and some other software we always start from a known reference base image so for example we start with the official

ubuntu 12.04 image what we don't do is we do not incrementally evolve images so we don't launch an image install a package and make a snapshot and then next week say oh now we want you know redis so then we install Redis and make a snapshot but when

you evolve server images that way what you end up with is a server image that you don't know how to reproduce and I guarantee you're gonna someday forget to write down in your change log file exactly what you did in exactly what config file you edited

[ ... ]

Nota: se han omitido las otras 5.838 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.