Transcripción
Extracto de la transcripción automática del vídeo realizada por YouTube.
okay good afternoon um first one when you're coming down I put some stickers there so feel free to take them I've got plenty of stuff to share with everybody so please ask questions and then you get a t-shirt or maybe a book I don't know okay so
I can hear myself it's kind of strange anyway um expert troubleshooting so I'm going to talk about some things on troubleshooting with MySQL this is like the table of contents so I'll talk a little bit about problems for types of problems and what
the problem is with problems I'll talk a little bit about instrumentation different ways of instrumenting for my squirrel but also a little bit more general about different aspects like response time capacity things like that then I'll go into tree
different types of troubleshooting that we often have to do in pro kona as part of our consulting business is individual slow query troubleshooting seeing how we can optimize individual queries and global performance problems sometimes you have just the slow
server because of some reason and a very interesting one is the intermittent performance problems so performance problems that happen at very random times for a very brief amount of time sometimes that's really a problem for some applications costs a lot
of money so there's a it's difficult to troubleshoot but i'll show you some tools on how you can actually yeah get more insight in that and try to find a solution for it i'll use this thing here okay so the problem so the thing i see a lot
is is when when some of our customers come to us they say I hey I've got a problem what what's the problem here so usually we say like we hear things like hate the website is down or the database doesn't work anymore and then it's like sometimes
it's like really what what's wrong I mean what's really going on here and and and usually the information that we get from people is kinda yeah scarce or maybe not really what the real problem is so if the most important thing here is is it's
hard to find out what really the problem is and in this case yeah I can't login that's that's one example and go and I'll show you some more about it a little later on another one is we hear is like Tom doesn't respond anymore how do you
know I've had several discussions with some of my friends on IRC like they have some problems and then we try to troubleshoot it just for fun and we try to help them and and then it's like okay how do you know that Tom car doesn't work anymore
because I don't know did you check the error logs I didn't did you have so many money drinks nope so there's a lot of times we don't have the right instrumentation the right tools that right data to actually find out what is really going on
is there something going on is there any change of behavior with compared to things that were happening and yeah just before the problem happens things like that also another problem is is guessing sometimes they say okay we're this is a tomcat example
here I like to use Tomcat as an example it's probably something to do with the connection pool okay why kind of a problem so the thing is is finding out the real problem it's hard what's really going on the problem with those definitions about
these things is that we kind of limit ourselves when we talk about some problem with a customer we try to yet not try to listen to what they actually already think is going on when they say hey it's probably tomcat connection pooling we try to say hey
that's not not not something that we want to hear we want to know what do you see what do you what is the application how is it behaving because otherwise we kind of limit and we our area focus we kind of narrow down the vision that we have and we kind
of only focus on connection pooling then but there's like no real indication that there's something going wrong with connection pooling so the lesson number one is don't trust anybody including yourself and guessing is also kind of lying to yourself
so that that's an important lesson that I've heard over the last couple of years so we're talking all about measuring an instrumentation here so you've got to measure like a boss so instrumentation very important there were some talks yesterday
about instrumentation as well sorry about that I need to drink sometimes so it's the branch of mechanical engineering that deals with measurement and control so one famous quote here is you can't control what you can't measure so that's a very
important thing we need to measure we need to have the metrics we need to know all kinds of things the example here is try to use this can you see that the laser not really okay so if we look at a car we all know how fast is our car going how far did I go
since I last got some new gas how much fuel am I consuming these are all kinds of metrics that we have and we should have that any our application as well but also in our database and very important to be able to understand the real problem so why do we need
to instrument in a lot of cases there's a a guess and that guest I said you're kind of lying to yourself but in many ways you're right you probably are experienced in that area so you might think that this is the problem and maybe you're right
most of the time the example of like logging is slow so we've got logging on a website takes 15 seconds so we've gots a bunch of web service and we've got a mysql database okay so where do you think the problem would be the first guess always is
its mysql there's only my one my SQL box so can't be the web servers you know there's many of them it scales and everything so okay you're thinking the database probably right there's a lot of challenges with databases compared to applications
so yeah we're good but we kind of need to know what's going on here so proving important thing that we don't often see in applications is that you try to need to monitor instrument what your application is doing how much time it is spending at
various types of tasks i think it's doing and one way to represent this is for showing afro like this so where you got submitted a login form and that's the amount of time that it takes so it's a sequence diagram so this only takes a small part
so checking if the user exists goes really fast but somehow we kind of update the last login date of a user and that seems to take a lot of time so by having some kind of instrumentation like this it's I don't know it's not easy to make something
like this in your application server but at least knowing at which stages of your application or your complete architecture it is spending time on how much time you kind of can figure out ok we spent most time here so if we want to make things faster or if
you want to fix our solution it's probably something to do with updating the last login date ok so that takes a lot of time so why are why is law it why is this taking so long so this is the query that's being used so we're updating users and we
said the last login date to now where ID so user ID so a very simple thing yeah probably very easy one ID is probably primary key so there should not be a table scan here so it should be pretty fast schema is also very very simple very straightforward so we've
got ID primary key a lot of cons and we've got last login date which is one of the columns now if you do show full process list what we see is an it's a little hard to actually read but we can see that the second query is the one that is updating the
users and setting a last login date for one particular users and all the other users are actually locked they're waiting waiting until the other one completes so in this case it's using my eyes m and my isaam has stable level locking and you can only
write one one query at the same time so only one right can happen at the same time so that kind of yeah is a kind of a problem but you can see that this takes two seconds so this is the time that the query is running so two seconds is quite a lot to just update
that small column why is that so here we understand that we can see by running show process list that this is causing a problem if we would like just look at general metrics like uptime load cpu usage we could see like okay load is 0 dot 88 it doesn't
mean that even though there's a lot of things going on here and a lot of locking and I love waiting here doesn't mean that cpu must be like one hundred percent use Lotus very low here so by looking at global metrics you don't always get the information
you need or you don't always see that yeah the problem is not in the database because there's no cpu load that does not always true in many cases is it it is not another thing we could do is we could do show global status like slow query see how many
slow queries we had over a certain amount of time and we can see that there were only three slow queries since that was since the start of the mysql database however slow queries by itself we're configured long query time set to one second so you could
assume that if it takes longer than one second it should be a slow query but that's not true because lock time as we can see here they're all locked that time doesn't count for this long query time so another problem another thing that you have
[ ... ]
Nota: se han omitido las otras 4.702 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.