Presentación
Vídeo
Transcripción
Extracto de la transcripción automática del vídeo realizada por YouTube.
alright so hi everyone my name is Ian I I work for Google I work at Google Plus on google plus I've been doing that for about eight months now so it's still slightly confusing and weird this talk is about event stream processing as you can guess from
the title it has quite a bit of code in it and that's all up on that github link that's the joint didn't URL please do go and comment on there not just for me but for all the speaker's you see it's really useful fast because it tells us
what we did right what we did wrong and it's also really useful for the organizers because it lets them say who really connected with the audience the kind of things they should look out for next year the kind of problems they should look out for and if
you want to ping me mba php.net will get to me or grab me on + or on twitter so one small plug before we start if you're interested in things like event stream processing you may also be interested in messaging and i wrote a kind of essay tiny ebook thing
if I about 80 pages over Christmas last year and it's on lean pub you can get it from that URL just search for the title and it's free I would love for anyone that is interested in messaging or things they might be to take a look at it give me some
feedback I've already had a lot of great stuff and I'll be updating it soon so it's one small plug so what I wanted to talk about was really a lot of things that have been of interest to me over the last few years I've been looking at how you
can do prediction from the point of view of machine learning and support vector machines document classification things like that so build a model and then make a prediction based on it and I started looking at messaging systems I started looking at fire hoses
and event based systems and looking at how you could make decisions now and what I got interested in was this concept of maybe combining the two in some way trying to make those kind of decisions in near real time based on data as it was moving rather than
data kind of in a bucket we're really good at analyzing data this in a bucket we've got bases where you can do I mean incredibly complex queries and you can do them very very fast even a traditional relational database your mysql and your post grocers
and all the commercial ones you can you know whack a bunch of SSDs and a lot of ram in a machine and you can do huge queries really quickly if your curries are two huge you can use Hadoop and things like big query and all these kind of amazing big data tools
we have and you can process some seriously large volumes of data and you could do that relatively fast but it still takes time it's still you put the data into a bucket you do some analysis and you look at him and then when you look at it what you're
almost always doing is trying to decide what you should do next you're trying to make a prediction about how things are going to happen so I wondered could we take this stuff that we have with messaging systems with cues with real-time systems can we do
some of our analysis as it's going along and that's the kind of idea that I wanted to look at on the kind of basis for for looking at this talk I think you can I think we have to think about data in motion and we have to think about that motion in
terms of the movement of events an event is real simple it's just a programmatic representation of an occurrence out in the real world or in another system it's a thing that has changed and from my point of view all an event is is a tuple it's
a list it is a list with some elements in it ABC that's fine it could be a complex object it could be a string it could be whatever but it's effectively a list that's all we need to know about it to actually start doing things with it but an event
on its own is not interesting it's like a it's a single record of a database table there's not enough of it where we start getting interesting is when we think about an event stream and red stream is just a series of related events now they don't
have to be related you could have a sort of non-homogeneous of industry mother it's not so common and it's not what we're really focusing on here so an event stream is the same kind of event a series of them that has occurred over time with some
kind of ordering like a stock market price so that would be the the code for excellent and that's their price at various points you can see that over time it doesn't say anything about what that time is it doesn't say anything about how that's
been generated just that there is a stream of events when you have multiple streams of events that are related in some way you have an event cloud an event cloud is not just a set of streams but a set of streams where there is some connection between them
so a stock market itself is an event cloud we think of something like a index so like this is the price for the footsie 100 what that represents is a series of streams of the prices of different stocks are there independent you know if one company goes out
and does something amazing there may be their price will go up if they do same bad maybe their price will go down but changes can affect all of the stocks in an index some of them it can cause some to go up and others to go down and there's a correlation
as a relation between them so there's certain things you can extract like if you have a bad economic outlook maybe everyone goes down if you have a index tracker then how they work is by investing in stocks in rough proportion to how much of an index they
make up so if i stop becomes less valuable makes up less of an index then maybe that index tracker will sell it and buy other ones and that causes a change it causes a rippling change outwards and analyzing these things is therefore pretty complicated if you
have a cloud of events trying to extract useful things from them it's hard which is why a field called complex event processing developed complex event processing is a world of kind of big chunky services some are open sources Esper which is an excellent
open source project you should check it out designed to do clever queries across a series of events dreams so you could do something like look for instances where a item has been placed onto a shelf so it's been scanned on a shelf in a shop it's been
it's gone out the door which you've tracked with an RFID going the door sensors but it hasn't gone through a check out so it's been nicked that you can do as a query and you write that in either a kind of sequel light language or a lot of cep
systems you have a visual query builder because they're just complicated they're too hard for humans to reason about to kind of easily code those kind of things up will do them in a more straightforward actually needs a dragon so you can see the flow
of events to actually build queries so these are no big generally expensive systems from people like informatica and tibco and huge names like that and one of those systems was a chemical stream basin this chap dark NS were there and if you know the London
Erlang or Java or these days node communities or many others you may have encountered that it's a really really smart guy really nice guy and he when he left string bass started looking at the idea of a vent process and in complex event processing and
realized that for all the complexity in there there's a good amount you can do with just a tiny bit of the work you can kind of do a Pareto principle thing and say I can get eighty percent of the ability to look at all these streams and do something with
just twenty percent the work so he did that and he built a little library on nodejs called EEP embeddable event processing and what this was was looking at all the parts of a CEP system of an event processing system and identified what bits are straightforward
to implement and therefore what bits can we do really fast because one of things node is good at is asynchronous callback based event driven applications it can do those really quickly and he figured if you can do something very very fast that's almost
[ ... ]
Nota: se han omitido las otras 4.119 palabras de la transcripción completa para cumplir con las normas de «uso razonable» de YouTube.