Caching Techniques – Web Development

Caching Techniques – Web Development


Okay, I’d like to go over each of the approaches we’ve talked about, and kind of the, the different properties they have. So the first approach was no caching and, and this meant on every page view, we we’re doing a DB read, and when we submitted a new piece of art, there was no DB reads. Then we did this kind of what I would describe as the naive caching approach. Which was basically have the basic cache if the cache is empty, do the DB read, and if it’s not, return the result. And what we’ll, we’ll add a, a third column on the edge here called Bugs. So the naive caching only does a DB read on a cache miss and doesn’t do any reading on a submit, and is full of bugs. [LAUGH] Or at least has one bug, where the, the front page would become out of, out of date, the front page would become stale, so then, we started clearing the cache. This has the same property of doing a DB read on a page view, no DB read on a submit and no bugs. Then, we improve to the kind of the refreshing of the cache. And so this means we’re no longer doing and DB reads on page views, or very rarely, basically only the first time our app turns on and the cache is empty and that first page view. Every other page view after that is cached, which is a really nice property to have. And we’re doing one DB read per submission, and it works. Now, the difference between, between these two and this one, is the notion that a page view doesn’t hit the database hardly ever and that’s a really nice property to have. You should always be striving to have the situation where a normal, un-logged in, basic, you know, viewer of your website, doesn’t touch the database. And I’m going to kind of condense that down to the notion that simple users shouldn’t touch the database. Basically, you know, they’re just lurkers, they’re just reading. They’re, they’re not changing the site so they shouldn’t be touching the database. Everything should be cached and ready to go for them. That makes the user experience better because the request will be faster and it makes, keeps your load down because you can add many, many of those users. And because they’re just bouncing off the cache you don’t have to do very much work to serve them. You don’t actually have to hit the database now, there’s a fourth approach that we didn’t implement yet. Which is the most aggressive of all of these, and I’m going to kind of refer to this as distinct from refreshing the cache, I’m going to call it updating the cache. And I’ll, and I’ll talk about this approach in just a sec and we can get to the state where on a, on a page view, on a simple page view, we do zero DB reads ever and this is slightly better than rarely. And we don’t do any database reads on submission either, and it works. This is a really nice property to have now of course, we still do our database right. You notice we haven’t been optimizing rights at all, because you’ve got to store this, you’ve gotta store the submission at some point. But you can cut down on the data base reads lower them to both zero by keeping your cache completely up to date, and I’ll show you how we might do that right now. Okay, we’re going to look at this picture one more time here. We know all the pieces now, the user, the ASCIIchan the database, and our cache. We’re going to talk about a new situation here. So let’s talk about, let’s pretend our cache is already warm. You know it’s got some pictures in there, and reading the front page which hits our cache which you know, returns the result that we send back to the user. Nothing too complex there, we’re not hitting the database because we’re only doing reads. Now what happens is when we use a database write, we’re going to send that right to the database, or simultaneously going to send that right to our cache as well. And so this, this gets a little bit more complex, we’re going to send to write to the database, and instead of immediately rereading from the database to update our cache or clearing our cache, we’re just going to update the cache. We’re going to say okay, this affects the front page, so let’s find that front page cache, insert our new piece of you know, ASCII art into the cache, and then from then on, that follow up request that, that redirects, that follow-up request to slash is basically going to bounce right off the cache again. So, we never did, we never did a database read during this whole process, we’re just writing only. The only time we would do a database read is when we that start up the app for the first time and do the first request, or maybe we have a program that does that for us. So no user ever does a database read. This is exactly how we do it on Reddit now. Every listing you can look at is stored in it’s own cache. And when you submit a link or do a vote we update all of the appropriate caches, all the different cache keys that may be affected. So, so it kind of introduced the trade-off here of complex inserts versus database reads. On reddit we actually do this, we have a different cache key for every listing you might look at, for every sorting, for every subreddit, etcetera, etcetera. And when you submit a link or you vote, we have to update all of, all of the possible listings that could be affected by that action. On the flipside, you know, users are pummeling the site all the time and they never read from the Database. And you know, so we have complex inserts plus speed, which is nice but, you know, complexity is complexity. On AsciiChan we probably don’t need to do this right now our site just isn’t at that scale. It doesn’t, you know, a cache stampede isn’t a realistic threat because we don’t have that many users. But if we did, this is the kind of approach we’d want to take and, and so this is kind of the name of the game when caching. You know, if you want to keep this cache totally accurate without doing database rights, we’re going to have to do, you know, complex code. So, one thing to keep in mind the more accurate the cache, the more complex the code. And these are the decisions you’ll make as you build your website. And as you’re scaling, you know, this is probably the ultimate solution you want to look to when you’re kind of caching a database. You know, if the, if the solution works for you.

5 thoughts on “Caching Techniques – Web Development

  1. but how do you validate that the data is written in the database while a POST request is send…
    what if the data gets updated in the cache and it is never written in the database.

  2. I guess the DB write is async. what if the database write fails for a long time? does the system have a replicate cache nodes? does the cache has a status of persistence?

Leave a Reply

Your email address will not be published. Required fields are marked *