July 8th, 2024 × #SEO#Sitemaps#Web Development
Perfect Sitemaps for SEO
Wes and Scott discuss why you need a sitemap, what should be in it, and how to generate and submit it properly for SEO.
- Wes is back from paternity leave
- Scott almost missed the boat to a conference
- Scott built a new website and realized he needs to optimize for SEO
- Scott wonders why we need sitemaps and what should be in them nowadays
- Sitemaps help search engines index and monitor pages better
- Sitemaps don't help ranking but help crawlers find relevant content
- Sitemap formats: XML, RSS, plain text file
- Plain text sitemap is just URLs, one per line
- Sitemap can be named anything, not just sitemap.xml
- But sitemap.xml is a standard worth following
- XML sitemap is most flexible and allows more metadata
- Last mod date is the only sitemap field search engines use now
- Bing uses sitemap priority but Google ignores it
- We should update the Syntax sitemap fields
- Getting all Syntax content indexed recently got much easier
- Parameters and future/unpublished pages should not be in the sitemap
- Only published, non-redirect pages should be included
- Ways to generate a sitemap: meta framework built-ins or custom route
- Hand-writing a large sitemap takes too much effort
- Store sitemap pages as data objects first before outputting as XML
- Validate sitemap with online tools before submitting it
- Submit sitemap to Bing and Google webmaster tools
- Cache sitemaps to avoid heavy DB and bandwidth loads
Transcript
Scott Tolinski
Welcome to Syntax. In this Monday, hasty treat, we're gonna be talking about how you can have the perfect sitemap for your application. So that way you can, land some SEO scores that are awesome, but not only that, get all your pages listed accurately and all the things we love about search engines. My name is Scott Tolinski. I'm a developer from Denver, and with me JS always is Wes Bos. What's up? Hey. Hi. Good. I'm kinda back. I'm I'd I recorded 1 before this, but I was on, paternity leave for a couple weeks. And,
Wes Bos
shout out to CJ for filling in there. But it feels good to be back in the horse
Wes is back from paternity leave
Scott Tolinski
and, recording back in my office. It's real nice. It does feel good to be back in the horse. I also, am back. I I was, I did a so I was at JS Nation and Yeah. How's that? Amsterdam. Credible. Amazing conference. Maybe maybe the best conference I've been to in terms of, like, organization structure and all that stuff. 1 of the things I really love for me as a speaker is they they put every single thing that I was supposed to be a part of on my Google Calendar. That way, I was just like that. I I had everything I knew where I was supposed to be at all times. And, for somebody like me, that's, really, really necessary.
Wes Bos
I love that because you don't have to fight with the time zones either. You just, like, you just look at your phone. Where do I need to be right now? Although Wes I was there, I did I literally missed the boat.
Scott almost missed the boat to a conference
Scott Tolinski
So that because there's a there was a boat to the boat, and I missed the boat to the boat. Like, I ran up to it, and everyone's like, they were like like, it was like 1 of those cruise ship things. They're, like, leaving. But, luckily, 1 of the attendees there had a car, and he's like, don't worry. We'll drive to it. So we, we we drove to the big boat. I honestly wish I would have missed the boat because that boat was hot, and there was a lot of people in it. And it was like Yeah. It felt like ESLint. Yeah. It felt like it was going to sink. I was very, I'm very nervous. But, you know, other than that, it was awesome, man. And then we got to explore Amsterdam for a week, and then we flew over to Italy and went to Venice and Florence. Like, I did the trip with my kids, so it was, like, a really cool experience for them getting to see all that stuff. Awesome. Yeah. But, also, 1 nice thing about being back is that I get to see all of the errors that occurred while I was gone in our century, which that always happens. So, if you if you wanna be able to track all the stuff that happens while you're away, so that way you can fix it when you get back or maybe get the alerts while you're on vacation and have them go somebody else to have them fix it. Head on over to century. Io. It tracks all your errors and the issues in your application, and it really make solving bugs easy JS really what it comes down to. And this show is brought to you by Sanity. So thank you so much, Sentry.
Scott built a new website and realized he needs to optimize for SEO
Scott Tolinski
So let's talk about site maps. I, while I was on the plane to the conference, I decided to build a new website for myself as part of, like, my my talk, and it's not done. It's they're still dusty. There's still a lot of, like I haven't even touched the mobile layout for this thing yet. But what I wanted JS I wanted to have all of my demos be, like, embeds in this website. And Yeah. I've never had a, like, a blog or anything. My my websites have always just been very simple. And this time, I was like, well, if I'm gonna do all these demos for the talk, they're gonna all be embedded iframes. Let's kinda have this be a a blog of some kind. So I made it a blog. And the next thing you know, I'm thinking, well, now I got AAAA blog.
Scott wonders why we need sitemaps and what should be in them nowadays
Scott Tolinski
I should be really thinking about SEO on this thing. So sitemap time. And then it got me thinking, what about sitemaps? It's something that I've kind of just done on every site. I have a a site map sort of route for SvelteKit that I just port over to each application Yeah. Or I I run a generator or something, and I just I just use a site map.
Scott Tolinski
And I I began thinking about the individual fields. Like, what's actually necessary? What do I need in here? And how is the how JS it changed over time? Like, what is a site map doing for me? So I I figured this would be a really great episode to just get into all about site maps. Why do you need them? What can you do with them? What do you even need within a site map in Wes 2024?
Wes Bos
Yeah. That's good. And, you've built the initial site map for the syntax website as Wes, and that was really nice because I've been going through not anymore, but probably over the last 6 months, I've been watching the Google Webmaster Tools trying to get our content indexed. Since we made, like, a pretty major shift in terms of, like, additional pages, there was a lot more content on the website from the old 1. It was kinda interesting to see, like, how do you tell Google, hey.
Wes Bos
There's now, what, an extra 1500 pages on this website.
Scott Tolinski
Yeah. And that's really the answer JS the sitemap. So you might be wondering why why do I need a sitemap? What does it do for me? How does it improve SEO? All of those things. Well, 1st and foremost, a site map really allows for better indexing and monitoring of your indexed pages.
Sitemaps help search engines index and monitor pages better
Scott Tolinski
So it lets the search engines know exactly the map of your site, hence the name site map. It lets it lets the search engines know the structure of them. And in turn, it allows the search engines to also be aware of potential changes and updates to certain pages and when to essentially relook at those pages.
Scott Tolinski
And with that improved idea of structure for your application comes along better SEO. Now a site map itself doesn't, like, help with the ranking itself. Right? It's not going like, having a site map isn't just going to get you ranked high, but it's going to help the crawlers find the relevant content on your applications
Sitemaps don't help ranking but help crawlers find relevant content
Wes Bos
and understand, like, what the general structure of your application is without it having to guess. Right? Yeah. Yeah. You can't use it to trick Google into pages that are not linked from anywhere. Like, Google still has to be able to find that this is a page you're telling me about, but where have you linked from it? Right? Is it is it being linked from another website? Is it being internally linked from inside of the page? Like, for for us, it was the transcript page, which is Mhmm. It was a brand new page, and I wanted all of that to be indexed because it's a lot of lot of good information. And that's, like, very good for SEO if you're searching for a specific topic. In fact, I find that when I Google for a specific syntax episode, I'll often the transcript page will actually come up before the actual show notes page because the transcript page has literally every word we've we've spoken inside of it. Yeah. But, initially, I had a hard time getting those, like, indexed from Google. And, it was a mix of, like, how often is it updated, should I be crawling this page, and all the stuff we'll talk about today.
Scott Tolinski
Yeah. Totally. So 1st and foremost, let's get started. What kind of formats do you need, or can you use for a sitemap? You know, I always personally just thought it was XML only. And looking into it, you can I'm surprised to see this. So yeah. So according to Google and all the information out there, you can have a sitemap that's in XML, like what most of them are. You can also have a site map that's just RSS.
Plain text sitemap is just URLs, one per line
Scott Tolinski
So if you're publishing an RSS feed of your entire application, you can use that as the site map. And 3rd, you can have a text file, a TXT file, which is literally just 1 page per line.
Scott Tolinski
Incredibly simple. Right? Just straight up 1, full URL per line. That's it. That works as a site map. That's a totally valid site map. And, what's the file called? Sitemap dot txt? Correct. Yep.
Wes Bos
And it does does sitemap have to be I might be getting ahead of us ourselves right now, but does a sitemap have to be named sitemap.xml? Or That's just a good question. Like, a meta tag that you can you can put?
Sitemap can be named anything, not just sitemap.xml
Scott Tolinski
This is 1 I have to Google. I did not even think about this. I it's something you just always do. You know? Which is funny. You can name your site map anything you want according to Google. You just have to when you submit it to the search engines, developer consoles, or whatever they are, search consoles, you just specify the path to it. I had no idea. I would have assumed that you would Node to name it sitemap just based on the convention. But according to Google, you could name it anything you want, and you can place it anywhere on any route. So it doesn't need to be just forward slash sitemap.
Wes Bos
That's really handy if you have for whatever reason, you don't have control over top level routes, because your application doesn't allow you to do that, it would be nice to to be able to do that. I probably would still try my darnest to make it sitemap.xml because it like robots Scott txt, it's a standard
But sitemap.xml is a standard worth following
Scott Tolinski
that There's no reason not to follow pnpm. Wes absolutely can't. Right? Yeah. So on top of that, you might be wondering why. Would I use Scott text or RSS or XML? You'd use RSS if you already had an RSS feed for your entire application. It exists. You don't need to make another 1. XML is probably what you're going to go with anyways because JS you'll see, it allows for more information. It also allows for indexing of, like, media itself Wes you can't do that with the text file.
XML sitemap is most flexible and allows more metadata
Scott Tolinski
The types of things and we'll even get into after this, like, which additional metadata even makes sense in a, site map today. But XML is you know, it's the most verbose of these options probably at maybe other than RSS. I mean, a TXT is, like, super simple. But pnpm XML gives you flexibility to be able to add more information, so you're probably gonna use XML anyways.
Scott Tolinski
Apparently, the limit for a sitemap is 50, 000 URLs on a sitemap and 50 megabytes. If you go over 50, 000 URLs, you can create a multiple different sitemaps. You can have Sanity sitemaps.
Wes Bos
Yeah. If you if you go and peruse I I do this a little bit myself to find unlisted URLs on websites.
Wes Bos
Like, my my wife was really excited about this dress coming out once, so I wrote a little scraper that would download the site map every so often, and it would the site map often lists even all the images that were uploaded to the website, all of the pages that are on the website. And, often, those pages are public, but they're not linked anywhere just yet. So it's kinda security bay by obscurity. So you can download the site map and, and see all of the pages of the website, and you can sort of peruse through that looking for unlisted pages.
Wes Bos
But often especially with, like, Shopify websites, you'll find Scott of like an index site map, and then it links off to tags site map and product site map and blog post site map. Each 1 has their own site map.
Scott Tolinski
Yep. Yeah. And that that's particularly useful when you have a whole lot of content.
Scott Tolinski
You also want for your URLs in a sitemap to be fully qualified, which means you need the whole dang URL.
Scott Tolinski
Relative URLs aren't gonna cut it. You want that HTTPS in there. You want the whole deal. You want that whole dang URL to make sure that it that that's what it's looking for. Right? It's not looking for anything relative here. Let's talk about fields and metadata for XML based, site maps. And then I'm gonna got a pop quiz for you, Wes. We have several different potential metadata fields. Let's say priority, change frequency, and last mod.
Last mod date is the only sitemap field search engines use now
Scott Tolinski
Which of those actually matter in 2024?
Wes Bos
Priority, change frequency, and last modified.
Wes Bos
I would say, like, priority doesn't matter because the days of telling Google what's important are are are over. They can figure that out themselves.
Wes Bos
I'm gonna say frequency is is important because if you have a page that is frequently updated, that needs to be reindexed every hour or something that's like a blog post and you'll never update that again The answer is is that change frequency
Scott Tolinski
is completely ignored by both Bing and Google.
Scott Tolinski
Priority is also ignored by Google JS you mentioned, but but Bing does use priority.
Bing uses sitemap priority but Google ignores it
Scott Tolinski
So it does kind of make sense to still use priority. It doesn't make any sense to use change frequency, but, apparently, Bing does use priority. Google ignores it.
Scott Tolinski
Last mod, though, is the 1 that all of them care about.
Scott Tolinski
And it only matters last mod only matters if you're using it consistency consistently.
Scott Tolinski
So, like, let's imagine every single time you update your entire application, the sitemap rebuilds and sets all of your pages to be, last mod today. You know? Okay. Then that's not good. You really want it to truly be when this page was last modified for Google to actually, use that in an important way to know which content has been recently updated. So, yeah, this was new to me because I I personally used, priority and change freak, but did not use last mod. So I was, like, using the 2 that didn't matter. Oh.
We should update the Syntax sitemap fields
Scott Tolinski
What about the syntax website? Maybe we need to change that? I believe the syntax website has priority, and that's it. Maybe it does have change frequency because we should definitely change that, though. It's a bit of a a bit of a pain to do that, but it we should definitely change it. It's changed frequency and priority
Wes Bos
are on the syntax 1. And you're telling us we you only need last mod?
Scott Tolinski
Yeah. And even then last mod is like you know, it only helps if you have content that is changing. Right? Yeah. If your content is published once and doesn't change, last mod isn't really helping you out that much. Okay. Yeah. So maybe maybe just nuke them all and like, be because we've been going through
Getting all Syntax content indexed recently got much easier
Wes Bos
the syntax website, and it's crazy looking at the webmaster tools both, like, over the last 6 months, a year, getting all of the pages indexed and finally to a point where Google knows about every single page. Because, like, even when we migrated, there was a point where, like, you couldn't find specific episodes on Google. Like, it was not finding at all, so we had to really work at that. But the Google changed their algorithm recently, and I I posted a tweet about this. We've mentioned it on the last episode as well.
Wes Bos
The amount that we're showing up on search results JS just we went right up with that algorithm change. So you Sanity.
Wes Bos
We're not even doing the best practice here, and Google's, like, obviously showing our stuff a lot more frequently,
Scott Tolinski
recently, which is good. And you're not gonna get docked for using priority and change frequency. It's just that, Google does not care about them. Priority, like I said, Bing Bing has it, so you might as well have it anyways. What types of things shouldn't be in your sitemap? Typically, dynamic user pages and account pages, so, like, my account or, my account for like, the stuff that's not going to really need to be indexed. Right? You you don't want a search engine to take you to the my account page because that's something that's hidden behind the login wall. Right? So any dynamic user pages shouldn't be in your sitemap. Yeah. URLs with parameters. So parameter based URLs should not be in the sitemap because Yeah. If you think about it, parameters can be anything. Right? That's another 1 we had to figure out with the canonical
Wes Bos
URLs on the syntax website is we have forward slash shows, and that needs to be indexed.
Parameters and future/unpublished pages should not be in the sitemap
Wes Bos
And then we also have forward slash shows and type equals hasty, tasty, or supper. That needs to be indexed.
Wes Bos
But the pages of every single 1 of them, like page 1, page 2, etcetera, those don't need to be indexed because well, no. The pages do need to be indexed, but the some of the search filters do not need to be indexed. And I remember I had to write a very complex thing to sort of figure out what the canonical URL was because there's unlimited combinations of the query params of, like, pages, how many per page. That was the other 1. And a couple other filters that, like, there's there's unlimited. And if you go into the Google Webmaster Tools, it says something like 6, 000 pages are not being indexed Yeah. Because you told us not to.
Wes Bos
And I was like, good. Like, those yeah. We don't want you to index the page 4 of 15 per page.
Scott Tolinski
Yep.
Scott Tolinski
Redirects also should not be in your sitemap or duplicate or disallowed pages,
Wes Bos
things that you have, like, being blocked in your robots Scott txt. I got 1 more here, and this JS, a problem we had is these shows.
Only published, non-redirect pages should be included
Wes Bos
The basically, the way that we create our site map is we just query the database for all the shows, and we query the database for all the guests. And, basically, anything that's a page, we just query it and and use a function to generate the URL for it. Right? But in that case, we were we forgot to filter out for future shows, and it was telling Google, hey. There's a page here.
Wes Bos
And then Google would go to that URL, and it would find this page is coming soon.
Wes Bos
And that was a bit of an issue because when it was published, then Google would would not know about the content until it eventually crawled that page again. So we had to filter that out and say,
Scott Tolinski
only show pages that are obviously in a published state Yes. And not Node future 1. Yeah. Right. That's a a good good 1 here. So, practically, how do you make a sitemap? Well, there's lots of ways. If you're using a meta framework, many of these meta frameworks these days have, like, a generator plug in that will just do it for you. I know there's certainly 1 for SvelteKit, and I can't imagine there's not 1 for Next. Wes. That seems like something that'd be very obvious to have. So you could just Google Next. Js sitemap or any of that. And and oftentimes, what they are Deno, the way they work JS that it's a part of, like, a build step. You put in, like, a post build script. Once your application is built, it would run a post build script. It would scan the routes the way that the application works, and then it would publish your site map for you. It's hard to do with the last modified. It was the, yeah, the last mod. So that 1 might be 1 that you might have to, you know, work in how you do it dynamically. That gets a little bit easier in, like, a CMS based site if you have that kind of metadata property on your field.
Ways to generate a sitemap: meta framework built-ins or custom route
Scott Tolinski
The way that I've always built the site map, in I I don't personally feel like I need these generators, although the generators are great because they can, save you from missing things.
Scott Tolinski
But, essentially, you're just you're just creating an XML file. And the way that I've always done it is just have a route that returns an XML file that I've created as a string. So you create the string. You load in your your collections from data. You loop over them, and you output those routes. In that way, you have full control over it. You could even insert the last mod property if you have a updated at field in that data, and then that becomes dynamic and easy for you to do that instead of these generated ones which have a little bit harder time with that type of thing. So that's all I've always done it. But, again, the generated ones are nice and simple and make sure you don't forget anything, or you can create it yourself. I I don't recommend just making these by hand. That would that would take forever.
Scott Tolinski
Even, like, even my website, Tolinski, has, like, you know, like, 70 links or something on it, 70 pages or something already, and I just made the dang site. So it it just feels like that would be a hard thing to do. So find find a generator or a way to do it that's baked into your platform. Typically, every platform has a way.
Hand-writing a large sitemap takes too much effort
Wes Bos
Yeah. Like, they they have this concept of pages. If it's, like, totally from scratch, like your personal website or, the Syntax website where, like, there's no concept of a page, right, You can just like Scott said, you can concatenate a string and throw it out the door. I would probably keep a array of pages and just store them as, like, objects and then grab some sort of, like, JSON to XML plug in off NPM and then convert it out the out the door.
Store sitemap pages as data objects first before outputting as XML
Wes Bos
Sitemaps are pretty simple, so I don't know if that's if that's overkill versus just concatenating a string or not. But when it comes to, like, oh, did I already add this 1? You Node, does this URL exist previously? Well, let me search for it in the array.
Wes Bos
If that's the case, then it's it's sometimes nicer to to deal with, like, a actual data object first. And then right before you kick it out the door, convert it to XML because that sucks working with XML.
Scott Tolinski
Yeah. For sure. So what do you do once you have it? You validate it. There's ton of these validation things online. You just do site map validator, paste in your site map link. It'll typically tell you if something's broken in it. Even if you submit it to webmaster tools, it will tell you what's wrong with it.
Scott Tolinski
And then after you have it, you'll want to submit it to your search engine so that way they become aware of it. So the the 2 ones that you really need to worry about are the Bing webmaster tools, bing.comforward/webmastersforward/about.
Validate sitemap with online tools before submitting it
Scott Tolinski
We have the link in the show notes or Google Search Console.
Scott Tolinski
Both of those 2 are are are the 2 big ones. Right? Because I believe DuckDuckGo even uses Bing's, search.
Scott Tolinski
If if I'm if I'm not correct about that, please somebody update me. But either way, these are the 2 places you wanna create an account, submit them. I know Google, requires you to have a text record within your DNS, to connect your site, then you submit the site map. Bingo. Bingo. It is all good. And it typically tells you again, like, which pages are being indexed, and and it gives you some good feedback on your application. So if you have anything that's a long running website that you want to be indexed, having these, webmaster tools and Google Search Console are are 2 things that you're gonna want to be familiar with regardless. You you know what's 1 thing?
Submit sitemap to Bing and Google webmaster tools
Wes Bos
I was just looking at our search console, and it says discovered videos.
Wes Bos
That's probably worth doing. I always I often wonder that. You Node, like, you go to the video tab of Google search, how to get your video to show up on that tab. I think I thought it was a mix of, like, the proper XML or or that what's that? LD JSON? Yeah. JSON LD, which is for linking data. That's used sort of like a meta tag. But instead of putting it in the head of the document, you simply just dump the JSON into the body, and Google will pick it up there. But it looks like you can also there's also specific video tags for sitemap.xml, which will tell Google about videos, which is neat.
Scott Tolinski
Sick.
Wes Bos
Cool. Yeah.
Wes Bos
1 more tip I have here is cache them. Your sitemap can be 1 of the largest files that is accessible to your website. And if they are generated on demand, that can be very taxing on your database. Yes. If it's it's literally querying every single record in your database in a lot of cases and looping over it or at least pages. And and then that file itself is is fairly large because it's it's all text. Right? And it's possibly an attack vector against your bill, both your database bill as well as your if you're if you're using something like, a render or a Vercel to to generate the site map .XML and you don't have the proper caching headers on those, then that could be somewhere where somebody could just continually hit it, and it will it will cause a very large bandwidth bill on your end. So throwing caching headers, putting a CDN in front of it, probably a good idea.
Cache sitemaps to avoid heavy DB and bandwidth loads
Scott Tolinski
Sick. Cool. Well, that's all I got for site maps. If y'all have any additional tips or tricks, leave a comment in the video below. But that should be enough, and you can find this video on YouTube if you wanna leave that comment. Youtube.comforward/adsyntaxfm.
Scott Tolinski
That's all I got for you. Thanks so much for watching or listening.
Wes Bos
Grab a t shirt. Century.shop.
Wes Bos
Peace.