[Solved] URL not found (404): Googlebot not finding dynamic routes in a React application served using Amazon S3 and CloudFront

Google search console URL not found screen

You find out that Amazon S3 allows you to host a static website, CloudFront provides an amazing and insanely cheap CDN, and Amazon Certificate Manager (ACM) provides free public SSL certificates for your websites. You sign-up for AWS and get a 1-year free trial. Basically, one year of free static website hosting. Now, all you need is a static website that acts dynamic (with client side navigation, API fetches, etc.). You love and are good at React and think why not leverage your knowledge to take advantage of AWS's services. You use React, Redux, Redux-Sagas, React-Router (or similar combination of libraries) to build an awesome website. You put your build files in an S3 bucket, configure it to host a static website, use CloudFront for CDN services and Route53 to manage DNS. All is fine and dandy. Your website runs smoothly, you can navigate along the URLs finely and surf through your whole website without much trouble. You expect Google's search-bot to do the same. You go to your Google's Search console and try to request Google to index your website and all its URLs only to find that those dynamic URLs in your website cannot by discovered by Google. You see error messages like the one above.

A quick search generates numerous results, most of which revolve around finding out if there are any errors in your code caused by googlebot's website rendering tool not supporting some ES6 syntax and keywords. This involves going back to your code, adding error detection mechanism, displaying the error on the main website view, deploying this new build into S3, invalidating your old index.html file from CloudFront so that it can pick up the latest file, resolve the issue based on the errors you see in google-bot's render result, add polyfills, and so on. ...And all of this may not even solve your issue. The solution is probably quite simple.

Solution

When Google's bot tries accessing a URL, it sends the request of that exact URL to the server; in this case, CloudFront and S3. But since this is a static website, it doesn't find such URL. All the URLs in your application are dynamically generated on client side. Thus, Google's bot receives a 404 error. To solve this, you need to let CloudFront know that you want to handle 4xx and 5xx errors yourselves.

To do this, go to your CloudFront console and select the distribution for this particular website. Now click on "Error Pages" tab.

CloudFront error pages tab

Next step is to click on "Create Custom Error Response" button. This takes you to a page to create custom error response settings. Set the fields' values as shown below.

CloudFront error settings

Save the settings.

Now, your CloudFront distribution knows not to return 404 errors to anyone crawling a specific URL, but instead redirect the requests to your root application.