In response to our analysis of the recent site issues, the devs decided to upgrade the Amazon database server we use. This upgrade will double both processors and memory, as well as a switch from magnetic to solid state device for the first level of external memory. Hopefully, this will eliminate most of the gateway timeout errors we have been seeing, both for now and some time to come. I have to add, however, that we’ve also seen that some of the errors have a different set of symptoms, ones that we can’t clearly tie to the subsystem we are replacing.
Anyway, the Amazon control console says the upgrade is proceeding, with no glitches detected. The upgrade UI advised me that the period of outage would only last 2 minutes, followed by a much longer time of full functionality, but decreased performance, while the replication of the data proceeded. Since it has been over an hour now, that scenario obviously didn’t apply to us. I suspect that since we have been using technology that is probably two generations behind their current offering, we have to wait through a longer process.
I’ll keep you posted.
Not much to add. The Amazon console shows me some stats like I/O for the old instance, so I can tell that things are still happening. But there’s no clue as to how far along we are.
Progress! The good news is Amazon says the new database server is active. The bad news is that the website isn’t back yet. Undoubtedly, I need to update something on the main server so it can connect to the new database. Once again, I’m not sure how long that will take, since I don’t really know what I am doing.
Everyone: I think I restored access to the the (new and improved!) database, but I’m not sure. Please post here if you see issues.
Working fine now it seems! Much more responsive and submissions are working. Well done all.
Nice job! pages are loading much quicker.
9/28 7PM EST. FYI - Submissions had been working seamlessly for me, but I just had two submission hangups within a couple minutes.
Thank you for the report, Gerry. I checked the logs, and there was no gateway timeout error around that time, so your hangup had a different root cause. This is not a huge surprise, because while the vast majority of apparent hangups were timeouts because the database server didn’t process a query within 90 seconds, two other sets of symptoms have been observed.
I don’t know whether this is your kind of thing, but if you want to get involved specifically with helping track down the other causes, and/or more generally working to improve the Eterna website, send me a PM including an email address, and I will email you an invitation to join the Player-Driven Development Slack team.
(This invitation is open to any Eterna player, not just Gerry.)