Sign in to follow this  
Followers 0
refresh

Ongoing Server Issues...

17 posts in this topic

This is a quick appeal to anyone who experiences the dreaded "500" error, or sees a Sitelock "can't access site" error page to post in this thread as soon as possible after it happens.

We know the site outages occur when the 24:24 thread is posted. I've experienced the outage myself, but what I can't tell is if errors are happening on the site outside that short window of time when the 24:24 thread goes live. So it would help us to know if anyone has connection issues to the site at other times of the day.

What we believe is happening is one of two things -

1. The site is choking on the 24:24 post upload. It's several megabytes in size and is well within server resource limits, but can be slow to upload and taking up to 5 minutes to complete. Sometimes it stalls mid-way and knocks out FOH website access for everyone at Club Czar and us all here at FOH, only for the thread to show up and the issue disappear a few minutes later. Other times it gets through with no problems at all.

2. When the 24:24 thread is posted, we get a traffic spike. Again, the peak is well within server resource limits, but there's a very definite spike as the thread is posted. If traffic was to blame, I'd expect a steady ramp-like build-up as people progressively logged onto the site over 10-15 minutes, which is what happens, rather than an immediate spike we can pinpoint down to the second.

Regardless of the cause, the site issues appear to correlate with the upload and/or site stalling. What we don't know is if the upload is causing the site errors, or if it's the traffic spike causing the upload error, or a combination of both.

The issue for us is consistency - we can have no problems with 500 users online concurrently one day, then the site will fall over with only 250 online the next. On both occasions, most site visitors are here for the 24:24 thread and I expect everyone's trying to get in to view the offer. If it was definitely due to a traffic spike, surely 500 users would cause us more problems on a busy day than 250 on a quiet day, but it's completely random whether we have a problem or not.

Long story short - we know there's a problem when the 24:24 thread is posted. What we don't yet know for sure is why or what's causing it. If you could post in this thread if and when you have problems accessing the site rather than emailing Rob so I can see the timestamps, get a site notification and jump on to check it out myself, that would greatly help our cause.

Share this post


Link to post
Share on other sites

I start having issues 2-3 minutes before the Time when 24.24 should be posted.

I have a feeling that this is traffic issues, because I know in fact that sometimes thread gets posted 2-3 min early and its easy to miss the deal.

I usually start refreshing the page (F5) pretty often starting 2-3 minutes before scheduled post time and if everyone doing the same there is your increased traffic that brings down the server to its knees. I never had any issues with slow downs or not being able to load post other than 2-5 min before 24.24 scheduled time.

Share this post


Link to post
Share on other sites

That makes perfect sense, however 2-3 minutes before the thread goes live is also when it's uploading, which makes this difficult to pinpoint. Could be the chicken, could be the egg...

I've been on FOH with over 500 other site visitors and mashed the refresh button while waiting for the 24:24 thread to appear without any problems. Then the next day, half that number are on the site doing exactly the same thing and we'll get issues.

We've doubled our server resources today and will see how things pan out on Monday. If the issue continues, we'll look into further upgrades.

Share this post


Link to post
Share on other sites

I was experiencing the 500 errors again this morning at 10:34AM Brisbane time.  There was no gradual decline in performance - everything was working fine, and then it just stopped.  Even after the 24:24 posting had appeared on the forum, I experienced further timeouts trying to load the post itself.  

Share this post


Link to post
Share on other sites

 

1 hour ago, MrGlass said:

I was experiencing the 500 errors again this morning at 10:34AM Brisbane time.  There was no gradual decline in performance - everything was working fine, and then it just stopped.  Even after the 24:24 posting had appeared on the forum, I experienced further timeouts trying to load the post itself.  

Thanks for the update MrGlass, Rob reported the same thing this morning.

I think we can rule out the upload, the 24:24 thread was pre-loaded an hour early, then the site crashed when it was set live.

The click frenzy is still overwhelming 24 CPUs and 16 GB of RAM, so we have our work cut out trying to get enough resources to handle the spike.

We're investigating Amazon Web Services to improve our capacity to ride it out. Thanks for your patience everyone, we'll get there in the end.

Share this post


Link to post
Share on other sites

Tapatalk and regular web browser both would not load. On the web browser, I got the site lock page. Cant say exactly when, but it was right before and up to about 10 minutes after normal 24:24 posting time.

Share this post


Link to post
Share on other sites
5 minutes ago, jmg said:

Tapatalk and regular web browser both would not load. On the web browser, I got the site lock page. Cant say exactly when, but it was right before and up to about 10 minutes after normal 24:24 posting time.

That's our window of chaos.

We're pretty confident we know what we need to do now and a solution is in the works. As a Kiwi supermodel once said, "It won't hippen overnight, but it will hippen." Please bear with us for the next week or two (at a guess) while we sort things out behind the scenes.

Share this post


Link to post
Share on other sites

I typically get the error leading up to the daily posting. Then it settles down and goes back to normal.

It would be best if Rob would send the 24:24 to my personal email about an hour before he posts to the site. Thanks.


Sent from my iPhone using Tapatalk

2 people like this

Share this post


Link to post
Share on other sites

Im getting essentially the same page as colt45 on 3 differen computers. No acces for the last 36 hrs atleast. I am still able to browse on my Iphone though. 

Share this post


Link to post
Share on other sites

Since it went up I can't get the 24:24 pictures on the website or on tapatalk, other threads normal. When I select a photo I get a Sitelock message. Didn't stop me buying a box of Cazadores though...

Thunder & Lightening '75-'15




Share this post


Link to post
Share on other sites

Thanks gents.

The Sitelock page is our content distribution network unable to serve the 24:24 thread due to the volume of members madly clicking refresh to get in before everyone else. The Sitelock page won't display for everyone and if you see it, it should clear within a few minutes of the 24:24 thread going live.

Corylax and Akela3rd, hitting refresh should clear the Sitelock page, which may be stuck in your web browser cache. To force a page refresh, type Ctrl+F5 or hold Ctrl while left-clicking the refresh button.

Until we get more resources to handle the madness, this will continue if everyone keeps clicking the crap out of the refresh button when the 24:24 thread goes live. It spikes the concurrent CPU threads by 500% and overwhelms 24 CPUs and 16 GB of RAM. We doubled server resources on Friday and it looks to have made no difference. 

We're reviewing a proposal from our webhost today regarding migrating to Amazon Web Services to handle the 24:24 traffic spike and will get this fixed asap!

If there was a better way to post the 24:24 specials, we'd do it. 

Share this post


Link to post
Share on other sites

Just checking in to see how the site has been performing. AFAIK, everything's been pretty good for the last couple of weeks since we ditched the Sitelock CDN, which was causing us more problems than it solved.

The root of the server performance issue is not looking like it was traffic capacity when the 24:24 thread is posted, but a faulty MySQL configuration failing during the 24:24 upload. It also appears that MySQL wasn't setup to handle the number of threads we are tossing at it during the spike. Both issues combined to cause a nasty fubar 2-3 times a week.

We tested a lot of stuff during this ordeal to pin down the source of the problem. My gut told me something was causing a memory leak (we ruled out pretty much everything else) and I'm confident I was close to the mark.

We've got some server optimisations underway that should resolve this bottleneck and hopefully sort this out once and for all. Those changes may cause other unforeseen issues, so please let me know if you see anything odd going on in the coming days. There will likely be a short outage to reboot the server after MySQL is updated, but other than that it should be business as usual.

Share this post


Link to post
Share on other sites

Things are definitely better.  Look forward to the future optimizations!

Share this post


Link to post
Share on other sites

The site is still locking up badly just pre/post introduction of the 24:24 thread. Today was particularly slow, it took almost 2 full minutes from "click" to appearance. Just a heads up. 

Share this post


Link to post
Share on other sites

Cheers Cory

We were forced to do a few security upgrades from IP board which have thrown our previous fixes/patches asunder. Will have them sorted as best we can in the next week. 

1 person likes this

Share this post


Link to post
Share on other sites
On 4/8/2017 at 7:28 PM, El Presidente said:

Cheers Cory

We were forced to do a few security upgrades from IP board which have thrown our previous fixes/patches asunder. Will have them sorted as best we can in the next week. 

Security is tops, and a constant battle these days. It hadn't been anywhere near that slow for a while, so I thought I would make sure you got a heads up. Keep doing what you do! :thumbsup:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Recently Browsing   0 members

    No registered users viewing this page.

Community Software by Invision Power Services, Inc.