Situation:
Take a very highly successful osc LoadedV store that has been in business for 5 years and move it to a highly modified proprietary osCMAX derivative. Go from 5,000 to 30,000 products, change the payment processor, change the shipping system from ship by price to ship via UPS/USPS, change servers, move from subdirectory store to root directory store, and more.
Part of the agreement was that we could dump the old customers and orders tables .
What Happened:
We created a temp store and reloaded the old products and the new products from FOUR CSV files via a series of routines that duplicated what easypopulate does, but in sections since not all the data was in one file. We backed up the database, tarballed the structure and moved to the new server. We then modified the configure.txt files to use the IP of the new store and validated the PHP/MySQL installation. We ran test purchase through payflow pro test server. We got the new SSL certificate. We cleaned up and created an .htaccess to redirect from the /catalog subdirectory to the root.
We swapped out the domain name late on a Sunday night, the stores slowest time. We moved from payflow pro test servers to payflow pro live servers via the admin. We reset the configure.txt to use domain names and SSL.
What Bombed:
The first thing we saw was the site moved over and was functional. About midday Monday we noticed that no orders were being placed. We got a call from a customer who said they were getting a CC error after they submitted the confirmation. We moved to "under maintenance" moved back to the payflow test servers and tried again. payflow worked under test. We then moved payflow to live servers and it failed. After 14 hours on the phone with paypal tech support we determined that messages were not getting to their server, we had the customer start the process of opening an authorize account and continued efforts. About 10 minutes before we all agreed it was either up or we were moving to authorize we noticed that there was one location in the code with a hardcoded reference to the payflow test server. On a whim we rewrote the line to default to the live server. Payflow was functional.
At this point the server looked good and we decided to go live again. We took down the "under maintenance". It was now early Tuesday AM. Orders started to come in. We created a sitemap and submitted it to google.
Early Wednesday we noticed that CPU usage was spiking (via "top") and that mysql was running at 99% of the CPU. We loaded mytop on the server (that meant loading the cpan stuff that the server didn't have on it originally). At about noon we were able to investigate the transactions via mytop and saw that one transaction was killing the server. We killed that transaction and the server returned to normal. Investigation showed that a website with 30,000 products should NOT be running printable catalog. Printable catalog issues a select distinct query with a very complex set of joins. Even if you set via the admin to only do 10 items per page, it doesn't use a "limit" in the query, but queries the entire table and then uses the limitation value set by the admin to limit the for loop against ALL the products in an array.
By late Wednesday we had removed the link from the front page of the site and the site seemed to be working better. We continued to see spikes of 99% CPU usage so we used mytop to watch and see what was going on. We noticed the query being run and hunted it down and by mid Thursday we had determined the problem. Now, we use a proprietary version of allprods that among other things allows product selection of allprods by alpha (a-z 1-0), but it's default behavior without an alpha being passed in is to act like allprods normally does and return all products. Again, this would not be too bad if we weren't returning 30,000 products into an array in memory and then processing it. So, we modified allprods to default to alpha "a" if no alpha was passed in and that eliminated the spiking.
This takes us to Thursday. At about this time we started to see the CPU on mysql go back to 99%. A quick look showed that even though we had taken all links to printable catalog off the site, it was still getting run. We looked to see who was hitting it and it was googlebot (that sitemap we submitted was coming back to bite us on the ass). We edited the file to NOT run the query and return a message to run advanced search.
Later in the day we noticed sales were down and looked at google analytics and the log files. We started to see a trend in google analytics that was disturbing. Site traffic was dropping daily and traffic was now down about 50%. A quick check showed that google search referrals were down drastically. So, searches were started against the google for the site and it was found that the SEO enhancements were working for default site terms. The site had moved up to page 1 on almost 70% of the terms that we SEO'd for and was on page 2 or 3 for the rest.
Further analysis showed that for the actual products that had been in the store before, products that were not added in the upgrade, we had dropped off google completely in many cases. This was bad as the site had most of their products on page 1 when someone searched for that products name. Something we had done had pissed off google.
It was now midday Friday and we determined that the .htacess file was the problem. Since all the product IDs had changed in the reloading of the products when google was going to the old subdirectory and the rewrite took place, it went to a new product. This was causing google to trash all the product listings.
Throughout the weekend a series of changes to the .htaccess files were made, but all to no avail. By now traffic had dropped to 33% of the old site.
On Monday morning a copy of the old products table was made. in addition a copy of the new products table was made. The .htaccess file was modified to only throw a 404 not found and redirect to the new root index.php file if anything vcame in for any page that was not there (anything in the old sub-directory) the /catalog subdirectory was created and a new DB was created with the old products table and the new products tables loaded into the new database. Then a single file was added to the catalog sub-directory called product_info.php that contained the following code.
a check of the headers when a query went to the old product code in the /catalog/product_info.php was made at Check webserver resultcodes and HTTP headers :: oyoy.eu search engine tools and it showed that all worked well as long as the system didn't generate a empty new products_id. In that case you get the 301 redirect followed by a 302 redirect as the system can't locate a value for the products_id and it throws out the link to the product with the lowest products_id in the database.Code:<? $db = "hostmachine"; // if you're gonna use this code, change these 4 values :-) $dbp = "dbpassword"; $dbu = "dbuser"; $dbn = "dbname"; $db1 = mysql_connect($db, $dbu, $dbp); $conn = mysql_select_db($dbn); $opid = $_GET['products_id']; $query = "select p.products_id from products as p, oldproducts as op where op.products_id = '".$opid."' and op.products_model = p.products_model"; $getid = mysql_query($query); $row = mysql_fetch_array($getid); $newloc = "http://www.domain.com/product_info.php?products_id=". $row['products_id']; header("HTTP/1.0 301 moved permanently"); header( "Location: $newloc"); ?>
So, that's where the site is tonight.
Conclusion:
Now, most people will never do EVERYTHING we did all at once. This was a case where the site had no choice. Continued operation was reducing margins with the ship-by-price shipping model and since the site is a deep discounter there just weren't margins to continue to lose. But, if you are ever going to do any of these things, from moving a site from a sub-directory to a root directory, reloading products, changing to payflow pro, adding a large number of products, etc. There are any number of lessons here to be learned. My best suggestion is Do Not Do Them All At The Same Time, but even then, each of the challenges that we ran into here were unique among themselves and while no one single challenge was in itself terrible, the snowballing effect of multiple challenges all having to be cleared at once was extreme.
As for the site, we are waiting for google to hopefully some back since they got 302's when they were getting the .htacess rewrite and the changes we made should correct the google drops and get the site back to it's old positions on the individual products within a week or two.
Now, before you ask why someone would go out and post this horror story on-line, we have to go back to why I gave talks for years at DECUS (before most of your time - it was the Digital Equipment Computer Users Society, one of the largest computer user groups in the world, we had two conferences per year, one on each coast, attended by thousands of geeks like me, with some few like me actually giving sessions on what we learned the hard way so others wouldn't have to) and the WWW Conferences (yes I spoke/presented at WWW I in Boston and WWW II in Chicago). A great part of teaching others is to be able to stand up and tell them the errors you made and what you did to fix them, so that they won't make the same errors and so that they will learn how to look for something when they do make an error. We old grey hairs like to call these "learning opportunities". So, I hope that the dread of what you just read, and my last week, will help you the next time you move an active store. I know that I sure learned a great deal and will be more than ready to move my next one (which just so happened to be next month when we are moving a store from CRELoaded to our version of oscMAX).
All the best.




LinkBack URL
About LinkBacks






so endith the lesson



Bookmarks