So this is more of a rant than a tip, but believe it or not it comes up VERY often with users that are new to directory software like IndexU.
A user pays $99 for a copy of IndexU and another $65 for a copy of the Extreme DMOZ Extractor and since they CAN extract all of DMOZ they do.
Then they somehow manage to find a way to upload some of that data to their server and it dies, badly.
The complaints after that are “It’s IndexU’s fault because of this or that” and they blame it on various things.
The issue is not whether IndexU can handle the data, rather it’s can your server handle the data?
I don’t give a hoot if you have a dual Xeon with 4GB of ram and a 300GB hard drive. There is a LOT more to webhosting than just space.
These issues are things like disk performance, I/O, optimization and throughput. You cannot honestly expect a webserver (lets be honest, it’s almost the same as a PC) to run one million queries in less than a second so you can update your site quickly.
I don’t care what you think, it’s not possible because of the HARDWARE. Anyone who thinks they can run a mirror of DMOZ just because they have a hosting account or a dedicated server is just fooling themselves and proves that they know absolutely nothing about what they are doing.
Add to the fact that DMOZ already exists, and most major search engines use the DMOZ data anyways. You would be doing nothing more than getting yourself banned from search engines for duplicate content while adding your nightmares to my support tickets and emails.
Forget mirroring DMOZ, and make something unique.









