The official blog of PBworks

Archive for the ‘Geek’ Category

Browser trends

  • Filed under: Geek
Friday
Feb 2,2007

We aggregate all kinds of usage data so help us make better decisions about how to improve PBwiki. Here are a few graphs I found interesting as a web geek, enjoy.

Firefox
Firefox distribution

Internet Explorer
Internet Explorer adoption

Browser family
Browser family

Thursday
Jan 25,2007

We’ve been bringing up a number of new servers at the San Francisco data center. We’ve got some great Core2Duo machines which draw between 0.75A and 1.10A but have pretty substantial horsepower. So far so good, and almost all of the machines went in happy and stayed that way.

An interesting aspect of hosting servers at a data center on the west coast is that there’s plenty of space and lots of connectivity, but fairly scarce power. This is important for two reasons; first you need to be careful not to put too many servers in one cabinet or you’ll blow your 20A fuse all at once and all the servers shut off simultaneously. Second and nearly as scary is that you don’t get enough cooling from the building’s overtaxed and potentially under-sized air conditioners, and slowly cook your servers to an early death.

[We use a great system called syslog which collects all kinds of system stats and logs in one place for all of our servers - it makes it simple to collect and plot data like 'fan speed, cpu temperature, and cpu workload of all machines for the last few days, data points every 15 minutes'.]

You can see on the plot below that on first power-up, “sf3″ was running hot.

This is one of two machines where the factory hard drives were flaky and I replaced them with drives from another vendor. Because the new drives are full-on SATA and our cases only supply standard ATX-style 4-pin drive power, I needed a jumper cable. This just a few inches of 4 wires, plus connectors on the ends. Turns out I’d done two dumb things.

I left these jumper cables hanging down a little near the motherboard, an inch away from a fan exhaust. I’d also stashed the folded-up spare IDE cable in an evidently unused space within the case.

On power-up we find, as shown in the graph below, the CPU is running 30F too hot. Once I was home from the data center and had time to build the thermal report for all the servers the two machines with SATA jumpers stood out, sf3 was the extreme example but sf9 was running 20F too hot. A half hour drive each way plus a few minutes of fiddling with the cables and baffles and the cpu temp was back under control.

24 hours of temperature plots

PHP OO pitfalls

  • Filed under: Geek
Monday
Jan 8,2007

I should have picked up on this earlier, but PHP does some crazy stuff with objects. Because they’re not objects. Take the following:

  function _init_storage($options) {
    if(!$se = $options['storage_engine']) {
      trigger_error("Error instantiating MetaStash - no storage_engine" );
    } 

    $this->engine = $se;
  }

That looks fine, we’ve got some options defined at our constructor and it’s time to connect to a storage engine and blit out (or in) the permanent version of the data. But what if we want to change the storage engine object we were passed — this happens for PBwiki because there’s a notion of the expected place for some data and then the notion of ‘but right now it’s over here’ — as we transition from one storage infrastructure to another we’re going to have several reasonable places to look and the meta information here can indicate the canonical location for the data. So MetaStash calls a change_docroot() on the storage engine to make sure subsequent calls to it will source the right data. Turns out passing in the object as a parameter and assuming it’s treated like a pointer or handle is incorrect. Ugh. Passing an object happens by value – PHP makes a deep copy of the object, so any changes made during that call apply only to the copy.

The corrected code follows:

  function _init_storage($options) {
    if(!$se =& $options['storage_engine']) {
      trigger_error("Error instantiating MetaStash - no storage_engine" );
    } 

    $this->engine =& $se;
  }

What galls me is there’s not much warning about this in the docs and apparently no way to detect/trap this behavior at runtime. Ideally I want a development-only check which watches for this but so far no dice.

-Nathan (grumpy)

Under the hood

  • Filed under: Geek
Sunday
Jan 7,2007

What makes PBwiki tick?

For stability:
- Debian Sarge Linux – rock solid with very good package management
- grsecurity – paranoid security, the small overhead is worth the peace of mind
- MySQL – just works
- Apache – a known quantity

For speed:
- PHP – it’s fast and dead simple
- memcached – very fast, very stable
- eAccelerator – we have a lot of code to compile, and this extension speeds up time-to-first-byte by 250ms

For coolness:
- FCKeditor – the basis for our new rich text editor
- syslog – we funnel all of the system parametrics from all of our machines plus all apache and all application-level logging into syslog, one file per machine per day, all told it’s about 750Mb of data per day

Our next generation architecture (soon) will add:

- mogilefs – distributed file storage system that uses commodity hardware to automatically spread spare copies of files around a number of servers
- squid – caching reverse proxy
- pound – super-fast traffic director

And now you know.

-Nathan

PBwiki loves Core2Duo

  • Filed under: Geek
Sunday
Jan 7,2007

We’re working behind-the-scenes to seriously ramp up our server fleet. When David started PBwiki the core servers were a bunch of 2001-vintage VA Linux boxes from Craigslist — Essentially “San Carlos: 400 pounds of 2U servers $10 OBO, you pick them up tonight”. Since those heady days of off-brand ramen and Mountain Dew we’ve accumulated a few generations of machines, with a handful of 0.5U Rackable units still providing low-load services. Today the bulk of PBwiki’s heavy lifting is done by a few white label 1U P4-3Ghz machines with twin SATA drives. This piecemeal approach has produced two interesting outcomes: a very capital-efficient company and a very diverse server stack. We’ve got hardware from several generations of chipsets — bogomips ratings span 1×1300 through 4×6100 and RAM from 512Mb through 4Gb. We’ve got IDE, SCSI, SCSI RAID, and SATA, plus 8 different ethernet controller chipsets.

Normally this kind of diversity isn’t such a big deal — just install whatever OS and let the drivers fight for survival on every boot. We add to the challenge with our security policy, which has us using a monolithic kernel, patched with grsecurity and identical across all the production machines. We use Debian which makes this a bit easier but it’s still a lot of things to keep straight.

We’ve recently embarked on a project to build One Heck Of A Server Cluster with more or less modern hardware bought new. PBwiki is the baby sister and she’s about to get her own prom dress. The funny part of all of this is of course the orders of magnitude we’re talking about versus the first dot-com boom circa 1999 — We’ll spend less for ten of these new machines than a single Sun E420, and each one is easily five or six times faster than the one E420.

So we ordered up a sample machine which had chips we were familiar with and specs and a price that made sense. The machine was a dual core P4 3Ghz with a single 500Gb SATA drive in your average white label 0.5U case. One great thing about modern hardware is the monitoring you get — temperatures, voltages, and fan speeds are all measurable if you’ve got the right driver code compiled into your kernel (see above). After some yelling at ssh terminals I got lm-sensors working on this new machine and we got some numbers out. As an aside, we’ve set up syslog so every machine in our system sends all of its health, security, and operational log events into a pair of machines with big hard drives and we have an ongoing record of everything (logins, temperature changes, hard drive health measurements, etc etc) in one place. Very handy.

Since it’s a dual-core machine running one thread of “while(1) {}” will occupy one CPU, and two threads of the same will run both CPUs at full throttle.

Intel Dual Pentium 4 @ 3Ghz
Ambient air temp 65F
Idle CPU temp 131F
One CPU running 176F
Two CPUs running 186F

So — those numbers are terrifying. The machine at idle raised the ambient temp by almost 70F, and running full steam the delta was a full 120F. We looked around for a T-max-ever spec for these parts and the general consensus was that our numbers were a bit high but ‘yep those chips run hot, don’t they’. Unbelievable.

So we decided to give the newer Core2Duo chips a try. They’re more expensive and require a slightly more modern motherboard and so on which means added cost there as well. They’re clocked a bit slower in general but between deeper caches and some other differences the real performance is actually equivalent or better. This new motherboard required a bit more yelling at the ssh terminal to get lm-sensors working (spot the trend?) All in all the extra few hundred bucks will be worth it — check out the new test results:

Intel Core2Duo @ 2.4Ghz
Ambient air temp 66F
Idle CPU temp 73F
One CPU running 87F
Two CPUs running 98F

Yeah – with both cores of the Core2Duo running full-throttle the chip surface is 90F cooler than the equivalent Pentium 4. Sold.

The heat output is really important for us because the Bay Area is out of power. We’ve got enough fiber optic cable running up and down the peninsula that you can trip over it if you’re not careful, but the limiting factor for adding machines to our server cages will be how much power they consume and how much heat they put out — since any heat burned off by the computers needs to be cooled by big expensive air conditioners eating up more power than your server burned through to keep the whole thing from melting down. It’s really the difference between being able to rack 5 of the Pentium 4 boxes or 10 of the Core2Duo ones.

In hindsight this makes a good deal of sense given the data I’ve seen from the Mac crowd (myself included) — the previous generation of MacBook Pro machines were almost painfully hot in general use. I’ve got a current-generation unit (with basically the same chip as in the second test box) and it’s never more than warm, even running several virtual machines at once and doing all sorts of divx decoding in the background. Very impressive work from Intel getting these chips to run more efficiently.

-Nathan

Wanted: Javascript Guru

Monday
Sep 18,2006

PBwiki is looking to hire a full-time senior-level Javascript programmer.

(more…)

And there goes the continent…

  • Filed under: Geek, News
Tuesday
Sep 12,2006

Well, it looks like PBwiki is now blocked in China. Time for a party shirt. Maybe they didn’t appreciate my recent trip there, where I dressed up like the Red Army and had Chinese tourists gigglingly ask me for a picture. I swear governments have no sense of humor.

In all honesty, this is a little disheartening. PBwiki’s #2 most popular language after English is Chinese by a very good margin, due in part to our excellent internationalization support and full-text Unicode-compliant search. So without a great deal of cleverness and tomfoolery on our part, we may have to say goodbye to China for now. :(

The technical details follow, for the interested; traceroutes from two IPs behind the same router – one that runs the production pbwiki.com site (and is therefore blocked) and another that acts as a development box (and is not blocked). And yes, we’ve verified that people in China cannot access PBwiki.

(more…)

Interesting PHP Quirk w/require_once

  • Filed under: Geek
Tuesday
Sep 12,2006

So as many of you astute readers may have noticed, we use PHP on the backend. Nathan and I ran into a curious quirk today; if you do a require_once() in a function’s context, the variables set in the required file are not visible as globals once the function returns, even to functions in the required file.

The moral of the story? If you use globals in your includes, put all the require_once()s that you might need at the top of your PHP. Don’t try and be clever with conditional inclusion of your libraries.
(more…)

The PBwiki API

  • Filed under: Geek
Friday
Sep 8,2006

So we’re going to be working on an API for all you PBwiki coders out there to make it easy to pull PBwiki content into applications and other web services, push content from other sources into PBwiki, and get alerted when a wiki is updated. We’re just in the first design phases of rolling out the API – check out our official Developer Wiki at http://api.pbwiki.com/ for more information as it comes out or put in your two cents about what you’d like to see on the developer forums. :)