Archive for the 'performance' Category

YUICompressor’s CSSMin

Wednesday, March 10th, 2010

Honored to be a part of the YUI project, I am now helping with the maintenance of the CSSMin part of the YUICompressor. My changes are now part of the trunk on github, so I'm official. Next on the agenda is documenting the thing, so that's what I'll try to do here, maybe in a few posts. You know, divide and conquer.

PHP, Java and a JavaScript port

Originally written in PHP by Isaac Schlueter and ported to Java by Julien Lecomte, CSSMin got a JavaScript port by yours truly some time ago. Because, after all, JavaScript is the language of the web, isn't it?

You can play with the latest git version of the JS port online here.

I'm also happy to report that the JS port is now used in PageSpeed and YSlow (as you probably know Firefox extensions are written in JavaScript)

Page Speed

YSlow

Building

If you want to play on your own with the source version of YUICompressor without waiting for the next release, you can build it like so:

  1. Checkout or download the code from http://github.com/yui/yuicompressor/
  2. Navigate to the root yuicompressor/ directory
  3. Type ant and hit enter

In order for this to work you need a somewhat recent Java SDK installed and also Ant running. (On the Mac, just do port install apache-ant to get Ant)

This is for the Java version, the JS version needs no building, of course.

Tests

There's a bunch of new tests now (and if you want to contribute to the project, you can always write more tests and test cases for any bugs), you can run them with the suite script that Isaac wrote:

  1. cd tests/
  2. ./suite.sh

One thing I added (and loved it) is to run the tests using the JS port as well. Since the JS min part is using Mozilla's Rhino (slightly modified), Rhino is part of the code. So I'm using this already available JavaScript interpreter to run the JS port. Convenient.

The procedure to write new tests is simple:

  1. Create source CSS file in the tests/ directory, e.g. new-test.css
  2. Create a new file with the expected result and name it with a .min extension, e.g. new-test.css.min

You can use the handy-dandy online version to help with the tests creation.

Next time

With those details out of the way, the next time I'll talk more about the different things that CSSMin does to your CSS code. Thanks for reading!

 

Uncompressed data in base64? Probably not

Thursday, February 4th, 2010

The beauty of experimentation is that failures are just as fun as successes. Warning: this post is about a failure, so you can skip it altogether :)

The perf advent calendar was my attempt to flush out a bunch of stuff, tools and experiments I was doing but never had the time to talk about. I guess 24 days were not enough. Here's another little experiment I made some time ago and forgot about. Let me share before it disappears to nothing with the next computer crash.

I've talked before about base64-encoded data URIs. I mentioned that according to my tests base64 encoding adds on average 33% to the file size, but gzipping brings it back, sometimes to less than the original.

Then I saw a comment somewhere (reddit? hackernews?) that the content before base64-encoding better be uncompressed, because it will be gzipped better after that. It made sense, so I had to test.

"Whoa, back it up... beep, beep, beep" (G. Constanza)

When using data URIs you essentially do this:

  1. take a PNG (which contains compressed data),
  2. base64 encode it
  3. shove it into a CSS
  4. serve the resulting CSS gzipped (compressed)

See how it goes: compress - encode - compress again. Compressing already compressed data doesn't sound like a good idea, so it sounds believable that skipping the first compression might give better results. Turns out it's not exactly the case.

Uncompressed PNG?

The PNG format contains information in "chunks". At the very least there's header (IHDR), data (IDAT) and end (IEND) chunks. There could be other chunks such as transparency, background and so on, but these three are required. The IDAT data chunk is compressed to save space, but it looks like it doesn't have to be.

PNGOut has an option to save uncompressed data, like
$ pngout -s4 -force file.png

This is what I tried - took several compressed PNGs, uncompressed them (with PNGOut's -s4), then encoded both with base64 encoding, put them in CSS, gzip the CSS and compared file sizes.

Code

<?php
// images to work with
$images = array(
  'html.png',
  'at.png',
  'app.png',
  'engaged.png',
  'button.png',
  'pivot.png'
);
//$images[] = 'sprt.png';
//$images[] = 'goog.png';
//$images[] = 'amzn.png';
//$images[] = 'wiki.png';

// css strings to write to files
$css1 = "";
$css2 = "";

foreach ($images as $i) {

  // create a "d" file, d as in decompressed
  copy($i, "d$i");
  $cmd = "pngout -s4 -force d$i";
  exec($cmd);

  // selector
  $sel = str_replace('.png', '', $i);

  // append new base64'd image 
  $file1 = base64_encode(file_get_contents($i));
  $css1 .= ".$sel {background-image: url('data:image/png;base64,$file1');}\n";
  $file2 = base64_encode(file_get_contents("d$i"));
  $css2 .= ".$sel {background-image: url('data:image/png;base64,$file2');}\n";

}

// write and gzip files
file_put_contents('css1.css', $css1);
file_put_contents('css2.css', $css2);
exec('gzip -9 css1.css');
exec('gzip -9 css2.css');

?>

Results

I tried to keep the test reasonable and used real life images - first the images that use base64 encoding in Yahoo! Search results. Then kept adding more files to grow the size of the result CSS - added Y!Search sprite, Google sprite, Amazon sprite and Wikipedia logo.

test with compressed PNG, bytes with uncompressed PNG, bytes difference, %
Y!Search images 700 1506 54%
previous + Y!Search sprite 5118 8110 36%
previous + Google sprite 27168 40836 33%
previous + Amazon sprite + Wikipedia logo 55804 79647 29%

Clearly starting with compressed images is better. Looks like the difference becomes smaller as the file sizes increase, it's possible that for very big files starting with uncompressed image could be better, but shoving more than 50K of images inline into a CSS file seems to be missing the idea of data URIs. I believe the idea is to use data URIs (instead of sprites) for small decoration images. If an image is over 50K it better be a separate request and cached, otherwise s small CSS tweak will invalidate the cached images.

 

One-click Minifier Gadget (OMG) - initial checkin

Sunday, January 31st, 2010

So I've been thinking and talking to folks about this idea of having one-stop shop for all your minification needs. Minification of JS and CSS as well as image optimization helps site performance by reducing download sizes. This is good. But not a lot of people do it.

People don't do it, because it's a PITA :) It's simple enough, but with deadlines upon you and all that, you don't want to do an extra step. That's why having a build process helps, by automating this. But setting up a build process is yet another PITA. So it goes.

So my idea was to help busy designers and developers, that wouldn't invest their time researching which minifiers are good, downloading setting up, learning about the 10+ PNG optimization tools... That's how the the idea for the one-click OMG tool came about. (One-drag is more appropriate, come to think of it...) One tool that runs on all operating systems - Win, Mac, Linux - and delivers all minification and optimization tools you need as one package.

Running

Running the tool is as simple as drag/dropping a bunch of files and directories. Here I've dropped "wordpress" directory. The tool recursively looks into the dropped files for things it can optimize. More information here.

OMG screenshot

Download

Version 0.0.1 is here. It doesn't do image optimization, only JS and CSS minification, but please feel free to download and give it a shot. Unzip the package for your OS and run omg.exe (Windows), OMG.app (Mac), or the omg binary (Linux)

Open source

The code is on GitHub. Fork and enjoy.

The developer's notes are there too - how to setup, run, package. Also a list of todos if you want to help.

Next?

This is just a preliminary version. Feel free to join, comment, suggest. Hate the name? Say so :)

Personally, looks like my plate is very full for the next moth or two, so I probably won't be actively working on the tool. I hope though the foundation is good enough and relatively documented, should be easy to pick up if anyone's interested in contributing.

Built with XUL

This has been a learning experience for me with XULRunner. I loved it. I love the idea of being able to create cross-OS desktop apps with JavaScript alone.

Behind the scenes, I'm using my JavaScript port of YUICompressor's CSSmin and Doug Crockford's JSMin. JSMin should be replaced with YUICompressor (or Google closure compiler) in the next release.

 

Performance job offers

Thursday, January 28th, 2010

I'm sure quite a few of you my fellow readers are crazy about web performance. And if you're seeking new challenges, timing can't be any better. Below are three excellent opportunities in three of the most high-traffic sites on the planet.

  • Yahoo
    Yahoo! Search is hiring a senior performance engineer. Yep, you'll be working with me and a bunch of incredible folks.
  • eBay
    eBay is hiring a performance engineer. I had the pleasure of delivering a tech talk there, it looks like a great place to be, fast-paced, and they do take performance seriously, lot of opportunities to sharpen your perf teeth (I don' have a URL, hit me up ssttoo at gmail if you're interested)
  • Facebook
    Facebook is hiring a performance engineer. Depending on who you trust, FB is #2 or #3 most popular site, so the challenge is definitely there. I've spoken to several awesome people, like David Wei performance engineer and researcher extraordinaire, and let me tell you, things are happening and you'll never be bored, even for a second.

And, not perf-related, but an extraordinary opportunity at YUI was announced today, it almost sounds too good to be true. One of the most important thing about a job is the people you'll be working with. Well, with YUI you can't wish for a higher concentration of front-end brain power. It's scary :)

 

The performance business pitch

Thursday, December 24th, 2009

Dec 24 This post is the last article in the 2009 performance advent calendar experiment.

The idea for this post actually came from the awesome Jeremy Hubert. I met him after a tech talk at Yahoo! I co-delivered and we talked about the presentation. The talk contained an opening part with the various business stats that were presented at Velocity this year (The Year of the Business Metrics) - not that I thought anyone at the tech talk needed to hear about business metrics. I believe developers are naturally attracted to performance, the same way as we are attracted to writing good quality maintainable code. We just like what we do and want to be good at it. The reason I added the business part is because I wanted to give the attendees something to "take home" and present to their bosses should their bosses need convincing. Jeremy suggested I should've instead created a downloadable presentation for people who are interested. So here is the said "business pitch" presentation plus added front-end pitch.

» Download the pitch (PPT)

I intentionally kept the slides clean without much formatting. My hope is that you'll find these slides useful to skin, update, mashup and build upon and deliver to your bosses and clients that need to be convinced to invest in performance and give you the time you need to work on performance improvements.

The slides are also on slideshare. In case you happen to have your client attentive and you don't have the pitch handy, you can always deliver it from slideshare's full screen view.

The rest of the post is the same as the contents of the powerpoint slides. The text under each slide is also added as a comment in the downloadable pitch.

It goes like this:

The Business of Performance

Does anybody like to wait in line at the cache register? Probably not. The same way probably no one likes one to sit and wait for slow web pages to finish loading.

The question is how does page loading time relate to the site's business metrics - metrics such as revenue, user satisfaction and repeat visits? And why should we invest in making pages load faster?

The way to test the impact of slow pages on business metrics is traditionally done with split testing. You randomly separate the users into two buckets. Half get normal page, half get the same page but artificially slowed down at the server side. Then you slow down a little more and see how the business metrics change.

Conducting such experiments people quickly realize that slow is bad and stop the experiment in order not to lose users. So experiments usually end at half a second delay.

Bing.com (while formerly live.com) was brave enough to introduce artificial slow downs as long as 2 seconds. The results were that every one of the monitored business metric changed exponentially for the worse.

Just focusing on the revenue metric, you can see that making the page 1 second slower causes 2.8% drop in revenue. With 2 seconds delay the revenue drops 4.3%.

Similar experiment was done at Google where the search results page was artificially delayed with 0.4 seconds. This experiment shows two additional points:

  1. The negative effect on business metrics gets worse with time. The users may tolerate the slower page during first few visits but will then search increasingly less and even abandon the site
  2. The second point is that even after removing the delay, things didn't get back to normal. There was an after-effect. Some people left the site for good, moved to the competition and didn't return.

Similar experiment was done at Yahoo – as soon as the page gets 0.4 seconds slower, there is a drop in the fill page traffic. This means users leave the page before it's loaded completely (before the onload event fires) – either hit Back button or click away from the page.

At AOL there was a measurement which tied the performance of individual pages with the number of visits those pages get.

The results were pretty clear – slow pages get fewer visits. Users learn to avoid them.

While the previous results showed how business metrics go down as the site gets slower, there is one study from Shopzilla that shows a more positive perspective – how business metrics improve as the site gets faster.

Shopzilla redesigned their site and went from page loading times around 6 seconds down to 1.2 seconds. As a result all metrics improved. The revenue increased with 7 to 12%, they got 25% more page views, the paid search traffic (from Search Engine Marketing) increased. And the required infrastructure to support the site was cut in half, which means they saved money while making more money.

Another piece of stats from Shopzilla illustrates how even the smallest change can have a positive impact. They moved static components to be served from a domain without cookies and this one small change resulted in 0.5% more revenue.

In Yahoo's list of performance best practices, this improvement is listed under number 24 (of 34) meaning that there's much more important and beneficial improvements to be made, but still – every little bit counts.

With regards to cutting down expenses and saving money, here's an example of what happened when Netflix enabled gzip compression for plain text components (gzipping is Yahoo's performance recommendation #4, meaning it's pretty important). Netflix noticed a sudden drop in outbound network traffic - they lowered their bandwidth bill by 43%.

When it comes to business metrics, different sites measure success differently. For some it's the number of new user signups, for some it's time spent on the site, for others – the amount of sales. But in general, what can we expect when improving performance and page loading times? More repeating visits, more time spend on the site viewing more pages, better conversions and revenue, happier users. All that while paying less for hardware and bandwidth.

Also more search engine traffic. Since April 2008 Google takes into account your page loading time when ranking the sponsored search links (AdWords). Shopzilla's stats are a testament to that. And Matt Cutts from Google has mentioned at a conference that there's lobbying inside Google to extend the load time as factor for organic search results too. So far this hasn't happened, but we can speculate that in the future a faster site may rank higher in Google's search results bringing more new visitors to the site.

So a faster site means more business. And where do we start?

To answer that let's look at one page and see where the time is spent for loading this page.
(This is where you can substitute this example with your page, or your client's page. Then run the page through WebPageTest.org and copy the waterfalls to replace in the next two slides)

Here is the so called "waterfall view" showing how does the page load. The first bar is the page itself. All the others are different page components required by the page such as images, scripts and stylesheets.

As you can see only 10% of the time is spent on assembling the page from different sources such as data from the database. We can call this portion server-side time.

Once the server has done constructing the page and has sent it to the browser, the rest is browser time. And that's where 90% of the time is spent – downloading all the page components. We can call this the front-end time.

Many people believe that the front-end time is not so important because all these images and other page components get downloaded once and then cached.

But even for repeat visits with full cache there's still a few components to be loaded, and the scripts and styles to be executed from the cache. In this page example, still only 38% of the total time is server time.

And Yahoo's research shows that 50-60% of all page visits (and 20% of all page views) will always be empty cache experience.

So the front-end matters and this is where the time is spent. For best ROI this is where you should be focusing – improving the front-end performance.

This is actually good news, because improving front-end performance is much easier than changing back-end systems, databases, infrastructure or reengineering server-side applications.

It's easier to improve the front-end and it results in greater overall benefits. Gutting 10% in half won't make a big difference, while cutting the 90% part in half will have a significant effect on the page loading time and therefore on the business metrics.

Credits/URLs

Thanks!

I hope you enjoyed the the calendar and this last article. It was quite the experience for me, writing a post a night, sometimes way too late into the night. Many thanks to the guest bloggers - Christian, Eric and Ara (2 posts!) - thanks to you guys I was able to catch some sleep. Hopefully next year we can make it a bigger community effort with more contributors and a whole separate web site.

 

CSS performance: UI with fewer images

Wednesday, December 23rd, 2009

Dec 23 This post is the one-before-last article in the 2009 performance advent calendar experiment.

Often performance improvements come with their drawbacks, sometimes improving performance causes pains in other parts of the development process or strips stuff from the final product. Sometimes there's even a conflict where you have to pick: slow, unusable and beautiful or fast and looking like hacked with a blunt axe. But it doesn't have to be this way.

This post outlines some approaches to achieving common UI elements using CSS tricks in as many browsers as possible while using as fewer images as possible. Some of the tricks are brand new, some are very, very old, IE5.5. old. They all have in common the "fewer or no images" mantra. Using fewer images comes with some pronounced benefits:

  • less time spent in Photoshop
  • lighter page, less HTTP requests, less image payload
  • fewer elements in the sprite to maintain (and sometimes fewer sprites) which means longer lived sprites with fewer updates and cache invalidation
  • generally easier maintenance - it's much easier to change a color value than to update and push a new image version

Sometimes some browsers may not be fully supported but that's ok - as long as there's progressive enhancement and the basic page is usable, people rarely notice 1px glows and other ornaments.

So let's get started. BTW, a test page with the stuff discussed in the post is here.

Rounded corners

Yep, let's tackle the biggie.

Forget rounded corners in browser that don't support border-radius. Period. It may be hard to argue this case, but definitely try. Doing rounded corners any other way than border-radius is a pain - it adds markup bloat, it makes you create more images or sprite elements. It's tougher to update. Just forget it. Forget rounded corners in IE < 9 (as rumor has it border-radius is coming to IE9). People may argue that IE is important for your audience. No doubt that's true, but rounded corners are not so important for the audience. Show your designer Yahoo Search results page - the sidebar on the left-hand side. Not very rounded in IE. Do you think this was an easy battle - losing rounded corners in IE for such a high-profile site? Ask the man who won the battle ;)

So starting with a normal module - head, body and border:

The markup - nice and clean:

  <div class="module">
    <div class="hd"><h3>This is the header</h3></div>
    <div class="bd">
      <p>Here comes the content</p>
      <p>Here comes some more</p>
      <p>You can never have too much content, because
         content is king, right?
      </p>

    </div>
  </div>

Some fairly simple border radius to support Firefox, Webkit (Safari, Chrome, iPhone...) and, since a few days ago, Opera 10.5 alpha:

.module {
  -moz-border-radius: 9px;
  -webkit-border-radius: 9px;
  border-radius: 9px;
}

Result:

This is it! Easy-peasy, lemon squeezy.

Now, it's a little annoying to write three declarations for the same thing, but, hey - beats images and extra markup hands down. Also annoying are the differences when setting individual corners (-moz-border-radius-topleft is -webkit-border-top-left-radius). In this case we need to also round the header (class .hd) so it doesn't bleed through the beautifully rounded corners:

.hd {
  -moz-border-radius: 8px 8px 0 0;
  -webkit-border-top-left-radius: 8px;
  -webkit-border-top-right-radius: 8px;
  border-radius: 8px 8px 0 0;
}

Verdict:

  • Full support: Firefox, Safari, Chrome, Opera 10.5
  • Fallbacks: IE (corners are not rounded)

Drop shadows and glows

Another favorite effect designers love - dropping shadows. It's easy to enhance that existing .module without any new images:

.module {
  /* offset left, top, thickness, color with alpha */
  -webkit-box-shadow: 5px 5px 5px rgba(0, 0, 0, 0.5);
  -moz-box-shadow: 5px 5px 5px rgba(0, 0, 0, 0.5);
  box-shadow: 5px 5px 5px rgba(0, 0, 0, 0.5);
  /* IE */
  filter:progid:DXImageTransform.Microsoft.dropshadow(OffX=5, OffY=5, Color='gray');
  /* slightly different syntax for IE8 */
  -ms-filter:"progid:DXImageTransform.Microsoft.dropshadow(OffX=5, OffY=5, Color='gray')";
}

And now our module casts a shadow:

Now two notes for IE: first the shadow doesn't have alpha so it's not as nice and second, this filter may not play along with other filters in the same module. But the shadow is cast and that's a check for IE too, even IE5.5!

You may notice that in this case we basically need to more or less repeat the same declaration three times and the IE declaration two times. This is irksome, but hopefully keeping the strings close together should help gzip compression.

As for glowing, it's the same thing in FF, Webkit, Opera, only without any offset. For IE, there's a different filter called glow:

.glow {
  -webkit-box-shadow: 0 0 10px rgba(50, 50, 50, 0.8);
  -moz-box-shadow: 0 0 10px rgba(50, 50, 50, 0.8);
  box-shadow: 0 0 10px rgba(50, 50, 50, 0.5);
  filter:progid:DXImageTransform.Microsoft.glow(Strength=5, Color='gray');
  -ms-filter:"progid:DXImageTransform.Microsoft.glow(Strength=3, Color='gray')";
}

I added these declaration to a new class .glow so I can add the class name to modules that need to glow. The result:

The result as it glows in IE:

Now you see why I added only 3 pixels glow in IE and whole 5 in the rest. The IE glow is a little .. interesting. Also in IE8 (could be my VM, in IE6 XP no VM all looks OK) the glow seems to move slightly when you hover over the module.

Verdict for shadows and glows:

  • Full support: FF, Safari, Chrome, Opera, IE5.5 and up

More info:

Gradients

Ah, gradients. Sometimes so subtle that we, muggles and other mere mortals, don't see them even when we try our hardest. But for the designer they could be life/death situation.

Let's make the head (class .hd) of our module a gradient without any images:

.hd {background-image: -moz-linear-gradient(top, #641d1c, #f00);
  background-image: -webkit-gradient(linear, left top, left bottom, from(#641d1c), to(#f00));
  filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=#ff641d1c,endColorstr=#ffff0000);
  -ms-filter: "progid:DXImageTransform.Microsoft.gradient(startColorstr=#ff641d1c,endColorstr=#ffff0000)";
}

The result:

What a beautiful (code-speaking, of course, not so sure about visually beautiful) module. It has rounded corners, drop shadows and a gradient and so far we haven't used even a single image. Which means this reddish module can become blue, green or, god forbid, pink - with a single tweak in the code, the CMS or the user preferences (if you're building a social network for example).

Gradients verdict:

  • Full support: FF, Safari, Chrome, IE
  • Fallbacks: Opera (solid color)

More info:

... and RGBA for all

Being able to set the transparency of the background without affecting the transparency of the foreground (the text) is quite handy. That's why there's rgba() in CSS (red, green, blue, alpha). IE is not yet supporting it, but we can use the gradient filter which does support transparency. In this case we don't need the actual gradient so we set start and end color to the same thing. Also the background: transparent is needed for the whole thing to work in IE:

.rgba {
  background-color: transparent;
  background-color: rgba(200,200,200,0.8);
  filter:progid:DXImageTransform.Microsoft.gradient(startColorstr=#99dddddd,endColorstr=#99dddddd);
  -ms-filter: "progid:DXImageTransform.Microsoft.gradient(startColorstr=#99dddddd,endColorstr=#99dddddd)";
}

The result is pleasantly cross-browser:

RGBA verdict

  • Full support: Firefox, Safari, Opera, Chrome, IE

Rotating images

It happens that sometimes you use the same image only flipped. For example open/close thingies, menus and such. How about reusing the same image and rotating it with CSS?

.arrow {background: url(arrow.png) no-repeat; display: block; float: left; width: 33px; height: 33px;}
.right{ /* this is the original image*/ }
.left {
  -moz-transform: rotate(180deg);-webkit-transform: rotate(180deg); -o-transform: rotate(180deg);
  filter: progid:DXImageTransform.Microsoft.BasicImage(rotation=2);
  -ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=2)";}
.up {
  -moz-transform: rotate(270deg);-webkit-transform: rotate(270deg); -o-transform: rotate(270deg);
  filter: progid:DXImageTransform.Microsoft.BasicImage(rotation=3);
  -ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=3)";}
.down {
  -moz-transform: rotate(90deg);-webkit-transform: rotate(90deg); -o-transform: rotate(90deg);
  filter: progid:DXImageTransform.Microsoft.BasicImage(rotation=1);
  -ms-filter:"progid:DXImageTransform.Microsoft.BasicImage(rotation=1)";
}

Here's the result. Single image:

Arrow

Result:

the HTML:

<span class="arrow right"></span>
<span class="arrow left"></span>
<span class="arrow up"></span>
<span class="arrow down"></span>

You may notice that the CSS could be quite verbose for saving such small images. It's highly recommended you add the rotation code to a class and use the class name when necessary instead of repeating the same long declaration for every use case or image. Then pray to the gods of compression that this thing gzips well ;)

Verdict

  • Full support: Firefox, IE, Safari, Opera, Chrome

Multiple UI elements with the same background image

The last few tricks have something in common - they each use one background image. The background images are very small - usually around 100 bytes. The tiny image has some transparency to it and is placed as a background-image which sits on top of a background-color. Because of the transparency, the background color shines through, but differently depending on the transparency level of the image above it.

The result is - different UI elements with different colors (and even different hover colors) which can be part of CMS or part of user's skinning and they all reuse the same tiny background. So what can we do this way? A lot of interesting background effects, but here's a few.

Glossy buttons

Here's the end result:

All these buttons share the same background image. The image is 1x1000 and repeated horizontally. The 1000 is just to be safe, very safe, because 50, 100 or 1000 doesn't affect the file size which just a mere 100 bytes. The upper half of the image is a little less transparent. The lower half is 100% transparent. When placed on top of the solid color the whole thing looks shiny and glossy. And you can change the color any way you like.

The HTML:

<p class="button">button1<p>
<p class="button button2">button2<p>
<p class="button button3">button3<p>
<p class="button button4">button4<p>
<p class="button button5">button5<p>
<p class="button button6">button6<p>

And the CSS can't be simpler:

.button {
  background-image:url(http://tools.w3clubs.com/mask/mask.php?x=1000&type=h);
  background-position: center;
}
.button:hover {background-color: #F29222;}
.button2 {background-color: #A41D1C;}
.button3 {background-color: #0F6406;}
.button4 {background-color: #333f79;}
.button5 {background-color: black;}
.button6 {background-color: orange; color: black;}

Actually in the test page I have inlined the image with data URI to save the whole HTTP request for such a teeny image.

As you can see in the URL of the background, I've done a little script to generate some background images:
http://tools.w3clubs.com/mask/mask.php?x=1000&type=h
The image generator's source code is right here.

Stripes

Same technique - but used to generate striped background:

It's basically the same code, only using a different call to the image generator to give us a different background image.

HTML:

  <div class="module stripe earth glow">
    <div class="bd">
      <p>striped background</p>
    </div>

  </div>

  <div class="module stripe tech glow">
    <div class="hd phony stripe"><h3>stripity-stripes</h3></div>
    <div class="bd">
      <p>striped background with the same background image</p>
    </div>
  </div>

CSS:

.stripe {background-image: url(http://tools.w3clubs.com/mask/mask.php?type=stripe);}
.earth  {background-color: olive;}
.tech   {background-color: #bbb;}
.phony  {background-color: #0F6406;}

Again, this image can be a data URI so we save the single HTTP request.

And another gradient

So if you don't like the previously discussed way to do gradients, here's another one. The same trick with the solid color background and a semi-transparent image on top.

Result:

The background images as generated by the service are:
// lighter at the top
http://tools.w3clubs.com/mask/mask.php?type=gradient
// darker at the top
http://tools.w3clubs.com/mask/mask.php?type=gradient&flip=1

Again, you can see the test page here and the source for the image generation is here.

For yet another example of this technique check my post on this (abandoned) blog phonydev.com. There I take an image and a mask image generated by the same script and overlay to achieve an iPhone-like glossy button.

iphone glossiness

Thanks!

Kind of long post, but I hope you're excited about removing a bunch of images from your future designs. If I've omitted some details, please let me know in the comments.

 

iPhone caching

Tuesday, December 22nd, 2009

Dec 22 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come - only 2 to go!

Some time ago there was a post on YUIBlog highlighting the findings of Wayne Shea and Tenni Theurer on the caching behavior of the iPhone. Curious if things have changed several iPhone OS updates later, I ran some experiments with OS3 and OS3.1.

Highlights

  • iPhone will not cache components bigger than 15K (was 25K in the previous experiment)
  • total cache is about 1.5MB (was 500K)
  • short-term memory cache will store components up to 1941K provided the component size does not divide by 4
  • powering off the device still clears all the cache, Expires headers still important
  • not sure since when, but closing all tabs clears the cache too
  • consider HTML5 application cache to improve cacheability and provide offline experience – components in the application cache can be bigger than 15K and stay cached even after you clear Safari's cache from the Settings app (or clear it any other way)

15 is the new 25

Previous experiment showed that the iPhone will not cache components bigger that 25K, but this number is now down to 15K. And this is ungzipped size, meaning if you have for example a 20K JavaScript and your server sends it gzipped down to 10K, it will not be cached by the iPhone.

So minification can definitely help you here. Minification may not be as beneficial as gzipping in desktop browsers, but it may be the difference between a cache hit or miss in the iPhone. Minify, then gzip. And, in general, try to keep your component sizes low - that always helps in any browser.

Total cache size

First off, how was the experiment run? I simply tailed the access log of a server, in order to monitor which components get requested:

$ tail -f ~/logs/http/access.log | grep "XXX.YYY.ZZZ.000"

Where XXX.YYY.ZZZ.000 is the IP address. You can also replace that with just "iPhone" if you think no one else is hitting this site with an iPhone.

Then I requested components with different sizes and look at what the log tail says.

Once it was clear that 15K is the maximum components size, the next question was how many of those 15K components can be cached before iPhone runs out of space allocated for caching? And the result was 105. Attempting to cache one more file resulted in removing existing ones from the cache. Then playing with the size of the very last component helped nail the exact number of bytes available in the cache – 105 * 15 + 7 = 1582K.

So the total cache is just a little over 1.5 MB (earlier research showed 0.5MB for earlier iPhone OS). Have in mind that this is the total cache shared between all pages, it's not per domain or per tab.

Memory cache

The iPhone has a little bit more cache space in the form of memory cache. The memory cache has no per-component size limit, so your components can be as big as the memory cache – 1941 bytes (don't ask how many attempts it took to nail this number).

There a little catch though, probably due to a bug, and it's that if the component size divides by 4, it won't be cached in memory. That's true for JavaScripts – they will be requested on every page load. CSS files with sizes over 15 that divide by 4 will not be requested in the same tab session, but if you close the tab and open the page in another tab, they will be requested again.

So components under 15K go to the disk cache, a 16K component is never cached (so is 20, 24, …, 36,.. 100,… 1024 and so on), 17K and up (up to almost 2 megs) components are cached in memory.

This memory cache is very unreliable as you can guess, because it gets cleared very often, it's probably useful only in the same user session on your web site.

Some observations

  • Closing all tabs, except for blank ones (as when you do "New Page") and then closing Safari is the same as clearing the cache from Settings. So it's a good idea in your normal daily use to leave at least one page open, before you close Safari in order not to cause the implicit cache clearing.
  • Tapping the reload icon in the address bar sends unconditional requests for all components, without the If-Modified-Since header and ignoring the Expires header. So to speed up your browsing, refresh the page by tapping the address bar and then tapping GO, don't use the refresh icon.
  • The tests I ran were using OS 3.0 and OS 3.1.2. I got my phone last Christmas which means that according to this page it came with OS 2.2. I don't remember if I ran any tests with OS2.2. too. I do remember though that I tried some tests back when I didn't have a phone and asked help from Ryan Grove and Nicole Sullivan. Chatting over IM they were loading pages while I was tail-ing the server log. Those tests must have been with OS 2 or OS2.1. Back then the results showed that the limit for a component that gets stored in the cache was 10Meg, which was also the total size of the cache. Now I have my doubts that back then we only tested memory cache, not disk cache. In any event, it's important to note that those restrictions are all software restrictions. No matter what model the phone, it's the OS that sets the limits. Different models with the same OS will behave the same when it comes to caching sizes.

HTML5 offline application cache

All in all we can safely summarize that the iPhone has no cache to speak of. 15K per component is nothing and the limit of 1.5Megs shared with all other pages will have your cached components kicked out pretty quickly.

So – what's an iPhone performance optimizer to do? Use HTML5 goodies.

One thing to consider is some form of client storage supported by the mobile webkit (either key-value or SQLite), but that will require some of your javascript to handle the caching, expiration and so on.

Another thing to do is use the offline application cache. It's meant to support applications to work offline, but it also ends up being useful to improve caching of online applications as well.

What you need is a simple text file called a manifest. In there you list all the components required by your application. E.g.

CACHE MANIFEST

/root/path/to/images/image.jpg
or/maybe/relative/paths/too/scripts.js
http://example.org/good-old/absolute/path/oojs-home.jpg

It's important to serve this file with content type text/cache-manifest.

Then in your html tag just point to that file. Let's say you called the manifest mycache.manifest (could be dynamic, php, or anything, as long as it's served with the proper content type). So your HTML should start like the following:

<!DOCTYPE html>
<html manifest="mycache.manifest">

And this is it. There's also JavaScript API available if you want to play with the items stored in the cache.

Now the browser will request the manifest file every time (Expires header won't help you here) and if its contents is changed it will quietly and unobtrusively download the updated components in the background and the next time the user will see the updated page. Needless to say it makes sense that all the pages of the site share the same manifest.

The best part is that when using offline cache none of the restrictions mentioned above apply – you can store files bigger than 15K and you don't share your total cache space with anyone.

I used this for a personal project – whomsy.com if you want to play with some requests and monitor the traffic (e.g. using the iPhone simulator and Charles proxy).

Manifest – not just for iPhone?

The manifest idea sounds so nice, it makes sense to apply it to desktop browsers as well. In a way it's similar to the idea of having web archives – a zip with all page components (sorry, can't find the URL for the proposal for web archives right now). You can include the manifest on your homepage and, after onload, the browser preloads all the other components used by internal pages (no need to do preloading yourself in JavaScript, no need to worry about spriting images and so on). This is absolutely doable and it also provides the side benefit that your app is available offline, which is the original purpose of the offline cache :) However the application of this technique is a little limited.

  • Browser support – currently Webkit browsers support manifests, and IE doesn't. Firefox does support offline cache, however it shows an ugly confirmation message at the top of the page and you normally don't want to stress out your users with warning messages (see screenshot below)
  • The "homepage" meaning the page that includes the cache manifest also gets cached implicitly. This could be undesired behavior for many sites, but could work just fine for Ajaxy one-page-sites and applications.

Overall, I'm pretty enthusiastic about using offline cache for the purposes of normal page caching too. The future is here, it's just not evenly distributed (nor supported in IE) yet :)

 

Progressive rendering via multiple flushes

Monday, December 21st, 2009

Dec 21 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

Perceived page loading time is just as important as the real loading time. And when it comes to user perception, visible indication of progress is always good. The user gets feedback that something is going on (and in the right direction) and feels much better.

Using multiple content flushes allows you to improve both the real and the perceived performance. Let's see how.

The head flush()

While your server is busy stitching the HTML from different sources - web services, database, etc - the browser (and hence the user) just sits and waits. Why don't we make it work and start downloading components we know will be absolutely needed, such as the logo, the sprite, css, javascript... While the server is busy, you can send a part of the HTML, for example the whole <head> of the document. In there you can put the references to external components such as the CSS, which then the browser can head-start downloading while waiting for the whole HTML response.

<html>
<head>
  <title>the page</title>
  <link rel=".." href="my.css" />
  <script src="my.js"></script>
</head>
<?php flush(); ?>
<body>
  ...

Doing something like this will result in shorter waterfalls, because more downloads can happen in parallel. In the waterfall below the page is not yet completed at 0.4 seconds, yet the browser has already requested more components.

One step further - multiple flushes

While having the browser busy is good and the whole page loads faster, can we do better? How about letting the user see something while the server is still busy? Remember - show something "in a blink of an eye". And how about doing the flushing several times, hence rendering the page is stages. This will help show usable partial versions of the page without waiting on potentially long-loading page or waiting some blocking JavaScripts to load.

Here's an example - Google search results.

The header part of that page (chunk #1) doesn't need any complicated logic. True, the page title and pre-filling the input box are dynamic parts, but this is just a simple echo of the user input, nothing that requires complex work. So out goes the header. Notice that the number of search results is not visible yet. In this chunk there's the logo, so the sprite is downloaded. If this page was using external CSS, it would be included in the head too.

Then, the search results, the meat of the page. Out it goes, as a static HTML chunk #2.

So far the page is done, but not quite yet. There's some progressive enhancement of the page which requires JavaScript. And JavaScript blocks. So include it in the footer as chunk #3.

The page is usable even without the JavaScript and without the footer. The user mostly cares about the results, so chunk #2 is what matters most. Chunk #3 can even get lost in transfer. As for chunk #1 - it's just to give feedback that "hey, we're working on your query". The first chunk actually tricks the user to believe that the query is already done. "Heck", concludes the user seeing that the page is already coming up, "that was FAST" :)

Boring details - HTTP 1.1 chunked encoding

So how does this work actually, how come the HTML is served in parts?

The answer is - HTTP 1.1 chunked encoding. A normal HTTP response looks like:

Headers...
[One empty line]
<html><body>response...

A chunked response would be like:

Headers...
Transfer-Encoding: chunked
More headers...

size of chunk #1
<html><body>...chunk #1...

size of chunk #2
...chunk #2...

size of chunk #3
...chunk #3 </body><html>

0 (meaning "the end!")

The chunk sizes are given as a hexadecimal. Here's an example response (from the wikipedia article)

HTTP/1.1 200 OK
Content-Type: text/plain
Transfer-Encoding: chunked

25
This is the data in the first chunk

1C
and this is the second one

0

Chunking/flushing strategies

There's basically two paths you can take when it comes to chunking:

Application-level chunking
when your web app knows when to flush, based on some logic. The Google search above is an example of application-level flushing - header, body, footer are parts of the page, known to the web application
Server-level chunking
when your application doesn't worry about how the content is delivered, but leaves this task to the server. The server can choose some strategy for flushing, for example once every 4K of output. Google search does this when the user agent doesn't support gzip - it flushes out every 4k. Bing.com does it similarly - flushing about every 1k (sometimes 2K, sometimes less) regardless of the user agent's support for gzip. Interestingly enough bing's first chunk is often non-readable characters - just something that tells the browser - hey, I'm alive, here's your first byte

Amazon is an interesting example of doing a mix of both strategies - it looks like sometimes it's server level (e.g. in the middle of an html tag) but sometimes it looks like the chunk contains (or wraps up) a page section. Amazon is also a good example of focusing on what's important on a page (Why is the user here? What do they want? What do we want them to do?) and making sure it's rendered first.

The areas I've marked in this screenshot do not correspond to chunks exactly, but they show how the page renders progressively, using a combination of chunked response and source order:

  1. #1 - header. Every page has one. Get done with it.
  2. #2 - buy now. This is what we want the user to do.
  3. #3 - image. The user wants to see what they're buying. Probably also helps Amazon minimize merchandise returns :)
  4. #4 - title/price. Kind of important too.

The rest of the page - reviews, comments, also buy... all this can wait, it's all secondary. Most of it is way below the fold anyway.

Tools: none

Unfortunately, as far as I know, there's no tool that offers visibility into those chunks - what is the contents of each one and how does it looks like.

Fiddler let's you see the encoded chunked response, but that doesn't help too much. At least it gives you an idea that the response was chunked - you can see under Inspectors/Transformer that there is a "Chunked Transfer-Encoding" checkbox.

Also in Fiddler under the next tab - Headers, you can see the chunked encoding header. And also Fiddler helpfully tells you that the response has been encoded (the yellow message on top)

HTTPWatch let's you see the incomprehensible response, but it also tells you the number of chunks. Note that the number includes the last 0 in the response, so when it says 4 chunks it means actually 3.

I also tried to fill the void in the tools department by attempting a Firefox extension. Unfortunately it didn't work, I couldn't find API exposed to extensions that would give me access to the raw encoded response. Looks like it would be possible as an extension to HTTPwatch or Fiddler though - both offer extensibility, both show the raw response.

For own consumption I did a PHP script to request the page and give me the chunks ungzipped. It's very primitive but you can give it a shot here. Test with Yahoo! Search for example.

Works with gzip!

A common concern is whether chunked encoding works together with gzipping the response. Yes, it does. In this presentation Steve Souders sheds some light (PPT, see slide #66) on how to address common issues and also gives flush() equivalents in languages other than PHP.

There's a number of things that can be in the way of successfully implementing chunked encoding+gzip including:

  • call ob_flush() in addition to flush() and careful if you have several output buffers started, you may need to iterate and flush all of them
  • some browsers may require some minimal amount of content before starting to parse, IE needs at least 256 bytes
  • you may need a newer version of Apache
  • DeflateBufferSize in Apache may be set too high for your chunk size
  • check the the user-contributed comments in the php.net manual for flush() for helpful advice and ideas
  • there's ob_implicit_flush() setting which may flush for you instead of you doing flush() every time

Do it!

It may be tricky to implement multiple (or even single) flushing, but it's well worth it. There's some server setup hurdles when it comes to gzip, but once you figure it out, you only do it once. As a reward you get faster loading times, plus progressive rendering so your page not only has faster time-to-onload by feels that way too.

 

Extreme JavaScript optimization

Sunday, December 20th, 2009

Dec 20 This article is part of the 2009 performance advent calendar experiment. Today's article is a second contribution from Ara Pehlivanian (here's the first).

Ara PehlivanianAra Pehlivanian has been working on the Web since 1997. He's been a freelancer, a webmaster, and most recently, a Front End Engineer at Yahoo! Ara's experience comes from having worked on every aspect of web development throughout his career, but he's now following his passion for web standards-based front-end development. When he isn't speaking and writing about best practices or coding professionally, he's either tweeting as @ara_p or maintaining his personal site at http://arapehlivanian.com/.

There's an odd phenomenon underway in the JavaScript world today. Though the language has remained relatively unchanged for the past decade, there's an evolution afoot among its programmers. They're using the same language that brought us scrolling status bar text to write some pretty heavy duty client-side applications. Though this may seem like we're entering a Lada in an F1 race, in reality we've spent the last ten years driving an F1 race car back and forth in the driveway. We were never using the language at its full potential. It took the discovery of Ajax to launch us out of the driveway and onto the race track. But now that we're on the track, there's a lot of redlining and grinding of gears going on. Not very many people it seems, know how to drive an F1 race car. At least not at 250 mph.

The thing of it is, it's pretty easy to put your foot to the floor and get up to 60 mph. But very soon you'll have to shift gears if you want to avoid grinding to a halt. It's the same thing with writing large client-side applications in JavaScript. Fast processors give us the impression that we can do anything and get away with it. And for small programs it's true. But writing lots of bad JavaScript can very quickly get into situations where your code begins to crawl. So just like an average driver needs training to drive a race car, we need to master the ins and outs of this language if we're to keep it running smoothly in large scale applications.

Variables

Let's take a look at one of the staples of programming, the variable.Some languages require you to declare your variables before using them, JavaScript doesn't. But just because it isn't required doesn't mean you shouldn't do it. That's because in JavaScript if a variable isn't explicitly declared using the 'var' keyword, it's considered to be a global, and globals are slow. Why? Because the interpreter needs to figure out if and where the variable in question was originally declared, so it goes searching for it. Take the following example.

function doSomething(val) {
    count += val;
};

Does count have a value assigned to it outside the scope of doSomething? Or is it just not being declared correctly? Also, in a large program, having such generic global variable names makes it difficult to keep collisions from happening.

Loops

Searching the scope chain for where count is declared in the example above isn't such a big deal if it happens once. But in large-scale web applications, not very much just happens once. Especially when loops are concerned. The first thing to remember about loops, and this isn't just for JavaScript, is to do as much work outside the loop as possible. The less you do in the loop, the faster your loop will be. That being said, let's take a look at the most common practice in JavaScript loops that can be avoided. Take a look at the following example and see if you can spot it:

for (var i = 0; i < arr.length; i++) {
    // some code here
}

Did you see it? The length of the array arr is recalculated every time the loop iterates. A simple fix for this is to cache the length of the array like so:

for (var i = 0, len = arr.length; i < len; i++) {
    // some code here
}

This way, the length of the array is calculated just once and the loop refers to the cached value every time it iterates.

So what else can we do to improve our loop's performance? Well, what other work is being done on every iteration? Well, we're evaluating whether the value of i is less than the value of len and we're also increasing i by one. Can we reduce the number of operations here? We can if the order in which our loop is executed doesn't matter.

for (var i = 100; i--; ) {
    // some code here
}

This loop will execute 50% faster than the one above because on every iteration it simply subtracts a value from i, and since that value is not "falsy," in other words it isn't 0, then the loop goes on. The moment the value hits 0, the loop stops.

You can do this with other kinds of loops as well:

while (i--) {
    // some code here
}

Again, because the evaluation and the operation of subtracting 1 from i is being done at the same time, all the while loop needs is for i to be falsy, or 0, and the loop will exit.

Caching

I touched briefly on caching above when we cached the array length in a variable. The same principle can be applied in many different places in JavaScript code. Essentially, what we want to avoid doing is sending the interpreter out to do unnecessary work once it's already done it once. So for example, when it comes to crawling the scope chain to find a global variable for us, caching it the reference locally will save the interpreter from fetching it every time. Here, let me illustrate:

var aGlobalVar = 1;

function doSomething(val) {
    var i = 1000, agv = aGlobalVar;
    while (i--) {
        agv += val;
    };
    aGlobalVar = agv;
};

doSomething(10);

In this example, aGlobalVar is only fetched twice, not over a thousand times. We fetch it once to get its value, then we go to it again to set its new value. If we had used it inside the while loop, the interpreter would have gone out to fetch that variable a thousand times. In fact, the loop above takes about 3ms to run whereas if avg += val; were replaced with aGlobalVar += val; then the loop would take about 10ms to run.

Property Depth

Nesting objects in order to use dot notation is a great way to namespace and organize your code. Unforutnately, when it comes to performance, this can be a bit of a problem. Every time a value is accessed in this sort of scenario, the interpreter has to traverse the objects you've nested in order to get to that value. The deeper the value, the more traversal, the longer the wait. So even though namespacing is a great organizational tool, keeping things as shallow as possible is your best bet at faster performance. The latest incarnation of the YUI Library evolved to eliminate a whole layer of nesting from its namespacing. So for example, YAHOO.util.Anim is now Y.Anim.

Summary

These are just a few examples of how to improve your code's performance by paying attention to how the JavaScript interpreter does its work. Keep in mind though that browsers are continually evolving, even if the language isn't. So for example, today's browsers are introducing JIT compilers to speed up performance. But that doesn't mean we should be any less vigilant in our practices. Because in the end, when your web app is a huge success and the world is watching, every millisecond counts.

 

The new game show: “Will it reflow?”

Saturday, December 19th, 2009

Dec 19 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

Intrigued by Luke Smith's comment and also Alois Reitbauer's comment on the previous post about rendering I did some more testing with dynaTrace and SpeedTracer. Also prompted by this tweet, I wanted to provide an example of avoiding reflows by using document fragments as well as hiding elements with display: none. (btw, sorry that I'm slow to respond to tweets and blog comments, just too much writing lately with the crazy schedule, but I do appreciate every tweet and comment!)

So welcome to the new game show: "Will it reflow?" where we'll look into a few cases where it's not so clear if the browser will do a reflow or just a repaint. The test page is here.

Changing classnames

The first test is fairly straightforward - we only want to check what happens when you change the class name of an element. So using "on" and "off" class names and changing them on mouse over.

.on {background: yellow; border: 1px solid red;}
.off {background: white; border: 1px dashed green;}

Those CSS rules shouldn't trigger a reflow, because no geometry is being changed. Although the test is pushing it a bit by changing borders, which may affect geometry, but not in this case.

The test code:

// test #1 - class name change - will it reflow?
var onoff = document.getElementById('on-off');
onoff.onmouseover = function(){
  onoff.className = 'on' ;
};
onoff.onmouseout = function(){
  onoff.className = 'off';
};

Sooo.. will it reflow?

In Chrome - no! In IE - yes.

In IE, even changing the class name declarations to only change color, which is sure not to cause reflow, still caused a reflow. Looks like in IE, any type of className change causes a reflow.

cssText updates

The recommended way to update multiple styles in one shot (less DOM access, less reflows) is to update the element's style.cssText property. But.. will it reflow when the style changes do not affect geometry?

So let's have an element with a style attribute:

...style="border: 1px solid red; background: white"...

The JavaScript to update the cssText:

// test #2 - cssText change - will it reflow?
var csstext = document.getElementById('css-text');
csstext.onmouseover = function(){
  csstext.style.cssText += '; background: yellow; border: 1px dashed green;';
};
csstext.onmouseout = function(){
  csstext.style.cssText += '; background: white; border: 1px solid red;';
};

Will it reflow?

In Chrome - no! In IE - yes.

Even having cssText (and the initial style) only play with color, there's still a reflow. Even trying to just write the cssText property (as opposed to read/write with += ) still causes a reflow. The fact that cssText property is being updated causes IE to reflow. So there might be cases where setting individual properties separately (like style.color, style.backgroundColor and so on) which don't affect geometry, might be preferable to touching the cssText.

Next contestant in the game show is...

addRule

Will the browser reflow when you update stylesheet collections programatically? Here's the test case using addRule and removeRule (which in Firefox are insertRule/deleteRule):

// test #3 - addRule - will it reflow?
var ss = document.styleSheets[0];
var ruletext = document.getElementById('ruletext');
ruletext.onmouseover = function(){
  ss.addRule('.bbody', 'color: red');
};
ruletext.onmouseout = function(){
  ss.removeRule(ss.rules.length - 1);
};

Will it? Will it?

In Chrome - yes. The fact that style rules in the already loaded stylesheet have been touched, causes Chrome to reflow and repaint. Even though class .bbody is never used. Same when creating a new rule with selector body {...} - reflow, repaint.

In IE there's a repaint definitely, and there's also a kind of reflow. Looks like dynaTrace shows two kinds of rendering calculation indicators: "Calculating generic layout" and "Calculating flow layout". Not sure what is the difference (web searches disappointingly find nada/zero/rien for the first string and my previous blog post for the second). Hopefully "generic" would be less expensive than "flow".

display: none

In my previous post I boldly claimed that elements with display: none will not have anything to do with the render tree. IE begs to differ (thanks to dynaTrace folks for pointing that out).

A good way to minimize reflows is to update the DOM tree "offline" out of the live document. One way to do so is to hide the element while updates are taking place and then show it again.

Here's a test case where rendering and geometry are affected by simply adding more text content to an element by creating new text nodes.

// test #4 - display: none - will it reflow
var computed, tmp;
var dnonehref = document.getElementById('display-none');
var dnone = document.getElementById('bye');
if (document.body.currentStyle) {
  computed = dnone.currentStyle;
} else {
  computed = document.defaultView.getComputedStyle(dnone, '');
}

dnonehref.onmouseover = function() {
  dnone.style.display = 'none';
  tmp = computed.backgroundColor;
  dnone.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  dnone.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  dnone.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
}
dnonehref.onmouseout = function() {
  dnone.style.display = 'inline';
}

Will it reflow?

In Chrome - no. Although it does do "restyle" (calculating non-geometric styles) every time a text node is added. Not sure why this restyling is needed.

In IE - yes. Unfortunatelly display: none seems to have no effect on rendering in IE, it still does reflows. I tried with removing the show/hide code and having the element hidden from the very beginning (with an inline style attribute). Same thing - reflow.

document fragment

Another way to preform updates off-DOM is to create a document fragment and once ready, shove the fragment into the DOM. The beauty is that the children of the fragment get copied, not the fragment itself, which makes this method pretty convenient.

Here's the test/example. And will it reflow?

// test #5 - fragment - will it reflow
var fraghref = document.getElementById('fragment');
var fragment = document.createDocumentFragment();
fraghref.onmouseover = function() {
  fragment.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  fragment.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
  fragment.appendChild(document.createTextNode('No mo tests. '));
  tmp = computed.backgroundColor;
}
fraghref.onmouseout = function() {
  dnone.appendChild(fragment);
}

In Chrome - no! And no rendering activities take place until the fragment is added to the live DOM. Then, just like with display: none a restyle is being performed for every new text node inserted. And even though the behavior is the same for fragments as for updating hidden elements, fragments are still preferable, because you don't need to hide the element (which will cause another reflow) initially.

In IE - no reflow! Only when you add the final result to the live DOM.

Thanks!

Thank you for reading. Tomorrow if all goes well there should be a final post related to JavaScript and then moving on to ... other topics ;)

 

DOM access optimization

Friday, December 18th, 2009

Dec 18 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

This blog series has sailed from the shores of networking, passed down waterfalls and reflows, and arrived in ECMAScriptland. Now, turns out there's one bridge to cross to get to DOMlandia.

(OK, I need to get some sleep, evidently. Anyway.) Ara Pehlivanian talked about strategies for loading JavaScript code. Yesterday's post was about rendering and how you can prevent making things worse in JavaScript. Today's post will be about DOM access optimizations and, if all is good, tomorrow's post will round up the JavaScript discussion with some techniques for extreme optimization.

What's with the DOM

Document Object Model (DOM) is a language-independent API for accessing and working with a document. Could be an HTML document, or XML, SVG and so on. DOM is not ECMAScript. ECMAScript is just one way to work with the DOM API. They both started in the web browser but now things are different. ECMAscript has many other uses and so has the DOM. You can generate a page server side, using the DOM is you like. Or script Photoshop with ECMAScript.

All that goes to show that ECMAScript and DOM are now separate, they make sense on their own, they don't need each other. And they are kept separate by the browsers.

For example WebCore is the layout, rendering and DOM library used by WebKit, while JavaScriptCore (most recently rewritten as SquirrelFish) is the implementation of ECMAScript. In IE - Trident (DOM) and JScript. In Firefox - Gecko (DOM) and SpiderMonkey (ECMAScript).

The toll bridge

An excellent analogy I heard in this video from John Hrvatin of MSIE is that we can think of the DOM as a piece of land and JavaScript/ECMAScript as another piece of land. Both connected via a toll bridge. I tried to illustrate this analogy here.

domland and ecmaland connected with a bridge

All your JavaScript code that doesn't require a page - code such as loops, ifs, variables and a handful of built-in functions and objects - lives in ECMALand. Anything that starts with document.* lives in DOMLand. When your JavaScript needs to access the DOM, you need to cross that bridge to DOMlandia. And the bad part is that it's a toll bridge and you have to pay a fee every time you cross. So, the more you cross that bridge, the more you pay your performance toll.

How bad?

So, how serious is that performance penalty? Pretty serious actually. DOM access and manipulations is probably the most expensive activity you do in your JavaScript, followed by layouting (reflowing and painting activities). When you look for problems in your JavaScript (you use a profile instead of shooting in the dark, of course, but still) most likely it's the DOM that's slowing you down.

As an illustration, consider this bad, bad code:

// bad
for (var count = 0; count < 15000; count++) {
    document.getElementById('here').innerHTML += 'a';
}

This code is bad because it touches the DOM twice on every loop tick. It doesn't cache the reference to the DOM element, it looks for that element every time. Then this code also updates the live DOM which means it causes a reflow and a repaint (which are probably buffered by the browsers and executed in batches, but still bad).

Compare with the following code:

// better
var content = '';
for (var count = 0; count < 15000; count++) {
    content += 'a';
}
document.getElementById('here').innerHTML += content;

Here's we only touch the DOM twice at the end. The whole time otherwise we work in ECMAland with a local variable.

And how bad is the bad example? It's over 100 times worse in IE6,7 and Safari, over 200 times worse in FF3.5 and IE8 and about 50 times worse in Chrome. We're not talking percentages here - we talk 100 times worse.

Now obviously this is a bad and made up example, but it does show the magnitude of the problem with DOM access.

Mitigating the problem - don't touch the DOM

How to speed up DOM access? Simply do less of it. If you have a lot of work to do with the DOM, cache references to DOM elements so you don't have to query the DOM tree every time to find them. Cache the values of the DOM properties if you'll do a chunk of of work with them. And by cache I mean simply assign them to local variables. Use selectors API where available instead of crawling the DOM yourself (upgrade your JavaScript library if it's not taking advantage of the selectors API). Be careful with HTML Collections.

// bad
document.getElementById('my').style.top = "10px";
document.getElementById('my').style.left = "10px";
document.getElementById('my').style.color = "#dad";

// better
var mysty = document.getElementById('my').style;
mysty.top = "10px";
mysty.left = "20px";
mysty.color = "#dad";

// better
var csstext = "; top: 10px; left: 10px; color: #dad;";
document.getElementById('my').style.cssText += csstext

Basically, every time you find you're accessing some property or object repeatedly, assign it to a local variable and work with that local variable.

HTMLCollections

HTMLCollections are objects returned by calls to document.getElementsByTagName(), document.getElementsByClassName() and others, also by accessing the old-style collections document.links, document.images and the like. These HTMLCollection objects are array-like, list-like objects that contain pointers to DOM elements.

The special thing about them is that they are live queries against the underlying document. And they get re-run a lot, for example when you loop though the collection and access its length. The fact that you touch the length requires re-querying of the document so that the most up-to-date information is returned to you.

Here's an example:

// slow
var coll = document.getElementsByTagName('div');
for (var count = 0; count < coll.length; count++) {
    /* do stuff */
}

// faster
var coll = document.getElementsByTagName('div'),
    len = coll.length;
for (var count = 0; count < len; count++) {
    /* do stuff */
}

The slower version requeries the document, the faster doesn't because we use the local value for the length. How slower is the slower? Depends on the document and how many divs in it, but in my tests anywhere between 2 times slower (Safari) to 200 times slower (IE7)

Another thing you can do (especially if you'll loop the collection a few times) is to copy the collection into an array beforehand. Accessing the array elements will be significantly faster than accessing the DOM elements in the collection, again 2 to 200 times faster.

Here's an example function that turns the collection into an array:

function toArray(coll) {
    for (var i = 0, a = [], len = coll.length; i < len; i++) {
        a[i] = coll[i];
    }
    return a;
}

If you do that you also need to account for the one-off cost of copying that collection to an array.

Using event delegation

Event delegation is when you attach event listener to a parent element and it handles all the events for the children because of the so-called event bubbling It's a graceful way to relieve the browser from a lot of extra work. The benefits:

  • You need to write less event-attaching code.
  • You will usually use fewer functions to handle the events because you're attaching one function to handle parent events, not individual function for each child element. This means less functions to store in memory and keep track of.
  • Less events the browser needs to monitor
  • Easier to detach event handlers when an element is removed and therefore easier to prevenk IE memory leaks. Sometimes you don't even need to detach the event handler if children change, but the event-handling parent stays the same.

Thanks for reading!

  • Don't touch the DOM when you can avoid it, cache DOM access to local references
  • Cache length of HTMLCollections to a local variable while looping (good practice for any collections or arrays looping anyway). Copy the collection to an array if you'll be looping several times.
  • Use event delegation

Links

 

Rendering: repaint, reflow/relayout, restyle

Thursday, December 17th, 2009

Dec 17 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

Nice 5 "R" words in the title, eh? Let's talk about rendering - a phase that comes in the Life of Page 2.0 after, and sometimes during, the waterfall of downloading components.

So how does the browser go about displaying your page on the screen, given a chunk of HTML, CSS and possibly JavaScript.

The rendering process

Different browsers work differently, but the following diagram gives a general idea of what happens, more or less consistently across browsers, once they've downloaded the code for your page.

Rendering process in the browser

  • The browser parses out the HTML source code (tag soup) and constructs a DOM tree - a data representation where every HTML tag has a corresponding node in the tree and the text chunks between tags get a text node representation too. The root node in the DOM tree is the documentElement (the <html> tag)
  • The browser parses the CSS code, makes sense of it given the bunch of hacks that could be there and the number of -moz, -webkit and other extensions it doesn't understand and will bravely ignore. The styling information cascades: the basic rules are in the User Agent stylesheets (the browser defaults), then there could be user stylesheets, author (as in author of the page) stylesheets - external, imported, inline, and finally styles that are coded into the style attributes of the HTML tags
  • Then comes the interesting part - constructing a render tree. The render tree is sort of like the DOM tree, but doesn't match it exactly. The render tree knows about styles, so if you're hiding a div with display: none, it won't be represented in the render tree. Same for the other invisible elements, like head and everything in it. On the other hand, there might be DOM elements that are represented with more than one node in the render tree - like text nodes for example where every line in a <p> needs a render node. A node in the render tree is called a frame, or a box (as in a CSS box, according to the box model). Each of these nodes has the CSS box properties - width, height, border, margin, etc
  • Once the render tree is constructed, the browser can paint (draw) the render tree nodes on the screen

The forest and the trees

Let's take an example.

HTML source:

<html>
<head>
  <title>Beautiful page</title>
</head>
<body>

  <p>
    Once upon a time there was
    a looong paragraph...
  </p>

  <div style="display: none">
    Secret message
  </div>

  <div><img src="..." /></div>
  ...

</body>
</html>

The DOM tree that represents this HTML document basically has one node for each tag and one text node for each piece of text between nodes (for simplicity let's ignore the fact that whitespace is text nodes too) :

documentElement (html)
    head
        title
    body
        p
            [text node]

        div
            [text node]

        div
            img

        ...

The render tree would be the visual part of the DOM tree. It is missing some stuff - the head and the hidden div, but it has additional nodes (aka frames, aka boxes) for the lines of text.

root (RenderView)
    body
        p
            line 1
	    line 2
	    line 3
	    ...

	div
	    img

	...

The root node of the render tree is the frame (the box) that contains all other elements. You can think of it as being the inner part of the browser window, as this is the restricted area where the page could spread. Technically WebKit calls the root node RenderView and it corresponds to the CSS initial containing block, which is basically the viewport rectangle from the top of the page (0, 0) to (window.innerWidth, window.innerHeight)

Figuring out what and how exactly to display on the screen involves a recursive walk down (a flow) through the render tree.

Repaints and reflows

There's always at least one initial page layout together with a paint (unless, of course you prefer your pages blank :)). After that, changing the input information which was used to construct the render tree may result in one or both of these:

  1. parts of the render tree (or the whole tree) will need to be revalidated and the node dimensions recalculated. This is called a reflow, or layout, or layouting. (or "relayout" which I made up so I have more "R"s in the title, sorry, my bad). Note that there's at least one reflow - the initial layout of the page
  2. parts of the screen will need to be updated, either because of changes in geometric properties of a node or because of stylistic change, such as changing the background color. This screen update is called a repaint, or a redraw.

Repaints and reflows can be expensive, they can hurt the user experience, and make the UI appear sluggish.

What triggers a reflow or a repaint

Anything that changes input information used to construct the rendering tree can cause a repaint or a reflow, for example:

  • Adding, removing, updating DOM nodes
  • Hiding a DOM node with display: none (reflow and repaint) or visibility: hidden (repaint only, because no geometry changes)
  • Moving, animating a DOM node on the page
  • Adding a stylesheet, tweaking style properties
  • User action such as resizing the window, changing the font size, or (oh, OMG, no!) scrolling

Let's see a few examples:

var bstyle = document.body.style; // cache

bstyle.padding = "20px"; // reflow, repaint
bstyle.border = "10px solid red"; // another reflow and a repaint

bstyle.color = "blue"; // repaint only, no dimensions changed
bstyle.backgroundColor = "#fad"; // repaint

bstyle.fontSize = "2em"; // reflow, repaint

// new DOM element - reflow, repaint
document.body.appendChild(document.createTextNode('dude!'));

Some reflows may be more expensive than others. Think of the render tree - if you fiddle with a node way down the tree that is a direct descendant of the body, then you're probably not invalidating a lot of other nodes. But what about when you animate and expand a div at the top of the page which then pushes down the rest of the page - that sounds expensive.

Browsers are smart

Since the reflows and repaints associated with render tree changes are expensive, the browsers aim at reducing the negative effects. One strategy is to simply not do the work. Or not right now, at least. The browser will setup a queue of the changes your scripts require and perform them in batches. This way several changes that each require a reflow will be combined and only one reflow will be computed. Browsers can add to the queued changes and then flush the queue once a certain amount of time passes or a certain number of changes is reached.

But sometimes the script may prevent the browser from optimizing the reflows, and cause it to flush the queue and perform all batched changes. This happens when you request style information, such as

  1. offsetTop, offsetLeft, offsetWidth, offsetHeight
  2. scrollTop/Left/Width/Height
  3. clientTop/Left/Width/Height
  4. getComputedStyle(), or currentStyle in IE

All of these above are essentially requesting style information about a node, and any time you do it, the browser has to give you the most up-to-date value. In order to do so, it needs to apply all scheduled changes, flush the queue, bite the bullet and do the reflow.

For example, it's a bad idea to set and get styles in a quick succession (in a loop), like:

// no-no!
el.style.left = el.offsetLeft + 10 + "px";

Minimizing repaints and reflows

The strategy to reduce the negative effects of reflows/repaints on the user experience is to simply have fewer reflows and repaints and fewer requests for style information, so the browser can optimize reflows. How to go about that?

  • Don't change individual styles, one by one. Best for sanity and maintainability is to change the class names not the styles. But that assumes static styles. If the styles are dynamic, edit the cssText property as opposed to touching the element and its style property for every little change.
    // bad
    var left = 10,
        top = 10;
    el.style.left = left + "px";
    el.style.top  = top  + "px";
    
    // better 
    el.className += " theclassname";
    
    // or when top and left are calculated dynamically...
    
    // better
    el.style.cssText += "; left: " + left + "px; top: " + top + "px;";
  • Batch DOM changes and perform them "offline". Offline means not in the live DOM tree. You can:
    • use a documentFragment to hold temp changes,
    • clone the node you're about to update, work on the copy, then swap the original with the updated clone
    • hide the element with display: none (1 reflow, repaint), add 100 changes, restore the display (another reflow, repaint). This way you trade 2 reflows for potentially a hundred
  • Don't ask for computed styles excessively. If you need to work with a computed value, take it once, cache to a local var and work with the local copy. Revisiting the no-no example from above:
    // no-no!
    for(big; loop; here) {
        el.style.left = el.offsetLeft + 10 + "px";
        el.style.top  = el.offsetTop  + 10 + "px";
    }
    
    // better
    var left = el.offsetLeft,
        top  = el.offsetTop
        esty = el.style;
    for(big; loop; here) {
        left += 10;
        top  += 10;
        esty.left = left + "px";
        esty.top  = top  + "px";
    }
  • In general, think about the render tree and how much of it will need revalidation after your change. For example using absolute positioning makes that element a child of the body in the render tree, so it won't affect too many other nodes when you animate it for example. Some of the other nodes may be in the area that needs repainting when you place your element on top of them, but they will not require reflow.

Tools

Only about a year ago, there was nothing that can provide any visibility into what's going on in the browser in terms of painting and rendering (not that I am aware of, it's of course absolutely possible that MS had a wicked dev tool no one knew about, buried somewhere in MSDN :P). Now things are different and this is very, very cool.

First, MozAfterPaint event landed in Firefox nightlies, so things like this extension by Kyle Scholz showed up. mozAfterPaint is cool, but only tells you about repaints.

DynaTrace Ajax and most recently Google's SpeedTracer (notice two "trace"s :)) are just excellent tools for digging into reflows and repaints - the first is for IE, the second for WebKit.

Some time last year Douglas Crockford mentioned that we're probably doing some really stupid things in CSS we don't know about. And I can definitely relate to that. I was involved in a project for a bit where increasing the browser font size (in IE6) was causing the CPU go up to 100% and stay like this for 10-15 minutes before finally repainting the page.

Well, the tools are now here, we don't have excuses any more for doing silly things in CSS.

Except, maybe, speaking of tools..., wouldn't it be cool if the Firebug-like tools showed the render tree in addition to the DOM tree?

A final example

Let's just take a quick look at the tools and demonstrate the difference between restyle (render tree change that doesn't affect the geometry) and reflow (which affects the layout), together with a repaint.

Let's compare two ways of doing the same thing. First, we change some styles (not touching layout) and after every change, we check for a style property, totally unrelated to the one just changed.

bodystyle.color = 'red';
tmp = computed.backgroundColor;
bodystyle.color = 'white';
tmp = computed.backgroundImage;
bodystyle.color = 'green';
tmp = computed.backgroundAttachment;

Then the same thing, but we're touching style properties for information only after all the changes:

bodystyle.color = 'yellow';
bodystyle.color = 'pink';
bodystyle.color = 'blue';

tmp = computed.backgroundColor;
tmp = computed.backgroundImage;
tmp = computed.backgroundAttachment;

In both cases, these are the definitions of the variables used:

var bodystyle = document.body.style;
var computed;
if (document.body.currentStyle) {
  computed = document.body.currentStyle;
} else {
  computed = document.defaultView.getComputedStyle(document.body, '');
}

Now, the two example style changes will be executed on click on the document. The test page is actually here - restyle.html (click "dude"). Let's call this restyle test.

The second test is just like the first, but this time we'll also change layout information:

// touch styles every time
bodystyle.color = 'red';
bodystyle.padding = '1px';
tmp = computed.backgroundColor;
bodystyle.color = 'white';
bodystyle.padding = '2px';
tmp = computed.backgroundImage;
bodystyle.color = 'green';
bodystyle.padding = '3px';
tmp = computed.backgroundAttachment;

// touch at the end
bodystyle.color = 'yellow';
bodystyle.padding = '4px';
bodystyle.color = 'pink';
bodystyle.padding = '5px';
bodystyle.color = 'blue';
bodystyle.padding = '6px';
tmp = computed.backgroundColor;
tmp = computed.backgroundImage;
tmp = computed.backgroundAttachment;

This test changes the layout so let's called it "relayout test", the source is here.

Here's what type of visualization you get in DynaTrace for the restyle test.

DynaTrace

Basically the page loaded, then I clicked once to execute the first scenario (requests for style info every time, at about 2sec), then clicked again to execute the second scenario (requests for styles delayed till the end, at about 4sec)

The tool shows how the page loaded and the IE logo shows onload. Then the mouse cursor is over the rendering activity following the click. Zooming into the interesting area (how cool is that!) there's a more detailed view:

dynatrace

You can clearly see the blue bar of JavaScript activity and the following green bar of rendering activity. Now, this is a simple example, but still notice the length of the bars - how much more time is spent rendering than executing JavaScript. Often in Ajax/Rich apps, JavaScript is not the bottleneck, it's the DOM access and manipulation and the rendering part.

OK, now running the "relayout test", the one that changes the geometry of the body. This time check out this "PurePaths" view. It's a timeline plus more information about each item in the timeline. I've highlighted the first click, which is a JavaScript activity producing a scheduled layout task.

dynatrace

Again, zooming into the interesting part, you can see how now in addition to the "drawing" bar, there's a new one before that - the "calculating flow layout", because in this test we had a reflow in addition to the repaint.

dynatrace

Now let's test the same page in Chrome and look at the SpeedTracer results.

This is the first "restyle" test zoomed into the interesting part (heck, I think I can definitely get cused to all that zooming :)) and this is an overview of what happened.

speedtracer

Overall there's a click and there's a paint. But in the first click, there's also 50% time spent recalculating styles. Why is that? Well, this is because we asked for style information with every change.

Expanding the events and showing hidden lines (the gray lines were hidden by Speedtracer because they are not slow) we can see exactly what happened - after the first click, styles were calculated three times. After the second - only once.

speedtracer

Now let's run the "relayout test". The overall list of events looks the same:

speedtracer

But the detailed view shows how the first click caused three reflows (because it asked for computed style info) and the second click caused only one reflow. This is just excellent visibility into what's going on.

speedtracer

A few minor differences in the tools - SpeedTracer didn't show when the layout task was scheduled and added to the queue, DynaTrace did. And then DynaTrace didn't show the details of the difference between "restyle" and "reflow/layout", like SpeedTracer did. Maybe simply IE doesn't make a difference between the two? DynaTrace also didn't show three reflows instead of one in the different change-end-touch vs. change-then-touch tests, maybe that's how IE works?

Running these simple examples hundreds of times also confirms that for IE it doesn't matter if you request style information as you change it.

Here's some more data points after running the tests with enough repetitions:

  • In Chrome not touching computed styles while modifying styles is 2.5 times faster when you change styles (restyle test) and 4.42 times faster when you change styles and layout (relayout test)
  • In Firefox - 1.87 times faster to refrain from asking computed styles in restyle test and 1.64 times faster in the relayout test
  • In IE6 and IE8, it doesn't matter

Across all browsers though changing styles only takes half the time it takes to change styles and layout. (Now that I wrote it, I should've compared changing styles only vs. changing layout only). Except in IE6 where changing layout is 4 times more expensive then changing only styles.

Parting words

Thanks very much for working through this long post. Have fun with the tracers and watch out for those reflows! In summary, let me go over the different terminology once again.

  • render tree - the visual part of the DOM tree
  • nodes in the render tree are called frames or boxes
  • recalculating parts of the render tree is called reflow (in Mozilla), and called layout in every other browser, it seems
  • Updating the screen with the results of the recalculated render tree is called repaint, or redraw (in IE/DynaTrace)
  • SpeedTracer introduces the notion of "style recalculation" (styles without geometry changes) vs. "layout"

And some more reading if you find this topic fascinating. Note that these reads, especially the first three, are more in depth, closer to the browser, as opposed to closer to the developer which I tried to do here.

 

How To Measure Web Site Performance

Wednesday, December 16th, 2009

Dec 16 This article is part of the 2009 performance advent calendar. Today's article is a contribution from Eric Goldsmith. Please welcome Eric and stay tuned for the articles to come.

Eric GoldsmithEric Goldsmith (@GoldsmithEric), Operations Architect at AOL, has more than 20 years of experience providing technical leadership in the areas of product development, engineering and operations. At AOL he has led efforts to deliver the highest levels of performance and availability for top Web sites, including: AOL.com; AIM.com; and AOL Video; among others.

His areas of expertise include Performance Analysis, Capacity Planning, Network Engineering, and Software Development. Prior to AOL, Eric worked for companies such as UUNet, WorldCom and CompuServe, as well as telecom and Internet startups. He holds a BS in Computer Science from The Ohio State University.

When trying to quantify the performance of a Web site, we most commonly mean the response time. The two most common methods of gathering response time data are from Field Metrics and Synthetic Measurement.

Field Metrics measure response time from real user traffic, and generally rely on JS instrumentation of the pages, or toolbars to collect data. Synthetic Measurement involves loading pages in one of a myriad of tools designed to collect metrics. Each method has its strengths and weaknesses - but that's a discussion for another time.

Synthetic Measurement is an easy way to get started quantifying your site performance. But there are some important guidelines for getting accurate results.

Test Speed

A common mistake people make when testing the response time of a Web site is testing on their office network. According to Speedtest.net, my office network gives me 53 Mbps. A typical DSL user gets about 1.5 Mbps, or 35 times slower.

How much difference does this make in practice?

CNN Response Times

I tested cnn.com with Webpagtest at two different speeds. At DSL speeds (1.5 Mbps) it loads in ~15 seconds, while at FIOS speeds (20 Mbps) it loads in ~ 3 seconds - a 5x difference.

This demonstrates how a developer testing from his workstation in the office could see a 3s load time, and conclude that all is well. While a home user on DSL could see a 15s load time and abandon the site due to slowness.

As mentioned above, you can test at different speeds with Webpagetest. And if you want to simulate different speeds from your desktop, try AKMA Labs' Network Delay Simulator.

Test Location

Does a site perform differently from LA vs. NY? US vs. UK?

This double waterfall excerpt (the waterfalls from two test executions overlayed together) demonstrates the difference in response time for a page measured from East and West coasts.

East vs. West

A difference of ~ 2 seconds of load time, based solely on where the measurement was taken.

Now, this is a particularly egregious example, and it's possible to design your site so geographic differences are minimized. But you have to know there's a problem before you can solve it.

So, where should you test your site? From where your users are. Most web analytics tools will provide this information. For example, here's what Google Analytics shows for Webpagetest:

Usage by Geography

The ability to test from different locations is available from Webpagetest (if you're interested in hosting a location, let me know), and from Keynote via their KITE tool.

Test Iterations

Can you accurately determine the response time of your site with a single measurement?

The figure below is a typical response time distribution for a Web site.

Response Time Distribution

If you only took a single measurement, where would it fall on the distribution? For example, it could be the left-most red circle - or the right-most (or anywhere else). The delta between those two points is more than 4 seconds. So, what's the response time of your site?

There are two things to consider when trying to answer that: how many measurements (samples) do you need, and how do you coalesce them into a representative number?

Determining the number of measurements needed can get complicated. But in general, 30 or more is suggested.

How to aggregate all that data into a response time number is subject all it's own, and a whole 'nother discussion. It's common practice to just average them. That's not an ideal approach, but it's a starting point, and gets the dialog started.

 

JavaScript loading strategies

Tuesday, December 15th, 2009

Dec 15 This article is part of the 2009 performance advent calendar experiment. Today's article is a contribution from Ara Pehlivanian, author of two JavaScript books. Please welcome Ara and stay tuned for the articles to come.

Ara PehlivanianAra Pehlivanian has been working on the Web since 1997. He's been a freelancer, a webmaster, and most recently, a Front End Engineer at Yahoo! Ara's experience comes from having worked on every aspect of web development throughout his career, but he's now following his passion for web standards-based front-end development. When he isn't speaking and writing about best practices or coding professionally, he's either tweeting as @ara_p or maintaining his personal site at http://arapehlivanian.com/.

JavaScript has a dark side to it that not many people are aware of. It causes the browser to stop everything that it's doing until the script has been downloaded, parsed and executed. This is in sharp contrast to the other dependencies which get loaded in parallel--limited only by the number of connections the browser and server are able to create. So why is this a problem?

Good question! Before I can answer that, I need to explain how the browser goes about building a page. The first thing a it's does once it receives an HTML document from the server is to build the DOM--an object representation of the document in memory. As the browser goes about converting HTML into the DOM, it invariably encounters references to external dependencies such as CSS documents and images. Every time it does so, it fires off a request to the server for that dependency. It doesn't need to wait for one to be loaded before requesting another, it makes as many requests as it's capable of. This way, the page gets built one node at a time and as the dependencies come in, they're put in their correct placeholders. What gums up the works though, is when a JavaScript dependency is encountered. When this happens, the browser stops building the DOM and waits for that file to arrive. Once it receives the file, it parses and executes it. Only once all of that's done does the browser continue building the DOM. I suspect this has to do with wanting to provide as stable a DOM to the script as possible. If things were in flux while the script attempted to access or even modify a DOM node, things could get dicey. Either way, the time it takes before the browser can continue depends entirely on the size and complexity of the script file that's being loaded.

Now imagine loading a 200k JavaScript file right in the <head> of a document. Say it's a JavaScript file that's not only heavy but also does some fairly complex computing that takes half a second to complete. Imagine now what would happen if that file took a second to transfer. Did you guess? Yup, the page would be blank until that transfer and the computation were complete. A second and a half of a blank page that the visitor has to endure. Given that most people don't spend more than a few seconds on the average web page, that's an eternity of staring at a blank page.

Reduce

So how can this problem be overcome? Well, the first thing that should be done, is to reduce as much as possible, the amount of data that's being sent over the pipe. The smaller the JavaScript file, the less waiting the visitor has to do. So what can be done to reduce file size? JavaScript files can be run through a minifier such as YUI Compressor (which removes unnecessary white space and formatting, as well as comments, and is proven to reduce file size by 40-60%). Also, if at all possible, servers should be set up to gzip files before they're sent. This can drastically reduce the number of bytes that get transferred since JavaScript is plain text, and plain text compresses really well.

Defer

So, once you've made sure your file is as small as possible, what next? Well, the first thing is to make sure the visitor has something to look at while the script is loading. Instead of loading JavaScript files in the document's <head>, put your <script> tags immediately before your page's closing </body> tag. That way, the browser will have built the DOM and begun inserting images and applying CSS long before it encounters your script tags. This also means that your code will execute faster because it won't need to wait for the page's onload event--which only fires once all the page's dependencies are done loading.

So with the script tags placed at the end of the document, when the browser does encounter them, will still halt operations for however long it needs to, but at this point the visitor is reading your page and unaware of what's going on behind the scenes. You've just bought yourself the time to surreptitiously load your script files.

Go Async

There is another way to load JavaScript files which won't block your browser, and that's to insert the script tags into your page using JavaScript. Dynamically including a script tag into the DOM causes it to be loaded asynchronously. The only trouble with that is that you can't rely on the code within the script file to be available immediately after you've included it. What you'll need is a callback function that is executed once your script is done loading. There are several ways of doing this. A lot of libraries have built in async script loading functionality, so you're likely better off using that. But if you want to do it yourself, be ready to deal with the idiosyncrasies of different browsers. For example, where one browser will fire off an onload event for the script, another will not.

Be Lazy

So now that we know how to load scripts behind the scenes, is there anything more we can do to improve performance? Of course.

Say for example your page loads up a large script that gives your site a fancy navigation menu. What if the user never uses the navigation menu? What if they only navigate your site through links in your content? Did you really need to load that script in the first place? What if you could load the necessary code only when it was needed? You can. It's a technique called lazy loading. The principle is simple, instead of binding your fancy navigation script to the menu in your page, you'd bind a simple loader script instead. It would detect an onmouseover event for example, and then insert a script tag with the fancy nav code into the page. Once the tag is done loading, a callback function wires up all the necessary events and presto bingo, your nav menu starts working. This way, your site doesn't have to needlessly bog visitors down with code they'll never use.

Bite Size

In keeping with lazy loading, try to also load only the core components that are needed to make your page work. This is especially the case when it comes to libraries. A lot of the time a library will force you to load up a huge amount of code when all you want to do is add an event handler, or modify class names. If the library doesn't let you pull down only what you need, try ripping out what you want and only load that instead. There's no point in forcing visitors to download 60k of code when all you need is 4k of it.

Do You Need It?

Finally, the best way to speed up JavaScript load times is to not include any JavaScript at all. A lot of times people go nuts for the latest fad and include it in their site without even asking themselves if they really need it. Does this fancy accordion thing actually help my visitors get to my content easier? Does fading everything in and out and bouncing things all over the place actually improve my site's usability? So the next time you feel like adding a three dimensional spinning rainbow tag cloud to your site, ask yourself, "do I really need this?"

Note from Stoyan:

I'd like to thank Ara for the great article, it's pleasure for me to be the blog host!

Also wanted to offer some additional links for your reading pleasure:

Please comment if you can think of more good resources on the topic.

 

Free-falling waterfalls

Monday, December 14th, 2009

Dec 14 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

In this serias of performance posts, so far we've looked at having fewer components in the waterfall (meaning less HTTP requests) and also making the components as small as possible. The next task is to make sure that the waterfall is as short as possible - meaning let it fall freely, without interruptions and have the browser download as many components as possible in parallel.

Some ways to make the waterfall fall free include having:

  • fewer DNS lookups
  • parallel downloads using several domains (this point contradicts the previous, ain't performance fun)
  • fewer redirects (ideally no redirects)
  • non-blocking scripts and styles
  • smaller request/reponse headers, which includes using fewer cookies

Reducing DNS lookups

When you request a component, the browser needs to resolve the hosthame of the component to an IP address. This is known as a DNS lookup and you can see those lookups in the waterfall charts. The DNS time may look negligible (plus DNS lookups get cached by browsers and operating systems), but they sometimes take ridiculous amount of time. It depends on many factors, often beyond your control, so the best thing to do is change what you do control and that is - require fewer DNS lookups. Carlos Bueno has an excellent writeup on DNS lookups here.

You should limit the number of DNS lookups the browser needs to perform, ideally to no more than 2 to 4, according to Yahoo's studies.

Parallel downloads

Browsers have limits on how many components they download from the same domain at the same time. In older browsers, including IE6 and IE7 this limit is 2. This can definitely slow down your waterfall significantly, when you have a greater number of components to download.

Newer browsers have increased that limit to 4 (Safari, Opera 10) or 6 (FF3, IE8), so this should be less of an issue. But at the end it depends on your page - how many components and how many people on IE6,7.

Below is an image of how IE7 loads a page with 8 images, where each image is artificially delayed to take 2 seconds. Downloading two components at a time, IE spends at least 8 seconds on the images (2-4-6-8) or a total time of over 9 seconds.

Loading the same page in IE8 is shown below. IE8 loads 6 components at a time, so the 8 images are loaded in two batches (1st - 6 images, 2nd - 2 images), for a total time of 4 seconds spend on images. Overall, the whole page loads in 5 seconds.

Now, to work around this limitation in older browsers a common technique is to create dummy subdomains, like img1.example.org, img2.example.org and so on, so that more components can be downloaded in parallel (the limitation is per domain). If you're going to do this, remember to balance this optimization with the fewer DNS lookups recommendation, don't spread on too many domains. Look carefully at your waterfalls to find the balanced point. Again, the general recommendation is that a page should require up to 2-4 domains tops.

Quick sidetrack: two URLs for your toolbox.

  1. For the up-to-the-moment insight into different browser limits and capabilities, bookmark Google's BrowserScope project. Check the "Network" tab for example to see the parallel downloads limitations across browsers
  2. Cuzillion (by Steve Souders) lets you quickly create test pages. The waterfalls above are actually coming from a page created with Cuzillion and tested in AOL's WebPageTest

No redirects

Redirects are bad for your waterfall. They do nothing for the user but just slow down the experience. Think about what happens: the browser makes a request, waits for the response, the response says "no, no, go get your component from way over there", so the poor old browser starts again - makes a new request, waits for the response.

So, avoid redirects, be they server-side redirects or client-side (JavaScript or meta-tag redirects).

As an illustration how bad redirects can be - consider the waterfall below. It's from a real page, not a made up case. So it all looks like the page could finish loading after about 1.1 seconds, but then a redirect occurs at 0.9s, takes half a second and then points to a 1x1 blank GIF. Obviously it's some sort of stats tracking image. But the bad parts are that: a/ it's an IMG tag, therefore delays onload and b/ there's a redirect. At the end, the page loads in 1.7 seconds instead of 1.1 seconds. The user experience suffers for no reason. The way to fix this is simply remove the image from the IMG tag and load it with new Image().src = "1x1.gif" this way taking it out of the onload flow. Then remove that redirect. For such stats tracking cases a 204 No Content response is the appropriate way to go.

Blocking script and styles

Scripts block downloads, hence slow down your free-falling waterfall. This is an important topic which deserves an article of its own, so stay tuned.

And what about stylesheets, do they block other downloads in the waterfall? Turns out stylesheets are mostly fine, but they could also block in these cases:

  • in Firefox before version 3 (probably no need to worry about it)
  • in all browsers, if followed by an inline script

The second one is interesting as much as it's surprising. It's probably not a good idea to have inline script tags scattered all around the HTML to begin with. And since this can cause the stylesheets to block the downloads of the other components, it should be avoided at all costs.

Be sure to check Steve Souders' blog post for more information. Credit to Steve - I believe he was the first to take note and report this issue.

So check your waterfall if you see a stylesheet that blocks, look around it in the markup for any inline script tags that can be moved further down.

Cookies and other HTTP headers

We talked about making the responses smaller. But we can also optimize the requests by making the HTTP headers smaller. You can take a look at your request and response headers and see if you're not sending too much.

Looking at my blog I see this Server header:

Server: Apache/2.2.11 (Unix) mod_ssl/2.2.11 OpenSSL/0.9.7a Phusion_Passenger/2.2.4 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635

Seems to me like too much information.

There's also an ETag:
Etag: "9f38013-2af0-453b354a20bc0"

While ETags may be help with caching, when you have far-future Expires header, they are not necessary. And if you have a multi-machine setup, ETags can actually be bad (YSlow will warn you for this issue)

While you have some control over the response headers, you have very little control over the request headers. It's your users' browsers that control them. But you do have control over the Cookie header and this is where the bigger savings will actually come from because Cookie is often the biggest part of the HTTP headers

Reduce cookies

You should aim at sending the least possible amount of cookies. Be careful when you write cookies. Make them smaller and write them to the appropriate sub-domain name. If you have a blog at blog.example.org and a main site at www.example.org, then don't write the blog.example.org cookies at *.example.org level.

Hotmail have talked about how they compress their cookies before writing them. That's also an idea if you have big Cookie headers.

Cookie-less domains for components

Better yet, for static components that don't have any use for these cookies, just don't send them. Setup static.example.org and don't write cookies for this domain. Then put your static components there.

A curious piece of stats here - Philip Dixon reported (slides) that after Shopzilla.com moved their static components to a cookie-free domain they made more money.

Images to non-cookie domain resulted in
0.5% top line revenue increase!

"top line" means revenue (maybe you know that but I had to check :)). This is a fascinating idea - that you can make more money by improving something as simple as cookie-less components. So, every little bit helps. Keep making your site faster and ... you never know.

www or no-www

This point also adds to the good old www vs. no-www flamewar. If you opt for no www, then in IE you cannot write cookies to example.org, but you'll write to *.example.org. This means your static.example.org will see all the cookies too. This post has some more info on the topic.

If you've already polluted your top-level domain with long-term cookies, the remedy would be to just buy a new "clean" domain, like examplestatic.org, never ever write cookies to it and use it for your static components.

Thanks!

And that's it for today's post, thank you for reading and may HTTP be with you ;)

 

Give PNG a chance

Sunday, December 13th, 2009

Dec 13 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the articles to come.

People are often afraid to use PNG because they think that:
a/ it doesn't work in all browsers, or
b/ filesizes are bigger than GIF

While these have some grain of truth to them, they are mostly misconceptions. Before addressing them, one quick background point - what's PNG8 and why it's cool.

PNG8

There are several types of PNG files, which can be grouped into those main kinds:

  • Truecolor PNGs with or without alpha transparency channel, also known as PNG24 and/or PNG32 (the one with alpha)
  • Grayscale PNGs with or without alpha
  • Indexed PNG, aka palette PNG, aka PNG8

PNG8 is like a GIF - it has a palette of 256 colors and supports transparency. While GIF supports true/false transparency (a pixel is either transparent or it isn't), PNG8 supports variable alpha transparency. Right there you see - PNG8 can do anything that GIF can, plus more.

There's a little glitch in IE6 where a semi-transparent pixels in PNG8 are seen as fully transparent, just like a GIF. So here's an option for progressive enhancement - you use the same image and IE6 gets a degraded GIF-like experience, while modern browsers get the full experience.

Here's an example, taken from this excellent article - modern browsers get the light bulb with the glow:

IE6 and under get the gracefully degraded experience and no glow:

Another pain point is that Photoshop doesn't produce semi-transparent PNG8 (although they came up with the name "png8" instead of saying palette or indexed PNG). Only Fireworks does export alpha transparent PNG8, which makes it a bit of a challenge. You also need a good designer to undertake this tricky task of making sure the image looks OK in both experiences. One way is to assume you're working with a GIF and then upgrade the experience with the carefully selected semi-transparent pixels. It could also help you keep the gif-like version in a layer and use other layers for the semi-transparent stuff, so you can quickly preview what the image will look like in IE6.

In any event - the important thing to remember is that in the worst case (IE6) PNG8 is as good as a GIF.

PNG doesn't work in browsers?

PNG works in browsers since forever with the exception of two edge cases:

  • the glitch where PNG8's semi-transparency is gone in IE6 (see above), but here GIF can't help you either
  • transparency in truecolor PNGs is shown as a solid (usually grey) color in IE6. But again - GIF can't help here either, because it doesn't support alpha (variable) transparency to begin with. People often use GIF to "solve" this problem (moving to GIF will mean potentially losing colors), but if you can solve with a GIF, you can solve it even better (and with smaller filesize) with PNG8

Another solution to the second problem is to use IE's AlphaImageLoader CSS filter (and there's a number of scripts to do so automatically), but this filter has serious performance drawbacks and should only be used as a last resort. Three things to try before resorting to AlphaImageLoader:

  1. Try PNG8 for progressively enhanced experience
  2. Try without transparency - if the background is a solid color, convert the image to use the solid color. In imagemagick you can use -flatten for this purpose:
    $ convert source.png -flatten -background yellow result.png
  3. Forget about IE6 :)

If you end up using AlphaImageLoader, make sure you use the underscore hack so that only IE6 users experience the performance degradation.

#some-element {
    background: url(image.png);
    _background: none;
    _filter:progid:DXImageTransform.Microsoft.AlphaImageLoader(src='image.png', sizingMethod='crop');
}

PNGs are bigger than GIFs?

This misconception comes from the fact that people compare truecolor PNG with GIF which is not a fair comparison because you often compare image with thousands of colors (the PNG24) with an image with 256 colors (GIF). Often people work on an image in Photoshop or another program and when they decide to export for the web, they try PNG24, see that it's bigger and switch to GIF. But in this step GIF may strip a lot of colors. And if you're going to strip colors, well, PNG8 will give you the same colors and smaller filesize. (Another thing is that sometimes Photoshop does a poor job exporting the PNG8. If the PNG8 looks crappy, but the GIF is OK, then export as a GIF but then convert to PNG with another utility, such as optipng)

Again - PNG8 is the file format comparable to GIF and it's almost always smaller in filesize than GIF.

Comparing GIF vs. PNG filesizes

(This and the next experiment is something I did over an year ago, bored to death in the middle of the ocean on the board of a Carnival cruise ship, but since then I never really looked at the data. So here's my chance to flush some old data and clean up 20 Gigs of lets-keep-just-in-case test images :) )

Using Yahoo! image search web service I downloaded some GIFs (matching the queries "logo", "mail" and "graph"), ended up a little over 1700 images. Then I used optipng to convert them all to PNG and see the results.

I used OptiPNG simply with no special options:

$ optipng *.gif

As the next experiment will show, optipng can do better, so can pngout for example. So consider these results the least you can do to make GIFs smaller (by turning them to PNG)

So some stats from the experiment:

  • The average, median actually, GIF image on the web (last year, judging from this small sample) is 525x388 and has 139 colors (I just love semi-useless stats ;) )
  • The median GIF is 24K
  • After conversion to PNG, the median becomes 18K
  • The median savings from converting all GIFs to PNG is about 23%

Interestingly enough 4% of the images were smaller as GIFs - utter disappointment (and don't tell anyone!). So I had to try just a little harder. I didn't run OptiPNG with its best -o7 option, but ran PNGOut instead. The results is that now only 4 of the 1706 images were smaller as GIF. I'm pretty sure that trying a little harder (with PNGSlim, see yesterday's post) would've probably fixed it, but 4 out of 1700 is something I could live with. BTW, the images where OptiPNG failed to produce smaller PNG, then PNGOut converted with the ratio of 21% median savings. Not bad for taming the few shrew GIFs.

BTW, some GIFs lost over 100K of filesize, the max was over almost 600K savings! So, you never know.

If you like to look at numbers, here's a csv dump - the optipng results and the selected few that ran through PNGout.

So, take-home message: turn your GIFs into PNGs and win at minimum 20% fewer bytes over the network.

Comparing PNG optimizers

For this experiment I downloaded over 12000 images (again, Yahoo! search API) and ran them through a bunch of optimizers, sometimes with different options. In retrospect, it's probably not that useful of an experiment, because (see previous post) different optimizers specialize in different areas - compression, pre-compression filtering, chunks removal, etc, and your best bet is to run several tools. But still it's at least some data points (the cvs dump is here)

The images were 1000 matches for each of the searches for "baby", "background", "bkg", "flower", "graph", "graphic", "icon", "illustration", "kittens" (of course), "logo", "monkeys", "png", "transparency". After removing 4xxs, 5xxs and other mishaps and cleaning up a bit, I ended with over 10000 images. I ran them thorough:

  • pngcrush - pngcrush -rem alla -reduce before.png after.png
  • pngcrush-none - to keep all chunks pngcrush -rem none -reduce before.png after.png
  • pngcrush-brute - more filter attempts - pngcrush -rem alla -brute -reduce before.png after.png
  • pngout - pngout /q /y /force before.png after.png. default compression level in PNGOut is "extreme", so I tried two less extreme below
  • pngout-match - pngout /s2 /q /y /force before.png after.png
  • pngout-intense - pngout /s1 /q /y /force before.png after.png
  • pngrewrite - pngrewrite before.png after.png PNGRewrite only works with PNG8, it also converts truecolor to PNG8 whenever the truecolor happens to be under 256 colors,
  • optipng - optipng before.png -force -out after.png. OptiPNG's default level is 2 (of 7) so I had to try below and above the default:
  • optipng1 - optipng before.png -o1 -force -out after.png
  • optipng3 - optipng before.png -o3 -force -out after.png
  • optipng7 - optipng before.png -o7 -force -out after.png
  • advpng - cp before.png after.png; advpng -z -f -q after.png
  • advpng-insane - with the "insane" 4th level of compression cp before.png after.png; advpng -z4 -f -q after.png
  • deflopt - cp before.png after.png; deflopt -s -f after.png
  • pngoptimizercl -cp before.png after.png; pngoptimizercl -file:"after.png"

And the results:

Tool Median time to run Median savings Success rate
pngcrush 0.25s 6.06% 93.85%
pngcrush-none 0.23s 5.58% 90.22%
pngcrush-brute 3.08s 8.10% 96.31%
pngout 1.89s 12.21% 94.35%
pngout-match 0.22s 13.89% 44.57%
pngout-intense 1.63s 12.10% 94.22%
pngrewrite 0.07s 29.84% 22.37%
optipng 0.23s 7.32% 93.21%
optipng1 0.10s 4.24% 85.16%
optipng3 0.66s 7.10% 94.26%
optipng7 4.13s 7.57% 94.81%
advpng 0.34s 11.55% 52.47%
advpng-insane 0.76s 15.64% 56.09%
deflopt 0.34s 0.44% 96.94%
pngoptimizercl 0.48s 9.71% 97.99%

"Success rate" is how often the tool managed to produce a smaller result than the original. For example PNGRewrite's success rate is pretty low, because it only works with up to 256 colors. Median time to run is the median value that the tool takes to optimize one image.

And now, madames et monsieurs, introducing...

Give PNG a chance (.com)

I hope you'll find this as funny as I do, I thought it was pretty funny, at least in my head :)

My secret goal was that everybody who hears the song or watches the video, will think twice the next time when doing "Save for the Web..." in Photoshop.

Enjoy!

Music: Drums from GarageBand, I play two guitars, also bass (a guitar with effect actually) and vocals. If you think you hear a woman's voice, it's still me, with "Helium" effect. The MP3 is here. If you want to experiment with the song yourself, here's a zip with each channel as an MP3.

Video: It may be lousy, but it's all web dev :) It's all JavaScript and CSS. The video is a screen capture of the Safari window. Also there are no images, only HTML entities. Heavy use of -webkit-* animations and transitions. The source and a live version you can play in Safari is here. The StarWars-like effect is borrowed from here.

The http://givepngachance.com URL is currently pretty blank, but I intend to add more PNG-related stuff there. Oh, and the lyrics.

Thanks!

Thanks for reading. And watching. And listening. Peace. And give PNG a chance :)

 

Big list of image optimization tools

Saturday, December 12th, 2009

Dec 12 This post is part of the 2009 performance advent calendar experiment (12 articles down, 12 more to go). Stay tuned for the articles to come.

Let's continue the topic of reducing file sizes started with the previous post and talk about making images smaller.

Engineer's guide to smaller images

Just to set the frame of the discussion - this is not about using Photoshop or setting the quality of the JPEGs and so on. I realize that we, web developers, wear many hats - we're designers, client/server coders, Apache/Linux admins, database heros. But this post is not about using image programs and assumes that you or your designer has already created the images to be used on the site with the appropriate colors, quality and so on.

Now, repeat after me: you should never take an image from the designer and put it up on the web.

Most often this image is bigger than it should be. It's not the designer's fault, it's usually the software used to produce the image.

You shouldn't put an image up on your server before running it through a few tools. These tools are free, open source, cross-platform and can be run on the command line, hence scripted and run in batches over a large number of files - by you, or even better, automatically by a build deployment process. It's OK to batch-run those files without human intervention, because these tools simply optimize the files, they don't change the pixel information, so the "after" images look exactly like "before", only smaller.

Selecting the right file format

The first step towards leaner images is to select the correct file format. There are three options:

  1. JPEG for photos. Photos contain millions of colors and smooth transitions of colors. Blue skies, clouds, sunset, your dog, lolcats - all photos.
  2. GIF is for the occasional "loading..." progress animation. This is it, no other uses for GIFs.
  3. PNG is for everything that's not a photo or an animation. That includes all icons, graphs, buttons, gradients, and what not. Any image with sharp transitions of colors. Think (but don't use for) text. Sharp transitions become "dirty" in JPEG.

PNG is an interesting topic for a follow up post, for now let's stop here. If it's not a photo, it should be PNG. An edge case is a screenshot for example. Depends on what's on the screen of course, but most often JPEG will give a smaller size if you can live with the artifacts around the sharp edges (like text).

Optimizing GIFs

Unfortunately many people still use GIFs even for non-animated images. That's a mistake. PNG is a superior format and yields smaller file sizes.

People still use GIFs because they think either that a/ GIF is smaller than PNG or b/ there's lack of support for PNG in browsers. These are misconceptions and I'll talk more about them tomorrow.

So, the way to optimize a GIF is to convert it to PNG.

You can use many tools to turn your GIFs to PNGs, including ImageMagick and OptiPNG.

# option 1: ImageMagick (if you know the filename)
$ convert logo.gif logo.png

# option 2: ImageMagick again (if you just convert all files in a directory)
$ mogrify -format png *.gif

# option 3: OptiPNG
$ optipng *.gif

These are just some of the options, I'm sure there are others.

After the conversion you can optimize the new PNGs like all other PNGs

Optimizing PNGs

There are various ways to write a PNG file. Unfortunately not all image editing programs do a good job at writing PNGs for the minimal file size.

Luckily, to fill the void, there's a great number of tools that excel in writing small PNGs. There are different ways to optimize a PNG:

  1. Stripping out "chunks" - PNG is an extensible format. Extensions come in the form of chunks and most chunks are not needed for the web.
  2. Reducing the number of colors and switching between PNG types - truecolor PNG, grayscale, palette...
  3. Chosing the best "filter". Filters are a pre-compression step. You can compress any type of file, but when you know the file is an image, you can do better. Filters are for this purpose.
  4. Optimizing the actual DEFLATE compression algorithm

Different tools specialize in one or more of these areas. So the more tools you run, the better the results will be. But you have to run at least one tool, always. You'll be surprised how unoptimized are most PNGs coming from common commercial image programs.

So, to optimize a PNG you shoul run as many of the following programs as possible:

# optipng (skip -o7 to run faster)
$ optipng -o7 my.png

# pngcrush (skip -brute to run faster)
$ pngcrush -rem alla -brute -reduce my.png my.png.temp
$ mv my.png.temp my.png

# pngout - closed source, non-windows binaries here
# (add parameter -s2 to run faster)
$ pngout my.png

# advpng (use -z2 to run faster)
$ advpng -z4 my.png

# deflopt - windows only
$ deflopt my.png

Other tools to note include PNGrewrite, PNGNQ and PNGquant, but they are limited because they deal only with PNG8 (256 colors) files. PNGNQ and PNGQuant are actually converters from truecolor to PNG8, so they are not guaranteed to be lossless. PNGreqwrute is safe to use, it will just silently fail if the file has more than 256 colors, so there's nothing to lose.

Oh, and another, excellent tool - PNGOptimizer, windows-only has both command line interface and a GUI.

PNGSlim for the hardcore PNG optimization

If you're really serious about optimizing your PNGs, the tool is called PNGSlim. It's a Windows-only batch file that runs pretty much all tools above and runs them (especially PNGOut) with all kinds of parameters, hundreds of times. So it can take a while to run.

Optimizing JPEGs

JPEG is a lossy format (you lose information every time you save it, even if you choose 100% quality), but there are some operations that can be done losslessly - such as tweaking comments and meta information, cropping, rotating to 90, 180, 270 degrees. The tool that does this magic is called JPEGTran and is likely already on your unix/linux box. If not - here's how to install it (for Windows - get the .exe here)

So to optimize a JPEG losslessly you remove the meta information and optimize the so called Huffman tables. For bigger JPEGs (bigger than 10K) you can also convert the image to progressive coding.

# strip meta and optimize
$ jpegtran -copy none source.jpg > destination.jpg

# strip meta and convert to progressive coding
$ jpegtran -copy none -progressive source.jpg > destination.jpg

# keep all meta but still optimize
$ jpegtran -copy all source.jpg > destination.jpg

Important note on stripping meta

Only strip meta information from images you own the rights for and have permission. Otherwise you're committing a crime. Photographers put important copyright information in meta markers.

Optimizing GIF animations

Remember - no GIFs other than animations. For animations, run GIFsicle (pronounced "yo' mama" :) ) berfore you put them up:

# GIFSicle
$ gifsicle -O2 source.gif > destination.gif

More tools?

These are the core tools for image optimization. There's a number of wrapper tools that are more or less UIs on top of these, because there are people who don't like consoles (really?!). I'll list the few that I can think of, please comment if you know of others, especially for windows. It's nice to give nice UIs to give to designers so they can drag-drop optimize images too.

  • smush.it, created by yours truly and Nicole Sullivan, now part of YSlow - runs pngcrush, jpegtran, gifsicle
  • PageSpeed runs optipng, jpegtran
  • PunyPNG - originally inspired by smush.it, but more advanced
  • ImageOptim - Nice easy UI for Mac, runs most of the tools above (hope your company firewall doesn't block the site because of the domain name ;))
  • PNGSquash another UI for Mac, runs advpng, pngcrush, optipng
  • PNG Monster is for Windows, runs many PNG tools, you can drag/drop on it
  • IrfanView - my favorite image viewer for Windows has a plugin to use PNGOut
  • WP-Smushit is a WordPress plugin by Alex Dunae which sends all your image uploads to smush.it for optimization. Talk about easy!

More reading

Series of articles on YUIBlog:

presentations:

and more:

Thanks!

Thank you for reading. Now you have a whole lot of tools/toys to install and play with. Image optimization is an easy way to improve performance, it's just running a bunch of tools. You don't need to worry that the quality will suffer (so the designer won't be disappointed in you :)). So you can only win. You may win quite a bit, you may win just a little (anywhere between 5 to 30% savings is what I've seen on random live sites when I was working on and testing smush.it). The thing is you'll almost always win something.

And because it's human to forget to optimize the images before you push them live, do take the time to setup the optimization step as part of the automated deployment process.

And to summarize once again the steps of what this automated process would be:

  1. Convert GIFs to PNG. Then relax and take a deep breath (instead of flaming the unfortunate soul who created the GIFs) while you string replace "gif" with "png" in all your styleshets
  2. Run PNGs through optipng, pngcrush, pngout, any or all of the tools listed above
  3. Run JPEGtran
  4. Run GIFsicle
 

Reducing the payload: compression, minification, 204s

Friday, December 11th, 2009

Dec 11 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the next articles.

After removing all the extra HTTP requests you possibly can from your waterfall, it's time to make sure that those that are left are as small as they can be. Not only this makes your pages load faster, but it also helps you save on the bandwidth bill. Your weapons for fighting overweight component include: compression and minification of text-based files such as scripts and styles, recompression of some downloadable files, and zero-body components. (A follow-up post will talk about optimizing images.)

Gzipping plain text components

Hands down the easiest and at the same time quite effective optimization - turning on gzipping for all plain text components. It's almost a crime if you don't do it. Doesn't "cost" any development time, just a simple flip of a switch in Apache configuration. And the results could be surprisingly pleasant.

When Bill Scott joined Netflix, he noticed that gzip is not on. So they turned it on. And here's the result - the day they enabled it, the outbound traffic pretty much dropped in half (slides)

Netflix traffic after turning on gzipping

Gzip FAQ

How much improvement can you expect from gzip?
On average - 70% reduction of the file size!
Any drawbacks?
Well, there's a certain cost associated with the server compressing the response and the browser uncompressing it, but it's negligible compared to the benefits you get
Any browser quirks?
Sure, IE6, of course. But only in IE6 service pack 1 and fixed for after that. You can boldly ignore this edge case, but if you're extra paranoid you can disable gzip for this user agent
How to tell if it's on?
Run YSlow/PageSpeed and they'll will warn you if it's not on. If you don't have any of those tools just look at the HTTP headers with any other tool, e.g. Firebug, webpagetest.org. You should see the header:

Content-Encoding: gzip

provided, of course, that your browser claimed it supports compression by sending the header:

Accept-Encoding: gzip, deflate
What types of components should you gzip?
All text components:

  • javascripts
  • css
  • plain text
  • html, xml, including any other XML-based format such as SVG, also IE's .htc
  • JSON responses from web service calls
  • anything that's not a binary file...

You should also gzip @font-files like EOT, TTF, OTF, with the exception of WOFF. Average about 40% to be won there with font files.

How-to turn on gzipping

Ideally you need control over the Apache configuration. If not full control, at least most hosting providers will offer you ability to tweak configuration via .htaccess. If your host doesn't, well, change the host.

So just add this to .htaccess:

AddOutputFilterByType DEFLATE text/html text/css text/plain text/xml application/javascript application/json

If you're on Apache before version 2 or your unfriendly host don't allow any access to configuration, not all is lost. You can make PHP do the gzipping for you. It's not ideal but the gzip benefits are so pronounced that it's worth the try. This article describes a number of different options for gzipping when dealing with uncooperative hosts.

Rezipping

As Billy Hoffman discovered, there's potential for file size reduction with common downloadable files, which are actually zip files in disguise. Such files include:

  • Newer MS Office documents - DOCX, XLSX, PPTX
  • Open Office documents - ODT, ODP, ODS
  • JARs (Java Applets, anyone?)
  • XPI Firefox extensions
  • XAP - Silverlight applications

These ZIP files in disguise are usually not compressed with the maximum compression. If you allow such downloads from your website, consider recompressing them beforehand with maximum compression.

There could be anywhere from 1 to 30% size reduction to be won, definitely worth the try, especially since you can do it all on the command line, as part of the build process, etc. (re)Compress once, save bandwidth and offer faster downloads every time ;)

15% uncompressed traffic

Tony Gentilcore of Google reported his findings that a significant chunk of their traffic is still sent uncompressed. Digging into it he realized there's a number of anti-virus software and firewalls that will mingle with the browser's Accept-Encoding header changing into the likes of:

Accept-Encoding: xxxx, deflxxx
Accept-Enxoding: gzip, deflate

Since this is an invalid header, the server will decide that the browser doesn't support gzip and send uncompressed response. And why would the retarded anti-virus program do it? Because it doesn't want to deal with decompression in order to examine the content. Probably not to slow down the experience? In doing so it actually hurts the user to a greater extend.

So compression is important, but unfortunately it's not always present. That's why minification helps - not only because compressing minified responses is even smaller, but because sometimes there is no compression despite your best efforts.

Minification

Minification means striping extra code from your programs that is not essential for execution. The code in question is comments, whitespace, etc from styles and scripts, but also renaming variables with shorter names, and various other optimizations.

This is best done with a tool, of course, and luckily there a number of tools to help.

Minifying JavaScript

Some of the tools to minify JavaScript include:

How much size reduction can you expect from minification? To answer that I ran jQuery 1.3.2. through all the tools mentioned above (using hosted versions) and compared the sizes before/after and with/without gzipping the result of minification.

The table below lists the results. All the % figures are % of the original, so smaller is better. 29% means the file was reduced to 29% of its original version, or a saving of 71%

File original size size, gzipped % of original gzip, % of original
original 120619 35088 100.00% 29.09%
closure-advanced 49638 17583 41.15% 14.58%
closure 55320 18657 45.86% 15.47%
jsmin 73690 21198 61.09% 17.57%
packer 39246 18659 32.54% 15.47%
shrinksafe 69516 22105 57.63% 18.33%
yui 57256 19677 47.47% 16.31%

As you can see gzipping alone gives you about 70% savings, minification alone cuts script sizes with more than half and both combined (minifying then gzipping) can make your scripts 85% leaner. Verdict: do it. The concrete tool you use probably doesn't really matter all that much, pick anything you're comfortable with to run before deployment (or best, automatically during a build process)

Minifying CSS

In addition to the usual stripping of comments and whitespaces, more advanced CSS minification could include for example:

// before
#mhm {padding: 0px 0px 0px 0px;}
// after
#mhm{padding:0}

// before
#ha{background: #ff00ff;}
// after
#ha{background:#f0f}
//...

A CSS minifier is much less powerful than a JS minifier, it cannot rename properties or reorganize them, because the order matters and for example text-decoration:underline cannot get any shorter than that.

There's not a lot of CSS minifiers, but here's a few I tested:

  • YUI compressor - yes, the same YUI compressor that does JavaScript minification. I've actually ported the CSS minification part of it to JavaScript (it's in Java otherwise) some time ago. There's even an online form you can paste into to test. The CSS minifier is regular expression based
  • Minify is a PHP based JS/CSS minification utility started by Ryan Grove. The CSS minifier part is also with regular expressions, I have the feeling it's also based on YUICompressor, at least initially
  • CSSTidy - a parser and an optimizer written in PHP, but also with C version for desktop executable. There's also a hosted version. It's probably the most advanced optimizer in the list, being a parser it has a deeper understanding of the structure of the styleshets
  • HTML_CSS from PEAR - not exactly an optimizer but more of a general purpose library for creating and updating stylesheets server-side in PHP. It can be used as a minifier, by simply reading, then printing the parsed structure, which strips spaces and comments as a side effect.

Trying to get an average figure of the potential benefits, I ran these tools on all stylesheets from csszengarden.com, collected simply like:

<?php
$urlt = "http://csszengarden.com/%s/%s.css";
for ($i = 1; $i < 214; $i++) {
  $id = str_pad($i, 3, "0", STR_PAD_LEFT);
  $url = sprintf($urlt, $id, $id);
  file_put_contents("$id.css", file_get_contents($url));
}
?>

3 files gave a 404, so I ran the tools above on the rest 210 files. CSSTidy ran twice - once with its safest settings (which even keep comments in) and then with the most aggressive. The "safe" way to use CSSTidy is like so:

<?php
// dependencies, instance
include 'class.csstidy.php';
$css = new csstidy();

// options
$css->set_cfg('preserve_css',true);
$css->load_template('high_compression');

// parse
$css->parse($source_css_code);

// result
$min = $css->print->plain();
?>

The aggressive minification is the same only without setting the preserve_css option.

Running Minify is simple:

<?php
// dependencies, instance
require 'CSS.php';
$minifier = new Minify_CSS();

// minify in one shot
$min = $minifier->minify($source_css_string_or_url);

As for PEAR::HTML_CSS, since it's not a minifier, you only need to parse the input and print the output.

<?php
require 'HTML/CSS.php';

$options = array(
    'xhtml' => false,
    'tab' => 0,
    'oneline' => true,
    'groupsfirst' => false,
    'allowduplicates' => true,
);

$css = new HTML_CSS($options);
$css->parseFile($input_filename);
$css->toFile($output_filename);
// ... or alternatively if you want the result as a string
// $minified = $css->toString();

So I ran those tools on the CSSZenGarden 200+ files and the full table of results is here, below are just the averages:

  Original YUI Minify CSSTidy-safe CSSTidy-small PEAR
raw 100% 68.18% 68.66% 84.44% 63.29% 74.60%
gzipped 30.36% 19.89% 20.74% 28.36% 19.44% 20.20%

Again, the numbers are percentage of the original, so smaller is better. As you can see, on average gzip alone gives you 70% size reduction. The minification is not so successful as with JavaScript. Here even the best tool cannot reach 40% reduction (for JS it was usually over 50%). But nevertheless, gzip+minification on average gives you a reduction of 80% or more. Verdict: do it!

An important note here is that in CSS we deal with a lot of hacks. Since the browsers have parsing issues (which is what hacks often exploit), what about a poor minifier? How safe are the minifiers? Well, that's a subject for a separate study, but I know I can at least trust the YUICompressor, after all it's used by hundreds of Yahoo! developers daily and probably thousands non-Yahoos around the world. PEAR's HTML_CSS library also looks pretty safe because it has a simple parser that seems to tolerate all kinds of hacks. CSSTidy also claims to tolerate a lot of hacks, but given that the last version is two years old (maybe new hacks have surfaced meanwhile) and the fact that it's the most intelligent optimizer (knows about values, colors and so on) it should be approached with care.

204

Let's wrap up this lengthy posting with an honorable mention of the 204 No Content response (blogged before). It's the world's smallest componet, the one that has no body and a Content-Length of 0.

Often people use 1x1 GIFs for logging and tracking purposes and other types of requests that don't need a response. If you do this, you can return a 204 status code and no response body, only headers. Look no further that Google search results with your HTTP sniffer ON to see examples of 204 responses.

The way to send a 204 response from PHP is simply:

header("HTTP/1.0 204 No Content");

A 204 response saves just a little bit but, hey, every little bit helps.

And remember the mantra: every extra bit is a disservice to the user :)

Thank you for reading!

Stay tuned for the next article continuing the topic of reducing the component sizes as much as possible.

 

Caching vs. inlining

Thursday, December 10th, 2009

Dec 10 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the next articles.

Looking back at the life of Page 2.0. and all the opportunities for optimization, the posts I've put up so far as part of this advent calendar have been mostly around optimizing the waterfall - making it shorter by having fewer components in it. This post will be probably the last in this area of optimization. After that let's talk about having smaller components and then move away from the waterfall and talk about what happens next - rendering, user interaction, JavaScript...

Caching

Caching comes into the picture for repeat page visits. When the user comes to your page for the second time, wouldn't it be beautiful if all components are in the cache and the waterfall is really short?

Here are the two views - first and repeat - from visiting the same page.

first view repeat view

Ideally the repeat views should be with full cache and everything should be loaded from the local disk. That's what many people still assume: "Who cares about CSS? It's in the cache. Images? How do you mean sprites? But it's all in the cache, right?" Wrong. There are surprisingly few visits with full cache that we're used to believe.

This research
demonstrates that on an extremely popular site - yahoo.com - about 50% of the visits are empty cache visits and 20% of all page *views*.

You cannot control what the users do what their cache, but you can suggest that their browsers keep a copy of your files for as long as they can.

"Never expire" policy

The never expire policy is to set a far future Expires (or Cache-Control) header effectively saying to the browser - hey this copy is good for a long time, keep it and reuse it.

The way to set the expires header for your static components is to tweak the Apache configuration (in .htaccess if your host gives you no other options) for example like so:

ExpiresActive On
ExpiresByType application/x-javascript "access plus 10 years"
ExpiresByType text/css "access plus 10 years"
ExpiresByType image/png "access plus 10 years"

Now if you access a PNG on Dec 10th 2009, Apache will add this header to the HTTP response:

Expires: Sun, 10 Dec 2019 05:47:47 GMT

The browser should never request this file again till 2019, effectively "forever".

The drawback is, of course, that you cannot modify this file anymore, since some users have already cached it forever and ever till the end of 2019. If you need to change the file, you have to save it under a different name and update all references to it. For the new name some use consecutive numbers, some use timestamps, some even hashes of the content, so that the name reflects the actual body of the component.

Caching vs. inlining

External components have a shot at being cached. But on the other hand having inline components (script, styles) means having fewer HTTP requests. Which one is better?

It depends from one case to the next but generally inlining is better for first visits (for homepages) and external components are better for repeat visits. But homepages are so 1998, with the modern search engines people find deep content pages and don't browse from the homepage. There are exceptions, of course. Some pages are common user homepages (the first page when the browser loads).

Anyway - inline vs. external? How about both?

Inlining, then caching

During first visits you can inline some components, then once the page is loaded, lazy-load the same components but this time as external files (and proper far-future Expires date).

For repeat visits, you only link to the external components as they are already in the cache!

So how do you determine first visits? How can you tell that the user's cache is empty?

You can use cookies and assume missing cookie means empty cache. Drawbacks: 1. someone can delete cache without deleting cookies and 2. what happens when the external component updates

Both of these scenarios are not so bad, because the page will still work, only less optimally. And you can take care of 2. by introducing (complexity and) a component version information in the cookie.

Oh, and drawback 3. - writing longer term cookies increases the size of your headers.

Another option is to track sessions only and assume empty cache for all new sessions. This way you don't need longer-term (and unreliable, see drawback 1.) cookies. In this case the problem might be that you may be missing out on optimizing the first visit in the session. The user may have visited the site yesterday, closed the browser and there she comes today with the full bag of cached components. But you assume first visit = empty handed visit, so you don't benefit from the components already in the cache during the first page view.

Which case is the right for you? Only your logs and experimentation can tell.

For an example in the wild, check Bing's search results. The first visit in the session you get a number of inline stylesheets, which then get lazy-loaded as external files. Second search in the same session and you get the stylesheets as external files.

Thanks for reading!

And that concludes the "fewer HTTP requests" part of the calendar. Tomorrow's topic will be about making smaller those components that slip through the cracks of the HTTP reduction effort.

 

Duplicates and near-duplicates

Wednesday, December 9th, 2009

Dec 9 This post is part of the 2009 performance advent calendar experiment. Stay tuned for the next articles.

One of Yahoo!'s first batch of performance best practices has always been "Avoid duplicate scripts" (check Steve Souders' post). Later we added "... and styles". This is a pretty obvious, kind of a "Duh!" type of recommendation, it's like saying "Avoid sleep() in your server-side scripts". But it didn't come up out of thin air, duplicates were noticed on some quite high-profile sites.

Duplicates are easy to spot (and YSlow will warn you), but let's talk a bit about another concept - let's call it near-duplicates - when two components are similar, almost the same, but not quite.

Duplicate scripts and styles

As a refresher and a quick illustration of the effects of duplicate scripts and styles, fire off your HTTP sniffer and hit this test page.

(btw, this is a simple page I put up to test different YSlow scenarios, you can actually use it as a web service of sorts to create any type of components with different options)

Firefox 2 downloading both duplicate styles and scripts:

FF2 duplicate scripts and styles

IE6 and duplicate scripts:

IE duplicate scripts

Exact details of when/which browsers chose to download duplicates are not that interesting, it's obviously bad to waste time downloading the same resource. Even if no repeating download happens, the browser still has to parse through and execute the script/style for a second time.

Even if you have iframes you don't need to repeat the same JS/CSS in each frame, you can "borrow" them from the parent page, here's an example.

Near-duplicates

Near duplicates can be:

  • components with the exact same response bodies but different URLs causing the browser to do double work
  • components (images) that are too close to each other - in terms of looks or purpose. Only one component should be selected in this case.

Same component, different URLs

This could happen especially when you have user-generated content such as image uploads for profile photos and avatars in social sites, forums, images people put in comments on MySpace and so on.

Also images of stuff for sale (Craigslist, eBay). Often different sellers offering the same item would take the same photo from the manufacturer's site and upload it over and over again.

Luckily, PageSpeed warns about components with identical content, so those can be identified:

In the screenshot above, you see one image (2.3K) repeated 3 times, another (the iPhone, 1.7K) is repeated 4 times, and yet another one (2.8K) repeated 2 times.

It's not exactly trivial to avoid this type of duplications with user-generated content (for example, the first poster may delete the photo, in which case the second poster's photo will need to "shine through"). But it's not impossible, using for example a hash of the component's content as an identifier.

Loading...

Ajax loading indicators are a great idea to give feedback to the user that something is happening. They come in all shapes and sizes... sometimes on the same page, unfortunately. And again, sometimes it's the same stock image but used at different stages of gradual "ajaxification" of the page and with different URLs.

As we're moving more and more towards modular pages and client-side logic, often different modules on the same page are coded by different teams at different times, independently, without being aware of each other's assets. This way of building pages has it's challenges and one is that common components, such as Ajax loading indicators, should be shared.

Too similar modules

Along the same lines - different modules are sometimes created by different designers at different times. The result - one rounded corner box with 1px shadow and one with 2px shadow, both on the same page. Or two different shades of the same gray color, which no one can tell apart. That's just a waste. (See Nicole Sullivan's presentation for illustration, e.g. slides 44, 45)

Below is an example, can you tell that these 5 rounded corner boxes are all different - slightly different shadow, color or radius? How many different boxes does this page need?

Different sizes of the same image

It's highly recommended to not scale images in HTML (or CSS). If you need a 100x100 image you don't use a 400x400 one with <img width="100" height="100" ... />. That's a good rule of thumb... to break sometimes ;)

In cases where the same image is used with different sizes and likely even on the same page, it may be beneficial to reuse the same bigger image and scale it down, because this could be saving extra HTTP requests of downloading the same (but slightly smaller) image.

Facebook is an example, the same hairy guy on the screenshot has two images with different sizes. It's actually the same image but resized in CSS.

The relevant CSS which shows the profile image in LARGE and SMALL (and looks like there's even a TINY view, although I couldn't find an example on this page)

.UIProfileImage_LARGE{width:50px;height:50px}
.UIProfileImage_SMALL{width:32px;height:32px}
.UIProfileImage_TINY{width:25px;height:25px}

Thank you!

Thanks for reading! Reducing HTTP requests is critical for page performance. You've merged your scripts and styles as much as reasonable, you've crafted CSS sprites and inlined images with data URIs. Now it's time to look at what's left - are there components that are way too similar, are there any near-duplicates or even exact duplicates? Same image on different backgrounds? Ever-so-subtle gradients and shadows? Time to pick up the old axe and cut.

 

Collecting web data with a faster, free server

Tuesday, December 8th, 2009

Dec 8 This article is part of the 2009 performance advent calendar experiment. This is also the first ever guest post to this blog.
Please welcome the world-famous Christian Heilmann! And stay tuned for the next articles.

Christian HeilmannChris Heilmann is a self confessed data junkie and worked for over 12 years as a professional web developer. Having published several books on JavaScript, Accessibility and web development using web services he right now works as a developer evangelist for the Yahoo Developer Network. He blogs at http://wait-till-i.com/, has all his talks and videos at http://icant.co.uk/ and can be found on Twitter as @codepo8.

 

RSS is a wonderful format to get information from all kind of different sources. It is dead easy to provide, has a predictable (albeit limited) format and is very easy to use. The problem of course is that with the amount of different feeds used in one page its performance goes down.

The reason is the classic HTTP request issue - the more you negotiate, find and pull the slower your page renders. Therefore you need to try to shorten the time the calls happen.

Say you want to pull the following five RSS feeds and display them:

  • http://code.flickr.com/blog/feed/rss/
  • http://feeds.delicious.com/v2/rss/codepo8?count=15
  • http://www.stevesouders.com/blog/feed/rss
  • http://www.yqlblog.net/blog/feed/
  • http://www.quirksmode.org/blog/index.xml

The least effective way of doing that is pulling and displaying them one after the other:

Retrieving five RSS feeds using curl takes about 11 seconds.

$oldtime = microtime(true);
$url = 'http://code.flickr.com/blog/feed/rss/';
$content[] = get($url);
$url = 'http://feeds.delicious.com/v2/rss/codepo8?count=15';
$content[] = get($url);
$url = 'http://www.stevesouders.com/blog/feed/rss';
$content[] = get($url);
$url = 'http://www.yqlblog.net/blog/feed/';
$content[] = get($url);
$url = 'http://www.quirksmode.org/blog/index.xml';
$content[] = get($url);
display($content);
echo '<p>Time spent: <strong>' . (microtime(true)-$oldtime) .'</strong></p>';
function get($url){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $output = curl_exec($ch);
  curl_close($ch);
  return $output;
}
function display($data){
  foreach($data as $d){
    $obj = simplexml_load_string($d);
    echo '<div><h2><a href="'.$obj->channel->link.'">'.
          $obj->channel->title.'</a></h2>';
    echo '<ul>';
    foreach($obj->channel->item as $i){
      echo '<li><a href="'.$i->link.'">'.$i->title.'</a></li>';
    }
    echo '</ul></div>';
  }
}

Using Stoyan's Multi Curl function you already realise quite an increase in speed.

Retrieving five RSS feeds with multicurl takes about 2.8 seconds.

$data = array(
  'http://code.flickr.com/blog/feed/rss/',
  'http://feeds.delicious.com/v2/rss/codepo8?count=15',
  'http://www.stevesouders.com/blog/feed/rss',
  'http://www.yqlblog.net/blog/feed/',
  'http://www.quirksmode.org/blog/index.xml'

);
$r = multiRequest($data);
display($r);

function multiRequest($data, $options = array()) {
  // array of curl handles
  $curly = array();
  // data to be returned
  $result = array();
  // multi handle
  $mh = curl_multi_init();
  // loop through $data and create curl handles
  // then add them to the multi-handle
  foreach ($data as $id => $d) {
    $curly[$id] = curl_init();
    $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
    curl_setopt($curly[$id], CURLOPT_URL,            $url);
    curl_setopt($curly[$id], CURLOPT_HEADER,         0);
    curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
    // post?
    if (is_array($d)) {
      if (!empty($d['post'])) {
        curl_setopt($curly[$id], CURLOPT_POST,       1);
        curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
      }
    }
    // extra options?
    if (!empty($options)) {
      curl_setopt_array($curly[$id], $options);
    }
    curl_multi_add_handle($mh, $curly[$id]);
  }
  // execute the handles
  $running = null;
  do {
    curl_multi_exec($mh, $running);
  } while($running > 0);
  // get content and remove handles
  foreach($curly as $id => $c) {
    $result[$id] = curl_multi_getcontent($c);
    curl_multi_remove_handle($mh, $c);
  }
  // all done
  curl_multi_close($mh);
  return $result;
}
function display($data){
  foreach($data as $d){
    $obj = simplexml_load_string($d);
    echo '<div><h2><a href="'.$obj->channel->link.'">'.
          $obj->channel->title.'</a></h2>';
    echo '<ul>';
    foreach($obj->channel->item as $i){
      echo '<li><a href="'.$i->link.'">'.$i->title.'</a></li>';
    }
    echo '</ul></div>';
  }
}

However, there are still two things that are annoying:

  • You pull far more data than you really need
  • You do all the request from your server

Yahoo Pipes has been used for that kind of task for quite a while, but the issue was that it is a visual interface and therefore hard to maintain. The server was also not the best performing out there.

The good news is that there is a new(er) kid on the block in Yahoo Land called YQL running on a massively fast server farm and with one purpose: making it easier to use web services, mix them and get only the data back that you want.

YQL in itself is a web service and you post queries to it that access other web services in a SQL style syntax. Normally you'd get RSS feeds using the RSS table, like so:

select * from rss where url="http://feeds.delicious.com/v2/rss/codepo8?count=15"

See the single RSS in the YQL console (you need a Yahoo account to log in). You can also see the single RSS retrieval output.

The issue with this is that it only retrieves the items of the RSS feed and not the title, which we need for the headings. Therefore we need to use the XML table:

select * from xml where url="http://feeds.delicious.com/v2/rss/codepo8?count=15"

See the single XML in the YQL console (you need a Yahoo account to log in). You can also see the single XML retrieval output.

This gives us the same data the normal cURL calls give us. The cool thing about YQL is though that you can filter the data you get back to the bare minimum. In our case, this means replacing the * with the title and the link of the feed and of the items:

select channel.title,channel.link,channel.item.title,channel.item.link 
  from xml 
  where url="http://feeds.delicious.com/v2/rss/codepo8?count=15"

See the filtered RSS in the YQL console (you need a Yahoo account to log in). You can also see the filtered RSS retrieval output.

In order to use this with all of our RSS feeds, we can use the in() command:

select channel.title,channel.link,channel.item.title,channel.item.link
    from xml where url in(
      'http://code.flickr.com/blog/feed/rss/',
      'http://feeds.delicious.com/v2/rss/codepo8?count=15',
      'http://www.stevesouders.com/blog/feed/rss',
      'http://www.yqlblog.net/blog/feed/',
      'http://www.quirksmode.org/blog/index.xml'
    )

Check the aggregation in the console or the aggregation output.

This leaves all the hard work to the Yahoo Server farm. YQL pulls all the RSS feeds, adds one after the other and then gives it back to us as XML. We could simply use the generated URL from the console, but it is much more versatile to assemble the query in PHP:

$data = array(
  'http://code.flickr.com/blog/feed/rss/',
  'http://feeds.delicious.com/v2/rss/codepo8?count=15',
  'http://www.stevesouders.com/blog/feed/rss',
  'http://www.yqlblog.net/blog/feed/',
  'http://www.quirksmode.org/blog/index.xml'
);
$url ='http://query.yahooapis.com/v1/public/yql?q=';
$query = "select channel.title,channel.link,channel.item.title,channel.item.link from xml where url in('".implode("','",$data)."')";
$url.=urlencode($query).'&format=xml';
$content = get($url);
display($content);
function get($url){
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  $output = curl_exec($ch);
  curl_close($ch);
  return $output;
}
function display($data){
  $data = simplexml_load_string($data);
  $sets = $data->results->rss;
  $all = sizeof($sets);
  for($i=0;$i<$all;$i++){
    $r = $sets[$i];
    $title = $r->channel->title.'';
    if($title != $oldtitle){
      echo '<div><h2><a href="'.($r->channel->link.'').'">'.
           ($r->channel->title.'').'</a></h2><ul>';
    }
      echo '<li><a href="'.($r->channel->item->link.'').'">'.
           ($r->channel->item->title.'').'</a></li>';
    if($title != $sets[$i+1]->channel->title.''){
      echo '</ul></div>';
    }
    $oldtitle = $r->channel->title.'';
  };
}

As you can see the loop to display the different RSS feeds a bit clunky and we could use an open YQL table to move the whole conversion to a server-side JavaScript. However, as it is the performance of this way of retrieving the RSS feeds beats all the others hands-down already:

Retrieving five RSS feeds using YQL takes about half a second

You can try it yourself, get the demo code from GitHub and run it on your own server to see the magic of YQL.

 

Data URIs, MHTML and IE7/Win7/Vista blues

Monday, December 7th, 2009

Dec 7 This is the seventh in the series of performance articles as part of my 2009 performance advent calendar experiment. Stay tuned for the next articles.

Let's start this post as a dialog:

- Data URIs are a way to embed the base64-encoded content of images inside HTML and CSS. Inlining images in markup or stylesheets helps you save precious HTTP requests. It's an alternative to CSS sprites.
- BUT! IE6 and IE7 don't support data URIs
- Yes, for them you can inline the images using MHTML
- BUT! Looks like IE7 on Vista and IE7 on Win7 have problems with MHTML
- [Sigh...] For those browser/OS combos there's a solution too.

I'll quickly go over what are data URIs and how MHTML addresses IE6 and IE7. If you're familiar with these, feel free to skip down to the Vista blues part.

Data URIs

Here's how to use data URIs in your pages:

  1. take an image:
    horoscopes icon image
  2. Use PHP to read this image's binary content and encode it using base64 encoding:
    $ php -r "echo base64_encode(file_get_contents('horoscopes.png'));"
    iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAMAAADXqc3KAAADAFBMVEX///8mPnru9...
    ...more scary stuff goes here...
    dajNGGzlDAAAAABJRU5ErkJggg==
  3. To use this image in CSS, paste the encoded image content in the stylesheet following this syntax:
    .horoscopes {
        background-image: url("data:image/png;base64,iVBOR...rkJggg==");
    }

    Note: the trailing == is part of the image content it's not part of the syntax

  4. Alternatively if you want to use this image in the HTML, you can go like:
    <img src="data:image/png;base64,iVBOR...rkJggg==" />

And that's about it for data URIs, it's quite simple actually.

Data URIs in the wild

To see real-life, high-traffic sites using data URIs look no further than your favorite search engines.

Yahoo! Search using data URI in CSS for a button background:

yahoo search screenshot

Google Search using data URI in an IMG tag for a video thumb:

Google search screenshot

Base64-encoded filesizes

The base64 encoding adds about 33% to the filesize. But, then you gzip the result, the compression brings the size back to the original. Plus or minus. Interestingly enough, sometimes after base64 and gzip, the result is smaller than the original. But don't count on that.

Theoretically also when you base64 encode a bunch of images and inline them in the same CSS or HTML, you'll have a larger string to compress, so increasing the chance of repetitions and compressing better. (Todo: test this assumption)

In any event, the benefit of reducing HTTP requests will likely greatly outweigh any fluctuation in the file size.

MHTML

IE8 supports data URIs, but earlier IEs do not. For them you can use MHTML (Multipart HTML, blogged previously here)

Continuing with the previous example, in order to display the horoscopes image in IE < 8 you can use a stylesheet like this:

/*
Content-Type: multipart/related; boundary="_MY_BOUNDARY_SEPARATOR"

--_MY_BOUNDARY_SEPARATOR
Content-Location:horoscopes
Content-Transfer-Encoding:base64

iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAMAAADXqc3KAAADAFBMVEX///8mPnru9....U5ErkJggg==
*/

.horoscopes{*background-image:url(mhtml:http://example.org/styles.css!horoscopes);}

A few notes on this example:

  • the image content is base64-encoded just like before
  • if you need more images, use your chosen separator prepended with --, in this case --_MY_BOUNDARY_SEPARATOR
  • Content-Location:horoscopes is how you give a name to the image in order to use it later
  • then later in your stylesheet you can reference the name in the stylesheet using http://url/styles.css!horoscopes
  • you need absolute URLs in the mhtml:http://.... part (that's retarded, hope I'm mistaken. Didn't work for me with relative URLs)
  • you can use a single CSS file to support both old IE as well as new browsers, but you'll have to repeat the image stream twice, probably a better approach is to use browser specific stylesheets

For more examples of MHTML (which is also used for multipart email messages) check this and for a brilliant example of using one MHTML to embed images in HTML (not CSS), check Hedger's test page here (appears 404 at the moment of writing). For a tool to automate the dataURI-zation/MHTML-ization, be sure to check Nicholas Zakas' CSSEmbed

The problem with IE7 on Vista (and Win7)

So far we have data URIs and MHTML fallback. Turns out we're not done yet, because IE7 on Vista and Windows 7 fails with the MHTML example. Two points:

  • IE8 is the default in Windows 7, but it comes with several IE7 modes (either by default or turned on by users when their favorite pages break). Compatibility view, ie7 mode, ie8 in ie7 mode, blah-blah, it's all pretty confusing, but all modes with the exception of IE8 in proper IE8 standards don't work with the MHTML
  • I tested Windows 7 evaluation copy, but I have the bad, bad feeling that Windows 7 normal copy will be as broken when it comes to MHTML. I believe it has something to do with security, if anyone finds an article on MSDN, or anywhere, please share.

The solution to the Vista problem is to split the CSS shown above (the one that uses MHTML) in two: one which contains the encoded images (the commented part) and a second one which only has the normal CSS selectors part.

In other words have something like:

  1. mhtml.txt:
    Content-Type: multipart/related; boundary="_MY_BOUNDARY_SEPARATOR"
    
    --_MY_BOUNDARY_SEPARATOR
    Content-Location:horoscopes
    Content-Transfer-Encoding:base64
    
    iVBORw0KGgoAAAANSUhEUgAAABgAAAAYCAMAAADXqc3KAAADAFBMVEX///8mPnru9....U5ErkJggg==
    

    No need to use comments. Could be .txt, .css or anything you like. In my tests the content-type didn't matter - text/html, text/css, text/plain all worked

  2. styles.css:
    .horoscopes{*background-image:url(mhtml:http://example.org/mhtml.txt!horoscopes);}

    The normal CSS simply references the new MHTML file appending !identifier

  3. in your HTML you don't need to reference the MHTML document, you just point your link tag to styles.css as usual and styles.css contains the reference to the MHTML doc

Time to celebrate? Almost.

Another ugly bug surfaces - this procedure outlined above works only once. Once the MHTML.txt is cached, it stops working. That means repeating visits to the page or other pages using the same mhtml. The effect also means hovering on the same page. Say you have two images - one normal and one mouse over, both in the same MHTML. The normal works and the hover breaks, mouseout breaks too. Insane, isn't it?

There's a solution though - you should make the browser request the MHTML document every time. This is kind of backwards when it comes to performance optimization, where we want to cache everything. It's pretty unfortunate. The best you can do is avoid sending the MHTML every time, but only send a "not-modified" header. In order for this to work, the browser has to send If-Modified-Since or If-None-Match.

So you can send your MHTML file with an ETag (yes, ETags can help sometimes ;)) Then the browser will send an If-None-Match and you can happily reply "304 Not Modified".

Sounds complicated? It sure is. And, remember, all this is only for IE7 (or any IE7 mode) on Vista and Windows 7. All other IEs on all other platforms work with the easy MHTML and IE8 works with data URIs.

Everything is a trade-off (see last night's post) and I can understand how you may think "That is one big tradeoff". There's always the option of serving normal images to the affected browser/os and just don't implement this optimization for them.

But it's good to know there is a solution.

DataSprites class

Let me offer this DataSprites PHP class I coded that takes care of all the scenarios. It takes a bunch of image filenames as input and produces:

  • a CSS containing data URIs for normal browsers (example output)
  • an MTHML CSS fallback for IE6,7 (example output)
  • a CSS that refers to an additional MHTML for IE7 on Vista, Win7 (example output - css and mhtml). The MHTML in this case is sent out with an ETag (derived from the input image filenames) and consecutive requests with the same ETag return 304 Not Modified

You can see a test page here and view the source code here. The directory listing is also available if you want to grab the images for testing.

The perfect use case for such an approach is when you want to combine background images on the fly. When you would normally use CSS sprites, but you want to produce them dynamically at run time (maybe because you have too many combinations?).

I've tested this in Safari, Firefox, Opera, IE6, IE7 and IE8 (in all compat modes) on Vista and Windows 7 evaluation copy. If you find the test page is not working for you, please let me know.

In this DataSprites class, I'm checking the problem OS looking for the strings "Windows NT 6" and "Windows NT 7" in the user agent string. My Win7 evaluation copy had "Windows NT 6.1" in the UA, so I may be a little forwards-aggressive with the check, but somehow I thought Win7 proper should have "Windows NT 7" in the UA string. And also that Win7 will suffer from the same issue as Vista and Windows 7 evaluation.

UPDATE: Added a hover example.

UPDATE 2: Recorded a screencast to demo the IE7/Vista experience

Thanks!

Thanks for reading! Now you know all about data URIs, MHTML, and maybe a little too much about IE7/Vista's challenges :) Ready to use data URIs and save some HTTP requests?

 

The pain points of having fewer components

Sunday, December 6th, 2009

Dec 6 This is the sixth in the series of performance articles as part of my 2009 performance advent calendar experiment. Stay tuned for the next articles.

Last night I talked about the benefits of reducing the number of page components and the resulting elimination of HTTP overhead. A little carried away into making the case (because, surprisingly, I still meet people that are not convinced about the benefits) I didn't talk about some of the drawbacks and pain points associated with reducing the number of components. Kyle Simpson posted a comment reminding that it's not all roses (also see his blog post).

On drawbacks

A lot of the activities related to performance optimization have drawbacks associated with them. That's the nature of the game. Rarely an improvement is a clear win. Usually you need to sacrifice something - be it development time, maintainability, browser support, standards compliance (reluctantly), accessibility (never!). It's up to the team to decide where to draw the line. My personal favorite type of optimizations are those that (for example image optimization) are taken care of automatically, by a build process, offline (as opposed to runtime), at the stage right before deployment. But this also has drawbacks - you need to setup the build process to begin with, a build process that doesn't introduce errors, doesn't require more testing, is easy and friendly to use, doesn't take forever and so on.

So, drawbacks.

Here's how Jerome K. Jerome jokes about trade-offs in his Three Men in a Boat:

"...everything has its drawbacks, as the man said when his mother-in-law died, and they came upon him for the funeral expenses"

It's never easy, is it?

Combining creates atomic components

Say you merge three JavaScript files - 1.js, 2.js and 3.js to create all.js. A few days later you change 2.js. This invalidates all the cached versions of all.js that existing users have in their browser cache. You need to create a new bundle, say all20091206.js. Then a few more days pass and you fix a bug in 3.js. Again, you need a new bundle. Than the library (1.js) comes up with a new verison and you upgrade - yet another bundle.

While first-time visitors benefit from the fewer requests, repeating visitors don't take much advantage of the cache with such frequent updates. In optimization we also say - set the expiration date in far future, but what's the point if you're going to change the component every other day.

One way to deal with this is organizational - set up a release schedule. You don't push that new feature today when you know that the library update is coming up tomorrow. Queue the changes and roll them out in batches (could be pre-defined intervals). This is important in larger organizations where more than one team may touch parts of the bundled components and push out the whole uber-component.

Another way is to identify which files change relatively often and put the movable parts in a separate bundle. This way you end up with one core, slow moving component and one that changes quickly. When you change it, you at least don't invalidate the core.

But in order to make such a call you need to measure and analyze. And answer the two questions:

  1. How many repeat visits do I have? Forums and online communities are more sticky for example. Other sites may be surprised how few visitors come with full cache.
  2. Which files to put in the changing bundle? Sure not all change at the same rate. Also the rate of changes also changes. e.g. a new feature matures and has less bugs, therefore requires less updates. When do I move a piece of new code into the slow-to-change core?

Facebook have gone a pretty scientific in this regard. David Wei And Changhao Jiang presented their system for monitoring the usage of the site and adaptively bundling features together. They show how they have (slide 26) total of 6 megs JS and 2 megs CSS. Merging all of that into one atomic component is unthinkable, so more intelligent methods are a must.

So the third way is to follow common user paths across the pages visited in a session and bundle together the files that the user is likely to need in his/her pattern of site navigation.

Yet another way to combine stuff is to have pre- and postload bundles. One has the bare minimum to get the user interacting with the page, the other has more of the bells and whistles.

In summary, the atomic components create challenges of their own. That shouldn't be used as an excuse though. As you see, there are ways.

Atomic bundles and personal projects

I doubt that the options above are really ever going to work for a personal project or a blog though. We're lazy people and since the site ain't broken... why fix it?

A personal project usually gets us excited initially and then the interest fades away for an extended period of time. If you don't change something now while you're into the project, then changes happen less often as the time passes. (You also forget how the code works and are reluctant to touch/break it)

A good strategy to optimize a personal project is to come back to it a week of two after a major update. Things are still fresh in your head, but if you haven't changed anything in two weeks, you probably won't touch it for another year. So 'tis the time for a one-off little project - turn on gzipping, minify CSS and JS (which otherwise you changed much too often a week ago), combine components.

Post-launch one-off optimization sprints don't work in bigger organizations and projects, but they can be a "cost"-effective way to optimize a small, personal site.

Merging components is an extra step

Scenario 1: why merge now when in the p.m. I'll work that thing again?
Scenario 2: I have to move to another task in the p.m. and I barely have the time to fix this bug before lunch, let alone to combine components and stuff.

Merging components is boring, there's nothing particularly challenging to it. How hard could it be?

$ cat 1.js 2.js 3.js > all.js

And it's an extra step.

Any extra work you introduce, any extra step has a great chance of being skipped. That gradually forgotten.

That's why there are two options for the optimizations:

  1. Make them easy to do. Otherwise no one's going to do it.
  2. Make them harder to not do it

What I mean is for example take the few minutes to create a script for this new project or feature. The script minifies, combines and pushes to the server. All that with one command line. And especially if you do it several times a day, you can keep one console open and just hit ARROW UP + ENTER to repeat the command after you're done with a change. Two keyboard strokes sure beats opening up an ftp program, logging in, navigating your HDD, navigating the server... tens of clicks easily.

To summarize: it's best not to underestimate the laziness, we shouldn't ignore the fact that we're lazy, but embrace it. Make the right thing easy and the wrong thing hard.

Run-time merging

It's best to keep the static components static. Static is simple, static is fast. But sometimes in the interest of merging, minification and so on in a very dynamic (or technically lacking) environment, you may resort to dynamic file bundling. Sometimes also the little pieces can be merged in so many combinations that dynamic components make sense.

Take for example the combo handler script used by YUI. You have a combo script at:
http://yui.yahooapis.com/combo
which simply takes a list of files to merge, finds them on the disk and spits them out with the correct Expires header.

For example here's a URL produced by the dependency configurator:

http://yui.yahooapis.com/combo?3.0.0/build/yui/yui-min.js&3.0.0/build/oop/oop-min.js&3.0.0/build/event-custom/event-custom-min.js&3.0.0/build/attribute/attribute-base-min.js&3.0.0/build/base/base-base-min.js

The little pieces that make up the YUI library can be combined in so many ways that pre-merging all possible bundles is not an option.

As part of YUI3, there was a PHP loader recently released which you can use to host your own combo solution - doesn't have to do anything with YUI. Or you can roll your own combo script, how hard could be to have a script that merges a number of files. Check Ed Eliot's script for inspiration/borrowing.

SmartOptimizer is another open source project that you can use for comboing, minification and so on.

If you do create your own combo script, don't forget to store the merged results in disk cache directory, so you don't have to re-merge the same bundles over and over.

CSS sprites are a pain

While combining text source files like CSS and JS is fairly trivial, creating sprites is a pain. Not only you have to create the image, but them get the coordinates and write the CSS rules. Then when an element in the sprite changes, it may affect the others and you may have to move them around, then update the CSS. Luckily there are tools.

Here are some tools and resources that should help with the sprites. CSS sprites are an excellent way to reduce HTTP components, especially when it comes to tiny icons, those that are sometimes smaller than the HTTP request/response headers needed to download them. It's a technique that is still underused.

And now, re-introducing CSSSprites.com :)

CSSSprites.com

CSSSprites.com is a fun weekend project I did two and a half years ago. Back then I was even more convinced that this technique is wildly underused, still considered unstable by many people, although it was fearlessly in use by prominent sites such as yahoo.com. There was also a bit of lack of understanding how to build sprites and also, just like today, it's just a pain to create them.

So I wanted to raise awareness of the technique by creating an online tool for sprites generation, probably the first tool (which explains why I was able to get such a domain name). The idea was just to let people know that this technique exists, it's not rocket science and you can start quickly by using the tool. The tool was actually quite ugly and I posted a comment on css-tricks.com asking if anyone wants to contribute a skin.

CSSSprites.com: BEFORE
csssprites.com - before

CSSSprites.com: AFTER
csssprites.com - after

Luckily, Chris Coyier who runs css-tricks.com responded and sent me a skin. It was exactly 2 years minus 2 days ago.

(I remember January 2nd 2008 right after new year's, I was writing a list of things I want to accomplish in 2008. It was a list of unfinished projects actually. I decided I was sick of not finishing projects so 2008 would be the year of completion. I wasn't going to start anything new until I finish all abandoned projects. Needless to say, as all new year resolutions, it didn't work. I actually started something pretty cool and at the same time never finished csssprites.com with Chris' skin)

So today, after two years of procrastination, I sat down and updated the site. Additional kick in the behind was that few days ago I had to take the site down, because the exceptionally reliable host site5.com in their care for the well being of the sites on the shared hosting, shut down all my sites, including this blog, because csssprites.com was consuming too much resources. I had to take it down. Today it's proudly back up hosted by dreamhost, the all-hero host that survived the smush.it explosion.

What's new in the tool?

  • nicer skin!
  • the old "skin" is sill available
  • some options - padding between sprite elements, background color, border size, choice of top vs. left alignment of the elements
  • no more gifs are being generated
  • the generated PNG is optimized with pngout and pngcrush
  • upgrade to YUI3

So give it a spin and comment with any problems/wishes and so on. I haven't posted the code yet, it's not in a presentable statge, plus the "core" of it, the generation of the sprite image, is already out there.

That's all folks

When I talked about optimizing personal project above, I had in mind sites like csssprites.com. Currently it's not optimized at all - it's a sprites tool that doesn't use sprites. But that's because I only worked one evening on it and it's probably broken here and there. I intend to revise it in a few weeks when it stabilizes.

So, parting words - combining components into atomic dowanloads is not without drawbacks, just like most other performance optimizations. But this doesn't mean we shouldn't do it. Care and analysing users' behavior is a way to optimize the process of optimizing but meanwhile just go with your feeling and look at the metrics to see if you were right. Rinse, repeat. When in doubt - err on the side of less HTTP requests.

 

Reducing the number of page components

Saturday, December 5th, 2009

Dec 5 This is the fifth in the series of performance articles as part of my 2009 performance advent calendar experiment. Stay tuned for the next articles.

Let's talk a but about waterfall optimization - the first thing that happens in Mr.Page's life. The best way to optimize and speed up the waterfall is to have less stuff in it. The fewer page components, the faster the page - simple as that.

Fewer components vs. component weight

The size of the page components, meaning their size in kB, is important. It makes sense - smaller pages will load faster, 100K JavaScript will load faster than 150K. It's important to keep sizes low, but it should be clear that the number of components is even more important than their filesize.

Why? Because every HTTP request has overhead.

OK, but how bad can it be, someone might ask. If you look at an HTTP request - it has a header and a body. A 100K body will greatly outweigh the size of the headers, no matter how bloated they are.

Here's the headers for a request to Yahoo! Search:

Host: search.yahoo.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5;) Firefox/3.5.5
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

To that request the server responds with the body (content) of the response prepended with some headers like:

HTTP/1.1 200 OK
Date: Sat, 05 Dec 2009 07:36:25 GMT
P3P: policyref="http://p3p.yahoo.com/w3c/p3p.xml", CP="CAO DSP COR CUR... blah, blah"
Set-Cookie: sSN=nTMt3Lo2...crazy stuff...nLvwVxUU; path=/;domain=.search.yahoo.com
Cache-Control: private
Connection: close
Content-Type: text/html; charset=ISO-8859-1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

<html lang="en"><head><meta...

This is 352 bytes request header and 495 bytes response header. Not so bad, eh? No matter how hard you try to make your cookies monster-size, the response body (9k gzipped in this case) will always be significantly bigger. So what's the problem with the overhead of the HTTP requests then?

The size of the headers is a problem when you make requests for small components - say requests for small icons - for example 1K or under. In this case you exchange 1K headers to get 1K of useful data to present to the user. Clearly a waste. Additionally this 1K of headers can grow once you start writing more cookies. It may very well happen that the HTTP headers size is bigger than the actual icon you need. And even if the headers are not bigger that the component, they are still big when you think percent-wise. 1K of 10K is 10%.

But the HTTP headers size is only one (and the smaller) of the problems.

The bigger problem is the HTTP connection overhead.

HTTP connection overhead

What happens (at a high level) when you type a URL and hit Enter? The browser sends a request to the server. Which server? The browser needs to know the IP address of the server, so if it doesn't have it in the cache, it makes a DNS lookup. Than the browser establishes a connection to the server. Then it waits for the first byte of the response from the server. Then it receives the full response (payload).

Here's how this looks like graphically represented by webpagetest.org

And the color legend:

  1. DNS lookup
  2. Initial connection
  3. TTFB (Time to first byte)
  4. Payload

So what do we have here - looks like in this particular case the browser is downloading content about 40% of the time. The rest of the time it's... well, not downloading content. How's that for an overhead. And the part of not downloading can be even greater, this above was just one example.

Now how about this - a bird-eye view of the 4 last pages in webpagetest.org's test history - just some random pages people have tested.

Do you see a lot of blue (time spent downloading content). Not as much as you would've hoped. There's some DNS lookups, some orange... and OMG, talk about going green! :)

In fact you may notice that the smaller the component, the smaller is the blue part.

What does all this tell us?

  1. A significant chunk of time is spent in activities other than downloading.
  2. Smaller components still incur HTTP overhead and for them the relative penalty (relative to their size) is atrocious.

So what's a performance optimizer to do? Reduce the number of components and so pay fewer penalties.

Just remove stuff

Truth is - a lot of stuff on the pages today is not needed. Features no one likes or uses clutter the page and make it heavier. Well, what can you do, the boss/client/marketing guy wants that feature there. What you can do is at least try. You can introduce some science into the marketing activities - measure how much a specific feature is used. Or if you already have the data - look at it. Decide what can a page go without.

It's going to be tough convincing people to remove stuff. After all you spend time developing it. Someone dreamt up that feature initially. Someone (not the users) loves it. People hate to let go. But still, it's worth to try.

Combine components

Now that the phase of convincing people to remove stuff is done, what's left needs to be combined. How do you combine components? Simple - all JavaScripts go into a single file, all CSS into a single file. All decoration images go into a sprite.

JavaScript example (from a page to remain anonymous)

Before:

<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.2.6/jquery.min.js"></script>
<script src="/javascripts/application.js?1258423604"></script>
<script src="/javascripts/ui/minified/jquery.ui.all.min.js?1258423604"></script>
<script src="/javascripts/ui-ext/ui.bgiframe.min.js?1258423604"></script>
<script src="/javascripts/ui-ext/ui.stars.pack.js?1258423604"></script>
<script src="/javascripts/ui-ext/ui.dimensions.js?1258423604"></script>
<script src="/javascripts/ext/jquery.form.min.js?1258423604"></script>

After:

<script src="/javascripts/all.js"></script>

Sizes, gzipped: 70029 bytes before, 65194 bytes after. Simply merging files and there's even 6.9% saving!
And the more important saving: 6 less HTTP requests

Repeat for CSS. Before:

/stylesheets/general.css?1258423604
/stylesheets/global.css
/stylesheets/ui.stars.css
/stylesheets/themes/enation/enation.all.css
/public/template/css/132/1245869225

After:

<link type="text/css" rel="stylesheet" href="/stylesheets/all.css" />

Sizes, gzipped: before 14781 bytes, after 13352 bytes, saving 9.6%.
But the bigger saving: 4 less HTTP requests.

If you wonder how come the sizes before and after are different since we merely concatenate the contents of the files, well, the savings come from gzip compression. When you have more characters in the file, there's more chance that some will repeat which means they will compress better. That's one. And then, the compression itself has an overhead which you incur once for the whole bundle of files, as opposed to for every file.

Now - let's make decoration images into sprites. Before:

... 15 image requests, 6.8K

After: (1 sprited image)

Result size: 1.4K, 7 times smaller!

Here the savings are so dramatic partially because the source files are GIFs and the result is a PNG8 but that's a whole other post.

So in conclusion: file concatenation is just awesome. You save both: bytes to download and, much more importantly, HTTP requests. Less of the green stuff in the waterfall!

x-type component concatenation

So far we combined .js with .js, css with css and images with images. How about a cross-component type concatenation?

You can inline images inside HTML and CSS (and why not JS if you want) using data URIs (another post coming).

And you can inline CSS and JS inside HTML too.

This means you can have your whole application inside of just one HTML file if you wish. Inside of the HTML you have inline styles, scripts and images.

Combining CSS with JS

Now how about mixing CSS and JS into one component. You can do that, and it's especially suitable for lazy-loaded, widget-type functionality.

Say you've loaded the page, then the user clicks some rarely used button. You haven't downloaded that content that is supposed to wow the user upon click of the button. So you send a request to grab it. The new content may come in the form of a JSON string. And what if the new content requires some stylesheet that was not part of the base page? You'll have to make another request to download that stylesheet too.

Or, you can download both content and styles in the same JSON response. You simply inline the style information as a string in the JSON. So:

1. uses clicks, you request feature.js which goes like:

{"content":"<p class=\"wow\">I'm a feature</p>", "style": "wow{font-size: 60px}"}

2. You process the JSON and shove the content into the page

var o = JSON.parse(xhr.responseText);
$('result').innerHTML = o.content;

3. You add the styles to the head:

var wow = document.createElement('style');
wow.type = "text/css";
if (wow.textContent) { // FF, Safari
    wow.textContent = o.style;
} else {
    wow.styleSheet.cssText = o.style; // FF, IE
}
document.documentElement.firstChild.appendChild(wow);

Nice and simple. Makes the features (that progressively enhance the page) atomic and self-contained.

More to reducing components?

For more and creative ways to reduce HTTP components you can take a look at MXHR and Comet

Another thing to check is the Keep-Alive setting on your server. Remember how there were 4 steps in the component download. When you request a second component you can have the connection open so that you don't need to re-establish it (skipping step 2). And since the DNS lookup was already made you get rid of step 1. Skipping 2 out of 4 is not bad at all.

Summary

Reducing the number of page of components is the top priority of any web performance optimization effort. HTTP requests are costly. In part because of the headers size overhead, but mostly because of the connection overhead. The browser spends a disturbing amount of time not downloading stuff, and we can't allow this!

 

Psychology of performance

Friday, December 4th, 2009

Dec 4 This is the fourth in the series of performance articles as part of my 2009 performance advent calendar experiment. Stay tuned for the next articles.

Measuring time is an important activity in your performance efforts. After all, how else would you know if gzipping your javascripts helped or it didn't matter, or it slowed you down (because your server is now busier)? You have to measure and monitor what you measure in order to quantify whether or not you're making progress. With the exception of your own personal sites you have to measure time in order to show your boss or client that you are, in fact, working on something that matters. Time is important but the milliseconds don't tell the whole story.

Remember folks, we're dealing with humans here. The users of our sites are mainly humans (with the exception of a few (ro)bots, spiders and worms) and humans are irrational, some would say predictably so. Not only are we irrational, but we also have a very skewed idea of reality, we're easy to manipulate and our senses are not nearly as accurate as we believe them to be.

Mind hacks

"Mind hacks" and "Your brain: the missing manual" are excellent reads that show how unreliable our brains and our senses are. There's no other way actually, the reality is extremely complex and there's so much information to absorb and process and make sense of at any given time. Doing everything properly is just not possible. We need to take shortcuts in information processing, make guesses and sometimes just make up stuff so that out mental idea or the outside world is complete and consistent.

Magicians for example use our attention and processing flaws to do "impossible" things. Optical illusions work the same way. In the image above both A and B squares are the same color. Have fun figuring it out (short of using a color picker). It is the same rgb (100, 100, 100) or #646464 but you literally have to hide everything else on that image in order to see it. And even then, when you look at the image again your mind will still refuse to believe that it's the same color. Not only we're easily swayed, but we're also reluctant to accept a fact even after it has been proven to us.

The problem with that image is that we perceive things relatively, compare new information with previous knowledge and make a decision based on that. We're also suckers for patterns.

That's just the way we work - imagine you wake up tomorrow and before getting out of bed you decide to consider all the options ahead of you, their pros and cons. The number of choices is so daunting, you'll never do anything, you'll just lay there and calculate the risks and uncertainties associated with your every move. Because, you know, getting out of bed has its drawbacks (and is, generally, wildly overrated ;))

Time

Back to the time - the unit of measure that helps us judge how fast a page is. Needless to say, our idea of time is also inaccurate. And relative. Think about it - time flies when you're doing something you love and crawls when you're on the line at the cache register (although smart folks have decided to put celebrity magazines there to keep you distracted from the thought how much you hate waiting).

Time is relative to our enjoyment of what we do.

Turns out time also flows differently depending on the age - perceived 3 minutes for a 20 year old are in reality 3:03 and for a 60 year old 3 minutes are in reality 3:40.

Time is also relative to our expectations.

Maister's First Law of Service (service as in fast food service) states that:

Service = Perception – Expectation

When people perceive that you exceed expectations, they are happy and everything is fast and pleasurable. So you have to care about how users perceive your page load time and also what their expectations were. Naturally, both of these are subjective to begin with.

Competition

When it comes to expectations you should consider what type of site you have and how does the competition's sites compare. This will give you an idea of what experience the users expect. If you're in the social networking business, no one expects you to be search engine-fast. People come to comment to their friends' photos, to write messages and so on. They can wait a second more when they'll be spending minutes crafting their comment, they are probably thinking about the comment already and don't necessarily pay attention to the load time. The search engine experience is different - users are hunting for information, they want to move away from your page ASAP and get to their answer.

Painful = slow

Back to relativity of time. You've probably noticed that when you're in pain time goes slower, when you're frustrated with a site that is in your way of finishing a task, you find it pretty slow. When your bank site prompts you with half-baked extra security features and prevents you from paying that late bill, you don't care that time-to-onload of that page was under a second. You still hate the experience and think the site was slow (paying the bill actually took you longer than you imagined it should take).

On the opposite, when you're happy and having fun, you can tolerate a slower site and it will actually feel faster than the miserable experience with the objectively faster bank site.

People also expect certain types of pages to behave a certain way. If it's a light page with one-two graphics, it better be fast. People will perceive it as slow if it doesn't meet their expectations. On the other hand, a YouTube page is ok to be slower, you're there to watch funny cats (and enjoy yourself) and everybody has learned through experience that web video is normally slower than text-only pages.

Another illustration of pain and frustration being equal to slow. In WWII people who were likely to be captured and tortured for information were trained to deal with pain by counting. Because pain makes time run slow, the prisoner feels the torture go on forever. When you count, you have a more realistic idea of time and that helps.

Speed doesn't matter

Of course it does, just wanted to make sure you're paying attention :)

There's a fascinating study (and podcast) that claim "the truth about download time". The findings were that when people complete their task they perceive the site as fast, although it may be slower than another site that frustrated them.

This study was done in 2001 and things have changed since then. I mean they talk about 36 seconds page load times for Amazon. I found it particularly amusing how they measured the real page load time - interns with stop watches looking at the videos of the usability study and checking when the Netscape logo stops spinning (actually was it IE's logo that was spinning and NS had some shooting stars?)

These days we have data showing that sub-second delays make you lose visitors (sometimes forever) and it's clear that when given a faster and slower version of the same site, people will prefer the faster, naturally. This study was comparing sites against each other and looks like with different tasks.

But still, the findings that task completion determines you speed perception is fascinating and something to keep in mind when designing user interactions. If you bog down the user with pages and pages of lengthy forms with insanely strict and annoying validation, then no amount of super fast page loading will cause the people judge your site fast.

Flow

Flow is a concept that aims to answer the question what makes us tick, motivates us and makes us happy. In the words of the flow researcher Mihály Csíkszentmihályi flow is:

Being completely involved in an activity for its own sake. The ego falls away. Time flies. Every action, movement, and thought follows inevitably from the previous one, like playing jazz. Your whole being is involved, and you're using your skills to the utmost.

Web page performance can play critical role in stimulating the sense of flow. If a page takes too long, you leave the flow, you start thinking about things other than what you do, the brain wanders and the experience is ruined. On the other hand - clear goals, immediate feedback, balance between skills and challenge help create and maintain the state of flow.

It's an interesting topic, you can read up a bit on how flow relates to web sites here (see the pdf) and here.

"In a blink of an eye" is fast, right? As fast as it gets. Turns out a blink of an eye lasts on average 300 to 400 miliseconds. Can you load your page in a blink of an eye? Probably pretty difficult, but what you can do is provide some feedback in this time window. This way help the user maintain flow and get an idea something is going on. That's why progressive rendering is really important (subject for a separate post) - to let the user know that there was no error and the page is coming. Feedback, progressive rendering and progress indicators of any kind communicate and reassure.

Talking about milliseconds here's another number to keep in mind - 200ms. Our eyes move in a succession about 5 times a second or every 200ms, taking snapshots of the outside world. But these 200ms are relative too. When something appears or moves we're likely to focus on it and stay focused (who knows maybe there's a danger in there :) ) for more than 200ms before the eye moves to examine the rest of the world. If that thing dissapears, we're likely to move our eyes faster. That's why if you want to highlight something on the screen, say you showed object A and then B, B is more likely to be noticed if you first hide A.

Transitions

Things in real world don't just pop and disappear. It's actually pretty scary when someone we didn't notice all of a sudden seem to appear from nowhere. In real world objects move from one state to another. You can simulate that on the screen with transitions and animations. Otherwise people are taken by surprise a little bit and are feeling not in control which may hurt the state of flow. Often if you animate something on the screen, it will feel more natural even if it takes longer. But not too long, of course, not to make it look unnatural (As a friend said for my first YUI animation - it felt painful and made him tense waiting for it to finish, because I was so proud I can animate stuff on the page that I made the animation take its time). Using easing animations is also preferable.

White is fast

Google Reader used to have this blue background on the left hand side menu (where the list of feeds is), but it's now white. Turns out they made a user study to ask people what they think given the two options and nothing else changed in the app. People consistently said that the version with the white background was faster, although it's the same page. How crazy is this?

Clutter

One last thought - clutter. In these times of TV and limited attention span (who doesn't suffer from ADD, raise hand!) an often mentioned advise is to minimize clutter. It makes sense - the less information you have to process, the less anxious you feel about the new environment (the new web page just loaded). We don't have control over how we percieve things - a flashing animation will attract our attention away from the reason why we're on this page. And we won't like that. So it's good to avoid clutter, too many options, decoration images of beautiful people shaking hands and so on. Less stuff will also mean less markup, images, less page weight, which is great for performance. But clutter is also... relative.

Clutter is not simply a lot of things. Clutter is a lot of useless things and distractions. If you have a lot of exactly what you need, that's not so bad. (If I had a room full of guitars and gear and virtually no option to walk in this room, I won't complain, I'd say it's awesome.)

Also - you can consider culture differences. In Asia people often find it better if they have more content and options to parse through and make sense of on the same page and they don't find it cluttered. They would prefer to wait a bit in order to have their options in front of them. For illustration - just checked that yahoo.com has 146 links on the page and Yahoo! Taiwan has 334.

Parting words

And this concludes day 4 of the performance advent calendar. Takeaways?

  • Time is relative
  • Keep flow in mind
  • Consider the competition, the purpose of the page, the user expectations
  • Try to answer "what's the user perceived load time?"
  • Provide feedback
  • Transitions
  • Progressive rendering
  • Strive to render something (part of the page) in a blink of eye