<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>coding@scribd</title>
	<atom:link href="http://coding.scribd.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://coding.scribd.com</link>
	<description>... the bits behind the docs ...</description>
	<lastBuildDate>Wed, 02 May 2012 03:16:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='coding.scribd.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://1.gravatar.com/blavatar/5550ac56b96f650fbcbe2043812ebf1d?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>coding@scribd</title>
		<link>http://coding.scribd.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://coding.scribd.com/osd.xml" title="coding@scribd" />
	<atom:link rel='hub' href='http://coding.scribd.com/?pushpress=hub'/>
		<item>
		<title>Why zooming on mobile is broken (and how to fix it)</title>
		<link>http://coding.scribd.com/2012/02/29/why-zooming-on-mobile-is-broken-and-how-to-fix-it/</link>
		<comments>http://coding.scribd.com/2012/02/29/why-zooming-on-mobile-is-broken-and-how-to-fix-it/#comments</comments>
		<pubDate>Wed, 29 Feb 2012 21:11:39 +0000</pubDate>
		<dc:creator>matthiaskramm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://coding.scribd.com/?p=384</guid>
		<description><![CDATA[Traditional zooming In a standard PDF viewer, suppose you&#8217;re reading a two-column document and you zoom into the left column. Now suppose that the font size is still to small to read comfortably. You zoom in further: What happens is that even though the font is now the size you want, it also cuts off [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=384&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<span style="text-align:center; display: block;"><a href="http://coding.scribd.com/2012/02/29/why-zooming-on-mobile-is-broken-and-how-to-fix-it/"><img src="http://img.youtube.com/vi/6VjVlhJGs6I/2.jpg" alt="" /></a></span>
<h2>Traditional zooming</h2>
<p>In a standard PDF viewer, suppose you&#8217;re reading a two-column document and you zoom into the left column. Now suppose that the font size is still to small to read comfortably. You zoom in further:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/three1.png" alt="" /></p>
<p>What happens is that even though the font is now the size you want, it also cuts off the left and right half of the text.</p>
<h2>Scribd&#8217;s new reflow zoom</h2>
<p>With the new Scribd Android reader, what happens instead is this:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/three2.png" alt="" /></p>
<p>As soon as the left and right half of the text hit the border, the app starts &#8216;reflowing&#8217; the text, nicely matching it to the screen size. Essentially, the document has been reformatted into a one-column document with no pagebreaks for mobile reading.</p>
<p>For a clearer understanding of what this means, please watch the video at the beginning of this post.</p>
<h2>How it works</h2>
<p>In order to render a &#8216;reflowed&#8217; version of the document text, we have to analyze the document beforehand (we actually do this offline, on our servers).</p>
<p>In particular, we have to:</p>
<ol>
<li>Analyze the layout and detect the reading order of the text</li>
<li>Detect and join back words where hyphens were used for line-wrapping</li>
<li>Remove page numbers, headers/footers, table of contents etc.</li>
<li>Interleave images with the text</li>
</ol>
<p>I&#8217;d like to talk about at least two of them right here-</p>
<h2>Detecting the reading order of the text</h2>
<p>For starters, we need to figure out the reading order of the content on a page. In other words, given a conglomeration of characters on a page, how to &#8220;connect the dots&#8221; so that all the words and sentences make sense and are in the right order.</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/read1.png" alt="" width="255" /><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/read2.png" alt="" width="255" /></p>
<p>Thankfully, PDF tends to store characters in reading order in its content stream.<br />
It doesn&#8217;t always (and what to do if it doesn&#8217;t is a topic for a whole blog post),<br />
but when it does, determining the reading order is as easy as reading the index of<br />
characters in the page content stream from the PDF.</p>
<h2>Detecting hyphenation, and joining back words</h2>
<p>Determining whether a hyphen at the end of a line is there because a word was hyphenated, or whether it&#8217;s just a so-called em dash is more tricky— especially since not everybody uses the typographically correct version of the em dash (Unicode 0&#215;2014). Consider these example sentences:</p>
<pre>The grass would be only rustling in the wind, and 
the pool rippling to the waving of the reeds—
the rattling teacups would change to tinkling sheep-
bells. The Cat grinned when it saw Alice. It looked good-
natured, she thought.</pre>
<p>When implementing a algorithm for detecting all these cases, it&#8217;s useful to have a dictionary handy, (preferably in all the languages you&#8217;re supporting— for Scribd, that&#8217;s quite a few.) That allows you to look up that &#8220;sheep-bell&#8221; is a word, whereas &#8220;reedsthe&#8221; is not.</p>
<p>It&#8217;s even better if the dictionary also stores word probabilities, allowing you to determine that &#8220;good-natured&#8221; is more probable than &#8220;natured&#8221;.</p>
<h2>Learn more</h2>
<p>If you want to try this out for yourself, you can download our implementation<br />
<a href="http://market.android.com/details?id=com.scribd.app.reader0">from the Android market</a>.</p>
<p>Right now, we have a choice selection of books and documents that offer this functionality. Soon, we will roll it out to a major percentage of our content.</p>
<p><em>Matthias Kramm</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/384/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/384/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/384/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/384/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/384/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/384/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/384/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/384/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=384&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2012/02/29/why-zooming-on-mobile-is-broken-and-how-to-fix-it/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/22c8a54e73393ef203e0d2b5b4f4cce8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">matthiaskramm</media:title>
		</media:content>

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/three1.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/three2.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/read1.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/read2.png" medium="image" />
	</item>
		<item>
		<title>What Start-Ups Like to See in Resumes from College Students and Entry-Level Candidates</title>
		<link>http://coding.scribd.com/2011/10/04/what-start-ups-like-to-see-in-resumes-from-college-students-and-entry-level-candidates/</link>
		<comments>http://coding.scribd.com/2011/10/04/what-start-ups-like-to-see-in-resumes-from-college-students-and-entry-level-candidates/#comments</comments>
		<pubDate>Tue, 04 Oct 2011 20:15:53 +0000</pubDate>
		<dc:creator>robojenny</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://coding.scribd.com/?p=378</guid>
		<description><![CDATA[This post was originally posted on jenny.webs.com. Last week, I represented Scribd at my alma mater Carnegie Mellon&#8217;s job fair &#8212; the TOC. While the quality of students that we met were incredible beyond what our recruiters had expected, the overall quality of the resume writing was honestly atrocious. I&#8217;m sure college seniors feel like they [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=378&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><em>This post was originally posted on jenny.webs.com.</em></p>
<p>Last week, I represented Scribd at my alma mater Carnegie Mellon&#8217;s job fair &#8212; the <a href="http://toc.web.cmu.edu/" target="_blank">TOC</a>. While the quality of students that we met were incredible beyond what our recruiters had expected, the overall quality of the resume writing was honestly atrocious. I&#8217;m sure college seniors feel like they don&#8217;t have a lot yet to put on a resume, but there are certainly ways to stand-out. This post is specifically about how to make your resume stand-out to a start-up.</p>
<p>Why a start-up? <a href="http://techcrunch.com/2011/01/25/tc-cribs-take-a-doc-on-the-wild-side-at-scribd-with-bonus-go-karts/" target="_blank">Start-ups are a lot of fun</a>. The atmosphere is also closer to college life, which makes an easier transition. Working at a start-up is also a really great way to learn a ton since they are generally small with a lot of work and interesting problems. The start-ups I&#8217;ve worked at include <a href="http://www.overturecorp.com/" target="_blank">Overture Technologies</a>, <a href="http://webs.com/" target="_blank">webs.com</a>, and now <a href="http://scribd.com/" target="_blank">Scribd</a>. The hours are extremely flexible (anything before 11am is considered &#8220;early&#8221;), the offices are filled with toys, snacks and drinks, and I really respected the intelligence, abilities, and passion of my co-workers. The last point goes to show why these companies look more for these qualities rather than for a list of skills. This post describes how to showcase these points on your resume.</p>
<p><strong><span style="font-size:small;">List Personal Projects</span></strong><br />
Personal projects are a way to differentiate yourself from other candidates; they show that programming is part of who you are in life, not just your job. Personal projects can be as small and simple as a little game or a tutorial for a new language or framework you went through not because you needed to know it, but because you wanted to learn more about it. We weigh these things heavier than academic or work projects. I was flabbergasted to find that when I asked one student why his personal projects weren&#8217;t on his resume, he answered, &#8220;They weren&#8217;t real projects; I just did them for fun.&#8221; That is exactly the reason they <em>should</em> be on your resume! Not everyone programs for fun. We want people for whom programming is fun, not just work.</p>
<p><strong><span style="font-size:small;">List Only Academic Projects that Differentiate You</span></strong><br />
Most companies send alumni back to represent them at college job fairs. Being a Carnegie Mellon School of Computer Science alumna, I could easily recognize all the academic projects listed on various students&#8217; resumes. While &#8220;Diffusing a Binary Bomb&#8221; is really a great project and in fact the one I like to use as an example of why I thought our homework projects were really well-written, it is something that every CMU computer science student has to do. Meanwhile, projects in elective classes or ones where you must self-design your project helps a potential employer understand what sort of problems interest you. Specifying that it was an elective helps recruiters less familiar with various academic programs parse your resume.</p>
<p><strong><span style="font-size:small;">Specify Links that Reference You and Your Work</span></strong><br />
Especially for college students, it is huge when we see someone who has a github account. Whether the github account shows original code, forked projects, or simply following other projects, it shows not only interest in the industry but also that you are already a part of the community. Links to projects are also useful; we can not only read about your work, we can see it first hand. Even links to twitter accounts or blogs are good as well. Particularly if you are applying to a social media start-up, it is good to see that you are a user of social media tools yourself and already have some domain knowledge.</p>
<p><strong><span style="font-size:small;">List your Hobbies</span></strong><br />
We review so many resumes. Hobbies make us feel like you&#8217;re more of a person than just a list of skills and qualifications. It also may help us determine whether you&#8217;d be a culture fit with the company.</p>
<p><strong><span style="font-size:small;">Realize the Skills List is <em>Not</em> the Most Important Part of Your Resume</span></strong><br />
While more traditional companies may look for a checklist of skills, this is not the start-up mentality; start-ups look for smart, passionate people who can learn and pick-up anything. I remember being an entry-level candidate and listing skills that I had, but didn&#8217;t really want to pursue at a company (like my sysadmin experience). I thought it was better to fill my resume with anything I could do rather than leave it off. As a result of course, I piqued the interest of several companies wanting me to do a role I was not interested in. I like to state it this way to candidates: if you are in the middle of figuring out a problem on Friday and you really have to leave work to go to a friend&#8217;s birthday dinner, what sort of problem would make you more likely to want to continue figuring it out over the weekend rather than waiting until Monday to get back to it? I&#8217;m not saying don&#8217;t like all the various skills you have, but know your passions and be sure to be forthright about what you are passionate about versus what you just &#8220;know how to do&#8221; or are simply &#8220;willing to do&#8221;.</p>
<p>The start-up hiring mentality is just different from the traditional hiring mentality; what your parents advise you or what your college teaches may not be applicable here. Passion, intelligence, and love of the industry are what matter. If you are looking to apply to both start-ups and non-start-ups, I actually advise you to make two different resumes that emphasize things differently and make your own choice on what type of company you prefer after you get to see their offices and meet the employees.</p>
<p>If you are interested in working at Scribd (located in San Francisco) or webs.com (located in the DC area), please check out their job pages at <a href="http://www.scribd.com/jobs" target="_blank">http://www.scribd.com/jobs</a> and <a href="http://webs.com/Careers/" target="_blank">http://webs.com/Careers/</a>.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/378/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/378/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/378/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=378&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2011/10/04/what-start-ups-like-to-see-in-resumes-from-college-students-and-entry-level-candidates/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/19e23efb135e24234b9793c5fa43a044?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">robojenny</media:title>
		</media:content>
	</item>
		<item>
		<title>Shrink your iOS app by turning PNG compression up to 11</title>
		<link>http://coding.scribd.com/2011/09/07/shrink-your-ios-app-by-turning-png-compression-up-to-11/</link>
		<comments>http://coding.scribd.com/2011/09/07/shrink-your-ios-app-by-turning-png-compression-up-to-11/#comments</comments>
		<pubDate>Thu, 08 Sep 2011 04:57:26 +0000</pubDate>
		<dc:creator>jaredfriedman</dc:creator>
				<category><![CDATA[iOS development]]></category>

		<guid isPermaLink="false">http://coding.scribd.com/?p=367</guid>
		<description><![CDATA[This post was written by John Engelhart, an iOS developer at Scribd and author of the JSONKit library. So you have a lot of PNG images in your iPhone app&#8230; When I started here at Scribd, we were just a few weeks away from launching our first iPhone app&#8211; Float. Being a new hire obviously [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=367&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This post was written by John Engelhart, an iOS developer at Scribd and author of the <a href="https://github.com/johnezang/JSONKit">JSONKit</a> library.</p>
<h2>So you have a lot of PNG images in your iPhone app&hellip;</h2>
<p>When I started here at Scribd, we were just a few weeks away from launching our first iPhone app&ndash; <a href="http://itunes.apple.com/us/app/float-reader/id447992005?mt=8">Float</a>.</p>
<p>Being a new hire obviously meant that I didn&rsquo;t know the code base.  Being just a few weeks away from launch obviously meant that there was a strong focus and getting something out the door.  So the first thing I did is start making huge, sweeping fundamental architecture changes like swapping out all the XML REST stuff with JSON, and switching the JSON parser that was currently being used with <a href="https://github.com/johnezang/JSONKit">JSONKit</a>, because <a href="https://github.com/johnezang/JSONKit">JSONKit</a> is <em>really, really fast</em>.  Just look at those graphs!  Does it happen to parse JSON correctly?  Are numbers arbitrarily and silently truncated to 32 or 64 bits haphazardly?  Are floating point values preserved correctly when round tripped?  Who cares! That simple graph tells me everything I need about those complicated technical issues: <em>it&rsquo;s fast!</em>  Anyone who suggested this had anything to do with the fact that I was the author of <a href="https://github.com/johnezang/JSONKit">JSONKit</a> was quickly silenced&hellip;</p>
<p>Oh, no, wait&hellip; that&rsquo;s right, that&rsquo;s not the way it happened&hellip;  In reality it was obvious that no matter how much I might like to contribute to getting the app out the door, odds were that I would either slow things down or screw something important up because of my unfamiliarity with the code base.  One thing that caught my eye was that the application had a lot of <code>PNG</code> image assets, and in my various adventures in the great city of life, I knew that you could often easily make <code>PNG</code> images even smaller.</p>
<p>This seemed like a good project that I could work on:</p>
<ul>
<li>It was independent of what everyone else was doing, so no one would have to stop and explain how something in the app worked.</li>
<li>It was something that would probably either work or it wouldn&rsquo;t.  It would also be pretty unambiguous about whether or not it was causing problems.</li>
<li>It could be easily and trivially backed out if a problem was found, even up until the very last second&hellip; as long as you kept the original <code>PNG</code> images, which seemed pretty obvious.</li>
<li>I could actually contribute to the app that was going to ship in a few weeks, even if it only meant that I &ldquo;saved a few bytes that the end user has to download and takes up on their iPhone&rdquo;.</li>
</ul>
<h2>Small details</h2>
<blockquote><p>In astronomy, you first enjoy three or four years of confusing classes, impossible problem sets, and sneers from the faculty.  Having endured that, you&rsquo;re rewarded with an eight-hour written exam, with questions like: &ldquo;How do you age-date meteorites using the elements Samarium and Neodymium?&rdquo;  If you survive, you win the great honor and pleasure of an oral exam by a panel of learned professors.</p>
<p>I remember it vividly.  Across a table, five profs.  I&rsquo;m frightened, trying to look casual as sweat drips down my face.  But I&rsquo;m keeping afloat; I&rsquo;ve managed to babble superficially, giving the illusion that I know something.   Just a few more questions, I think, and they&rsquo;ll set me free.  Then the examiner over at the end of the table&mdash;the guy with the twisted little smile&mdash;starts sharpening his pencil with a penknife.</p>
<p>&ldquo;I&rsquo;ve got just one question, Cliff,&rdquo; he says, carving his way through the Eberhard-Faber.  &ldquo;Why is the sky blue?&rdquo;</p>
<p>My mind is absolutely, profoundly blank.  I have no idea.  I look out the window at the sky with the primitive, uncomprehending wonder of a Neanderthal contemplating fire.  I force myself to say something&mdash;anything.  &ldquo;Scattered light,&rdquo; I reply.  &ldquo;Uh, yeah, scattered sunlight.&rdquo;</p>
<p>&ldquo;Could you be more specific?&rdquo;</p>
<p>Well, words came from somewhere, out of some deep instinct of self-preservation.  I babbled about the spectrum of sunlight, the upper atmosphere, and how light interacts with molecules of air.</p>
<p>&ldquo;Could you be more specific?&rdquo;</p>
<p>I&rsquo;m describing how air molecules have dipole moments, the wave-particle duality of light, scribbling equations on the blackboard, and&hellip;</p>
<p>&ldquo;Could you be more specific?&rdquo;</p>
<p>An hour later, I&rsquo;m sweating hard.  His simple question&mdash;a five-year-old&rsquo;s question&mdash;has drawn together oscillator theory, electricity and magnetism, thermodynamics, even quantum mechanics.  Even in my miserable writhing, I admired the guy.</p>
</blockquote>
<p>While &ldquo;saving a few bytes&rdquo; might seem trivial, <a href="https://plus.google.com/u/2/107117483540235115863/posts/gcSStkKxXTw">small details like that matter to me</a>.  Whether or not someone is willing to pay attention to the small details can say a lot about them.  The above quote from Clifford Stoll&rsquo;s <em>The Cuckoo&rsquo;s Egg: Tracking a Spy Through the Maze of Computer Espionage</em> is sort of like the culmination of a lot of small details&ndash; the sky is blue for a reason, often for seemingly trivial, small details&hellip; but those small details form a long, causally related chain.  I think it also eloquently illustrates that while small details matter, knowing which small details matter is just as important, and the causal relationship between them.  Just knowing that &ldquo;Why is the sky blue?&rdquo; is an interesting question can reveal just as much about someone.</p>
<p>There&rsquo;s a lot of small, trivial details involved in something as simple as &ldquo;optimizing an iOS devices PNG images&rdquo;.  For example, once Xcode.app has built the app, you can not modify any of the files in the applications bundle because that will invalidate its code signing.  There&rsquo;s also the small detail that the <code>PNG</code> images that end up in your applications bundle aren&rsquo;t <a href="http://www.w3.org/TR/PNG/">PNG standard</a> conforming, but are actually an Apple proprietary <code>PNG</code> extension.</p>
<h2>Turning iPhone PNG optimization up to eleven</h2>
<p>Xcode.app has a build setting that you may not be aware of&ndash; <code>Compress PNG Files</code>, and for new Xcode.app iPhone projects it is set to <code>Yes</code> by default.</p>
<p>For the vast majority of projects the only time it is ever set is when the project was initially created&hellip; which is probably one of the reasons why you&rsquo;ve never heard of it.  If you did happen to notice the <code>Compress PNG Files</code> build setting, the only other option is <code>No</code>.  Given these two choices, who wouldn&rsquo;t want their <code>PNG</code> files compressed?  <code>Yes</code>, please!</p>
<h3>What it does</h3>
<p>When you build your project, and the target is an iOS device, not the simulator, the <code>Compress PNG Files</code> build setting causes any <code>PNG</code> resources that are copied in to your applications bundle to go through a preprocessing step that optimizes them for iOS devices.</p>
<p>Apple has not published any of the details as to what it specifically means to &ldquo;<a href="http://developer.apple.com/library/ios/#qa/qa1681/_index.html">optimize a <code>PNG</code> image for iOS devices</a>&rdquo;, but others have <a href="http://iphonedevwiki.net/index.php/CgBI_file_format">reverse engineered</a> at least some of it:</p>
<ul>
<li>Extra critical chunk (<code>CgBI</code>).</li>
<li>Byteswapped (<code>RGBA</code> &ndash;&gt; <code>BGRA</code>) pixel data, presumably for high-speed direct blitting to the framebuffer.</li>
<li><a href="http://www.zlib.net/"><code>zlib</code></a> header, footer, and CRC removed from the <a href="http://www.w3.org/TR/PNG/#11IDAT"><code>IDAT</code> chunk</a>.</li>
<li>Premultiplied alpha (<code>color&prime; = color * alpha / 255</code>).</li>
</ul>
<h3>Like <code>gzip -9</code>, except this one goes to <code>gzip -11</code></h3>
<p>Most <code>PNG</code> optimization tools tend to perform optimizations at the <code>PNG</code> level, such as:</p>
<ul>
<li>Color reduction (i.e., 24-bit <code>RGB</code> to 256 indexed color conversion, etc).</li>
<li>Bit depth reduction (i.e., 8-bits per Red, Green, and Blue to 4-bits per).</li>
<li>Optimizing some of the <a href="http://www.zlib.net/"><code>zlib</code></a> libraries user tunable settings.</li>
<li><code>PNG</code> filter optimization.</li>
</ul>
<p>The <a href="http://www.w3.org/TR/PNG/"><code>PNG</code> standard</a> specifies a number of <a href="http://www.w3.org/TR/PNG/#9Filters">predefined filters</a> that can be applied to an image that can often improve compression.  It&rsquo;s difficult to tell in advance which filter will give the best results for a particular image, so <code>PNG</code> optimizers usually try several of them.  As you can probably imagine, the number of combinatorial permutations of different options grows rather quickly, so there is usually an option to specify how many of the different permutations will be tried in an effort to optimize the <code>PNG</code> images size.  As is often the case with such brute force techniques, the amount of time it takes to try the different permutations tends to grow exponentially, and the improvements gained for the extra effort tend to shrink inverse exponentially&ndash; the dreaded diminishing returns, where more and more work gets you less and less of an improvement.</p>
<p>One <code>PNG</code> optimization tool stands apart from the rest, however: the <code>advpng</code> optimizer from the <a href="http://advancemame.sourceforge.net/comp-readme.html">AdvanceCOMP</a> recompression utilities.  This <code>PNG</code> optimizer does most of its optimization at the <a href="http://www.zlib.net/"><code>zlib</code></a> level&ndash; instead of using the standard <a href="http://www.zlib.net/"><code>zlib</code></a> library, it uses the <a href="http://www.ietf.org/rfc/rfc1950.txt">RFC 1950</a> (the standard that defines the <a href="http://www.zlib.net/"><code>zlib</code></a> compression format) implementation from <a href="http://en.wikipedia.org/wiki/7-Zip">7-Zip</a> / <a href="http://en.wikipedia.org/wiki/Lzma">LZMA</a> compression engine instead.  Most of the time, the <a href="http://en.wikipedia.org/wiki/7-Zip">7-Zip</a> / <a href="http://en.wikipedia.org/wiki/Lzma">LZMA</a> <a href="http://www.ietf.org/rfc/rfc1950.txt">RFC 1950</a> / <a href="http://www.zlib.net/"><code>zlib</code></a> compression engine is able to do a better job, and thus produce a smaller compressed result, than the standard <a href="http://www.zlib.net/"><code>zlib</code></a> library at its maximum compression setting.</p>
<p>However, the <code>advpng</code> tool does not perform any of the optimization strategies that the common <code>PNG</code> optimizers use, and in fact will undo any of the optimizations that they performed when it recompresses the result using the <a href="http://en.wikipedia.org/wiki/7-Zip">7-Zip</a> / <a href="http://en.wikipedia.org/wiki/Lzma">LZMA</a> compression engine.  And you can forget about using it on quirky, proprietary <code>PNG</code> image formats that aren&rsquo;t <code>PNG</code> standards compliant&hellip;</p>
<h3>What would be great is&hellip;</h3>
<p>The majority of a <code>PNG</code> image is contained in the <a href="http://www.w3.org/TR/PNG/#11IDAT"><code>IDAT</code> chunk</a>&ndash; it contains the actual pixels that make up the image.  The <a href="http://www.w3.org/TR/PNG/#11IDAT"><code>IDAT</code> chunk</a> is compressed using standard <a href="http://www.ietf.org/rfc/rfc1950.txt">RFC 1950</a> / <a href="http://www.zlib.net/">zlib</a> compression.  What&rsquo;s really needed is a tool that just recompresses the <a href="http://www.w3.org/TR/PNG/#11IDAT"><code>IDAT</code> chunk</a> chunk using the <a href="http://en.wikipedia.org/wiki/7-Zip">7-Zip</a> / <a href="http://en.wikipedia.org/wiki/Lzma">LZMA</a> compression engine, while leaving everything else unmodified.</p>
<p>Well, <em>Good News, Everyone!</em>  Just such a tool exists: the <code>advpngidat</code> tool, which is part of <a href="https://github.com/scribd/advancecomp">Scribds AdvanceCOMP fork on github.com</a>.  Not only that, it happens to work correctly with Apples non-standard <code>PNG</code> format!  This means you can make the <code>PNG</code> images in your iOS applications bundle even smaller.  Naturally, <em>your milage may vary</em>, and it wont be able to make every <code>PNG</code> smaller, but it can usually compress your iOS <code>PNG</code> images an additional 5% &ndash; 7%.</p>
<h3>Turning Xcode.app up to eleven</h3>
<p>So how do you turn your iOS projects <code>PNG</code> compression up to eleven using Xcode.app?  You use Scribds Xcode.app <code>PNG</code> optimizer enhancement, also <a href="https://github.com/scribd/Xcode-OptimizePNG">available on github.com</a>.</p>
<p><strong>Important:</strong> Scribds Xcode.app <code>PNG</code> optimizer enhancement directly modifies configuration files that are private to Xcode.app!</p>
<p>While the Xcode.app <code>PNG</code> optimizer enhancement modifies private Xcode.app files, the changes it makes are relatively benign:</p>
<ul>
<li>It modifies some <code>.xcspec</code> files that are used to enable the <code>Compress PNG Files</code> build setting in the GUI by changing the build setting from a <code>boolean</code> to a multiple choice.</li>
<li>It modifies some related files to modify and add descriptions that are displayed info help and info displays.</li>
<li>It modifies some <code>perl</code> and shell scripts that perform the actual copy and &ldquo;optimize the <code>PNG</code> image for iOS devices&rdquo; so that, depending on the additional build setting options, pass the optimized <code>PNG</code> image to <code>advpngidat</code> for additional compression.</li>
</ul>
<p>The end result is this: The <code>Compress PNG Files</code>, which was a simple <code>Yes</code> / <code>No</code> boolean setting, turns in to a multiple choice build setting:</p>
<table>
<thead>
<tr>
<th align="left">  Setting  </th>
<th align="left"> Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"> <code>None</code>    </td>
<td align="left"> Identical to the unmodified <code>Compress PNG Files</code> <code>No</code> setting.</td>
</tr>
<tr>
<td align="left"> <code>Low</code>     </td>
<td align="left"> Identical to the unmodified <code>Compress PNG Files</code> <code>Yes</code> setting.  This uses the Apple proprietary version of <a href="http://pmt.sourceforge.net/pngcrush/"><code>pngcrush</code></a> to optimize <code>PNG</code> files for iOS devices.</td>
</tr>
<tr>
<td align="left"> <code>Medium</code>  </td>
<td align="left"> The compressed <code>PNG</code> files from the <code>Low</code> setting are further optimized by the <code>advpngidat</code> command.</td>
</tr>
<tr>
<td align="left"> <code>High</code>    </td>
<td align="left"> The same as <code>Medium</code>, except a handful of carefully chosen <code>-m</code> compression methods that work much better in practice are used instead of the default heuristic used by <a href="http://pmt.sourceforge.net/pngcrush/"><code>pngcrush</code></a>.</td>
</tr>
<tr>
<td align="left"> <code>Extreme</code> </td>
<td align="left"> The same as <code>Medium</code>, except <a href="http://pmt.sourceforge.net/pngcrush/"><code>pngcrush</code></a> is passed the <code>-brute</code> option which tries all of the compression method permutations.<br /><strong>Warning:</strong> This can take a <strong><em>very</em></strong> long time!</td>
</tr>
</tbody>
</table>
<h4>It even goes to twelve, but your puny iOS device can&rsquo;t handle it&hellip;</h4>
<p>Unfortunately, you should not use the <code>High</code> and <code>Extreme</code> settings.  While iOS versions &lt; 5.0 had no problems with <code>PNG</code> images compressed with either setting, iOS 5.0 will not correctly display <code>PNG</code> images compressed at either <code>High</code> or <code>Extreme</code>.  Although it depends on the particulars of the image, some images will be displayed using the wrong colors.  Of course, there could be other problems as well, as the image format is an unpublished, non-standard <code>PNG</code> extension.</p>
<p>That being said, the <code>Medium</code> compression setting seems to work just fine&ndash; the only optimization it does is recompress the <a href="http://www.w3.org/TR/PNG/#11IDAT"><code>IDAT</code> chunk</a> using a better <a href="http://www.ietf.org/rfc/rfc1950.txt">RFC 1950</a> / <a href="http://www.zlib.net/">zlib</a> compression engine.  Everything else in the <code>PNG</code> file is passed through unmodified.</p>
<h3>Help fight random entropy!</h3>
<p>Take a look at Scribds <a href="https://github.com/scribd/advancecomp">AdvanceCOMP fork</a> and <a href="https://github.com/scribd/Xcode-OptimizePNG">Xcode.app <code>PNG</code> optimizer enhancement</a> (which requires the <code>advpngidat</code> tool from the <a href="https://github.com/scribd/advancecomp">AdvanceCOMP fork</a>), both available on github.com.  After reading the documentation, and assuming you&rsquo;re comfortable with modifying some of Xcode.apps private files, install them both.</p>
<p>Once installed, simply set your Xcode.app iOS projects <code>Compress PNG Files</code> build setting to <code>Medium</code>, and do your part in the fight against random entropy!</p>
<h2>Just how many useless bytes were saved?</h2>
<table>
<tr>
<th>  Setting  </th>
<th> Size (bytes) </th>
<th> &Delta; <code>Low</code> </th>
<th> &Delta; <code>Extreme</code> </th>
</tr>
<tr>
<td align="left"> <code>Low</code>     </td>
<td align="right">                 9740448 </td>
<td align="right">        100.0% </td>
<td align="right">         131.3% </td>
</tr>
<tr>
<td align="left"> <code>Medium</code>  </td>
<td align="right">                 8969108 </td>
<td align="right">         92.1% </td>
<td align="right">         120.1% </td>
</tr>
<tr>
<td align="left"> <code>High</code>    </td>
<td align="right">                 7756942 </td>
<td align="right">         79.6% </td>
<td align="right">         104.6% </td>
</tr>
<tr>
<td align="left"> <code>Extreme</code> </td>
<td align="right">                 7418479 </td>
<td align="right">         76.2% </td>
<td align="right">         100.0% </td>
</tr>
</table>
<p>As previously mentioned, a problem was discovered with iOS 5.0 with some images compressed using either <code>High</code> or <code>Extreme</code>.  This is most likely due to the fact that the Apple proprietary &ldquo;optimized for iOS devices&rdquo; format seems to only use a <a href="http://www.w3.org/TR/PNG/#9Filters">PNG filter</a> setting of <code>None</code>.  This means that the decompressed result can be used without any additional per-pixel filter processing.</p>
<p>So, in the end, we were only able to use the <code>Medium</code> setting, which only optimizes a <code>PNG</code> images <a href="http://www.w3.org/TR/PNG/#11IDAT"><code>IDAT</code> chunk</a>, leaving the rest of the bytes completely unmodified.  Still, this resulted in a savings of 7.9%, which translates in to nearly 753K-bytes shaved off the final application bundle.</p>
<h2>One more thing&hellip;</h2>
<p>The <code>advpngidat</code> compression tool isn&rsquo;t just for &ldquo;optimized for iOS devices&rdquo; <code>PNG</code> images, it can be used on regular <code>PNG</code> images too.  This can be a useful addition to any work flow that passes <code>PNG</code> images through one of the common <code>PNG</code> optimization tools (i.e., <a href="http://optipng.sourceforge.net/"><code>optipng</code></a> and <a href="http://pmt.sourceforge.net/pngcrush/"><code>pngcrush</code></a>).  As an example, any web site that has a large number of static <code>PNG</code> images can use a simple shell script to process all of the static <code>PNG</code> images with something like <a href="http://optipng.sourceforge.net/"><code>optipng</code></a>, and then process the <a href="http://optipng.sourceforge.net/"><code>optipng</code></a> results with <code>advpngidat</code>.</p>
<p>In fact, the <code>advpngidat</code> tool effectively does what is on the roadmap for the <a href="http://optipng.sourceforge.net/"><code>optipng</code></a> tool:</p>
<blockquote><ul>
<li>Plans for version 0.8:
<ul>
<li>Additional trials that use the powerful <a href="http://www.7-zip.org/">7zip deflation</a>.</li>
</ul>
</li>
</ul>
</blockquote>
<p>&hellip; which is exactly what <code>advpngidat</code> does today&ndash; the only &ldquo;optimization&rdquo; it performs is it recompresses the <a href="http://www.w3.org/TR/PNG/#11IDAT"><code>IDAT</code> chunk</a> using the &ldquo;powerful <a href="http://www.7-zip.org/">7zip deflation</a>&rdquo; compressor. If the recompressed result happens to be bigger than the original, then the <code>PNG</code> image is left unmodified.  Otherwise, the <code>PNG</code> image is replace with the smaller, optimized result.</p>
<p>This is really something that every web site with static <code>PNG</code> images should do.  You only need to perform the &ldquo;optimization&rdquo; on an image once, and every request for that <code>PNG</code> image after that point will use the smaller, optimized result.  You don&rsquo;t have to be a rocket scientist to figure out the benefits: less bytes to send means pages load that much faster, and if you happen to pay for the amount of bandwidth you use&hellip; it means a simple, one time run through <code>advpngidat</code> can save you real money.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/367/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/367/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/367/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/367/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/367/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/367/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/367/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/367/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=367&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2011/09/07/shrink-your-ios-app-by-turning-png-compression-up-to-11/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7246ec579efd29d66c26bf28a706e5ba?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">jaredfriedman</media:title>
		</media:content>
	</item>
		<item>
		<title>Clean Up Your Project</title>
		<link>http://coding.scribd.com/2011/05/08/clean-up-your-project/</link>
		<comments>http://coding.scribd.com/2011/05/08/clean-up-your-project/#comments</comments>
		<pubDate>Mon, 09 May 2011 05:42:35 +0000</pubDate>
		<dc:creator>Scribd</dc:creator>
				<category><![CDATA[iOS development]]></category>
		<category><![CDATA[iOS]]></category>
		<category><![CDATA[rake]]></category>
		<category><![CDATA[ruby]]></category>

		<guid isPermaLink="false">http://coding.scribd.com/?p=363</guid>
		<description><![CDATA[This post was written by Sam Soffes, an iOS developer at Scribd, and originally posted on his blog here. Many of the apps I work on are usually 100% custom. There is rarely any system UI components visible to the user. Styling the crap out of apps like this makes for tons of images in [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=363&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><i> This post was written by Sam Soffes, an iOS developer at Scribd, and originally posted on his blog <a href="http://samsoff.es/posts/clean-up-your-project">here</a>.  </i></p>
<p>Many of the apps I work on are usually 100% custom. There is rarely any system UI components visible to the user. Styling the crap out of apps like this makes for tons of images in my iOS projects to get everything the way the designer wants. I&#8217;m starting to <code>drawRect:</code> stuff more these days because it makes it easier to reuse, but anyway.</p>
<p>There are literally hundreds of images in the <a href="http://samsoff.es/posts/im-moving-to-san-francisco">Scribd</a> app I&#8217;ve been working on. Designers changing their mind plus everything custom leaves a lot of images behind that are no longer used. Our application was starting to be several megs and a lot of it was unused images. So&#8230; being the programmer I am, <em>I wrote a script</em>.</p>
</p>
<div id="gist-947827" class="gist">
<div class="gist-file">
<div class="gist-data gist-syntax">
<div class="gist-highlight">
<pre><div class="line" id="LC1"><span class="n">desc</span> <span class="s1">'Remove unused images'</span></div><div class="line" id="LC2"><span class="n">task</span> <span class="ss">:clean_assets</span> <span class="k">do</span></div><div class="line" id="LC3">&nbsp;&nbsp;<span class="nb">require</span> <span class="s1">'set'</span></div><div class="line" id="LC4"><br /></div><div class="line" id="LC5">&nbsp;&nbsp;<span class="n">all</span> <span class="o">=</span> <span class="no">Set</span><span class="o">.</span><span class="n">new</span></div><div class="line" id="LC6">&nbsp;&nbsp;<span class="n">used</span> <span class="o">=</span> <span class="no">Set</span><span class="o">.</span><span class="n">new</span></div><div class="line" id="LC7">&nbsp;&nbsp;<span class="n">unused</span> <span class="o">=</span> <span class="no">Set</span><span class="o">.</span><span class="n">new</span></div><div class="line" id="LC8"><br /></div><div class="line" id="LC9">&nbsp;&nbsp;<span class="c1"># White list</span></div><div class="line" id="LC10">&nbsp;&nbsp;<span class="n">used</span><span class="o">.</span><span class="n">merge</span> <span class="sx">%w{Icon Icon-29 Icon-50 Icon-58 Icon-72 Icon-114}</span></div><div class="line" id="LC11"><br /></div><div class="line" id="LC12">&nbsp;&nbsp;<span class="n">regex</span> <span class="o">=</span> <span class="sr">/\[UIImage imageNamed:@"([a-zA-Z0-9\-_]+).png"\]/</span></div><div class="line" id="LC13">&nbsp;&nbsp;<span class="no">Dir</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s1">'Classes/*.m'</span><span class="p">)</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">path</span><span class="o">|</span></div><div class="line" id="LC14">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">used</span><span class="o">.</span><span class="n">merge</span> <span class="no">File</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="o">.</span><span class="n">read</span><span class="o">.</span><span class="n">scan</span><span class="p">(</span><span class="n">regex</span><span class="p">)</span><span class="o">.</span><span class="n">flatten</span></div><div class="line" id="LC15">&nbsp;&nbsp;<span class="k">end</span></div><div class="line" id="LC16"><br /></div><div class="line" id="LC17">&nbsp;&nbsp;<span class="no">Dir</span><span class="o">.</span><span class="n">glob</span><span class="p">(</span><span class="s1">'Resources/Images/*.png'</span><span class="p">)</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">path</span><span class="o">|</span></div><div class="line" id="LC18">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">next</span> <span class="k">if</span> <span class="n">path</span><span class="o">.</span><span class="n">include?</span> <span class="s1">'@2x.png'</span></div><div class="line" id="LC19">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">all</span> <span class="o">&lt;&lt;</span> <span class="n">path</span><span class="o">.</span><span class="n">gsub</span><span class="p">(</span><span class="sr">/Resources\/Images\/([a-zA-Z0-9\-_]+).png/</span><span class="p">,</span> <span class="s2">"</span><span class="se">\\</span><span class="s2">1"</span><span class="p">)</span></div><div class="line" id="LC20">&nbsp;&nbsp;<span class="k">end</span></div><div class="line" id="LC21"><br /></div><div class="line" id="LC22">&nbsp;&nbsp;<span class="n">unused</span> <span class="o">=</span> <span class="n">all</span> <span class="o">-</span> <span class="n">used</span></div><div class="line" id="LC23">&nbsp;&nbsp;<span class="n">unused</span><span class="o">.</span><span class="n">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span></div><div class="line" id="LC24">&nbsp;&nbsp;&nbsp;&nbsp;<span class="sb">`rm -f Resources/Images/</span><span class="si">#{</span><span class="n">key</span><span class="si">}</span><span class="sb">.png Resources/Images/</span><span class="si">#{</span><span class="n">key</span><span class="si">}</span><span class="sb">@2x.png`</span></div><div class="line" id="LC25">&nbsp;&nbsp;<span class="k">end</span></div><div class="line" id="LC26"><br /></div><div class="line" id="LC27">&nbsp;&nbsp;<span class="nb">puts</span> <span class="s2">"</span><span class="si">#{</span><span class="n">all</span><span class="o">.</span><span class="n">length</span><span class="si">}</span><span class="s2"> total found"</span></div><div class="line" id="LC28">&nbsp;&nbsp;<span class="nb">puts</span> <span class="s2">"</span><span class="si">#{</span><span class="n">used</span><span class="o">.</span><span class="n">length</span><span class="si">}</span><span class="s2"> used found"</span></div><div class="line" id="LC29">&nbsp;&nbsp;<span class="nb">puts</span> <span class="s2">"</span><span class="si">#{</span><span class="n">unused</span><span class="o">.</span><span class="n">length</span><span class="si">}</span><span class="s2"> deleted"</span></div><div class="line" id="LC30"><span class="k">end</span></div><div class="line" id="LC31"><br /></div></pre>
</div></div>
<div class="gist-meta">
            <a href="https://gist.github.com/raw/947827/a9b1a4ee2d0f04786ba9592a8a12997a82a97994/clean_assets.rb" style="float:right;">view raw</a><br />
            <a href="https://gist.github.com/947827#file_clean_assets.rb" style="float:right;margin-right:10px;color:#666;">clean_assets.rb</a><br />
            <a href="https://gist.github.com/947827">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
          </div>
</p></div>
</div>
<p>It basically searches all of your source files for references for <code>[UIImage imageWithName:@"image_name_here"]</code>. Then it looks at all of the images on disk and removes any you didn&#8217;t reference. I setup a whitelist for icons and other images I don&#8217;t reference directly. You might need to tweak the paths a bit to work for your setup.</p>
<p>Hopefully this little <a href="http://railscasts.com/episodes/66-custom-rake-tasks">rake task</a> helps someone clean up their project too.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/363/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/363/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/363/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/363/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/363/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/363/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/363/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/363/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=363&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2011/05/08/clean-up-your-project/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/a81bdc1f371888f61300acc220e7ab03?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scribd</media:title>
		</media:content>
	</item>
		<item>
		<title>How to Drastically Improve Your App with an Afternoon and Instruments</title>
		<link>http://coding.scribd.com/2011/05/08/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments/</link>
		<comments>http://coding.scribd.com/2011/05/08/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments/#comments</comments>
		<pubDate>Mon, 09 May 2011 05:35:54 +0000</pubDate>
		<dc:creator>Scribd</dc:creator>
				<category><![CDATA[iOS development]]></category>
		<category><![CDATA[cocoa]]></category>
		<category><![CDATA[instruments]]></category>
		<category><![CDATA[iOS]]></category>

		<guid isPermaLink="false">http://coding.scribd.com/?p=356</guid>
		<description><![CDATA[This post is by Sam Soffes, an iOS engineer at Scribd, and was originally posted on his blog here Recently I managed to make the Scribd iOS application way better with some simple tweaks. I wanted to write a quick post about what I did that really helped that will probably help most people. This [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=356&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><i> This post is by Sam Soffes, an iOS engineer at Scribd, and was originally posted on his blog <a href="http://samsoff.es/posts/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments">here</a> </i><br />
</p>
<p>Recently I managed to make the Scribd iOS application way better with some simple tweaks.  I wanted to write a quick post about what I did that really helped that will probably help most people. This stuff is a bit application specific, but I think you&#8217;ll see parallels to your application.</p>
<h3>Symptoms</h3>
<p>The Scribd application pulls a ton of data from the network and puts it in Core Data when you login for the first time. From using the application, I noticed that performance totally sucks at first and then goes back to normal. (My table views all scroll at 60fps, but I&#8217;ll save that for another post. Sorry. Had to throw that in there. I&#8217;m way proud.) This was troubling since it usually works really great, (okay, now I&#8217;m done bragging about my cells) so I investigated.</p>
<p>Just so you know, I am doing all of my networking, data parsing, and insertion into Core Data on background threads via <code>NSOperationQueue</code>.</p>
<h3>The Problems</h3>
<p>After running Instruments with the object allocations instrument, I noticed that I was using about 22MB of memory while it was downloading all of this data. In my opinion, that is way too high. I&#8217;ll add that to list of stuff to mess with.</p>
<p>I also noticed that my <code>NSDate</code> category for parsing <a href="http://en.wikipedia.org/wiki/ISO_8601">ISO8601</a> date strings (standard way to put a date into <a href="http://en.wikipedia.org/wiki/JSON">JSON</a>) was taking about 7.4 seconds using the timer instrument. Totally unacceptable. Added to the list.</p>
<p>After messing around for a little while longer, I noticed that a lot of time was being spent in one of my <code>NSString</code> categories, specifically in <code>NSRegularExpression</code>. This sounds annoying, so I&#8217;ll save that for last.</p>
<h3>The Solutions</h3>
<h4>Memory</h4>
<p>I had a few guess on how to cut memory usage while converting large amounts of JSON strings into <code>NSManagedObject</code>s. My guess was that a ton of objects needed to be autoreleased but the <code>NSAutoreleasePool</code> wasn&#8217;t being drained until the operation finished. The simple solution for this to <em>add a well-placed <code>NSAutoreleasePool</code> around problem code</em>. This took a few tries to get in the right spot. I would put it where I think most of the temporary objects were being created and then watch the object allocations instrument to make sure it got flatter.</p>
<p>Here was my first try:</p>
<p><img src="http://assets.samsoff.es/posts/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments/try-1.png" alt="First Try" /></p>
<p>See how it goes up and drops sharply down a bit and then builds up for awhile then finally drops off? That&#8217;s a sign there is another loop nested deeper down that should have a pool around it. For the first one, it did a little and then drained (probably because it did less stuff in that operation). Since the second giant hump (note the peak of that is 23MB or so) doesn&#8217;t drop off for awhile, I know to look for another loop deeper down. Hopefully that makes sense. Once you get in there, it will suddenly hit you after stumbling around for a bit. You&#8217;ll see.</p>
<p>After moving it to a more nested loop, here&#8217;s the result:</p>
<p><img src="http://assets.samsoff.es/posts/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments/try-2.png" alt="Second Try" /></p>
<p>Once I got it in the right spot, <em>it was using under 2MB of memory for the entire process!</em> Score! Next problem.</p>
<h4>Date Stuff</h4>
<p>The date stuff had me stumped for awhile. I was using <a href="https://github.com/square/iso8601parser">ISO8601Parser</a> (a subclass of <code>NSFormatter</code>) which was working really, really well compared to <code>NSDateFormatter</code>. After looking at timer instrument, I saw that most of that time was spent in system classes like <code>NSCFCalendar</code>. I assumed there was a better way. I tried switched back to <code>NSDateFormatter</code>, but that didn&#8217;t work well and still wasn&#8217;t great memory and speed wise.</p>
<p>As a disclaimer, I am all about Objective-C. I love it. I&#8217;m not one of those engineers that&#8217;s says &#8220;hey, we should rewrite this in C&#8221; all the time, but hey, we should rewrite this in C. I did&#8230; and the result was astounding!</p>
<p>Here&#8217;s the code:</p>
<div id="gist-840291" class="gist">
<p><code></p>
<div class="gist-file">
<div class="gist-data gist-syntax">
<div class="gist-highlight">
<pre><div class="line" id="LC1"><span class="cp">#include &lt;time.h&gt;</span></div><div class="line" id="LC2"><br /></div><div class="line" id="LC3"><span class="o">+</span> <span class="p">(</span><span class="n">NSDate</span> <span class="o">*</span><span class="p">)</span><span class="nl">dateFromISO8601String:</span><span class="p">(</span><span class="n">NSString</span> <span class="o">*</span><span class="p">)</span><span class="n">string</span> <span class="p">{</span></div><div class="line" id="LC4">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">string</span><span class="p">)</span> <span class="p">{</span></div><div class="line" id="LC5">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">return</span> <span class="nb">nil</span><span class="p">;</span></div><div class="line" id="LC6">&nbsp;&nbsp;&nbsp;&nbsp;<span class="p">}</span></div><div class="line" id="LC7">&nbsp;&nbsp;&nbsp;&nbsp;</div><div class="line" id="LC8">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">struct</span> <span class="n">tm</span> <span class="n">tm</span><span class="p">;</span></div><div class="line" id="LC9">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">time_t</span> <span class="n">t</span><span class="p">;</span>    </div><div class="line" id="LC10">&nbsp;&nbsp;&nbsp;&nbsp;</div><div class="line" id="LC11">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">strptime</span><span class="p">([</span><span class="n">string</span> <span class="nl">cStringUsingEncoding:</span><span class="n">NSUTF8StringEncoding</span><span class="p">],</span> <span class="s">"%Y-%m-%dT%H:%M:%S%z"</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">tm</span><span class="p">);</span></div><div class="line" id="LC12">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">tm</span><span class="p">.</span><span class="n">tm_isdst</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span></div><div class="line" id="LC13">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">t</span> <span class="o">=</span> <span class="n">mktime</span><span class="p">(</span><span class="o">&amp;</span><span class="n">tm</span><span class="p">);</span></div><div class="line" id="LC14">&nbsp;&nbsp;&nbsp;&nbsp;</div><div class="line" id="LC15">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">return</span> <span class="p">[</span><span class="n">NSDate</span> <span class="nl">dateWithTimeIntervalSince1970:</span><span class="n">t</span> <span class="o">+</span> <span class="p">[[</span><span class="n">NSTimeZone</span> <span class="n">localTimeZone</span><span class="p">]</span> <span class="n">secondsFromGMT</span><span class="p">]];</span></div><div class="line" id="LC16"><span class="p">}</span></div><div class="line" id="LC17"><br /></div><div class="line" id="LC18"><br /></div><div class="line" id="LC19"><span class="o">-</span> <span class="p">(</span><span class="n">NSString</span> <span class="o">*</span><span class="p">)</span><span class="n">ISO8601String</span> <span class="p">{</span></div><div class="line" id="LC20">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">struct</span> <span class="n">tm</span> <span class="o">*</span><span class="n">timeinfo</span><span class="p">;</span></div><div class="line" id="LC21">&nbsp;&nbsp;&nbsp;&nbsp;<span class="kt">char</span> <span class="n">buffer</span><span class="p">[</span><span class="mi">80</span><span class="p">];</span></div><div class="line" id="LC22"><br /></div><div class="line" id="LC23">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">time_t</span> <span class="n">rawtime</span> <span class="o">=</span> <span class="p">[</span><span class="n">self</span> <span class="n">timeIntervalSince1970</span><span class="p">]</span> <span class="o">-</span> <span class="p">[[</span><span class="n">NSTimeZone</span> <span class="n">localTimeZone</span><span class="p">]</span> <span class="n">secondsFromGMT</span><span class="p">];</span></div><div class="line" id="LC24">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">timeinfo</span> <span class="o">=</span> <span class="n">localtime</span><span class="p">(</span><span class="o">&amp;</span><span class="n">rawtime</span><span class="p">);</span></div><div class="line" id="LC25"><br /></div><div class="line" id="LC26">&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">strftime</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="s">"%Y-%m-%dT%H:%M:%S%z"</span><span class="p">,</span> <span class="n">timeinfo</span><span class="p">);</span></div><div class="line" id="LC27">&nbsp;&nbsp;&nbsp;&nbsp;</div><div class="line" id="LC28">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">return</span> <span class="p">[</span><span class="n">NSString</span> <span class="nl">stringWithCString:</span><span class="n">buffer</span> <span class="nl">encoding:</span><span class="n">NSUTF8StringEncoding</span><span class="p">];</span></div><div class="line" id="LC29"><span class="p">}</span></div></pre>
</div></div>
<div class="gist-meta">
            <a href="https://gist.github.com/raw/840291/d04a2b42832eefbd04c23dba321de46ffd2dbd3a/date.m" style="float:right;">view raw</a><br />
            <a href="https://gist.github.com/840291#file_date.m" style="float:right;margin-right:10px;color:#666;">date.m</a><br />
            <a href="https://gist.github.com/840291">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
          </div>
</p></div>
</div>
<p></code></p>
<p>See, it&#8217;s not too crazy. <em>Using the C date stuff took my date parsing from 7.4 seconds to 300ms. Talk about a performance boost!</em> (I updated <a href="http://github.com/samsoffes/sstoolkit">SSTookit</a>&#8216;s <a href="https://github.com/samsoffes/sstoolkit/blob/master/SSToolkit/NSDate%2BSSToolkitAdditions.h">NSDate category</a> to use this new code.)</p>
</article>
<h4>Regular Expression</h4>
<p>I have several <code>NSString</code> categories in my application for doing various things. Some of them were called throughout the process I was trying to optimize. I drilled down in the time profiler instrument and realized that <code>[NSRegularExpression regularExpressionWith...]</code> was taking a ton of the time. This totally makes sense, since it compiles your regex to use later and I was doing it each time. Simple solution:</p>
<div id="gist-840291" class="gist">
<p></code></p>
<div class="gist-file">
<div class="gist-data gist-syntax">
<div class="gist-highlight">
<pre><div class="line" id="LC1"><span class="o">-</span> <span class="p">(</span><span class="n">NSString</span> <span class="o">*</span><span class="p">)</span><span class="n">camelCaseString</span> <span class="p">{</span></div><div class="line" id="LC2">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">static</span> <span class="n">NSRegularExpression</span> <span class="o">*</span><span class="n">regex</span> <span class="o">=</span> <span class="nb">nil</span><span class="p">;</span></div><div class="line" id="LC3">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">regex</span><span class="p">)</span> <span class="p">{</span></div><div class="line" id="LC4">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span class="n">regex</span> <span class="o">=</span> <span class="p">[[</span><span class="n">NSRegularExpression</span> <span class="n">alloc</span><span class="p">]</span> <span class="nl">initWithPattern:</span><span class="s">@"(?:_)(.)"</span> <span class="nl">options:</span><span class="mi">0</span> <span class="nl">error:</span><span class="nb">nil</span><span class="p">];</span></div><div class="line" id="LC5">&nbsp;&nbsp;&nbsp;&nbsp;<span class="p">}</span></div><div class="line" id="LC6">&nbsp;&nbsp;&nbsp;&nbsp;</div><div class="line" id="LC7">&nbsp;&nbsp;&nbsp;&nbsp;<span class="c1">// Use regex...</span></div><div class="line" id="LC8">&nbsp;&nbsp;&nbsp;&nbsp;</div><div class="line" id="LC9">&nbsp;&nbsp;&nbsp;&nbsp;<span class="k">return</span> <span class="n">string</span><span class="p">;</span></div><div class="line" id="LC10"><span class="p">}</span></div></pre>
</div></div>
<div class="gist-meta">
            <a href="https://gist.github.com/raw/840291/9cd8a9d371b469d80f89d2653e9cfac295620d87/string.m" style="float:right;">view raw</a><br />
            <a href="https://gist.github.com/840291#file_string.m" style="float:right;margin-right:10px;color:#666;">string.m</a><br />
            <a href="https://gist.github.com/840291">This Gist</a> brought to you by <a href="http://github.com">GitHub</a>.
          </div>
</p></div>
</div>
<p></code></p>
<p>This was actually the easiest part <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<h3>Conclusions</h3>
<p>So using Instruments to track down slow or bad code is really easy once you get the hang of it. Start with the leaks instrument if you're new. You shouldn't have any (known) leaks in your application.</p>
<p>Once you get that down (or get so frustrated trying to track it down you give up and move to something else) do the object allocations instrument next. You can watch the graph and see how many objects you have alive. If you see a big spike that never goes down, you most likely have a ton of memory around that you probably don't need but still have a reference to so it doesn't show up in leaks. Adding autorelease pools around loops that do lots of processing always helps.</p>
<p>Finally, use the time profiler instrument to see what's taking a long time and optimize the crap out of it. This is the most fun since it's easy to see whats happening and how much of an improvement you made by the changes you just made. The key to making this instrument useful is the checkboxes on the left. Turning on Objective-C only or toggling the inverted stack tree is really useful.</p>
<h3>This is Hard</h3>
<p>Don't feel bad, especially if you're new to this. This stuff is hard. All of my solutions I listed above are pretty simple. I spent almost an entire day coming up with those few things. The majority of the time you spend will be tracking down problems. Fixing them is usually pretty simple, especially after you've done it a few times. This is hard. You're smart. <img src='http://s0.wp.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/356/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/356/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/356/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=356&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2011/05/08/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/a81bdc1f371888f61300acc220e7ab03?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scribd</media:title>
		</media:content>

		<media:content url="http://assets.samsoff.es/posts/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments/try-1.png" medium="image">
			<media:title type="html">First Try</media:title>
		</media:content>

		<media:content url="http://assets.samsoff.es/posts/how-to-drastically-improve-your-app-with-an-afternoon-and-instruments/try-2.png" medium="image">
			<media:title type="html">Second Try</media:title>
		</media:content>
	</item>
		<item>
		<title>FlashHeed: Fixing the Flash Z-index Problem For Ads</title>
		<link>http://coding.scribd.com/2010/11/13/flashheed-fixing-the-flash-z-index-problem-for-ads/</link>
		<comments>http://coding.scribd.com/2010/11/13/flashheed-fixing-the-flash-z-index-problem-for-ads/#comments</comments>
		<pubDate>Sat, 13 Nov 2010 17:53:55 +0000</pubDate>
		<dc:creator>James Yu</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[display order]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[flash ads]]></category>
		<category><![CDATA[flashheed]]></category>
		<category><![CDATA[google ads]]></category>
		<category><![CDATA[opaque]]></category>
		<category><![CDATA[transparent]]></category>
		<category><![CDATA[wmode]]></category>
		<category><![CDATA[z-index]]></category>

		<guid isPermaLink="false">http://coding.scribd.com/?p=326</guid>
		<description><![CDATA[Here at Scribd, we&#8217;ve moved on from Flash and are embracing HTML5 as the open standard for reading on the web. Unfortunately, the ad industry has not quite caught up yet, as many ads are still flash, and probably will be for some time. The dreaded problem that most web developers come across is the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=326&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Here at Scribd, we&#8217;ve moved on from Flash and are embracing HTML5 as the open standard for reading on the web. Unfortunately, the ad industry has not quite caught up yet, as many ads are still flash, and probably will be for some time.</p>
<p>The dreaded problem that most web developers come across is the z-index issue with flash elements. When the <code>wmode</code> param is not set, or is set to <code>window</code>, flash elements will always be on top of your DOM content. No matter what kind of z-index voodoo you attempt, your content will never break through the flash. This is because flash, when in window mode, is actually rendered on a layer above all web content.</p>
<p>There is <a href="http://slightlymore.co.uk/flash-and-the-z-index-problem-solved/">a lot of chatter about this issue</a>, and the simple solution is to specify the <code>wmode</code> parameter to <code>opaque</code> or <code>transparent</code>.  This works when you control and deliver the flash content yourself. However, this is not the case for flash ads.</p>
<p>The majority of flash ads don&#8217;t specify a <code>wmode</code> parameter, which will put flash into <code>window</code> mode. Why they don&#8217;t specify <code>transparent</code> or <code>opaque</code> is a mystery to me. This is a nightmare for pages that have UI elements that depend on z-index, like dropdown menus and lightboxes. Google even has <a href="http://adsense.blogspot.com/2007/05/clarification-on-accidental-clicks.html">an article about avoiding these sorts of elements altogether when they are in close proximity to ads</a>.</p>
<p>I personally disagree with the notion of redesigning your UI because of display ordering issues with flash ads. It just doesn&#8217;t make sense from a product standpoint. And look! Even YouTube, owned by Google, has z-index flash issues with their own ads:</p>
<p><img src="http://scribdtech.files.wordpress.com/2010/11/screen-shot-2010-11-12-at-12-05-25-pm.png?w=580" /></p>
<p>So, to solve all this, I wrote some javascript that will dynamically add the correct <code>wmode</code> parameter. I call it <a href="https://github.com/scribd/flash_heed">FlashHeed</a>. You can get it now <a href="https://github.com/scribd/flash_heed">on the GitHub repo</a>.</p>
<p>It works reliably in all major browsers, and has no dependencies, so feel free to drop it into your Prototype or jQuery dependent website.</p>
<p>The usage is simple: just include the FlashHeed javascript in the head of your page, and call it like so:</p>
<p><code><br />
FlashHeed.heed();<br />
</code></p>
<p>And you&#8217;re done. All the flash ads on the page will now heed to the z-index ordering. No more embarassing lightbox and dropdown menu occlusions.</p>
<p>Under the hood, FlashHeed injects the correct wmode parameter and actually forces the flash to re-render. This is the only reliable way that I&#8217;ve found to kick the flash into the correct wmode.</p>
<p><em>Update 11/14/10:</em></p>
<p>Note that FlashHeed will not work on flash ads or elements that are embedded inside iframes, due to cross domain policies. Unfortunately, I don&#8217;t have a solution for those. If anyone has a suggestion, please comment below.</p>
<p><em><a href="http://www.twitter.com/jamesjyu/">James Yu</a>, Lead Developer at Scribd</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/326/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/326/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/326/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=326&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2010/11/13/flashheed-fixing-the-flash-z-index-problem-for-ads/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/659c57f1b4bb70b1f7b9c39995f1ad9f?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">Tom</media:title>
		</media:content>

		<media:content url="http://scribdtech.files.wordpress.com/2010/11/screen-shot-2010-11-12-at-12-05-25-pm.png" medium="image" />
	</item>
		<item>
		<title>Vanity Profile URLs in Rails</title>
		<link>http://coding.scribd.com/2010/09/01/vanity-user-profile-urls-in-rails/</link>
		<comments>http://coding.scribd.com/2010/09/01/vanity-user-profile-urls-in-rails/#comments</comments>
		<pubDate>Wed, 01 Sep 2010 20:43:56 +0000</pubDate>
		<dc:creator>Scribd</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://scribdtech.wordpress.com/?p=5</guid>
		<description><![CDATA[One feature shared by many social networking sites is &#34;vanity&#34; short profile URLs. My Twitter page could have easily been the RESTfully predictable http://twitter.com/users/riscfuture, but thanks to short profile URLs it is http://twitter.com/riscfuture. Even Facebook got in the game recently with their &#34;Facebook Usernames&#34; feature. Of course, in classic Facebook style, getting the vanity URL [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=5&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[
<p>One feature shared by many social networking sites is &quot;vanity&quot; short profile<br />
    URLs. My Twitter page could have easily been the RESTfully predictable <tt><a href="http://twitter.com/users/riscfuture" target="_blank">http://twitter.com/users/riscfuture</a></tt>, but thanks to short profile URLs it is <tt><a href="http://twitter.com/riscfuture" target="_blank">http://twitter.com/riscfuture</a></tt>.</p>
<p>Even Facebook got in the game recently with their &quot;Facebook Usernames&quot; feature. Of course, in classic Facebook style, getting the vanity URL is a multi-step process with an application and the associated land-grab. At Scribd I kept it a little simpler, and I&#039;m assuming you&#039;d like to keep it simple for your Rails website as well.</p>
<p>In order for this system to work, we&#039;re going to have to lay down a few ground rules:</p>
<ul>
<li><strong>No user whose username conflicts with a controller name can have a short URL.</strong> You can&#039;t sign up on Scribd with the username &quot;documents&quot; and prevent anyone from seeing their document list.
    </li>
<li><strong>No user whose username conflicts with another defined route can have a short URL.</strong> Remember that the routes file defines named or custom routes and resources, but with the default routes, normal controllers do not need an entry in that file.
    </li>
<li><strong>Users with reserved characters in their names must have these characters escaped or dealt with.</strong> If I sign up with the username &quot;foo/bar&quot;, that slash can&#039;t be left unescaped, or the router will misunderstand the address.
    </li>
<li><strong>Usernames must be case-insensitively unique.</strong> Every browser expects <tt><a href="http://scribd.com/foo" target="_blank">scribd.com/foo</a></tt> to be the same as <tt><a href="http://scribd.com/FOO" target="_blank">scribd.com/FOO</a></tt>.
    </li>
<li><strong>Any user who cannot be given a short URL for the above reasons must have a fallback URL.</strong> This is where you fall back to your less pretty <tt>/users/123</tt> URL. (Or perhaps <tt>/users/123-foo-bar</tt> for SEO purposes.)
    </li>
</ul>
<p>Note that it&#039;s not enough to simply build a list of your controllers and stick them in a <tt>validates_exclusion_of</tt> validation. You want to be able to claim new routes for yourself even if users have already signed up with conflicting logins, and gracefully revert those users to a fallback profile URL.</p>
<p>Ultimately the question we need to answer is this: <strong>Given a user name, will a vanity URL conflict with an existing route?</strong> There are a lot of really hard ways of going about this, many of which will break over time.  I opted to go with the a reliable (if somewhat slow) way of doing this: I build a list of known routes, strip them down to their first path component, then build an array of these reserved names. A known route might be, for instance, <tt>/documents/:id</tt>; its first path component is &quot;documents.&quot; Thus, a user whose login is &quot;documents&quot; cannot have a vanity URL.</p>
<p>There are some points to note for this system:</p>
<ul>
<li><strong>You&#039;ll get a few false positives.</strong> If <tt>/documents/:id</tt> is a valid route, but <tt>/documents</tt> is not (say you had no <tt>index</tt> action), this system would still disallow a user named &quot;documents&quot;. You can easily solve this by tweaking the code below, though.
    </li>
<li><strong>No attention is paid to HTTP methods.</strong> Theoretically, if you had a route like <tt>/upload</tt> whose only acceptable method is <tt>POST</tt>, you could still use <tt>GET /upload</tt> to refer to a user named &quot;upload&quot;. I have intentionally avoided doing this, however; good web design dictates that varying the HTTP method of a request only varies the manner in which you interact with the resource represented by the URL; a single URL should represent the same resource regardless of which method is used in the request.
    </li>
</ul>
<p>In order to eke speed out wherever we can, we generate the list of reserved routes once, at launch, and cache it for the lifetime of the process. We do this in a module in <tt>lib/</tt>:</p>
<pre><code> 
module FancyUrls
  def self.generate_cached_routes
    # Find all routes we have, take the first part (/xxx/) and remove some unwanted ones
    @cached_routes = ActionController::Routing::Routes.routes.map do |route|
      segs = route.segments.inject(&quot;&quot;) { |str, s| str &lt;&lt; s.to_s }
      segs.sub! /^\/(.*?)\/.*$/, &#039;\\1&#039;
 
      # Some routes accept a :format parameter (ratings.:format).
      segs.sub! /\.:format$/, &#039;&#039;
      segs
    end
 
    # All possible controllers for /:controller/:action/:id route
    @cached_routes += ActionController::Routing.possible_controllers.map do |c|
      # Use only the first path component for controllers with multiple path components
      c.sub /^(.*?)\/.*$/, &#039;\\1&#039;
    end
    @cached_routes.uniq!
    # Remove routes whose first path component is a variable or wildcard
    @cached_routes.reject! { |route| route.starts_with?(&#039;:&#039;) or route.starts_with?(&#039;*&#039;) }
    # Remove the root route.
    @cached_routes.delete &#039;/&#039;
  end
 
  def self.cached_routes
    @cached_routes
  end
end
</code></pre>
<p>The top method combines two arrays: the first, a list of routes from the defined routes, and the second, a list of the app&#039;s controllers. It then filters out some non-applicable routes and stores the list in an instance variable. The list consists of only the first path component of a route.</p>
<p>The method is called <tt>generate_cached_routes</tt> because it&#039;s called when the server process starts, as part of the <tt>environment.rb</tt> file. The cached results are accessed with the <tt>cached_routes</tt> method.</p>
<p>So given this method, how do we test if a user is eligible for URL &quot;vanitization?&quot; It&#039;s simple:</p>
<pre><code> 
module FancyUrls
  def user_name_valid_for_short_url?(login)
    not FancyUrls.cached_routes.include?(login)
  end
end
</code></pre>
<p>The method is simple: If the user&#039;s name is in our list of reserved routes, then it&#039;s not valid for URL shortening. Easy peasy.</p>
<p>So now we can reasonably quickly determine whether or not a user gets a vanity profile URL. The next step is to write a <tt>user_profile_url</tt> method that, given a user, returns either the vanity or full profile URL, as appropriate. To do this, first we will need to add our vanity URLs to the bottom of our <tt>routes.rb</tt> file:</p>
<pre><code> 
# Install the non-vanity user profile route above the vanity route so people
# who don&#039;t have shortenable logins can still have a URL to their profile page.
map.long_profile &#039;users/:id&#039;, :controller =&gt; &#039;users&#039;, :action =&gt; &#039;show&#039;, :conditions =&gt; { :method =&gt; :get }
# Install the vanity user profile route above the default routes but below all
# resources.
map.short_profile &#039;:login&#039;, :controller =&gt; &#039;users&#039;, :action =&gt; &#039;show&#039;, :conditions =&gt; { :method =&gt; :get }
 
# Install the default routes as the lowest priority.
map.connect &#039;:controller/:action/:id&#039;
map.connect &#039;:controller/:action/:id.:format&#039;
</code></pre>
<p>What&#039;s going on here? Well, at the very bottom of the <tt>routes.rb</tt> file, we are installing the old Rails standby, the <tt>:controller/:action</tt> routes. Newer Rails ideology is often to leave these routes out, so adjust your routes file as appropriate. Above those routes, but otherwise of the lowest priority, is our vanity route. Anywhere above that route is our traditional profile URL. (If you have a RESTful users controller, you could of course replace the top route with a <tt>resources</tt> call.)</p>
<p>At first glance there&#039;s a chicken-and-egg problem: We&#039;re checking if a user is &quot;vanitizable&quot; using the routes file, but now the routes file contains the vanity URL route. We solved this problem earlier in the <tt>generate_cached_routes</tt> method:</p>
<pre><code> 
# Remove routes whose first path component is a parameter or wildcard
regular_routes.reject! { |route| route.starts_with?(&#039;:&#039;) or route.starts_with?(&#039;*&#039;) }
</code></pre>
<p>This line of code filters out any routes that start with a parameter or wildcard, among them the <tt>short_profile</tt> named route.</p>
<p>With the routes squared away, we move on to the problem of users with logins containing reserved characters. <a href="http://www.rfc-editor.org/rfc/rfc1738.txt" target="_blank">RFC 1738</a> defines what characters must be encoded in a URL:</p>
<blockquote><p> Thus, only alphanumerics, the special characters &quot;$-_.+!*&#039;(),&quot;, and<br />
    reserved characters used for their reserved purposes may be used<br />
    unencoded within a URL.
</p></blockquote>
<p>Characters aside from these in usernames must either be encoded or otherwise dealt with. Beyond RFC 1738, we should additionally consider the dollar sign and plus characters (&quot;$&quot; and &quot;+&quot;) reserved because they often serve special roles in URLs as well. And because this is a Rails app, we should consider the period (&quot;.&quot;) reserved as well, as it is used by Rails to indicate the <tt>format</tt> parameter.</p>
<p>So if a user has any reserved character in his login, what do we do? The obvious solution is to percent-encode it, creating a string like &quot;foo%2Fbar&quot;, but some might find that ugly. You could also replace these characters with dashes (or some other stand-in character), creating &quot;foo-bar&quot;, but then you run into trouble if someone actually signs up with the username &quot;foo-bar&quot;. If you&#039;re making a new website, you may opt to disallow these characters from usernames. At Scribd we use a combination of approaches: Some reserved characters (like spaces) are simply not allowed in usernames; others are allowed but by using one of these characters you &quot;give up&quot; your vanity URL, instead using the fallback profile URL.</p>
<p>If you choose to allow certain reserved characters in your usernames, but disallow those people vanity URLs, you will have to modify the <tt>user_name_valid_for_short_url?</tt> like so:</p>
<pre><code> 
def user_name_valid_for_short_url?(login)
  not (login.include?(&#039;.&#039;) and FancyUrls.cached_routes.include?(login))
end
</code></pre>
<p>This example allows users to have periods in their login, but disallows those users their vanity URLs.</p>
<p>With our vanity routes defined, we can implement the <tt>user_profile_url</tt> method:</p>
<pre><code> 
module FancyUrls
  def user_profile_url(person, options={})
    login = login_for_user(person)
    raise ArgumentError, &quot;No such user #{person.inspect}&quot; unless login
 
    if user_name_valid_for_short_url?(login) then
      short_profile_url options.merge(:id =&gt; login)
    else
      long_profile_url options.merge(:id =&gt; person)
    end
  end
 
  private
 
  def login_for_user(user_or_id)
    return (if user_or_id.is_a?(User) then
      user_or_id.login
    else
      Rails.cache.get(&quot;login:#{user_or_id}&quot;) { User.find_by_id(user_or_id, :select =&gt; &#039;login&#039;).try(:login) }
    end)
  end
end
</code></pre>
<p>The method is simple enough: We check if the user an have a vanity URL, and if so, we return it; otherwise we return the standard profile URL. I included two small optimizations: We cache the login to avoid database lookups with each method call, and we only select the fields we care about from our <tt>users</tt> table.</p>
<p>And with that, we&#039;ve got our URLs! Simply include your module as a helper and call <tt>user_profile_url</tt> to generate profile URLs as opposed to <tt>url_for</tt> or the named resource routes or whatever else you might have been using.</p>
<p>We&#039;re not quite done yet, though. What happens when a user who haplessly registered the username &quot;ratings&quot; gets screwed because we just launched our ratings feature? With the system I&#039;ve shown above, the moment we deploy our new feature, any links to that user&#039;s profile page would automatically revert to the normal profile URLs.</p>
<p>Good web practice teaches us that when we change the URL for a resource, we should respond with a 301 to any client that tries to access the old URL.  Obviously, since the <tt>/ratings</tt> URL now points to a different web page, we can&#039;t do that. Any users who visit external web pages and click a link to that user&#039;s profile URL will find themselves on your brand new ratings page. I have implemented no particular fix for this problem, as I believe most websites add very, very few controllers and named routes in comparison to the number of users they have. In other words, the problem is small enough that it&#039;s probably not worth solving.</p>
<p>We can solve the flip side of this problem, though: Once a website launches its vanity URL feature, there will still be bunches of external links to the old, longer profile URLs. We can respond to these requests with 301s to inform people that those links are now outdated. This also helps assist with SEO, getting people&#039;s new profile URLs on the Google index and getting the old ones off.</p>
<p>We do this by including code in the profile page&#039;s controller action to redirect if necessary:</p>
<pre><code> 
class UsersController
  def show
    if params[:id] then
      @user = User.find(params[:id])
      return head(:moved_permanently, :location =&gt; user_profile_url(@user)) if user_name_valid_for_short_url?(@user)
    elsif params[:login] then
      @user = User.with_login(params[:login]).first || raise ActiveRecord::RecordNotFound
    else
      raise ActiveRecord::RecordNotFound
    end
  end
end
</code></pre>
<p>We have this <tt>if</tt> statement at the start of our <tt>show</tt> method because the method is doing double-duty: It responds to both the <tt>short_profile</tt> and <tt>long_profile</tt> named routes. In the former, the variadic portion of the URL is stored in the <tt>id</tt> parameter; in the latter, the <tt>login</tt> parameter. You could of course opt to dispatch the two URLs to two separate actions; either way, make sure you respond to unnecessarily long profile URLs with a 301.</p>
<p>And with that, you&#039;ve got your vanity URLs. All it comes down to is a little bit of route-foo and some speed optimizations here and there. The solution here is tailored to the needs of Scribd; I&#039;ve done my best to outline those needs and how they impacted our code. You should think about how you want to do vanity URLs on your website and take this code as a guide to implementing your own solution. Vanity URLs take a little extra time to implement, but in return you are rewarded with users who are more willing to share their profile pages, improved SEO, and that glowy feeling you get when you increase your site&#039;s Web 2.0-ishness.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/5/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/5/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/5/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=5&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2010/09/01/vanity-user-profile-urls-in-rails/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/a81bdc1f371888f61300acc220e7ab03?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">scribd</media:title>
		</media:content>
	</item>
		<item>
		<title>Plan B: Font Fallbacks</title>
		<link>http://coding.scribd.com/2010/08/26/plan-b-font-fallbacks/</link>
		<comments>http://coding.scribd.com/2010/08/26/plan-b-font-fallbacks/#comments</comments>
		<pubDate>Thu, 26 Aug 2010 20:54:56 +0000</pubDate>
		<dc:creator>matthiaskramm</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://coding.scribd.com/?p=294</guid>
		<description><![CDATA[This is the fourth post in our series about Scribd&#8217;s HTML5 conversion. The whole process is neatly summarized in the following flowchart: In our previous post we wrote about how we encode glyph polygons from various document formats into browser fonts. We described how an arbitrary typeface from a document can be sanitized and converted [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=294&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>This is the fourth post in our series about Scribd&#8217;s HTML5 conversion.  The whole process is neatly summarized in the following flowchart: </p>
<p><a href="http://www.scribd.com/doc/34787845/Scribd-in-HTML5-How-it-Works"><img border="0" src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/flowchart.jpg" /></a></p>
<p>In our <a href="http://coding.scribd.com/2010/06/24/repolygonizing-fonts/">previous post</a> we wrote about how we encode glyph polygons from various document formats into browser fonts. We described how an arbitrary typeface from a document can be sanitized and converted to a so called &#8220;@font-face&#8221;- a font that browsers can display.</p>
<p>The next challenge the aspiring HTML5 engineer faces is if even after hand-crafting a @font-face (including self-intersecting all the font polygons and throwing together all the required .ttf, .eot and .svg files ), a browser still refuses to render the font.  After all, there still are browsers out there that just don&#8217;t support custom fonts- most importantly, mobile devices like Google&#8217;s Android, or e-book readers like Amazon&#8217;s Kindle.</p>
<p> Luckily enough, HTML has for ages had a syntax for specifying font fallbacks in case a @font-face (or, for that matter, a system font) can&#8217;t be displayed:</p>
<p><pre>    &lt;style type="text/css"&gt;
    .p {
	font-family:
	    myfontface, /* preferred typeface */
	    Arial,      /* fallback 1 */
	    sans-serif; /* fallback 2 */
    }
    &lt;/style&gt;
</pre>
</p>
<p>There&#8217;s a number of fonts one can always rely on to be available for use as fallback:</p>
<div style="font-family:Arial;">Arial (+ <b>bold</b>,<i>italic</i>)</div>
<div style="font-family:Courier;">Courier (+ <b>bold</b>,<i>italic</i>)</div>
<div style="font-family:Georgia;">Georgia (+ <b>bold</b>,<i>italic</i>)</div>
<div style="font-family:Times;">Times (+ <b>bold</b>,<i>italic</i>)</div>
<div style="font-family:Trebuchet;">Trebuchet (+ <b>bold</b>,<i>italic</i>)</div>
<div style="font-family:Verdana;">Verdana (+ <b>bold</b>,<i>italic</i>)</div>
<div style="font-family:Comic Sans MS;">Comic Sans MS (+ <b>bold</b>)</div>
<p>
</p>
<p>(Yes, that&#8217;s right- every single browser out there supports Comic Sans MS)</p>
</p>
<p>However, it&#8217;s not always entirely trivial to replace a given font with a font from this list. In the worst case (i.e., in the case where an array of polygons for a subset of the font&#8217;s glyphs is really all we have- not all documents store proper font names, let alone a complete set of glyphs or font attributes), we don&#8217;t really know much about the font face at hand: Is it bold? Is it italic? Does it have serifs? Is it maybe script?</p>
<p>Luckily though, those features can be derived from the font polygons with reasonable effort: </p>
<h2>Detecting bold face glyph polygons</h2>
<p>The boldness of a typeface is also referred to as the &#8220;blackness&#8221;. This suggests a simple detection scheme: Find out how much of a given area will be covered by a couple of &#8220;representative&#8221; glyphs. <br /> The easiest way to do this is to just render the glyph to a tiny bitmap and add up the pixels:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/letter_F.png"></p>
<p>A more precise way is to measure the area of the polygon directly, e.g. using a scanline algorithm.</p>
<p>A mismatch between the area we &#8220;expect&#8221; e.g. for the letter F at a given size and the actual area is an indicator that we&#8217;re dealing with a bold face.</p>
<h2>Detecting italic face glyph polygons</h2>
<p>A trivial italic typeface (more precisely: an oblique typeface) can be created from a base font by slanting every character slightly to the right. In other words, the following matrix is applied to every character:<br />
<table style="font-family:Trebuchet;">
<tr>
<td rowspan="2" style="font-size:300%;">(</td>
<td>&nbsp;1&nbsp;</td>
<td>&nbsp;s&nbsp;</td>
<td rowspan="2" style="font-size:300%;">)</td>
</tr>
<tr>
<td>&nbsp;0&nbsp;</td>
<td>&nbsp;1&nbsp;</td>
</tr>
</table>
<p></p>
<p>(With <span style="font-family:Trebuchet;">s</span> the horizontal displacement)</p>
<p>In order to find out whether a typeface at hand is slanted in such a way, we use the fact that a normal (non-italic) typeface has a number of vertical edges, for example in the letters L,D,M,N,h,p:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/v-normal.png" /></p>
<p>In an italic typeface, these vertical edges &#8220;disappear&#8221; (become non-vertical):</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/v-italic.png" /></p>
<p>In other words, we can spot an italic typeface by the relative absence of strict vertical polygon segments, or, more generally, the mean (or median) angle of all non curved segments that are more vertical than horizontal.  </p>
<h2>Detecting the font family</h2>
<p> As for the actual font family, we found that two features are fairly characteristic of a given font:</p>
<ul>
<li>The number of corners (i.e., singularities of the first derivative) of all the glyph outlines</li>
<li>The sign of (w1-w2) for all pairs of glyphs with widths w1 and w2</li>
</ul>
<p> For example, displayed below are the corners of two fonts (Trebuchet and Courier) and the extracted feature string: </p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/trebuchet.png" alt="" /></p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/courier.png" alt="" /></p>
<p>Of course, for a font to be mapped against a browser font, we typically only have a subset of n glyphs, hence we can only use the number of corners of a few glyphs.</p>
<p> The second feature, comparing signs of glyph-width differences, gives us more data to work with, as n glyphs generate <tt>n*(n-1)/2</tt> differences (entries in the difference matrix, with the lower left half and upper right half symmetric): </p>
<p style="vertical-align:top;"><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/arial.png" alt="" /><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/times.png" alt="" /></p>
<p> Notice that we assume in our detection approach that we actually <b>know</b> what a given glyph represents (i.e., that glyph 76 in a font is supposed to look like an &#8220;L&#8221;). This is not always the case- we&#8217;ll write about that in one of the next posts.  </p>
<p> Here&#8217;s a random selection of fonts from our documents (left) and the corresponding replacement (right): </p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/compare.png" alt="Comparison of original font and new font" /></p>
<p> And, as always, if you want to see the results of these algorithms for yourself, just grab a PDF (or any other document format), <a href="http://www.scribd.com/upload">upload it to Scribd</a>, and then download it to a (non @font-face-enabled?) <a href="http://support.scribd.com/entries/107964-using-your-mobile-device-with-scribd">mobile device of your choice</a>.</p>
<p style="text-align:left;"><em>-Matthias Kramm</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/294/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/294/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/294/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/294/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/294/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/294/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/294/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/294/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/294/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/294/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/294/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/294/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/294/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/294/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=294&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2010/08/26/plan-b-font-fallbacks/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/22c8a54e73393ef203e0d2b5b4f4cce8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">matthiaskramm</media:title>
		</media:content>

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/flowchart.jpg" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/letter_F.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/v-normal.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/v-italic.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/trebuchet.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/courier.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/arial.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/times.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/Aed3iac0/compare.png" medium="image">
			<media:title type="html">Comparison of original font and new font</media:title>
		</media:content>
	</item>
		<item>
		<title>Repolygonizing Fonts</title>
		<link>http://coding.scribd.com/2010/06/24/repolygonizing-fonts/</link>
		<comments>http://coding.scribd.com/2010/06/24/repolygonizing-fonts/#comments</comments>
		<pubDate>Thu, 24 Jun 2010 16:48:16 +0000</pubDate>
		<dc:creator>matthiaskramm</dc:creator>
				<category><![CDATA[Scribd Reader]]></category>

		<guid isPermaLink="false">http://scribdtech.wordpress.com/?p=171</guid>
		<description><![CDATA[This is the third of a four-part series on the technology behind Scribd&#8217;s HTML viewing experience. You might like to read part 1, “Facing Fonts in HTML” and part 2, “The Perils of Stacking,” if you haven&#8217;t already. Part 4, “Plan B: Font Fallbacks” is coming soon. Every single day, Scribd processes over 150,000,000 polygons in [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=171&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><em>This is the third of a four-part series on the technology  behind <a href="http://www.scribd.com/documents/5/Image-Cluster-Compression">Scribd&#8217;s HTML viewing experience</a>. You might like to read part 1, “<a href="http://coding.scribd.com/2010/05/17/facing-font-in-html/">Facing Fonts in HTML</a>” and part 2, “<a href="http://coding.scribd.com/2010/06/01/the-perils-of-stacking/">The Perils of Stacking</a>,” if you haven&#8217;t already. Part 4, “Plan B: Font Fallbacks” is coming soon.</em></p>
<p>Every single day, Scribd processes over 150,000,000 polygons in order to convert your uploaded documents into our new HTML format (among others).</p>
<p>So why on earth would something like this be necessary? In order to get to that, we first have to talk a bit about fonts. A font, in its most simplistic form, is just a bundle of polygons, with a Unicode index attached to each one. It’s these font polygons this post is about.</p>
<p>Of course, Truetype fonts and their descendants (.eot fonts, OpenType fonts etc.) store much more than that, with a typical font containing thousands of lines of vertex-adjusting bytecode programs (hence the name “font program”), multiple encoding tables, a <a href="http://developer.apple.com/fonts/TTRefMan/RM06/Chap6OS2.html">vast amount of miscellaneous font metrics</a>, etc. This is what this post is <em>not</em> about.</p>
<p>For our HTML conversion, we had to solve the following problem: How do you encode an arbitrary polygon into a font glyph for use in a @font-face declaration? The answer is font repolygonization, which is the process of optimizing the polygons in a given font. It turns out that to repolygonize a font properly, you first have to know a thing or two about fill-rules.</p>
<p>Polygons in computer graphic generally come in two “flavors”: even-odd-filled or nonzero-filled.  These two schemes use different semantics to determine whether a given point is inside or outside of the glyph. For even-odd, every line in the polygon separates something on the inside from something on the outside:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/evenodd.png" alt="" /><br />
<em> Even-odd filled glyphs</em></p>
<p>For nonzero, on the other hand, the direction of the line matters. Two lines in the same direction mean the area after the lines is still filled; drawing a hole into a polygon needs to be done by a line going into the other direction:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/nonzero.png" alt="" /><br />
<em> Nonzero-filled glyphs (with direction indicators)</em></p>
<p>For even-odd polygons, the segment direction doesn’t matter, but for nonzero-filled polygons it’s also necessary to draw segments in the right direction. Truetype, and embedded OpenType (eot) fonts use the nonzero fill style. For .svg fonts, you get to choose (see the <a href="http://www.w3.org/TR/SVG/painting.html#FillProperties">fill-rule property</a>). So to properly encode any polygon as font glyph, we need to “fix” the fill-rule.</p>
<p>Another thing that’s important for properly encoding a Truetype font is that <strong>all segments need to be continuous</strong>. Depending on where the source polygon comes from, this isn’t necessarily the case. In some vector graphic formats (e.g. SWF), polygons can easily be drawn in an arbitrary order and still define a filled shape:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/mixed.png" alt="" /><br />
<em> degenerate glyph (even-odd filled)</em></p>
<p>We found that while we’re recoding and fixing a polygon, a convenient way to store the end result is as a <strong>hybrid polygon</strong> which is, at the same time, even-odd as well as circular:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/hybrid.png" alt="" /><br />
<em> Self intersected, both even-odd as well as nonzero-filled glyphs</em></p>
<p>At Scribd, we greatly rely on this hybrid approach when repolygonizing fonts in order to be able to produce output files for both our Flash and HTML reader platforms at the same time. “But wait!” you cry, “you said you generate HTML fonts from fonts stored in PDF files?  Why aren&#8217;t those already encoded with the right fill-rule and continuous segments?” Good question.</p>
<p>First of all, we simplify font polygons, thus potentially breaking the type of fill-rule. Secondly, some “fonts” in a PDF are no more than a vector drawing in font’s clothing. Finally, font characters might get overlapped, transformed, combined, intersected etc. during conversion.</p>
<p>So how do you convert a polygon to a different fill-rule? This gets us back to the polygon intersection of those 150,000,000 polygons per day.  You intersect the polygon with itself and relabel them at the same time with a fill-rule of your choice. We actually not only intersect the polygon with itself but also with various elements like clipshapes in order to remove invisible or half-hidden elements on the page.</p>
<p>In theory, this is a rather simple approach: For every character element to be drawn on the page, intersect that character’s polygon with itself and all other page polygons it possibly interacts with, then store the result of that intersection in the font. There are two perhaps non-obvious problems, though:</p>
<ol>
<li>This polygon intersection needs to be <strong>fast</strong>. We process hundreds of thousands of pages every day.</li>
<li>It needs to be <strong>very</strong> stable. We want to introduce zero display errors.</li>
</ol>
<p>So while long processing speed can easily be countered by buying some more fancy hardware, getting a polygon intersection to be stable is something entirely different. For example, here are some classic pathological cases that need to be correctly handled.  Notice that cases like these are actually the rule rather than the exception.</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/cases.jpg" alt="" /></p>
<p>Horizontal lines throw off scanline algorithms, multiple intersections in a single point lead to subtle numeric errors, and intersections taking place on top of lines can introduce hairline fragments that will cause the rendering engine displaying these polygons to “bleed.”</p>
<p>While it’s possible to deal with each of these cases, floating point arithmetic can actually cause them to occur <strong>after or during</strong> the computation: For example, if you add a point to a polygon segment because of an intersection, the resulting small change in the segment slope can cause it to overlap another segment it was previously disjunct from.</p>
<p>There are basically three ways to deal with this precision problem:</p>
<ul>
<li>Introduce some random perturbation to all polygon endpoints. This nicely takes care of horizontal lines and three point intersections, and makes the probability of floating point errors introducing extraneous overlaps very small. However when concatenating intersections the perturbations tend to accumulate. It’s possible to counteract this with <a href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=1&amp;ved=0CAcQFjAA&amp;url=http%3A%2F%2Fwww.cs.tau.ac.il%2F~danha%2Fcourses%2FSEM2002%2Fmoshe-symb-perturbations.ppt&amp;rct=j&amp;q=symbolic+perturbation&amp;ei=NFrYS9TLGJC3rAff4KW-Bw&amp;usg=AFQjCNFh1lSD1KJX9FTGwmBdT-TqDnZ39g&amp;sig2=2GRV0QrMUmJAdexAm6G4Sg">symbolic perturbation</a>, but still, this is cheating. We wouldn’t be solving the problem, only make it harder to reproduce.</li>
<li>Work with infinite precision. In other words, do away with floating point errors by introducing enough bits in the numeric representation of the coordinates so that we never have to do any rounding. This approach has been verified in practice (see e.g. <a href="http://portal.acm.org/citation.cfm?id=161015">Fortune &amp; Wyk</a>). The main problem is going back from the infinite precision representation to the finite precision final version (which we need to actually store the data).</li>
<li>Work on an integer grid, and use <strong>snap-rounding</strong>. Don’t let the floating point numbers lure us into the security of false precision, and explicitly deal with all special cases that can occur.</li>
</ul>
<p>Let&#8217;s look at the last item in more detail. So in the example of these glyphs:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/before.png" alt="" /></p>
<p>A snap-rounded version would look like this:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/after.png" alt="" /></p>
<p>You see that snap-rounding doesn’t just reposition all polygon endpoints, it also puts all the intersection points on the integer grid. Also <strong>all lines that come into the vicinity of a used grid point are routed through that grid point as well. </strong>This way, no polygon edges can ever come to lie too close to the vertex of another edge. And, while snap-rounding causes things to look blocky in the example above, this is because we choose a very coarse-grained grid for illustration.  In our production system we operate with a grid spacing that is a tiny fraction of a pixel (or for glyphs, a tiny fraction of an em square).</p>
<p>Intersecting two polygons while at the same time grid-rounding the results also uses only slightly more computation time than standard scanline algorithms: while a <a href="http://oai.dtic.mil/oai/oai?verb=getRecord&amp;metadataPrefix=html&amp;identifier=ADA058768">Bentley &amp; Ottman scanline sweep</a> needs O(n log n + k log n) time, snap-rounding can be done in up to O(E log n), with E the <a href="http://www.springerlink.com/content/p46t8502v6q6g687/">description complexity of the crossing pattern</a> which for typical polygons is of order (n+k). The nice thing about intersection using grid-rounding is that it’s rock-solid. There&#8217;s no corner-case that’s not explicitly handled, and overcomplicated intersection patterns automatically simplify themselves. If you’re interested in the nitty-gritty details of our implementation, we basically used the <a href="http://ect.bell-labs.com/who/hobby/93_2-27.pdf">algorithm from Hobby</a> with a few additional ideas from <a href="http://www.springerlink.com/content/p46t8502v6q6g687/">Hershberger</a> and <a href="http://cccg.ca/proceedings/2007/07a1full.pdf">Bhattacharya et al</a> in order to produce a grid-rounded glyph polygon intersection in a single scanline pass.</p>
<p>Also, even though we process those hundred million polygons a day, only 10% of document conversion time is actually spent on that. All the rest goes into image processing. We’ll talk about that in one of our next posts.</p>
<p><em>—Matthias Kramm</em></p>
<p><strong>Coming Soon</strong>: <em>Plan B: Font Fallbacks</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/171/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/171/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/171/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/171/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/171/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/171/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/171/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/171/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/171/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/171/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/171/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/171/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/171/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/171/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=171&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2010/06/24/repolygonizing-fonts/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/22c8a54e73393ef203e0d2b5b4f4cce8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">matthiaskramm</media:title>
		</media:content>

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/evenodd.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/nonzero.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/mixed.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/hybrid.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/cases.jpg" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/before.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/pYur8Mnv/after.png" medium="image" />
	</item>
		<item>
		<title>The Perils of Stacking</title>
		<link>http://coding.scribd.com/2010/06/01/the-perils-of-stacking/</link>
		<comments>http://coding.scribd.com/2010/06/01/the-perils-of-stacking/#comments</comments>
		<pubDate>Tue, 01 Jun 2010 19:01:18 +0000</pubDate>
		<dc:creator>matthiaskramm</dc:creator>
				<category><![CDATA[Scribd Reader]]></category>

		<guid isPermaLink="false">http://scribdtech.wordpress.com/?p=156</guid>
		<description><![CDATA[This is the second of a four-part series on the technology behind Scribd’s HTML viewing experience. You might like to read part 1, “Facing Fonts in HTML” and part 3, “Repolygonizing Fonts,” if you haven&#8217;t already. Part 4, “Plan B: Font Fallbacks” is coming soon. A document page, unlike an image, isn’t really just a two-dimensional [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=156&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p><em>This is the second of a four-part series on the technology  behind <a href="http://www.scribd.com/documents/5/Image-Cluster-Compression">Scribd’s HTML viewing experience</a>. You might like to read part 1, “<a href="http://coding.scribd.com/2010/05/17/facing-font-in-html/">Facing Fonts in HTML</a>” and part 3, “<a href="http://coding.scribd.com/2010/06/24/repolygonizing-fonts/">Repolygonizing Fonts</a>,” if you haven&#8217;t already. Part 4, “Plan B: Font Fallbacks” is coming soon.</em></p>
<p>A document page, unlike an image, isn’t really just a two-dimensional thing.</p>
<p>It’s not until you’ve been forced to dig into the internals of the PDF format that you come to appreciate the rich structure an innocent looking document page gives you. Vector fills, gradient patterns and semi-transparent bitmaps fight over dominance in the z-order stack, while clip-polygons slice away whole portions of the page, only to be faded into the background by an omnipotent hierarchical transparency group afterwards. So how does one convert this multitude of multi-layer objects into an html page, which is basically nothing more than a background image with a bit of text on top?</p>
<p>To understand the problem better, here’s a stacked diagram of a page:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/before.jpg" border="0" alt="" /></p>
<p>At the bottom of the stack we have a bitmap (drawn first), then some text, followed by vector graphics, and finally another block of text on top. We don&#8217;t currently support vector graphics in HTML <em>(stay tuned &#8230;);</em> instead, we convert polygons to images which presents us with the challenge of finding a z-order of bitmaps and text elements that preserves the drawing order of the original page, while also simplifying the structure.</p>
<p>An optimal solution of transforming the above document page into a bitmap/text layering might look like this:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/after.jpg" border="0" alt="" /></p>
<p>You see that here we merged two images into one even though they were not adjacent in the rendering stack, by using the fact that the text between the two images didn&#8217;t intersect with both of them.</p>
<p>This was a simple case where it’s actually enough to put one solitary bitmap at the background of a page. It also may happen that you have to put transparent images on top of the text (i.e, give them an higher z-index value). Notice that this requires the <a href="http://homepage.ntlworld.com/bobosola/">IE6 transparency hack</a>.</p>
<p>In order to figure out whether or not an element on the page shares display space (i.e., pixels) with another element, we keep a boolean bitmap around during conversion:</p>
<table>
<tbody>
<tr>
<td><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/origbig.jpg" alt="" /></td>
<td><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/boolbig.png" alt="" /></td>
</tr>
<tr>
<td><em>Element on page</em></td>
<td><em>Corresponding boolean bitmap</em></td>
</tr>
<tr>
<td></td>
<td></td>
</tr>
</tbody>
</table>
<p>This bitmap tells us which regions of a page currently have been drawn by e.g.  polygon operations, and thus which pixels need to be checked against new text objects in preceding layers. In fact, we actually keep two of those bitmaps around; one for keeping track of the area currently occupied by the next bitmap layer we’re going to add to the display stack, and one for keeping track of the same thing for text objects.</p>
<p>There&#8217;s an interesting fact about this approach: As long as the things drawn so far onto a bitmap and the html text fields don’t overlap, we&#8217;re free to chose the order we draw them.</p>
<p>In other words, it&#8217;s not until we’ve drawn the first object intersecting with another layer that we decide which of those two layers to dump out to the page first.</p>
<p>Here&#8217;s an example of a document page being rendered step by step:</p>
<p>The background image is put on the lowermost layer. Notice that the background also contains graphical elements for the equations on this page:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/background.jpg" border="1" alt="" /></p>
<p>This is the text layer, consisting of normal HTML markup using fonts extracted from the document:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/foreground.jpg" border="1" alt="" /></p>
<p>The two layers are combined to produce the final output:</p>
<p><img src="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/combined.jpg" border="1" alt="" /></p>
<p>Take a look at <a href="http://www.scribd.com/documents/5/image-cluster-compression">the actual technical paper as converted</a>. And of course, if you want to see your own docs htmlified, just <a href="http://www.scribd.com/upload">upload them to Scribd.com</a>!</p>
<p style="text-align:left;"><em>—Matthias Kramm</em></p>
<p style="text-align:left;"><strong>Next:</strong><em> <a href="http://coding.scribd.com/2010/06/24/repolygonizing-fonts/">Repolygonizing Fonts</a><br />
</em></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/scribdtech.wordpress.com/156/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/scribdtech.wordpress.com/156/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/scribdtech.wordpress.com/156/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/scribdtech.wordpress.com/156/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/scribdtech.wordpress.com/156/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/scribdtech.wordpress.com/156/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/scribdtech.wordpress.com/156/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/scribdtech.wordpress.com/156/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/scribdtech.wordpress.com/156/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/scribdtech.wordpress.com/156/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/scribdtech.wordpress.com/156/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/scribdtech.wordpress.com/156/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/scribdtech.wordpress.com/156/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/scribdtech.wordpress.com/156/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=coding.scribd.com&amp;blog=6638842&amp;post=156&amp;subd=scribdtech&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://coding.scribd.com/2010/06/01/the-perils-of-stacking/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://0.gravatar.com/avatar/22c8a54e73393ef203e0d2b5b4f4cce8?s=96&#38;d=identicon&#38;r=G" medium="image">
			<media:title type="html">matthiaskramm</media:title>
		</media:content>

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/before.jpg" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/after.jpg" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/origbig.jpg" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/boolbig.png" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/background.jpg" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/foreground.jpg" medium="image" />

		<media:content url="http://blog.scribd.com.s3.amazonaws.com/rexoot5D/combined.jpg" medium="image" />
	</item>
	</channel>
</rss>
