Canadian Information Processing Society (CIPS)
 
 

CIPS CONNECTIONS

INTERVIEWS by STEPHEN IBARAKI, I.S.P.

World Authority/Contributor to the PHP Project and Apache Module Author

This week, Stephen Ibaraki has an exclusive interview with an international authority in PHP, George Schlossnagle.

George is a Principal at OmniTI Computer Consulting, a Maryland-based tech company specializing in high-volume web and email systems. Before joining OmniTI, George lead technical operations at several high-profile web sites where he gained experience managing PHP in very large enterprise environments. George is a frequent contributor to the PHP community. His work can be found in the PHP core, as well as in the PEAR and PECL extension repositories.

George is a frequent speaker on PHP and web technology. He is a regular speaker at PHP-Con, ApacheCon, the International PHP Conference as well as many other local and regional engagements. His writings have appeared in SELECT magazine (Journal of the International Oracle Users Group) and PHP Magazine. He has forthcoming articles in php|architect and Oracle Technology Network. His book, Advanced PHP Programming, published by Sams Publishing, is garnering wide attention.

Before entering into information technology, George trained to be an applied mathematician and served a 2 year stint as a teacher in the Peace Corps. His experience has taught him to value an inter-disciplinary approach to problem solving that favors root-cause analysis of problems over simply addressing symptoms.

Discussion:

Q: George, you are one of the world’s foremost authorities in PHP. We appreciate you taking the time to speak with us.

A: Thanks, it's my pleasure.

Q: Give us a life history, explain how did you get into computing, and describe some valuable lessons learned?

A: My first computer was an Apple 2e, which I got during middle school. Unlike other more prolific hackers, I was only a casual user and only did simple programming. My first 'hardcore' exposure to computing was in 1993 when I interned in a summer program at Argonne National Labs. I had been looking for an internship in pure mathematics (what I was majoring in), but since that type of internship is very rare I ended up doing research on wavelet techniques for solving fluid mechanics problems. This involved writing numerical analysis software in Fortran.

Over the next several years I continued studying mathematics in graduate school, crossing back and forth over the line between theoretical and computational analysis. When I finally left graduate school the tech bubble was still inflating and the internet seemed a natural place to go. I had been maintaining my grad school department's UNIX systems and was competent in Perl and teaching myself C (a big break from the Fortran that many people still use in computational mathematics!), so I got a job doing systems administration at iVillage.com. I enjoyed the high-pressure environment of dot-com operations, so after a year there, I joined Community Connect, Inc. (CCI) which runs the online community sites BlackPlanet.com, AsianAvenue.com and MiGente.com.

At CCI I was Director of Operations and was responsible for the architecture all technical aspects of the site. The CCI management gave me great leeway with infrastructure changes, and in the process of growing the site to 130 million dynamic page requests per day, I learned a tremendous amount about running PHP on a large site.

In 2002, I left CCI to join my brother at OmniTI, a small consulting company he had formed and which I had been doing off-hours work for several years. OmniTI is a small shop and we specialize in building scalable web and email architectures. For the past year I've devoted most of my work time to developing Ecelerity, a high speed MTA.

Q: What is your most surprising experience?

A: Gosh, that's a hard one. In the technical venue, I would have to say the first time someone I didn't know personally used a piece of open source software I wrote. On a personal front, however, my wife and I are expecting our first child in the fall, so I've been told that I will have many surprises ahead.

Q: Do you have any humorous stories to share?

A: I have a rather dark sense of humor, so for me a funny experience would be something like traveling to Portland for a conference and finding my reservation cancelled and all the nearby hotels fully booked. It was funny in retrospect at least.

Q: Please share your experiences in the Peace Corps.

A: After two years in graduate school I began to feel that the academic life was not for me; however I was afraid that if I took a job in industry I would become dependent on the income and never leave, so I went looking for something with a fixed tenure. Offering two years abroad, the Peace Corps seemed perfect. After a rather long application process, I had the opportunity to serve as a secondary school Math teacher in Nepal. My fellow volunteers and I underwent intensive in-country language training, and I was stationed in the village of Khamlalung, in Tehrathum district which lies in the eastern hills of Nepal.

Khamlalung was pretty remote - it lies two days from the nearest road and doesn't have electricity or running water. It was also very beautiful. From the top of the ridgeline above my house, there was an incredible view of Kangchenjunga (third highest mountain in the world.) I spent two years in Khamlalung teaching 4th through 10th grade Math and English, and working on a number of community projects.

Living in the developing world is an amazingly different experience from living in the US. Aside from the obvious problems such as language difficulties, lack of amenities, and loneliness, there are a wide range of cultural differences that you can either choose to embrace or distance yourself from. I tried my best to embrace them and made a number of great friends as a result. It was both the hardest thing I've ever done, and the most rewarding.

Unfortunately, in the past couple years Khamlalung (like many places in Nepal) has become embroiled in military actions between Maoist insurgents and the Nepali army, hurting many friends, acquaintances and students.

Q: You are a regular speaker at PHP-Con, ApacheCon, and the International PHP Conferences. Please share some valuable speaking and technical tips from these conferences.

A: I think too many speakers underestimate their audiences. Speakers often target the difficulty of their topic at the median audience member, and as a result end up going too slowly for half the people there. I try to make my talks a bit more advanced, or at least more fast-paced. I would rather have people ask me questions after the session than to bore everyone to death.

Q: Can you share your tips from your writings in SELECT magazine and PHP Magazine?

A: SELECT was the first non-academic journal I wrote for. In retrospect the article was pretty dry - it covered capacity planning techniques for running Oracle on Solaris. Although the material was narrowly focused, the basic idea was simple: in a web environment where you have a large number of connections to a database, the key to avoiding resource exhaustion is controlling connection concurrency. To do this, you measure the average resource utilization for a single connection and then use that to determine how many connections you can feasibly support. This general strategy works for everything from sizing MySQL instances to webserver instances, to any sort of client/server application.

My PHP Magazine article described the inner workings of compiler caches. In PHP, all data that is created during a request is destroyed at the end of it, including the parse tree for the script that was executed. This means that every time a script is run (or included by another script), it must be read in from disk and parsed before it is executed. For large scripts, or scripts with a large number of includes, this can be extremely expensive. A compiler cache saves the results from the initial parse of every script and allows you to avoid the parse overhead on subsequent requests. Because it runs inside the Zend Engine (the scripting core of PHP), a compiler cache is completely transparent to the programmer and requires no modification of PHP code to function. It's about the closest you can come to a configuration setting like 'fast = true'.

Q: What can you share from your articles in php|architect and Oracle Technology Network?

A: I've written two articles for php|architect, the most recent an introduction to regular expressions. Many PHP programmers are scared off by regular expressions, mostly because they are a relatively complex and terse language in and of themselves. That's unfortunate, because when properly used regular expressions are an incredibly powerful tool for analyzing and modifying text (which is a major part of web programming). Hopefully my article dispels some of the mystery surrounding regular expressions.

My OTN article is part of their series 'A Hitchhiker's Guide To PHP'. It discusses how to avoid common pitfalls when building large sites around PHP. The series also contains articles by core developers Rasmus Lerdorf and Wez Furlong , so it's well worth checking out.

Q: Describe your collaborations with Sterling Hughes, a core PHP contributor.

A: Sterling and I are good friends. We co-presented a tutorial on performance tuning PHP at the International PHP Conference in 2003, and he was a technical editor on Advanced PHP Programming. He's made contributions to the APC and APD extensions for PHP. Like many of the best technical relationships, our exchange of ideas has had much more impact on our respective projects than any direct collaboration. We're hoping to write a book together later this year.

Q: Your most recent book has a strong endorsement from Rasmus Lerdorf, creator of PHP. Share your experiences with the book and with Rasmus.

A: I met Rasmus for the first time at the Apachecon 2000 conference. Although I doubt he remembers the meeting, it was the first inspiration I had for writing the APC compiler cache for PHP. Our conversation went something like this:

GS: I'd like to use a compiler cache but the Zend Accelerator is really expensive.

RL: So go write your own. It can't be that hard, the hooks all have to be there.

GS: (panicked) Uhm… yeah, I guess so

And a couple months later, after coming to terms with the fact that he had to be right, APC was born.

Advanced PHP Programming is my first book. The PHP book market is flooded with introductory texts that walk you up from a basic level, but I wanted a book that would be interesting to people that already know PHP well. In the personal reviews I've read on my book, a number of people have said things along the lines of "I didn't think that there was anything to learn about PHP that I didn't already know." If I can please that sort of reviewer, I'll consider the book a success.

Q: What ten compelling tips can you share from the book?

A: In many ways the most critical aspect of being a professional programmer is being organized. These tips are pretty obvious and not specific to PHP, but enough people ignore them that they are worth mentioning:

1) Document your code. "Self-documenting code" tends to mean that you're too lazy to correctly document your code. Not only does this make it hard for others to understand your code but for yourself as well, if you write any volume of code (at least when you become old and forgetful like me). Documentation should be insightful, and guide the reader through potentially confusing logic.

2) Keep your API consistent. If you have a set of function which require a resource (say a database handle), make sure it always appears in the same position. This greatly reduces human error and keeps people from constantly having to reference your documentation or source code.

3) Organize your code. Organization means several things. First, reduce duplicated code by abstracting it into functions. Second, group similar functions and classes together in include files so that you know where to find what you are looking for. PHP parses and executes all includes at runtime, so don't go overboard with creating a large library tree, but also don't be afraid to have to include 5 or 10 files on every page. It's always easier to combine things than to pull them apart.

4) Use a change-control system religiously. I like CVS, because it's easy to find developers that know it; but Subversion, BitKeeper and SourceSafe are fine options as well. The critical part is that you should be able to easily roll your project back to any point in time and have a complete record of all changes to the source.

5) Test your code. Using Unit Testing consistently is a hard habit to acquire, but well worth it. The largest obstacle to refactoring or enhancing in a large code base is that a small change may have unexpected affects throughout the rest of the project. A comprehensive Unit Testing suite is your insurance policy against this uncertainty.

On the performance/scalability side of things:

6) Use a compiler cache. I talked a bit about why this is useful above, but it's worth reinforcing that a compiler cache is by far the easiest way to improve the performance of your site.

7) Look for opportunities to use caching techniques in your code. Most dynamic websites are not wholly dynamic, in the sense that their data is often static for seconds, minutes or even hours. Exploit this short-term static-ness for performance benefits.

8) Profile your code. It is pure hubris to think you know where all the bottlenecks are in your code. Using a profiler (like APD or Xdebug) will help you gain insight into the code path taken through your scripts, and where time is being spent.

9) Control your resource usage. Making 100 database calls on a page or making inline calls to remote SOAP services is just crazy. Don't tie your own performance to that of third-party services which you can't directly control.

10) Design your projects for horizontal scalability. Websites grow and shrink in popularity, and your application should be able to grow and shrink with incremental addition and subtraction of hardware resources.

Q: Can you provide debugging tips?

A: Many people like stepping debuggers (which allow you to step through your code instruction by instruction, inspecting and modifying variables as you go). Xdebug is one of the better PHP debuggers in that sense. In an article I can no longer find a reference for, Martin Fowler contends that the use of modern debuggers actually slows the development of bug free code. The idea is that by prematurely focusing on minutiae, you tend to lose the forest for the trees. Instead, comprehensive test suites helps you quickly find the location of your potential bug.

Q: What future books can we expect from you?

A: There are a few I'm thinking about, but since the proposals aren't finished yet their details are secret for now. What I can say is that they all revolve around PHP and Apache, which are the two technologies I am most fond of.

Q: What are the most important trends to watch, and please provide some recommendations?

A: Well, I'm not a very good predictor of technology trends. I remember looking at eBay back in 1999 and thinking 'Boy, these guys have no product.' But since you asked...

1) RSS and syndication formats. They change the way the web is traversed.

2) Convergence of scripting languages. We are already seeing this with many of the Microsoft languages compiling to CLR. Sterling and Thies Arntzen are working on a PHP compiler that targets Parrot, the engine to power Perl 6. This convergence is powerful as it helps pool development resources to make all the concerned languages stronger.

3) Anti-Spam technologies are on the brink of a revolution. There are many proposed standards on the table including Yahoo!'s DomainKeys, Microsoft's Caller-ID, and SPF. It will be interesting to see where the next year or two takes us there.

4) Look for PHP to continue penetrating the corporate sphere. Although it tends to get little publicity, the number of high-profile companies using PHP for both internal and external web-based applications is growing.

Q: What are your top recommended resources for both businesses and IT professionals?

A: I read tech news voraciously, so it's hard to come up with a short list of resources. I subscribe to several hundred RSS news feeds -- everything from major news portals such as Yahoo!'s aggregated feeds, the BBC and Wired to small weblogs. Some of my favorite sources of information (like the BoingBoing weblog or Bruce Schneier's Cryptogram) aren't really technical resources, but are very interesting and thoughtful reads. Basically, if it's smart and/or funny I'll read it. RSS feeds make it easy to scan through news items efficiently to determine if I'm really interested without spending the entire day surfing the web.

The only resource I can't imagine being without it Google. How I functioned before it's invention is lost on me. News feeds provide a broad net for information gathering, and knowing how to use a search engine well allows you to drill as deep as you want into any item.

Q: What kind of computer setup do you have?

A: I use a Linux desktop (Redhat 7.3, because it works and I don't want to break it by upgrading) and a Apple G4 Powerbook. I've tried to standardize on one or the other, but there are too many aspects of each that I like.

Q: If you were doing this interview, what three questions would you ask of someone in your position and what would be your answers?

A: Q1: Why PHP?

A1: PHP has a number of things going for it in the web space. It's highly portable, relatively fast, and has a very shallow learning curve. Unlike a language like Java, where you really need skilled technicians to program it, you can teach a non-programmer basic PHP in hours. PHP also has an extremely strong community behind it, so there is good assurance that the language will continue to evolve.

Q2: In Advanced PHP Programming, you've chosen to target PHP5, which is still in beta. What motivated that choice?

A2: In truth, I hope the book is equally applicable to both versions of PHP. My goal was to share my experiences on how to write solid, manageable PHP applications. Ninety percent of the book is completely agnostic to PHP versions. Aside from that general philosophy, PHP5's object model is a real step up from that in PHP4, and I wanted to showcase some of the changes.

Q3: When is a final release PHP5 coming out?

A3: When it's done.

Q: Do you have any more comments to add?

Q: George, thank you again for your time, and consideration in doing this interview.

A: Thank you.