Canadian Information Processing Society (CIPS)
 
 

CIPS CONNECTIONS

INTERVIEWS by STEPHEN IBARAKI, I.S.P.

International Authority in Pearl, PHP, Java, Lingo, AppleScript, HTML, Web Design, Web Developer, Trainer and Writer

This week, Stephen Ibaraki has an exclusive interview with an international authority in Pearl, PHP, Java, Lingo, AppleScript, HTML, Web design, and development, Web developer, trainer and writer, Matt Zandstra.

Matt is a widely respected and accomplished programmer and writer/author. His company, Corrosive (http://www.corrosive.co.uk), develops enterprise applications, conducts open source and open standards training seminars, and provides technical consultancy. He is the author of SAMS Teach Yourself PHP in 24 Hours. He is currently working on a book about PHP 5 and design patterns. His recent article 'Reflecting on PHP 5' appeared in the January edition of Linux Magazine.

Discussion:

Q: Matt, you are a busy leading authority/specialist in Web programming and design. We appreciate you taking the time to speak with us.

A: Thank you for asking me.

Q: It is interesting that you have a BA (Hons) in Philosophy with Literature from Sussex. Give us a life history, explain how did you get into computing, and describe some past positions?

A: Academically I was interested in the humanities, particularly history and literature. In my spare time I played with early microcomputers like the ZX 81 and Dragon 32, first typing in code listings from magazines and later learning the rudiments of BASIC to write my own simple games and homework cheaters.

After I graduated from university I acquired my first Macintosh and fell in love. The GUI was a revelation, information space made tangible. I spent months customizing my desktop and learning every package I could lay my hands on. I became fascinated by desktop publishing and I started to pick up small design jobs.

At last I borrowed money to get a dial-up account and that changed everything. It was as if I'd been staring into a cardboard box for years, and someone opened it up. I didn't know where to start. I signed up for a full-time computer media course at an organization called Bibliotech, learning Photoshop, Quark, and, of course, HTML. The HTML teacher dropped out suddenly after the third week, and I was asked to take over as lecturer and then to help set up their Internet production department.

We did good work at Bibliotech, learning as we went, but it was mainly static – brochure stuff and an e-zine. I wanted to do more: particularly forums and article feedback mechanisms. I bought myself a book on Perl and a book on Unix, and that was it. I couldn't stop.

That was a period of sponge learning for me. Javascript, Java, shell script, Lingo. I was never sure whether I learned to fulfill projects, or I chose projects in order to learn more.

I left Bibliotech and, with my business partner, Max Guglielmino founded Corrosive which is where, apart from a fantastic year as Web Manager for Time Out, I have been ever since.

In terms of coding I have moved from breadth to depth, focusing on core enterprise technologies such as PHP, Java and XML, and placing much greater emphasis on the principles of programming and patterns of software design.

Q: You have worked as lead developer on substantial projects for Unilever, the BBC, the Office for National Statistics, Virgin Clothing, Time Out, and the Credit Card Research Group. Share a few “surprising” stories from your work.

A: I suppose the greatest surprises have always come from failures in communication. That was the hardest lesson we learned as a company. For many of our early projects we walked away from initial meetings convinced we knew what we were building, and that the client did too. Then half way through production, or even later, the client would innocently ask about some feature we hadn't discussed. We'd all pour over the initial requirement documentation and subject every sentence to rigorous literary criticism. Usually the client was absolutely right; the new requirement could indeed be inferred from the high level specification. It is our responsibility to define requirements with our clients, so we took the hit. We soon bumped communication up our list of priorities.

Q: Can you describe your work at Corrosive?

A: It comes down to talking, planning and writing. All our projects begin with a requirements phase which involves a lot of back-and-forth with the client. This part of a project is often a hard sell. Some clients understandably want to cut to production as fast as possible. It is amazing, though how much time and money we end up saving by focusing on requirements and design early on.

I am happiest, though, writing code and I still get to do a lot of coding. I especially like it when a project makes the transition from diagrams and mock-ups to functional code. My business partner, Max is responsible for graphic design and interface, and the moment that a project runs with its clothes on, that is dressed up with a properly designed interface, is always magic.

I'm doing a lot of writing and training at the moment, so senior coder Tolan Blundell is handling much of the programming. We're both object-oriented design pattern junkies, so it's a good fit. Writing is an important part of what I do. It complements the courses we run and gives me the opportunity to research and try things out.

We started out in a training organization and we've never left that side of things behind. We recently extended our suite of open source/open standards training packages, and I'm busy creating course materials and looking for partners to help expand the service.

Q: What makes your book, SAMS Teach Yourself PHP in 24 Hours Third Edition, a compelling read and essential resource? With so many books on the market, how is it different from the “other” books?

A: A book at this level is hard to pitch right. I want to make it friendly for the novice and still offer a valuable resource for the experienced programmer. I hope I've found the right curve to do that. Certainly the feedback from the previous editions suggested that we got it right in the past, so I'm hoping we have carried it forward.

This edition is updated to cover PHP 5 and that's reflected throughout the book in code examples and design decisions. PHP 5 will have some impact on programming style, with object-oriented techniques becoming more central to many projects. The book reflects this, without abandoning traditional approaches, of course.

PHP 5 no longer comes bundled with MySQL, and the book reflects this, covering the excellent SQLite which is bundled. We also look at the PEAR::DB package which is an excellent way of making projects usable across a range of SQL databases.

PEAR (http://pear.php.net/) is coming into its own at the moment and really expanding the possibilities of the language, so we introduce PEAR and a few of its many packages.

Q: Share five of your high-powered PHP productivity techniques.

A:

  1. Use version control software. This is particularly important if you are working with other programmers on your project, but you'll benefit even if you are a solo programmer. Version control software (like the ubiquitous CVS) does two jobs for you. It lets you back up your project incrementally, so if you take a wrong turn it is easy to roll back and try again. It also merges changes from different locations, making it easy for multiple programmers to work on a project without overwriting one another's work. If you're not already using CVS and this convinces you to start, then I expect a thank-you email when it saves your project or job. Take a look at http://cvsbook.red-bean.com to read more.

  2. Automate documentation. Documenting software can make your project much easier to manage. New programmers will find their feet faster, and you will be able to chase and prevent bugs with greater efficiency. If you leave documentation to the end of a project you won't do it at all. We all know it. So use phpDocumentor (http://phpdocu.sourceforge.net) which lets you add inline documentation as you work, and outputs a polished set of hyperlinked HTML pages which provide a splendid overview of your project.

  3. Automate tests. If you are working with objects and classes, take a look at PHPUnit (http://pear.php.net/package/PHPUnit). By creating suites of automated tests you can catch many bugs as they occur. All projects are networks of interdependency. Changing code in one part of a project can have unforeseen consequences in another. A test framework can give you early warning of problems like this, and help you pinpoint the affected areas.

  4. Organize your code into packages. Break your code into library files, and organize your libraries into separate directories. Use a naming scheme for your classes (PEAR packages use a good one) to prevent name clashes. Place your package directories in one of PHP's include_path directories, and reference them using  require_once():
    require_once(
    "directory_name/classname.php" );
    This makes it easy to share library code between projects, and promotes reuse.
  5. Write housekeeping code. PHP is useful for more than Web scripting. PHP can be run from the command line, too. Take full advantage of that fact and create tools that reduce the drudgery of maintaining a large project. If you are using version control software, then create housekeeping software to handle tasks like installation and for setting values specific to your sandbox.

Q: Talk about embedding PHP scripts in Web pages.

A: By its nature, of course, PHP is designed to be embedded in Web pages. Strangely enough, this does not mean that this is always the best plan. For larger projects you should try to separate the logic of your application from its presentation. If you place too much application logic in your web pages you will soon find that code being duplicated from page to page, making your project a nightmare to maintain. You can process a browser's request in a central script, and then delegate to a PHP page for output, or you can allow individual pages to handle requests, delegating application logic to library code. In either case you should try to limit the PHP in your HTML pages to that necessary for outputting data and presenting navigation.

Q: How would you setup a secure PHP environment?

A: First and foremost, never trust external input. It often seems a neat trick to use a string submitted via a pulldown input or similar as the basis for a system call or SQL query. You cannot be sure, though, that a request has come from the Web page you designed, it may have been constructed by someone with their own agenda. So using a submitted string as a system command could have disastrous consequences. Test user input before you use it, and at the very least escape it using functions such as sqlite_escape_string() or escapeshellcmd().

As of PHP 4.2, PHP has shipped with the php.ini directive register_globals set to 'off'. This reverses the previous default behavior which caused any fields submitted in a request to be created as a global variable in your script. Although it can be useful to have a 'name' form field automatically appear as the variable $name, for example, it is also fraught with danger. Malicious users may be able to infect scripts by passing them parameters, thereby creating variables that could overwrite your own. By setting register_globals to 'off' in your php.ini file (or with ini_set()), you can close down this danger altogether. Acquire form data with the $_GET, $_POST, or $_REQUEST superglobal arrays instead.

If you are sharing a server with other users, you may also wish to have your session data saved somewhere other than the default directory, which is usually /tmp and is easily readable by all. You can change the session directory with the session.save_path php.ini directive.

Q: Comment on accessing databases.

A: Database code is often problematic. It's easy to write code that ties you to a particular database server (MySQL, for example, or MSSQL). This may not be a problem if you are writing for your server alone, but if you want to ship your code then you may run into problems when different users need to use the application with different databases.

There are various approaches to this problem, but one of the best is to use the PEAR::DB package. This allows you to abstract your database-specific code entirely. If you use PEAR::DB and avoid non-standard SQL, you should find that your project will run seamlessly with many databases.

Q: How would you create dynamic charts and graphs on Web pages?

A: PHP provides a powerful set of functions for working with images and fonts on the fly. Some of these depend upon external libraries, as well as the way that PHP was compiled, but you can test your set up with the gd_info() function, which outputs an array containing configuration information.

Depending upon your installation's capabilities you can use the image functions to create navigation elements, counters, bar charts, and so on. We cover the basics of this in SAMS Teach Yourself PHP in 24 Hours, but you can also find an excellent presentation online at http://www.nyphp.org/content/presentations/GDintro that covers the topic exhaustively.

Q: Describe the management of state information using cookies.

A: Although you can work with cookies directly in your scripts the best way to maintain state information from request to request is by using PHP's session support. HTTP is a stateless protocol, so you need to pass information around from server hit to server hit. PHP's session support makes this all but transparent.

To start or resume a session, begin your script with session_start() and then save or retrieve variables via the superglobal $_SESSION array:

// first.html

session_start();

$_SESSION['name'] = "bob";

// second.html

session_start();

if ( ! empty( $_SESSION['name'] ) ) {

            print "hello, {$_SESSION['name']}";

}

Any elements set in $_SESSION are saved between requests, and available throughout the user's session.

Q: Give us more about creating dynamic Web applications using PHP.

A: Many well-organized Web applications are divided into distinct tiers or layers. Each tier has its own distinct responsibilities. A presentation tier, for example, is responsible for displaying data, and for providing channels for user interaction.

The lines of communication between layers should be limited and well defined, so the presentation layer will have little or no knowledge about the mechanism by which the data it is displaying was acquired.

This dislocation can help make systems robust and easy to change. By making the presentation tier independent of the wider system, we can make significant changes in that system without causing an impact on presentation. By the same token, we should be able to alter an interface to work differently (shifting from HTML to Flash, perhaps, or XML) without being forced to make major changes to wider application logic.

However complicated things might become, you will probably need three basic tiers. The presentation tier, responsible for display; the domain tier where the business of your application gets done, and the data tier where your application handles data storage and retrieval.

When you sketch out your application, consider how your classes and functions might fit into this model. Try to define clear lines of communication between your tiers.

Although using global variables can be an easy way of storing data so that all parts of your system can access information, they tend to undermine the separation of application tiers. Use them sparingly, and concentrate on passing messages through method or function calls.

Q: Can you provide debugging tips?

A: Although I have colleagues who swear by their IDEs and debugging tools, I personally tend to rely on well-defined code in conjunction with unit tests. When I'm bug-hunting I'll print or log messages that confirm the contents of variables at key points in the script, allowing me to trace the movement of data in the script from the start of execution through to the point at which the error first became obvious. Generally it does not take long to zero in on the offending code, especially if I have kept various elements of the project orthogonal (that is, independent of one another).

Another trick that aids the bug-hunt is to make PHP's error reporting stricter than you would normally have it when a project is live. You can do this with the error_reporting() function:

error_reporting(E_ALL);

Setting this level of error reporting will cause the engine to report every footling undefined variable, and might highlight problem areas in your code.

Finally, there are debugging tools available for PHP. You can see them listed at http://www.zend.com/manual/debugger.php.

Q: What additional tips can you give from your writings?

A: 1) If you've been scared off XML in the past by the amount of groundwork you must lay down in order to get good results, you should look into SimpleXML, the new XML extension that will ship with PHP 5. SimpleXML can automatically load an XML document, and present it as an object based data structure. This is much more intuitive than it sounds. Given an XML document like this:

<banana-news>

        <newsitem type="world">

                <headline>Banana sales reach all time high</headline>

                <byline>William Curvey</byline>

        </newsitem>

        <newsitem type="home">

                <headline>Domestic banana use beggars belief</headline>

                <byline>Charles Split</byline>

        </newsitem>

</banana-news>

we can use SimpleXML to access elements with the following simple code:

$simple_element = simplexml_load_file("banana.xml");

foreach ( $simple_element->newsitem as $item ) {

    print "<b>{$item->headline}</b><br />\n";

    print "<i>{$item->byline}</i><br />\n\n";

}

SimpleXML will provide the easiest possible access to XML documents, and it's well worth checking out.

2) When you're debugging your code, don't forget that the print_r() function can be used to dump data in variables. This is particularly useful if the variable you wish to examine is complex: a multi-dimensional array, for example, or an object.

3) The print statement in PHP supports the 'here document' syntax. This involves a user-defined keyword to define the start and end of the string. Everything in between is printed, and variables are replaced as they are in double-quoted strings. This is very useful when you want to print multiple lines in one go, like this:

$num = 42;

print <<<BLOCK

            everything here

            is printed.

            Variables are substituted for their values ($num, for example).

BLOCK;

This can be quicker and easier than concatenating multiple strings into a one for printing.

4) PHP's support for objects has been revolutionized in PHP 5. In addition to abstract classes and interfaces; private and protected methods and properties; the language now supports class type hinting. This allows you to enforce a runtime check of an object passed to a method to ensure that it belongs to a particular type. So if I define some classes to play with:

class Shape {}

class Square extends Shape {}

class Color {}

and create a PHP 5 class intended to work with Shape objects:

class PaintyThing {

    private $shape;

    public function addShape( Shape $shape ) {

       $this->shape = $shape;

    }

}

$thing = new PaintyThing(); 

The addShape() method includes an class type hint: it asserts that PaintyThing::addShape() will only accept objects of type Shape. We do this by putting the type (Shape) in front of the argument ($shape).

So the following code will work without error:

$thing->addShape( new Shape() );

And this code:

$thing->addShape( new Square() ); 

Will also work, because the Square class extends the Shape class. But the PHP engine will throw a fatal error if we try to pass a Color object to addShape():

$thing->addShape( new Color() );

// Fatal error: Argument 1 must be an instance of shape in ...

We can use these hints to make code safer, because the PHP engine enforces the object interfaces passed to our methods. 

Q: Now provide us with those valuable rare “special gems” that only you know.

A: Well I'm just another coder, so there's very little that 'only I know'. One thing that didn't make the book is the Reflection API. 

The Reflection API is to PHP what java.lang.reflect is to Java. It is a suite of classes that model and work with the classes, functions and extension in your script at runtime. So given a class called 'TestMe' we can get an instance of Reflection_class and use its methods to find out about it:

$class = new Reflection_Class( 'TestMe' );

if ( $class->isAbstract() ) {

    print "An abstract class\n";

We can get an array of Reflection_Method objects, that describe the TestMe class's methods:

$methods = $class->getMethods();

foreach( $methods as $method ) {

    if ( $method->isConstructor() ) {

        print $method->getName()." is a constructor";

    }

The Reflection API can give us information about the arguments methods expect, about the properties of a class. We can even use it to find the file that contains a class's source code, and the start and end line numbers for the class or method. This feature will be useful in the creation of documentation tools, class browsers and systems designed to work with third party classes at runtime (calling methods according to a naming convention, perhaps).

You can find the full documentation for the Reflection API at http://sitten-polizei.de/php/reflection_api/docs/language.reflection.html.

Other new features that are worth a look are PHP's new support for object cloning and iteration: these and other Zend engine enhancements are detailed at http://www.php.net/zend-engine-2.php.

Q: What future books can we expect from you?

A: I am working on something that will cover object-oriented PHP, which should be appearing later in the year. I work intensively with other languages, particularly Java, so I'm waiting for the right topic to come along in that field, too.

Q: What are the most important trends to watch, and please provide some recommendations?

A:  I don't know about 'most important', but these are the PHP-related issues I have my eye on at the moment:

  1. Object-oriented code with PHP 5. The changes brought about by PHP 5 will mean an intensified interest in object-oriented design, and some intense debates about best practice.
  2. PEAR, the officially supported library of packages to extend the functionality of PHP, will continue to grow in extent and use. The code in PEAR packages will feed back into the best practice debate.
  3. Although XML support still requires a third party library, the SimpleXML extension should make working with XML even easier than it was. More of my projects have an XML element, and I'm guessing that this reflects a trend.
  4. PHP is an HTML-oriented language; it's even part of the name. Now that PHP is always available on the command line, I'm expecting to see more use made of it as a non-Web tool. I'm currently working on a command line utility or two. Another very interesting non-Web application for PHP is PHP-GTK (http://gtk.php.net/).
  5. Now that SQLite is bundled with PHP, we can expect to see more applications using the library. When creating applications at Corrosive we will be developing with both SQLite and MySQL in mind, and we will be using PEAR::DB for our database code.

Q: What are your top recommended resources for both businesses and IT professionals?

A: Well, I can tell you what I've been looking at lately: 

1) http://php.net and http://www.zend.com. These might sound like obvious choices, but the quality of documentation and news on all the official PHP sites is so high that I find myself consulting one or other of them on a daily basis.

2) The Register http://www.theregister.co.uk: I get a lot of general industry news and not a little entertainment here. I particularly enjoy the sarcasm, and where it's partial I often find myself sharing the prejudice.

3) http://c2.com/cgi/wiki: If you're serious about objects and design, you will spend a lot of time here sooner or later.

4) http://www.phppatterns.com: A good resource in itself if you are interested in PHP and object-oriented design. It is made even more useful by the quality of the links included in articles. A ten minute lunchtime read can end up taking half the afternoon, causing much fevered re-evaluation of received wisdom along the way.

Q: What kind of computer setup do you have?

A: I'm typing this using vi on a remote server running an ancient Red Hat distribution. The notes will find their way back to my laptop via CVS. The laptop is a noisy Compaq Presario 2500 running Red Hat 9. The final copy will be entered using Open Office. I have various end-of-life Macs and PCs running either YellowDog or Red Hat. The macs generally run OSX as well. We have a laptop running XP in the living room for testing,  games and kids Websites.

Q: If you were doing this interview, what two questions would you ask of someone in your position and what would be your answers?

A: Q1: Why do you write?
A1: One of the best things about writing is the time one can legitimately spend in coffee bars. I live in Brighton on the English coast, and it specializes in cafes. On a more serious level, I have always done it – it seems natural to do so. Perhaps it flows from teaching. As I prepare course notes, I find I fix my understanding of the topic I am teaching, and over the years it has become second nature to write, even when I'm not teaching the subject in question.

Q2: You mention that you specialize in open source and open standards software. Is there a rationale for that?
A2: I was using open source software before I fully understood what it was. I started working with Perl running on Linux, then I downloaded MacPerl, and worked with it there. I learned how to configure Apache. When I finally read Steven Raymond's 'The Cathedral and the Bazaar', I was amazed and excited. That so many people could organize naturally to produce such fantastic products seemed a miracle. We happen to have most experience of open source tools and environments so that's a strong reason for sticking there, but we also like the freedom it gives us as developers. We don't want to be tied into platforms that demand expensive and recurring license payments from us in order that we continue programming, and we don't want to be forced by one proprietary vendor to work with the products of another. As developers we think that releasing core code under an open source license can only benefit us and our clients. A library will mature as other developers contribute to and work with it. So clients benefit from continuing development while we are freed up to focus on their specific needs. As far as open standards are concerned: after seeing the disaster that proprietary enhancements made of HTML in the late 90s, we like standards, and we want to see them complied with.

Q: Matt, thank you again for your time, and consideration in doing this interview.

A: Thank you. It has been an interesting experience!