Stephen Ibaraki: Interviews with leading business and IT experts

http://www.cips.ca

CIPS CONNECTIONS

INTERVIEWS by STEPHEN IBARAKI, FCIPS, I.S.P., ITCP/IP3P, MVP, DF/NPA, CNP, FGITCA

Chat with Leslie Lamport ACM Turing Award Recipient in 2014 (Nobel Prize of Computing); World-Renowned Distinguished Researcher

This week, Stephen Ibaraki has an exclusive interview with Leslie Lamport.

Leslie Lamport first started working with computers using vacuum tubes to build digital circuits while attending the Bronx High School of Science in New York. He went on to get his BS from MIT, MA and PhD from Brandeis, all in Mathematics. In addition, Lamport has been bestowed with honorary doctorates from France's Université de Rennes in 2003, Germany's Christian-Albrechts-Universität zu Kiel in 2003, Switzerland's École Polytechnique Fédérale de Lausanne in 2004, the Università della Svizzera Italiana in 2006, and France's Université Henri Poincaré in 2007. Turing Award

His first employer was the Mitre Corporation and Marlboro College while still a student, then Massachusetts Computer Associates while finishing his PhD and after completion. From 1977-1985 Leslie contributed to SRI International, 1985-2001 DEC/Compaq and 2001-present Microsoft Research.

Leslie's honors include:

National Academy of Engineering (1991)
PODC Influential Paper Award (2000)
Honorary Doctorate, University of Rennes (2003)
Honorary Doctorate, Christian Albrechts University, Kiel (2003)
Honorary Doctorate, Ecole Polytechnique Fédérale de Lausanne (2004)
IEEE Piore Award (2004)
Edsger W. Dijkstra Prize in Distributed Computing (2005)
Honorary Doctorate, Università della Svizzera Italiana, Lugano (2006)
ACM SIGOPS Hall of Fame Award (2007)
Honorary Doctorate, Université Henri Poincaré, Nancy (2007)
LICS 1988 Test of Time Award (2008)
IEEE John von Neumann Medal (2008)
National Academy of Sciences (2011)
ACM SIGOPS Hall of Fame Award (2012)
Jean-Claude Laprie Award in Dependable Computing (2013)
ACM SIGOPS Hall of Fame Award (2013)
ACM A.M. Turing Award (2013, awarded in 2014)

Leslie is a legend in computing circles as evidenced in his significant body of published work. His foundational work in the theory of distributed computing is acknowledged by computing luminaries worldwide including ACM President Vint Cerf — a previous Turing Award recipient; Wen-Hann Wang, Intel's Corporate Vice-President and Managing Director of Intel Labs; Alfred Spector, Google Vice-President of Research, Microsoft co-founder Bill Gates. One of the most cited papers in computing history is his 1978 paper Time, Clocks, and the Ordering of Events in a Distributed System. From the IT World article, "the Turing Award citation notes that Lamport originated causality and logical clocks, replicated state machines and sequential consistency. Along with others, he invented the notion of Byzantine failure and algorithms for reaching agreement despite such failures; he contributed to the development and understanding of proof methods for concurrent systems, notably by introducing the notions of safety and liveness as the proper generalizations of partial correctness and termination to the concurrent setting."

As noted by Bob Taylor, founder and manager of the Xerox Palo Alto Research Center and Digital Equipment Corp.'s Systems Research Center, "The Internet is based on distributed-systems technology, which is, in turn, based on a theoretical foundation invented by Leslie. So if you enjoy using the Internet, then you owe Leslie."

To listen to the interview, click on this MP3 file link

The latest blog on the interview can be found in the IT Managers Connection (IMC) forum where you can provide your comments in an interactive dialogue.
http://blogs.technet.com/b/cdnitmanagers/

DISCUSSION:

Interview Time Index (MM:SS) and Topic

:00:38: When did you hear of this extraordinary honour, recipient of what is widely considered the Nobel Prize in Computing, the 2013 ACM Turing Award? What was the reaction from your colleagues and your family?
"....I received tons of congratulatory emails. It hasn't had any time to affect how they act with me — I don't expect it to have an effect...."

:01:08: How will the Turing Award impact your work, your influence, your thinking?
"....I hope that it might get people to take what I say a little more seriously, especially now when I'm devoting a good part of my activity trying to show people how they should go about programming...."

:01:45: Can you describe what were the catalysts early on to drive your passion towards science? Even in your early history you were building digital circuits in vacuum tubes. What made you do all of those things?
"....An interest in digital circuits came about, if I recall, from a book I just randomly picked out from a library shelf, a book about the mathematics of boolean circuits and that got me and a friend going around scrounging vacuum tubes and trying to build little digital circuits...."

:02:49: Your Paxos algorithm is widely used today and you wrote Paxos made Simple to explain the simplicity of the algorithm. Can you talk more about this Paxos algorithm?
"....I suspect that in five or ten years people will look at it and say 'boy that really is simple, almost obvious, why did he get a prize for doing that'? But that's the way that science progresses....It's an algorithm that is used to implement full power in distributed systems. It's sitting down at a fundamental level that's independent of what the system does and it's a tool that you can use to build lots of different kinds of distributed systems...."

:04:07: Can you talk about how Paxos has influenced Microsoft Research and maybe some of the products that are out there?
"....I'm not sure it has influenced Microsoft Research any more than it has influenced any other part of computer science. People who were interested in distributed computing will know the algorithm and some of them have the interest in modifying it, extending it, handling special cases and all that sort of thing. As far as products, basically any distributed full powered system that people build these days is most likely going to have Paxos in use somewhere inside of it...."

:05:21: Due to the profound nature of the work as you mentioned it's in Windows Azure storage and I guess in the Rest Availability Proxy and the Cosmos data storage and query system. Your work in general has been applied to the Windows server transaction protocol and the modeling in the Oslo platform for model driven applications which was inspired from your work in temporal logic of actions, so I can see all the implications of it within the organization.
"....The way Paxos works, where there are no failures the algorithm is essentially what any reasonably good engineer would have written, the way they would have approached the problem. The difference between using Paxos and not using Paxos is in the engineering system, but in the engineering world if you ask an ordinary engineer or programmer to program a distributed system they'd say you program something that looks like Paxos in the absence of failures and then you would just write some code that would somehow try to handle the failure situation and most likely it would handle some failures and wouldn't handle other failures. What Paxos does is give you a complete algorithm that tells you what to do in any case. If you build your system on top of Paxos then you know that it's not just going to handle the ordinary cases or the cases you are able to think of, you know that it will do the right thing no matter what happens, even in those cases that you wouldn't think of...."

:08:12: I'm thinking of your paper on Time, Clocks and the Ordering of Events in a Distributed System and how that came about, and where you wrote an introduction to a paper called the Maintenance of Duplicate Databases by Thomas and Johnson and you came to some realization that it did not preserve causality and that led you to this idea of Time Stamps and how could it be applied to Complex systems such as banking, airlines and reservations systems. Can you talk more about how that came about – this Time Clocks, Ordering of Events in a Distributed System?
"....I was not writing the introduction to the Johnson and Thomas paper. I received a copy of their technical report and that led me to think about the problem and understand two things. First, they had not gotten it quite right in the sense that their algorithm could seemingly violate causality (that one command issued to the system might have affected a second command, but the way that they ordered the command in their algorithm the second command could have appeared or been executed first before the first command apparently violating causality). I was able to fix that. The second thing I realized is that the algorithm didn't apply just to the particular problem they were solving (that it could be applied to basically implement any distributed system in the same sense that Paxos can be used to implement any distributed system), the difference is that in the Times, Clocks paper there is an assumption that no process failed and the later Paxos algorithm of course considered the possibility of failure...."

:10:23: I guess the applications of these would be things like the banking and airline reservations systems that are out there?
"....That could be an application, on the other hand if you look at banking and airline reservations systems they have to tolerate failure so they would probably be using something like the Paxos algorithm....What was explained in the Time, Clock paper is how you could implement this arbitrary state machine in which things happen one at a time and implement it in a distributed system where you have commands coming from different places...."

:12:06: Another piece of important work was this bakery algorithm as described in your "A New Solution of Dijkstra's Concurrent Programming Problem". Can you talk more about this, the problem of tackling mutual exclusion?
"....Bakery algorithm is theoretically very interesting in the sense that it was the first mutual exclusion algorithm that did not depend on some lower level mechanism, but it's still mutual exclusion. Mutual exclusion means that the mutual exclusion problem is given a system of thoughtful processes; each one of them has some sort of command that wants to be able to execute, to have them synchronized so that no two of them are executing their commands at the same time....It's a very nice algorithm so it makes a nice example that people have used a lot, and it's interesting because of this property of being the true mutual exclusion algorithm that doesn't have any practical impact, any practical use these days...."

:13:52: Another area that's very much famous with this is the Byzantine Generals problem regarding triple modular redundancy. Leslie gets into more detail about resolving the issues and the idea of signatures and so on.
"....After I wrote the Time, Clocks paper the next step was to handle failures....I extended the Time, Clocks work to handle arbitrary faults and delayed the algorithm I came up with which makes use of digital signatures. At the time hardly anybody knew about digital signatures....I had an algorithm that extended the Time, Clocks algorithm and it was published in a little known Journal and almost nobody saw that paper. Shortly after doing that work I moved to SRI and the group at SRI had a NASA contract for building a highly reliable distributed network of computers for flying an airplane. People had discovered that the commonly used algorithms for trying to achieve reliability didn't work in the presence of completely arbitrary failures, particularly that the triple modular redundancy were to handle one failure in three computers and just have them vote on the output, and the idea being that as long as two of the computers were correct they would produce the right output and therefore the vote would choose the right value regardless of the what the third computer did. People at SRI realized that didn’t work in the face of arbitrary failures and if you look carefully at the problem you'll discover that the triple modular redundancy doesn't work in the presence of that. Their algorithm didn’t assume a digital signature and in fact their proof that three didn't work they needed four didn't handle the case where they had digital signatures, so my algorithm actually could handle one failure with just three computers. At any rate we published a paper containing their contribution and my contribution and I realized that was a very important result. The whole nature of the problem, the realization that a computer telling different information to different other computers makes certain things impossible, and I decided that it really needed a nice story to go with it to make it popular so I phrased the problem in terms of Byzantine Generals...."

:19:45: It's a fascinating history and story on the Byzantine Generals and what you were able to achieve and the story behind it. Can you get into more detail about that?
"....I should say that I learned about the value of a story from Dijkstra (a great computer scientist and he started the whole field of concurrent computing)....So when I devised the Paxos algorithm, I thought this is a nice algorithm, this is important, I need a story and so I created a story. Unfortunately that story was not a very good story and it just confused the heck out of people and nobody would have understood the algorithm....Except Butler Lampson read the paper and he got it right away and he went around lecturing telling people that's the way they should build their reliable distributed systems...."

:21:56: You have this TLA and this TLA+ specification language and the PlusCal algorithm and the tools associated with it. You also wrote a book published in 2002 about specifying systems using the TLA+ language and tools for hardware and software engineers. Can you give an overview of the reason behind it and the value?
"....I realized I've come to understand that a lot of the problems (certainly in programming distributed systems, but also in any kind of programming), really have to do with the high level design of what you should do before you start proving. That is, you need to think carefully about what you're doing and write down what you're doing before you start implementing....Through what in retrospect seems like a surprisingly long path, I came to a method of writing specifications that's essentially based on ordinary mathematics (the kind you learn in math classes, sets and functions and things like that), and that language is TLA+. Thanks to some other people (especially my colleague Yuan Yu who wrote a model checker for TLA+), there are tools that designers can use to check their designs or algorithm designers can check their algorithms before they start coding them...."

:25:56: You have this document preparation system that you created. Can you give a little bit of history behind that?
"....I was going to write a book and typeset it myself and I decided that TeX was the right typesetting system to use, but to use TeX I would need a set of macros. For example you would write a macro to produce a bulleted list, so I decided to write a set of macros that I would need in my book and I figured with a little extra effort I could make those macros usable by other people and that was the set of macros that I created for LaTeX...."

:27:10: What are your current research interests?
"....Most of my work is around TLA+ and largely writing about how people should use it, or not so much about using TLA+ as trying to get people to think mathematically about what they are doing because I believe that's the best way of thinking rigorously is using mathematics....Most of the research involved in TLA+ was done years ago, but there's still some things to do. One of the projects that I'm involved with is a real research project which is building a theorem prover for TLA+...."

:29:50: What are the broader implications and applications of your current work?
"....The application of my current work is the TLA work. Regarding the language TLA+, I know that one domain of applicability is in the design of distributed systems and there I can very confidently say that if you are building a large distributed system and you want it to work right and you use TLA+ on it, it will really help you....But the broader message I have I think is applicable to everyone who builds computer systems whether they are little programs or large healthcare systems: it's important to think before you build and to think above the code level and the best way to think rigorously is to think mathematically...."

:32:02: You've done a lot of research over the years and some of the most recognized research ever. There must be some lessons you can distill from doing all of that - can you share any that you may have learned?
"....That's a very difficult question because when I look back at my career I can look at any particular piece of work that I've done and say 'That was just a stroke of luck!'....Looking back it's clear that it couldn't have been just good luck because nobody is that lucky — there's just too many things. What it is that allowed me to have so much more luck than most other people I don't really know. The only thing I can think of is my background in mathematics and the fact that I'm very good at thinking rigorously and mathematically...."

:35:00: Leslie Lamport talks about an important quality that is very important for a researcher to have.
"....There are some problems that are just innately complicated and ugly and don't have any nice solutions, and one of the abilities one needs as a researcher is to be able to tell whether a problem is something that can have a nice elegant solution or is one of those ugly tar pits that working on will get you nowhere. I think I was pretty good at that and at least I didn't waste too much time in tar pits. I don't know how many great problems I missed because I incorrectly thought they were tar pits, but that is one ability that a researcher needs...."

:36:37: You've encountered many people in your past and present, can you name one or two or more that stand out who really inspired you to become the computer scientist that you are?
"....The people who I think influenced my thinking the most, I would single out Richard Palais who was my thesis advisor and who has become a colleague and a friend....Edsger Dijkstra, not as a mentor but through his writings he set a standard of clear thinking that I naturally adopted....I learned a lot from the many colleagues that I encountered throughout my career. I counted that and I had the good fortune to have about a dozen Turing Award winners who I would consider as colleagues in the sense of the people I've had significant technical conversations with...."

:38:42: There's the ACM and they have many resources that have supported your work in some ways so which kinds of assets do you find are the most valuable for your work from the ACM?
"....Throughout my career the ACM Journals have pretty much been the pre-eminent ones....The ACM has influenced me through their Journals and the conferences...."

:39:49: If you were to summarize your long and distinguished career is there a lesson you'd want to share to the broad audience?
"....I think one thing that has contributed a lot to my success is my ability as a writer. This is not so much as good writing, but it's the ease with which writing comes. I can sit down and write and it's a natural thing for me to do....So the one lesson I would want to teach probably everybody in your audience is that writing and writing well is important and takes practice and you should be practicing it all the time....A more fundamental piece of advice is to be a craftsman; that is, everything you do you should try to do well. If you are a programmer you should try to write a beautiful program and if you are writing an email you should try to write a well written piece of email...."

:42:49: Leslie, with your demanding schedule, we are indeed fortunate to have you come in to do this interview. Thank you for sharing your substantial wisdom with our audience.