Stephen Ibaraki: Interviews with leading business and IT experts

http://www.cips.ca

CIPS CONNECTIONS

INTERVIEWS by STEPHEN IBARAKI, FCIPS, I.S.P., ITCP/IP3P, MVP, DF/NPA, CNP, FGITCA

Chat with Jeff Dean ACM Infosys Foundation Award recipient in 2013 (Computing's Top Prize for young innovators); World-renowned researcher — Part 2

This week, Stephen Ibaraki has an exclusive interview with Jeff Dean.

Jeff Dean joined Google in 1999 and is currently a Google Fellow in Google's Systems Infrastructure Group. He co-developed the MapReduce computational framework and is a co-designer and co-implementor of heavily-used distributed storage systems, including BigTable and Spanner. He co-designed and implemented five generations of Google's crawling, indexing, and query serving systems, as well as major pieces of Google's initial advertising and AdSense for Content systems. He has also worked on large-scale machine learning and machine translation software and has also designed and implemented many of Google's low-level software libraries and developer tools.

Prior to joining Google, Jeff was a researcher at Digital Equipment Corporation's Western Research Laboratory where he worked on optimizing compilers, profiling software and hardware, and information retrieval algorithms for the web. He has a Ph.D. and M.S. in Computer Science from the University of Washington and a B.S., summa cum laude, in Computer science and Economics from the University of Minnesota.

ACM Infosys Foundation Award announcement:
http://www.acm.org/press-room/news-releases/2013/infosys-award-12

To listen to the interview, click on this MP3 file link

The latest blog on the interview can be found in the IT Managers Connection (IMC) forum where you can provide your comments in an interactive dialogue.
http://blogs.technet.com/b/cdnitmanagers/

DISCUSSION:

Interview Time Index (MM:SS) and Topic

:00:43: When did you hear of this extraordinary honour, recipient of what is widely considered the top prize in computing for younger researchers, the 2012 ACM Infosys Foundation Award? How did you feel at the time? What was the reaction from your colleagues and your family?
"....I was quite honored and very pleased especially to win it with Sanjay because we've worked very closely together over the years. A number of colleagues sent me congratulatory emails, which is nice and my eldest daughter who wants to be a computer scientist, thought it was pretty cool...."

:01:20: How will the ACM Infosys Foundation Award impact your work, your influence and your thinking into the future?
"....I think it's always nice to be recognized for the work you've done so I'm very honored and pleased that the community has recognized the work that Sanjay and I have done...."

:01:38: What are your broader life goals that you want to achieve and how will you achieve them?
"....There are a number of different areas where computing can be used to augment what people are able to do to make certain things easier, and to complement human capabilities with things that computers do really well. In general I'm interested in a broad set of problems along those lines...."

:02:17: How did your 2004 research paper MapReduce: Simplified Data Processing on Large Clusters resolve the core computing challenge of scale in keeping up with the demand for a search service?
"....The origin of the MapReduce paper and work that Sanjay and I were working on was redesigning the core indexing and crawling system for Google search services....Prior versions that we worked on had essentially hand parallelized different phases of the indexing process....We kind of squinted at all these different phases and looked at what common interface we could put in so that we could have a nice separation of a library that deals with a lot of the messy issues of automatically parallelizing across lots of machines dealing with fault tolerance, failures and have a nice abstraction for people to write these parallel computations in...."

:04:15: You described the way software can solve what seem to be hardware problems and address or produce the underlying designs that hide the complexities of managing enormous clusters of computers from programmers. Can you get into more detail?
"....Google's computing operations have always been composed, at the hardware level, of lots and lots of relatively modest-sized computers that are themselves not particularly reliable.... So we end up writing software above the hardware level that takes a whole bunch of these machines and allows you to treat them as a reliable cluster of machines even if individual machines fail...."

:05:41: One can say that your work led to the advent of cloud computing or at least many aspects of cloud computing where computing power is provided as a utility to consumers, with all the hardware and implementation details abstracted away. Can you comment on that?
"....I think that kind of sharing of underlying resources across a bunch of disparate applications, jobs and even independent users or groups of users is pretty important when you start to get into large scale because you don't want those resources to sit idle...."

:06:52: Jeff gets into some of the underlying design concepts of MapReduce.
"....The main idea in MapReduce is the abstraction the programmer writes is they typically write two fairly simple functions. They write a Map function which processes a record of some input data and extracts something of interest from it....The Reduce function would be given all the data that I've collected together for a particular MapTile....then lets me go ahead and render that MapTile....And it turns out that if you squint at a lot of problems in the right way you can actually fit a surprising large number of different kinds of computational problems in this framework...."

:10:20: How about handling failures?
"....Because all the executions are essentially not modifying an existing state, but just producing a temporary state on disk or in some data structures it's fairly easy to throw part of it away and replace it with a newly executed version stored on some other machine...."

:11:55: You've also done some work on file systems working with huge files to be efficiently distributed across thousands of servers. Can you talk more about that?
"....Once you have a large cluster of machines there's a few things that you want to be able to do....You'd like it to be the case that the file system can deal with failures of individual disks somewhat in a similar way to the map produced master dealing with failures of map produced computations. You'd like the file system master to notice that a particular machine had failed understand what data was stored on its disk and be able to recover from that....."

:13:04: Can you now discuss some of the concepts behind Big Table?
"....Big Table was a system that Sanjay and I and a bunch of other people worked on to essentially give you a high level interface. It's called Big Table because you can kind of view it as a 3-dimensional spreadsheet in some ways...."

:16:04: There are limitations and probably compromises you made with your file system and you solved those problems by incorporating Big Table. Can you talk more about that?
"....The main observation is that you probably want different abstractions at different levels for different kinds of uses...."

:17:42: You have GFS evolving into Colossus, where does it go from there?
"....Colossus is essentially GFS with a much more scalable master for the metadata in the file system....We have a newer system we've worked on called Spanner that essentially is a higher level interface more like BigTable, but allows an abstraction that spans multiple datacenters in different geographic locations...."

:18:41: Jeff shares some additional ideas from the design concepts behind Spanner.
"....One nice property of Spanner is that it builds replication in at a fairly low level...."

:19:52: For example, you have a situation where you have a single machine or a couple of machines or a small cluster and there are issues about failures or performance. What happens when you scale this on a global level? What is the impact of scale on otherwise easy problems?
"....Most of our large scale systems do have to deal with failure and deal with it at a fairly integrated level within the abstraction they are trying to provide. MapReduce deals with failures because when you are running large computations on thousands of machines you are going to get some failures. BigTable deals with failures in similar ways. (If tablets ever fail the master has got logic to go and deal with whatever all the tablets of that master was serving.) That's a nice property, that the underlying abstraction can worry about the failure and that frees higher level users of that system to not have to worry about it in their use of the system...."

:21:42: I guess there's a link to performance as well?
"....When dealing with failures you want recovery to be very fast..."

:23:22: Your co-winner of the award is Sanjay Ghemawat. What is your working style when the two of you collaborate?
"....When we're working together we often write code together at the same screen and I find that we're pretty effective because we often have the same view on how to solve a particular problem and when we don't, we throw out a bunch of ideas and we each pretty quickly understand the pros and cons of what the other person is suggesting and can do a pretty good job of assessing what is the right decision to make here. We generally don't have big disagreements, we have data-driven discussions and eventually we settle on one thing...."

:25:23: In what ways does internet growth change what you do?
"....There are large amounts of data and also large amounts of computation you want to apply to that data in order to do interesting things with it...."

:27:26: Jeff you have an incredible research history, very extensive, you've worked on a lot of things which will have a lasting impact. Are there some valuable lessons you wish to share from some of your top research areas in the past?
"....One of the things I enjoy doing is finding new areas to work in and bring the skills that I do have to apply to problems in some other area, because I find that is a really effective way of collaborating with other people who have skills that are complementary to the ones that I have...."

:28:46: What are your current research interests and your future research interests and their broader implications and applications?
"....For the last year and a half or so I've been working and collaborating with a bunch of people on essentially scaling up large-scale machine learning systems...."

:30:33: All of these bits and pieces: Google Now, Knowledge Graph, this Star Trek concept, Kurzweil and this idea of singularity — what do you think about all of this?
"....I think it's kind of a natural evolution of how Google's services should be used...."

:33:45: What do you see as the most difficult challenge in research and what are the areas of controversy?
"....I think having machines do a really good job at understanding natural language — that is a pretty fundamental problem....There is controversy about people worrying about computers looking over their shoulders a lot and understanding that's too much from a privacy perspective...."

:36:05: What do you see as the big challenges and questions and or problems that will get young people interested and into computing? What do you think will be the big topics in 10 years? What do you think this research will look like?
"....I think in 10 years there will still be huge unanswered questions and interesting research to do in the area of building truly intelligent computing systems....It's really an exciting field because you can have a group of 2, 3 or 4 people produce something that is used by hundreds of millions or billions of people using your software and benefiting from the capabilities that it has. There are very few careers where small groups of people can have that kind of influence..."

:38:07: What are the things that have inspired you or created challenges for you in your education at the University of Washington and the University of Minnesota that were inflection points in your lifetime of contributions, and then extend that to your work experiences and perhaps to your mentors?
"..... I think that was a freak occurrence that I ended up stumbling across this source code that this person had published. That got me into programming because then I could modify the program to do what I wanted and I would learn a lot this way....At the University of Minnesota in my senior year I took a Parallel Computing class as an elective from Vipin Kumar who was a new professor at the time (he is now Department Chair). That got me interested in the field of parallel computing....Everyone I've worked with has taught me something. That is something you want in a career — interesting people around you that have different skills so that you can pick up what they know...."

:43:48: This is a pretty broad question. You choose the topic area. What do you see as one, two or three top challenges facing us today?
"....Figuring out ways to get the entire world connected to the internet, and along with that have everyone in the world have access to high-quality educational materials....Individualized medicine where we are able to understand a tremendous amount of past data and being able to look at outcomes from analysing large datasets from people with similar conditions to the ones you have....If we can build systems that can learn from experience and can learn very quickly and do a good job of understanding perceptual data and natural language that's going to really open up a lot of interesting applications that don't exist today purely because the underlying capabilities aren't good enough...."

:46:50: Jeff, with your demanding schedule, we are indeed fortunate to have you come in to do this interview. Thank you for sharing your substantial wisdom with our audience.

Music by Sunny Smith Productions and Shaun O'Leary