Canadian Information Processing Society (CIPS)



Chat with Sanjay Ghemawat ACM Infosys Foundation Award recipient in 2013 (Computing's Top Prize for young innovators); World-renowned researcher — Part 1

This week, Stephen Ibaraki has an exclusive interview with Sanjay Ghemawat.

Sanjay GhemawatSanjay Ghemawat has been at Google since 1999. He is currently a Google Fellow in the Systems Infrastructure Group. He has worked on many distributed systems (MapReduce, BigTable, GFS, Spanner), performance tools, indexing systems, compression schemes, memory management, data representation languages, RPC systems, and other systems at Google.

Prior to Google Sanjay was a researcher at Digital Equipment Corporation's Systems Research Center, where he worked on Java virtual machines, optimizing compilers, and profiling systems. He has a PhD and M.S. from MIT, and a B.S. from Cornell, all in Computer Science.

ACM Infosys Foundation Award announcement:

To listen to the interview, click on this MP3 file link

The latest blog on the interview can be found in the IT Managers Connection (IMC) forum where you can provide your comments in an interactive dialogue.


Interview Time Index (MM:SS) and Topic

:00:48: When did you hear of this extraordinary honour, recipient of what is widely considered the top prize in computing for younger researchers, the 2012 ACM Infosys Foundation Award? How did you feel at the time? What was the reaction from your colleagues and from your family?
"....I know some of the past winners and I was honored to be associated with them. I also feel very grateful especially to my co-recipient Jeff who has been responsible for a lot of the hard work behind this award and all of our colleagues who wished us well and worked well with us over the years and helped us with all the systems we have built. My family was very excited, especially my parents...."

:01:47: How will the ACM Infosys Foundation Award impact your work, your influence and your thinking into the future?
"....My biggest hope is that I will continue to get the chance that I've already had to work on great problems with great colleagues and hopefully more interesting problems in the future...."

:02:16: What are your personal life or career goals that you want to achieve and how will you achieve them?
"....I hope I can continue building systems, programming, working with Jeff and finding challenging things that require a lot of thought behind them to work on in the future...."

:03:06: How did your 2004 research paper MapReduce: Simplified Data Processing on Large Clusters resolve the core computing challenges of scale in keeping up with the demand for a search service?
"....What MapReduce does is that it takes all of the common pieces out of writing a problem and kind of abstracts things away so that the programmer can actually focus on the bits of the problem that they're interested in, and MapReduce takes care of all of the mechanics of partitioning the data and having it be processed on hundreds of computers in parallel...."

:04:23: In what way can software solutions solve what seem to be hardware problems?
"....Hard drives will crash sometimes, computers will stop working...what our higher-level software does is hides those problems from the programmer by managing failures internally...."

:05:59: What are the underlying designs that hide the complexities of managing enormous clusters of computers from programmers?
"....What should happen is that the software system should provide an abstraction which hides the details of that location from the programmer and internally manage it....This is very helpful because once you provide this level of location independency you can solve a whole bunch of other problems...."

:07:30: How has your work led to the advent of cloud computing, where computing power is provided as a utility to consumers, with all the hardware and implementation details abstracted away?
"....I think a lot of things have come together which have made now the right time for cloud computing to take off. Where our work has contributed a fair bit is that our work has focused on providing practical access to large computing resources such as large amounts of storage, large amounts of computation for dealing with very large data problems...."

:09:11: How do these abstractions make it easy for programmers to take advantage of large-scale computational resources by automatically parallelizing computations across machines and transparently handling failures?
"....Proportionally complicated as the complexity of the problem that the programmer is focusing on goes up, MapReduce itself takes care of the other work, the common work across all of these problems (which is partitioning the workup, collecting the results, handling failures, managing I/O, etc.)...."

:12:30: Can you share the thinking behind a file system that allows huge files to be efficiently distributed across thousands of servers?
"....What GFS (the file system work that you mentioned) was designed to do was to automate the placement and management of the data replicas....We took some shortcuts — we said let's take all the metadata for all of the files that is going to fit into one memory into the memory of one machine, have that machine handle the placement of all of the files on other machines...."

:15:06: You already talked about some of the concepts behind MapReduce. Can you drill even deeper about what are some of the underlying design concepts behind it?
"....Some of the concepts, the Map and Reduce function go back to functional programming many decades ago. The similar functions have been present in functional programming languages for a while and people have explored using them for parallelism before...."

:16:24: You also worked on something called BigTable. Can you share the design concepts behind it?
"....Our earlier file system work was focused around a relatively small number, say a million large files, and we would manage the metadata of these files in one machine. BigTable was motivated by places where this assumption fell down....So now what we have done is we have taken a large collection of very fine-grained pieces of data and spread them across a lot of machines, and have a system take care of management of the read and write operations we want to support for these pieces of data...."

:18:19: What is the impact of scale on otherwise easy problems (e.g., failures, performance)?
"....Suppose your program is going to have to scale up to run on a thousand machines. At some point you can't ignore failures anymore just because they happen enough in a large cluster of machines that your program will never finish if your strategy with dealing with failure is to restart everything from scratch when anything fails....What happens is that even though the average machine responds very fast, the slowest machine for any given query is going to be much slower so your total query-handling time, as far as the client is concerned, is going to be pretty poor because itís going to be gated by the performance of the slowest machine...."

:20:28: How does your close relationship with real products influence your research decisions?
"....The big challenge here is to make sure that the research you do, the systems you build to handle these problems are not too focused on the immediate problem. You have to step back and abstract a little bit so what you design for this particular product or problem that you are running into is that the design itself is more general and is longer lasting than what you need for just this product...."

:21:56: You've talked about this relationship between real products and your research. Can you name some of the products that you've influenced and in what way?
"....One of the key pieces of web search is the query serving system....We took a step back and said what are the patterns that are in all of these programs that we are writing, and the result of that was the MapReduce programming model. We were able to formalize the common parts away so that the problem specific parts could be extracted into simple Map and Reduce functions and everything else could be written once and then reused for different combinations in the Map and Reduce functions...."

:23:44: You've touched on this but let's drill further. Why are failures the hard part of distributed systems?
"....Let's go back to MapReduce....You run the computation for A on one machine, you run B's computation on another machine. You don't allow them to communicate with each other directly and if one of them fails you can restart it without having to worry about restarting other pieces of work. Of course there are other mechanical things you will need to add to MapReduce to deal with failures....All of these things add up and they add a fair bit of complexity to a system like MapReduce, but one of the goals of MapReduce is to take on all of that complexity and at least remove it for the programmers who are using it...."

:27:01: Your co-winner of the award is Jeff Dean. What is your working style when the two of you collaborate?
"....Jeff and I have been working together for many years and I love working with him. What we do is we typically sit at the same desk and we program....We argue the pros and cons of different approaches, spot mistakes that might be creeping in....very, very fast communication and back and forth...."

:28:48: In what ways does internet growth change what you do?
"....The systems we've been building are fairly practical systems, practical in the sense that we will make tradeoffs based on reality and assumptions about different numbers....You build systems for one scale, then a few years down the road the scale has changed by a factor of 10 or 20, and at that time you start having to think again about whether we should redesign, re-implement or if we can extend the lifetime of our system...."

:30:40: You talk about this updated system; can you talk about the concepts behind this updated system?
"....This updated file system is not my work. I was busy working on some other systems, but the design was in some sense fairly similar to GFS design....The metadata for Colossus (the new file system), is also spread across many machines so therefore now you can scale the system up so that it can handle hundreds and millions of files easily...."

:33:00: How does the work you've done extend to other domains (for example other fields even beyond computing)?
"....This is the thing that we had to be a bit careful about when we were designing it, to make sure that we took enough of a step back away from the search specific problems that we were trying to solve to come up with something which could extend to other places. It has been very helpful to us to have done that...."

:35:07: What are the practical applications and implications of your work influencing our daily lives?
"....Most of the systems we build are never directly exposed to end users....But there are some impacts that we can point to. For example, when Gmail was initially launched many years ago, it guaranteed a gigabyte of storage per user. It was launched on April 1 and most people thought this was an April Fool's joke because people couldn't believe that we would be able to afford giving a gigabyte of storage to each user....This was made possible because we had worked hard on the infrastructure to manage this large amount of data efficiently...."

:36:44: Can you additionally profile your extensive research history, its lasting impact and any valuable lessons you wish to share from your top research areas?
"....It kind of goes against something people often say which is 'donít reinvent the wheel', but sometimes if you are having problems with your wheel you have to step back and think about how do I fix the problem, and sometimes reinventing the wheel is the right thing to do...."

:39:25: What are current research interests?
"....If you look at Google and all of the services that Google provides, behind them there are a whole bunch of other internal services. User-facing services are composed out of these lower level services, thousands of them....I'm trying to figure out what is a good way to simplify the deployment of such a collection of internally related services. There really isn't anything concrete there yet....Some compression schemes, some domain-specific data that we have but again it's preliminary work and there's not much more to say at the moment...."

:40:59: Can you extrapolate some of the broader implications and applications of your current work?
"....We are running into complexity and management issues and we are trying to figure out a simple way of dwindling the burden of this management for people...."

:42:22: Both as part of this Google team, but also if you were able to go in any direction, what would be your future research interests?
"....It's a tough question for me to answer because I've always had a tough time formulating and working on a problem in abstract. I am much better if I can look at problems that either I'm encountering or people around me are encountering and take a step back from that and have the research fall out of the real problems that we are running into...."

:43:52: The Knowledge Graph....Google Now....Google User Research Labs....What is your thinking on these areas?
"....You mentioned a whole bunch of interesting stuff that is going on at Google and at other places too. I'm not personally working on any of them at the moment but I'm really excited about what people are working on....Promising things are happening and I suspect that five years down the road we'll be in a very different place than we are today, especially when it comes to interactions of computers and how much they understand about what we are asking them...."

:47:00: What are your most difficult challenges in research and what valuable lessons do you wish to share?
"....One challenge is managing complexity....Other challenges....when thinking of designs for systems, it's really good to be able to work them out with pencil and paper and to toss out a whole bunch of designs and see which ones will work out...."

:52:04: Are there areas of controversy in the areas that you research?
"....One of things is we are moving into a multi-core world — most processors that people have, even on phones, have 2, 4, 8 processors in them. It's up in the air as to what is the best programming model for multi-core...."

:53:27: Describe the types of research being created or updated that will drive our experiences in five or ten years? Can you paint a picture for our audience what this experience may be like?
"....I think you did a really good job of answering this question just a few minutes ago when you starting describing the vision of a Star Trek computer....Five, ten years down the road I suspect that most people will not be interacting with computers using keyboards....Moore's Law is still going. We are squeezing more transistors into chips. People who know better than I tell me that's not going to continue for much longer either, and that's going to cause us to make significant changes in how we build systems...."

:55:55: What specific challenges in your education at Cornell and MIT were catalysts to inflection points in your lifetime of contributions and how/why did this happen?
"....The thing I remember most from Cornell was I was taking a distributed systems class and really enjoyed the thinking and the papers that we were reading there, and that is what got me interested in distributed systems....I find myself fortunate that I lucked out at going into the right places where I could develop the skills that I needed and found the people that I had a chance to work with...."

:57:28: What specific challenges in your work were catalysts to inflection points in your lifetime of contributions and how/why did this happen?
"....One of the catalysts for research on different projects that Jeff and I worked on were hard problems we were running into. Let me mention GFS....There's a similar story behind MapReduce — as I mentioned already we were working on stuff that looked very similar to programs that we had written before and we were kind of writing the same thing over and over again. The catalysts were the pain we were dealing with...."

:59:21: Sanjay, you laid many of the foundational pillars in your pioneering work. Distilling from your experiences, what are the greater burning challenges and research problems for today's youth to solve to inspire them to go into computing?
"....There's a tension here today, you might say that we should highlight the great research problems in computer science and that might draw young people to be interested in computer science. There's an alternative path that you can show people that computer science or software programming at least, is about building things. It's a good idea to get people involved in building things and show them how much fun it is....I really like building things and communicating that to young people might be a good way to get them interested in computing...."

:01:01:34: Past, present, and future — name three who inspire you and why is this so?
"....The first person I'd like to mention is going pretty far back when I was growing up in India, my uncle....The other person is Professor Liskov who was my advisor when I in grad school, I learned a lot both directly from her and being part of her group....The third person I'd like to mention is Jeff Dean who is the co-winner of this award. We have had an amazing production relationship for the last fifteen or sixteen years since before we joined Google, we worked at the same lab...."

:01:04:00: You choose the topic area and it could be in any domain. What do you see as the top challenges facing us today?
"....I think we could have a huge impact on the world if we can figure out how to make computing more accessible to more people. This is not a challenge about computing, it's more about infrastructure and availability and so forth...Another interesting challenge that's coming up is in the field of biology....I suspect that in the next 10 years or so, there's going to be huge opportunities in the combination of computing and biology in terms of improving our health...."

:01:06:24: Sanjay because you have had such a long and distinguished career, do you have any lessons that you want to share with the audience?
"....I have one important lesson which was important for me....Find something you like to do, find people you know you like working with and then work with them...."

:01:07:15: Sanjay, with your demanding schedule, we are indeed fortunate to have you come in to do this interview. Thank you for sharing your substantial wisdom with our audience.


Music by Sunny Smith Productions and Shaun O'Leary