CIPS Connections

6/26/2003 11:01:30 AM
A Discussion with Mike Daconta, Chief Scientist, APG, McDonald Bradley, Inc.
Interview by Stephen Ibaraki, I.S.P.

This week, Stephen Ibaraki, ISP, has an exclusive interview with the world renowned Michael C. Daconta, Chief Scientist, APG, McDonald Bradley, Inc. (www.mcbrad.com).

Amongst Michael’s array of talents, he is a developer, writer, design and architecture guru specializing in such diverse areas as the Semantic Web, XML, eXtensible User interface Language (XUL), Java, C++, dynamic memory management, and JavaScript. His work can be followed at www.daconta.net.

Discussion:

Q: With your busy schedule, it’s a real pleasure to have you do this interview and share your insights with the audience. Thank you for agreeing to this interview.

A: You are welcome. This certainly has been a busy year, but I enjoy discussing information technology and where it is going. I'd also like to thank you for taking time to do the interview.

Q: Michael, can you describe your latest project work and real-world tips you can pass on?

A: I am currently working two related projects for the United States Department of Defense: the Virtual Knowledge Base and Net-centric Enterprise Services architecture. The DIA's Virtual Knowledge Base is an interoperability framework to integrate heterogeneous data stores (databases, HTML, message traffic, etc.) into a single virtual repository. The Network Centric Enterprise Services (NCES) is a wide-ranging program to transform the United Stated Department of Defense by improving the horizontal fusion of information.

There have been many lessons learned over the last few years. Here are some tips from those real-world experiences:

• Document versus Remote Procedure Call (RPC)-based web services. This is a critical issue for interoperability. Everyone who creates a web-service must specify in the WSDL SOAP binding whether the style is document or RPC. In other words, whether the web service transaction involves XML documents or parameters and a return argument for the method calls. The RPC method is clearly an XML form of traditional RPC; while the document binding makes web services more message-oriented. In terms of interoperability, the additional design required to transact XML documents vice RPC parameters provides better context, validation and application-independent abstraction to any number of clients. This is a critical component of “net-centricity” which attempts to eliminate our reliance on “point-to-point” interfaces. Examples 3 and Examples 4 of the WSDL 1.1 specification (available at http://www.w3.org/TR/wsdl) demonstrate the difference between these two styles. I plan on writing an article on this more fully demonstrating the differences between the two approaches. In short, use document-based web services to improve interoperability and addressability of your information systems. RPC-based web services do not exploit the full benefits of XML and offer little or no improvements over CORBA.

• Weak XML Design. A while ago I wrote an article for XML Journal entitled “Are elements and attributes interchangeable?” The article focused on the design issues and tradeoffs in this decision (by the way, the answer is “no”). One point the article tried to make was that many people treat such a distinction in overly-simplistic ways. Because XML-based markup languages are easy to create, sometimes too little thought is put into their design. For example, let’s say I am creating a recursive structure to display a business organization. I could do something like this:

<Employee type=”President” name=”Joe”>

<Employee type=”Vice President” name=”Sam”>

<Employee type=”Director” name=”Bill”>

…

        </Employee>

   </Employee>

   <Employee type=”Vice President” name=”Harry”>

…

   </Employee>

</Employee>

There is a significant deficiency in over-reliance on the type attribute. First, this document cannot be validated completely by standard validation methods because nesting rules would depend on the value of the Employee type which cannot be expressed in a DTD or XML Schema (though this may be possible in other schema languages). Thus it would be better to model this like so:

<President name=”Joe”>

   <VicePresident name=”Sam”>

      <Director name=”Bill”>

…

      </Director>

   </VicePresident>

…

</President>

• Maturity and performance of RDF stores is improving. When we initially started VKB this was a stumbling block to adoption. Now there are commercial implementations in addition to increased maturity of the open source offerings. This year will be the year they are ready for primetime.

• People too often confuse taxonomies with ontologies. I have to explain the difference between taxonomies (and topic maps) and ontologies too often. The confusion lies in the fact that a taxonomy may be an ontology if the classes defined follow a formal subclass relation; however, if they do not follow a subclass relation than a taxonomy is not an ontology. Thus, the key question being whether a defined taxonomy or classification scheme is suitable for inference.

• Web service interfaces and polymorphism. The technique of defining a standard web-service interface which can be implemented by any number of service providers is a powerful use of web services. This implements the object oriented principle of polymorphism in the web-services environment. One example of this technique is the Web Services for Remote Portals (WSRP) Specification from OASIS at http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsrp.

• Metadata registries are still in flux. Many people parrot the marketing hype of the web-services “triumvirate” as SOAP, WSDL and UDDI. While SOAP and WSDL are de-facto standards, the same cannot be said for UDDI. The UDDI classification capability is weak. Secondly, it has no current or planned support for ontologies. Finally, its information model is business-centric and often uses overly-abstract names like tModel, publisherAssertion, and instanceParms. The bottom line is that there is much competition in this space (11179, LDAP, ebXML, RDF, UDDI) and the jury is still out.

Q: What do you see on the horizon that businesses and IT professionals “must” be aware of to be competitive?

A: Here are the top five things IT professionals and executives should be examining now:

• Portals and standard, reusable portlets (JSR 168) – Portals are web aggregation points for specific communities. They are also great vehicles for organizations to implement business process reorganization (BPR), Enterprise Application Integration (EAI) and Enterprise Information Integration (EII) all in a single project.

• Ontologies and axioms (specifically OWL and the UML profile for OWL). As is made clear, in our just released books, ontologies have really come of age and our ready for primetime use. Both the revised RDF specifications and OWL will become W3C recommendations within the next few months.

• Inference and rule engines. While ontologies provide a formal fact base, rules can be used to infer new information and perform actions on the knowledge base. Although there are no current standards in this area, there are some promising efforts like RuleML and some standards efforts on the drawing board.

• Web service orchestration (OWL-S) and Web Services for Remote Portals (WSRP). Web services will proliferate both on intranets and the internet. If done correctly, (see comment above on RPC versus Document-based web services) they will be a catalyst force to the next phase of data evolution which is the semantic web (more on this later). Once these services are widespread, they will need to work in concert both in terms of workflow and in terms of semantic interoperability. This requires a much higher degree of data modeling for discovery and orchestration. The classic orchestration example is the travel service which unites flight, auto and hotel booking. In addition to orchestration, using web services as portlets is important to allow reuse among various portals and this is the goal of the WSRP specification underway in OASIS (www.oasis-open.org).

• Metadata registries and the incorporation of ontologies into them. In February I attended a metadata registries conference in New Mexico. It was a very good conference that really highlighted both the fragmentation and demand in this emerging space. The next step in these registries is to integrate the emerging semantic web concepts into the registry (specifically ontologies). So far, having looked at all the available technologies, I am most impressed with the ebXML information model. Additionally, I was told by the technical lead of that effort that ontology support would be added to the next version of the registry.

Q: Can you give specific predictions for businesses about where technology is going in two years, and in five years in the following areas: telephony, security, pervasive computing, networking, the desktop, the web, data storage, Voice over IP, IPv6, and other areas you feel require consideration?

A: Though I do not consider myself an expert in all these areas, I will do my best to give an assessment of where they are going.

• Telephony – the most interesting aspect of this area is the integration of cell phones with the PDA, network connectivity and digital cameras. I wrote a J2ME pitfall in the More Java Pitfalls book! It was great to do some code in this exciting area. If you have not done any J2ME development, I highly recommend it. You don’t need to have a device to code in this area, as the emulators are nicely done. Other exciting developments in the telephone space are Push-to-Talk (PTT) features, bandwidth expansion, GPS integration, downloadable code, and mobile code. This space is well-suited to semantic web technologies and web-services; for example, the CC/PP RDF vocabulary.

• Security. Obviously web services security is very hot right now. Kevin Smith discusses SAML, XACML and other XML security technologies in our new Semantic Web book. In fact, Kevin is doing a talk at JavaOne this year on web services security.

• Pervasive computing – this is defined differently by different people. In general, the goal in this area is for computing to become transparent. In everything, networked and taken for granted.

• Networking – For those who don’t have broadband at home – I highly recommend it and it is worth the cost. The best feature of high-speed internet is not actually the speed of the line but instead is the “always-on” aspect of broadband. It makes the internet much more useful to not have a separate “connect” phase. Of course, the speed will continue to improve and that will open us up to better multimedia, cheap video conferencing, and better collaboration software.

• Desktop computing – with the idea of pervasive computing the desktop computing arena is waning. It is more important to seamlessly connect our plethora of devices than to concentrate on a single device, thus Apple has it right with the digital hub concept.

• The World Wide Web – Tim Berners-Lee’s vision will come true. We will have a Collaborative web and a Semantic web. More on this later.

• Data storage – we are pushing towards Terabytes on devices the size of a postage stamp! This is one of the reasons the semantic web is a must! Our storage capacity has significantly outstripped our ability to manage the information we can store. We have spent the last several decades ramping up and refining our ability to create information (in a plethora of mostly proprietary formats) without considering the discovery problem. Vannevar Bush must be rolling over in his grave. This is our most pressing imperative and we ignore it to the detriment of productivity and progress.

• Voice over IP – I am not very knowledgeable in this area. In general, this technology does not seem to be a big issue as long distance rates keeps dropping. Competition in the long distance market has been great and consistently reduced the price for consumers. If we can only get that same competition in the local and broadband space. FCC are you listening?

• Ipv6 – I expect a graceful transition to this based on a dual capability path in equipment and equipment upgrades. We have to do this and IT managers should plan this in to their next equipment upgrades for routers and other network equipment.

• 64-bit computing – I am surprised that 64-bit computing does not get more press. It is an absolutely necessary transition in the next year or so for power users. The transition in the mainstream will be within 2 years (but I hope sooner). The bottom line is that we are bumping up against the 4GB limit of 32 bit addresses for memory intensive application like high-fidelity games and multimedia. It is disappointing that the major vendors and Microsoft have not made this switch sooner. Let’s not wait until there are massive complaints to get 64-bit computers and applications out there. We have known about this for a long time so let’s make the switch and get it over with.

• XML/RDF data stores – I predict we will see more of the big players get into this space. Especially all the database vendors. If they do not, the open source variants will eat their lunch as these data stores gain prominence. This is especially important in the registry space.

• Ontologies – as we discuss in detail in our book, these are truly the next big thing in data modeling, model-driven architecture, expert systems and the web. With OWL, they will cross the chasm from early adopters to mainstream adoption.

Q: As an international expert in Java, can you share insights and tips from your latest book? What prompted you to write this book?

A: In March of this year, the book More Java Pitfalls was released by John Wiley and Sons, Inc. I am proud of this book and the team that worked on it because we worked hard at perfecting how to write about pitfalls. Some people seem to think that writing about pitfalls is a slam on Java – that is not true. Every complex system has pitfalls to inexperienced and average users (in our case programmers). Sometimes it is the designer of the systems fault and sometimes it is the users fault. In the introduction to the book, I display a taxonomy of pitfalls as shown here in the figure below.

See Figure 1 to see diagram.

Looking at the pitfall taxonomy, the initial split is between the programmer’s fault and the platform designer’s fault. So, pitfalls are not a slam on the Java platform, instead they are the sharing of our hard-won experience with other programmers. That brings me to the next key point about the book, its use in peer mentoring programs. Peer mentoring is a concept I have been talking about at my company for awhile. In fact, there is an article about peer mentoring on the McDonald Bradley website.

One of the best uses of the pitfalls book is in a formal mentoring program. For example, a pitfall is a great discussion item for a brown-bag lunch training session that gets programmers together in an informal session.

As for tips, the whole book is a set of 50 pitfalls and the tips that resolve or workaround the pitfall. There are several sample pitfalls posted on the companion website at: http://www.wiley.com/legacy/compbooks/daconta

Q: There’s considerable momentum around the “Semantic Web.” Can you describe your book on this topic and why it’s important for IT professionals and businesses to be aware of this concept and related technologies?

A: About two weeks ago, John Wiley & Sons, Inc released my new book titled The Semantic Web: A guide to the future of XML, Web Services and Knowledge Management. I feel our book is truly unique on this subject in the following ways:

1. It is the First and Only Semantic Web Book for Technical Managers and Development leads. The authors are senior technologists that offer a critical analysis of the technologies and strategic guidance to senior Information Technology Professionals. The authors are not cheerleaders for the technology in order to give honest, accurate assessments. All previous books on the subject have been graduate-level or implementation-level books for developers.

2. It explains all the pieces of the Semantic Web and how current technologies fit into the puzzle. Specifically, discusses XML and its family of specifications, RDF, RDFS, Web Services, Taxonomies and Ontologies. No major web specification is left unanalyzed.

3. It reveals the Return-on-Investment (ROI) for Semantic Web Initiatives and other aspects of the business case for semantic web technologies.

4. It gives your Organization a clear, step-by-step Roadmap to preparing for and implementing Semantic Web technologies today.

5. It introduces, explains and explores the implications of new transformational concepts like the “Smart Data Continuum”, “Semantic Levels”, “Non-contextual Modeling”, and “Combinatorial Experimentation”. These new concepts are inventions of the authors, derived from their real-world experience, and their applicability demonstrated to today’s business environment.

I’d like to take a moment to highlight the concept of the smart data continuum as it is particularly relevant to understanding both what the semantic web is and how it is just another step along a larger path of the evolution of data fidelity. Below is the diagram of the smart data continuum.

See Figure 2 to see diagram of smart data continuum.

While the smart data continuum is covered in more detail in my book, I will summarize the key points here. The smart data continuum reveals the path of data evolution along a continuum of increasing intelligence. So, in a nutshell, the semantic web is a shift in moving the intelligence (or smarts) from applications to the data. The first major development along this path was in the 1970’s when the term GIGO, or garbage-in-garbage-out, characterized the dependence of programs on correct data. This elevated the importance of data in programs and led to object-oriented programming languages. In the 1990’s, the advent of HTML and the success of the World Wide Web, caused another shift in the data continuum from proprietary schemas to open schemas. Today, with web services, we are again undergoing a shift in the data evolution from interoperable syntax to interoperable semantics. So, the figure above highlights the progression of increasing intelligence from proprietary schemas (office documents, databases), to open schemas (XML), to multi-vocabulary schemas (taxonomies and namespaces) and finally to inference and automated reasoning (Ontologies).

6. Lastly, the book is taken from the author’s real-world experience transforming the Department Of Defense (DOD) and Intelligence Community into a Net-centric environment. The principles and practices espoused in the book are guiding the next generation military towards net-centricity.

Q: Please share more of your ideas around the Semantic Web?

A: Here are a few other ideas I am either actively experimenting with or exploring:

• Semantic chains – one problem with metadata is that it is potentially infinite. Unfortunately, you cannot burden every potential application with metadata layers that it has no interest in. Thus, the rule with metadata must be to provide “just enough”. Thus an XML or XML/RDF document must provide a minimal set of metadata to enable the most common processing of that data; however, the document must include a reference or link back to a larger pool of metadata which provides context for the data or clarification of the data. In turn, that pool may link back to another, possibly yet larger pool of metadata to provide yet more context and thus we have a semantic chain of metadata pools stretching back as far as it needs to go.

• Multitagging – yet another problem with metadata, especially in application areas that require cross-domain processing, is the fact that content can only be marked up (or governed) by a single DTD or Schema. In essence, each markup language can be viewed as a perspective on a piece of content. If there are multiple perspectives on that content, the only current way to achieve capture of those perspectives is to either transform the document into multiple separate markup languages (where the content itself is repeated) or attempt to nest different tags within a single document. Unfortunately, nesting breaks down due to the fact that it is illegal for tags to be interlaced (and rightly so). Therefore, I have created a simple XML format that allows markup language tags to be treated as separate layers on the content. The analogy would be to think of tags as an overlay (like acetate) on top of the content and thus multitagging will allow us to have multiple overlays overlaid on top of a single body of content. This is especially relevant for applications involving automated tagging, re-tagging and temporary tagging. After a successful prototype, I plan on releasing the specification to the W3C as a Note.

• Asymmetric search – in order to combat asymmetric threats (threats that do not follow symmetrical thinking), we need to significantly improve our ability to formally relate conceptual and physical entities in our information spaces. In my opinion, this is both our current biggest problem and biggest opportunity. I am not talking simple link analysis or simple labeled links. This is a difficult problem that is against the normal tide of traditional thought, and traditional information processing. Look at the way most programming languages have links (references or pointers) as second-class citizens that get buried inside other structures. More and more we are seeing that the links between the dots is where we are sorely lacking. I encourage every person with an entrepreneurial bent to study this problem and think of innovative solutions. We desperately need better ways to capture, manage and exploit formal relations (with their own set of metadata) in our information systems. The next killer application is “Relation authoring, sharing and discovery.” It is important to understand that this is not just a technical problem; it is also a functional knowledge engineering problem.

• I am working on several other ideas but will save discussion of them for a future interview.

Q: Describe future book titles and articles can we expect from you?

A: In the near term, I plan on writing some articles on some of the ideas knocking around in my head. Especially the ones related to the semantic web and practical examples of applications in that area.

As for books, there are many possibilities. I will most likely stay on the dual track of exploring pitfalls in other complex systems (XML, web services, .Net) and exploring the intricacies and applications of semantic web technologies.

Q: With your deep knowledge of the entire IT industry, what other pointers would you like to give the readers?

A: Even though the economy is in a funk, that is (and should be) tangential to progress in the IT industry. Hopefully it will dampen some of the hype surrounding IT and let us get back to solving problems, increasing productivity and improving our effectiveness in using and sharing knowledge. The real important message for IT is that real progress is being made and continues to be made in all aspects of software development. The industry is maturing and both reliable and user-friendly IT systems are possible.

Lastly, here are some final tips:

- Forget operating systems – we are now on the layer “above” the operating system. If you are tied to a particular operating system or protocol, you need a better IT Architect.
- Knowledge management activities must pervade every aspect of the organization. Especially capture. 90% of useful information is lost in informal or non-existing capture methods.
- Beware of expensive, proprietary solutions: open standards and open source are the way to go for most projects. Especially open standards.
- Never forget the human aspects of computing. Have your developers interact with users regularly in a social setting. Also, start a mentoring program in your organization.
- The next leap in information technology is all about information fidelity. We have spent the last 30 years mostly on graphics fidelity (3D, user interfaces, etc.). That era is mostly done. The next wave is information fidelity: this means fine-grained metadata, relations between data elements, and links between discovery and production of information.

Q: Thank you for taking the time to share with our readers and we look forward to reading your books, and articles.

A: Thanks for allowing me to discuss these important issues with your readers. I am always interested in the thoughts of other IT professionals so they can feel free to email me at mike@daconta.net.