CIPS Connections
6/26/2003
11:01:30 AM
A Discussion with Mike
Daconta, Chief Scientist, APG, McDonald Bradley, Inc.
Interview by Stephen Ibaraki, I.S.P.
This week, Stephen Ibaraki, ISP, has an
exclusive interview with the world renowned Michael C. Daconta, Chief Scientist,
APG, McDonald Bradley, Inc. (www.mcbrad.com).
Amongst Michael’s array of talents, he is a developer, writer, design and
architecture guru specializing in such diverse areas as the Semantic Web,
XML, eXtensible User interface Language (XUL), Java, C++, dynamic memory
management, and JavaScript. His work can be followed at www.daconta.net.
Discussion:
Q: With your busy schedule, it’s a real pleasure to have you do this
interview and share your insights with the audience. Thank you for agreeing
to this interview.
A: You are welcome. This certainly has been a busy year, but I enjoy
discussing information technology and where it is going. I'd also like to
thank you for taking time to do the interview.
Q: Michael, can you describe your latest project work and real-world tips you
can pass on?
A: I am currently working two related projects for the United States
Department of Defense: the Virtual Knowledge Base and Net-centric Enterprise
Services architecture. The DIA's Virtual Knowledge Base is an
interoperability framework to integrate heterogeneous data stores (databases,
HTML, message traffic, etc.) into a single virtual repository. The Network
Centric Enterprise Services (NCES) is a wide-ranging program to transform the
United Stated Department of Defense by improving the horizontal fusion of
information.
There have been many lessons learned over the last few years. Here are some
tips from those real-world experiences:
• Document versus Remote Procedure Call (RPC)-based web services. This is a
critical issue for interoperability. Everyone who creates a web-service must
specify in the WSDL SOAP binding whether the style is document or RPC. In
other words, whether the web service transaction involves XML documents or
parameters and a return argument for the method calls. The RPC method is
clearly an XML form of traditional RPC; while the document binding makes web
services more message-oriented. In terms of interoperability, the additional
design required to transact XML documents vice RPC parameters provides better
context, validation and application-independent abstraction to any number of
clients. This is a critical component of “net-centricity” which attempts to
eliminate our reliance on “point-to-point” interfaces. Examples 3 and
Examples 4 of the WSDL 1.1 specification (available at
http://www.w3.org/TR/wsdl) demonstrate the difference between these two
styles. I plan on writing an article on this more fully demonstrating the
differences between the two approaches. In short, use document-based web
services to improve interoperability and addressability of your information
systems. RPC-based web services do not exploit the full benefits of XML and
offer little or no improvements over CORBA.
• Weak XML Design. A while ago I wrote an article for XML Journal entitled
“Are elements and attributes interchangeable?” The article focused on the
design issues and tradeoffs in this decision (by the way, the answer is “no”).
One point the article tried to make was that many people treat such a
distinction in overly-simplistic ways. Because XML-based markup languages are
easy to create, sometimes too little thought is put into their design. For
example, let’s say I am creating a recursive structure to display a business
organization. I could do something like this:
<Employee type=”President” name=”Joe”>
<Employee type=”Vice President” name=”Sam”>
<Employee type=”Director” name=”Bill”>
…
</Employee>
</Employee>
<Employee type=”Vice President” name=”Harry”>
…
</Employee>
</Employee>
There is a significant deficiency in over-reliance on the type attribute.
First, this document cannot be validated completely by standard validation methods
because nesting rules would depend on the value of the Employee type which
cannot be expressed in a DTD or XML Schema (though this may be possible in
other schema languages). Thus it would be better to model this like so:
<President name=”Joe”>
<VicePresident name=”Sam”>
<Director name=”Bill”>
…
</Director>
</VicePresident>
…
</President>
• Maturity and performance of RDF stores is improving. When we initially
started VKB this was a stumbling block to adoption. Now there are commercial
implementations in addition to increased maturity of the open source
offerings. This year will be the year they are ready for primetime.
• People too often confuse taxonomies with ontologies. I have to explain the
difference between taxonomies (and topic maps) and ontologies too often. The
confusion lies in the fact that a taxonomy may be an ontology if the classes
defined follow a formal subclass relation; however, if they do not follow a
subclass relation than a taxonomy is not an ontology. Thus, the key question
being whether a defined taxonomy or classification scheme is suitable for
inference.
• Web service interfaces and polymorphism. The technique of defining a
standard web-service interface which can be implemented by any number of
service providers is a powerful use of web services. This implements the
object oriented principle of polymorphism in the web-services environment.
One example of this technique is the Web Services for Remote Portals (WSRP)
Specification from OASIS at
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsrp.
• Metadata registries are still in flux. Many people parrot the marketing
hype of the web-services “triumvirate” as SOAP, WSDL and UDDI. While SOAP and
WSDL are de-facto standards, the same cannot be said for UDDI. The UDDI
classification capability is weak. Secondly, it has no current or planned
support for ontologies. Finally, its information model is business-centric
and often uses overly-abstract names like tModel, publisherAssertion, and
instanceParms. The bottom line is that there is much competition in this
space (11179, LDAP, ebXML, RDF, UDDI) and the jury is still out.
Q: What do you see on the horizon that businesses and IT professionals “must”
be aware of to be competitive?
A: Here are the top five things IT professionals and executives should be
examining now:
• Portals and standard, reusable portlets (JSR 168) – Portals are web
aggregation points for specific communities. They are also great vehicles for
organizations to implement business process reorganization (BPR), Enterprise
Application Integration (EAI) and Enterprise Information Integration (EII)
all in a single project.
• Ontologies and axioms (specifically OWL and the UML profile for OWL). As is
made clear, in our just released books, ontologies have really come of age
and our ready for primetime use. Both the revised RDF specifications and OWL
will become W3C recommendations within the next few months.
• Inference and rule engines. While ontologies provide a formal fact base,
rules can be used to infer new information and perform actions on the
knowledge base. Although there are no current standards in this area, there
are some promising efforts like RuleML and some standards efforts on the
drawing board.
• Web service orchestration (OWL-S) and Web Services for Remote Portals
(WSRP). Web services will proliferate both on intranets and the internet. If
done correctly, (see comment above on RPC versus Document-based web services)
they will be a catalyst force to the next phase of data evolution which is
the semantic web (more on this later). Once these services are widespread,
they will need to work in concert both in terms of workflow and in terms of
semantic interoperability. This requires a much higher degree of data
modeling for discovery and orchestration. The classic orchestration example
is the travel service which unites flight, auto and hotel booking. In
addition to orchestration, using web services as portlets is important to
allow reuse among various portals and this is the goal of the WSRP
specification underway in OASIS (www.oasis-open.org).
• Metadata registries and the incorporation of ontologies into them. In
February I attended a metadata registries conference in New Mexico. It was a
very good conference that really highlighted both the fragmentation and
demand in this emerging space. The next step in these registries is to
integrate the emerging semantic web concepts into the registry (specifically
ontologies). So far, having looked at all the available technologies, I am
most impressed with the ebXML information model. Additionally, I was told by
the technical lead of that effort that ontology support would be added to the
next version of the registry.
Q: Can you give specific predictions for businesses about where technology is
going in two years, and in five years in the following areas: telephony,
security, pervasive computing, networking, the desktop, the web, data
storage, Voice over IP, IPv6, and other areas you feel require consideration?
A: Though I do not consider myself an expert in all these areas, I will do my
best to give an assessment of where they are going.
• Telephony – the most interesting aspect of this area is the integration of
cell phones with the PDA, network connectivity and digital cameras. I wrote a
J2ME pitfall in the More Java Pitfalls book! It was great to do some code in
this exciting area. If you have not done any J2ME development, I highly
recommend it. You don’t need to have a device to code in this area, as the emulators
are nicely done. Other exciting developments in the telephone space are
Push-to-Talk (PTT) features, bandwidth expansion, GPS integration,
downloadable code, and mobile code. This space is well-suited to semantic web
technologies and web-services; for example, the CC/PP RDF vocabulary.
• Security. Obviously web services security is very hot right now. Kevin
Smith discusses SAML, XACML and other XML security technologies in our new
Semantic Web book. In fact, Kevin is doing a talk at JavaOne this year on web
services security.
• Pervasive computing – this is defined differently by different people. In
general, the goal in this area is for computing to become transparent. In
everything, networked and taken for granted.
• Networking – For those who don’t have broadband at home – I highly
recommend it and it is worth the cost. The best feature of high-speed
internet is not actually the speed of the line but instead is the “always-on”
aspect of broadband. It makes the internet much more useful to not have a
separate “connect” phase. Of course, the speed will continue to improve and
that will open us up to better multimedia, cheap video conferencing, and
better collaboration software.
• Desktop computing – with the idea of pervasive computing the desktop computing
arena is waning. It is more important to seamlessly connect our plethora of
devices than to concentrate on a single device, thus Apple has it right with
the digital hub concept.
• The World Wide Web – Tim Berners-Lee’s vision will come true. We will have
a Collaborative web and a Semantic web. More on this later.
• Data storage – we are pushing towards Terabytes on devices the size of a
postage stamp! This is one of the reasons the semantic web is a must! Our
storage capacity has significantly outstripped our ability to manage the
information we can store. We have spent the last several decades ramping up
and refining our ability to create information (in a plethora of mostly
proprietary formats) without considering the discovery problem. Vannevar Bush
must be rolling over in his grave. This is our most pressing imperative and
we ignore it to the detriment of productivity and progress.
• Voice over IP – I am not very knowledgeable in this area. In general, this
technology does not seem to be a big issue as long distance rates keeps
dropping. Competition in the long distance market has been great and
consistently reduced the price for consumers. If we can only get that same
competition in the local and broadband space. FCC are you listening?
• Ipv6 – I expect a graceful transition to this based on a dual capability
path in equipment and equipment upgrades. We have to do this and IT managers
should plan this in to their next equipment upgrades for routers and other
network equipment.
• 64-bit computing – I am surprised that 64-bit computing does not get more
press. It is an absolutely necessary transition in the next year or so for
power users. The transition in the mainstream will be within 2 years (but I
hope sooner). The bottom line is that we are bumping up against the 4GB limit
of 32 bit addresses for memory intensive application like high-fidelity games
and multimedia. It is disappointing that the major vendors and Microsoft have
not made this switch sooner. Let’s not wait until there are massive
complaints to get 64-bit computers and applications out there. We have known
about this for a long time so let’s make the switch and get it over with.
• XML/RDF data stores – I predict we will see more of the big players get
into this space. Especially all the database vendors. If they do not, the
open source variants will eat their lunch as these data stores gain
prominence. This is especially important in the registry space.
• Ontologies – as we discuss in detail in our book, these are truly the next
big thing in data modeling, model-driven architecture, expert systems and the
web. With OWL, they will cross the chasm from early adopters to mainstream
adoption.
Q: As an international expert in Java, can you share insights and tips from
your latest book? What prompted you to write this book?
A: In March of this year, the book More Java Pitfalls was released by John
Wiley and Sons, Inc. I am proud of this book and the team that worked on it
because we worked hard at perfecting how to write about pitfalls. Some people
seem to think that writing about pitfalls is a slam on Java – that is not
true. Every complex system has pitfalls to inexperienced and average users
(in our case programmers). Sometimes it is the designer of the systems fault
and sometimes it is the users fault. In the introduction to the book, I
display a taxonomy of pitfalls as shown here in the figure below.
See Figure 1 to see diagram.
Looking at the pitfall taxonomy, the initial split is between the
programmer’s fault and the platform designer’s fault. So, pitfalls are not a
slam on the Java platform, instead they are the sharing of our hard-won
experience with other programmers. That brings me to the next key point about
the book, its use in peer mentoring programs. Peer mentoring is a concept I
have been talking about at my company for awhile. In fact, there is an
article about peer mentoring on the McDonald Bradley website.
One of the best uses of the pitfalls book is in a formal mentoring program.
For example, a pitfall is a great discussion item for a brown-bag lunch
training session that gets programmers together in an informal session.
As for tips, the whole book is a set of 50 pitfalls and the tips that resolve
or workaround the pitfall. There are several sample pitfalls posted on the
companion website at: http://www.wiley.com/legacy/compbooks/daconta
Q: There’s considerable momentum around the “Semantic Web.” Can you describe
your book on this topic and why it’s important for IT professionals and
businesses to be aware of this concept and related technologies?
A: About two weeks ago, John Wiley & Sons, Inc released my new book
titled The Semantic Web: A guide to the future of XML, Web Services and
Knowledge Management. I feel our book is truly unique on this subject in the
following ways:
1. It is the First and Only Semantic Web Book for Technical Managers and
Development leads. The authors are senior technologists that offer a critical
analysis of the technologies and strategic guidance to senior Information
Technology Professionals. The authors are not cheerleaders for the technology
in order to give honest, accurate assessments. All previous books on the
subject have been graduate-level or implementation-level books for
developers.
2. It explains all the pieces of the Semantic Web and how current technologies
fit into the puzzle. Specifically, discusses XML and its family of
specifications, RDF, RDFS, Web Services, Taxonomies and Ontologies. No major
web specification is left unanalyzed.
3. It reveals the Return-on-Investment (ROI) for Semantic Web Initiatives and
other aspects of the business case for semantic web technologies.
4. It gives your Organization a clear, step-by-step Roadmap to preparing for
and implementing Semantic Web technologies today.
5. It introduces, explains and explores the implications of new
transformational concepts like the “Smart Data Continuum”, “Semantic Levels”,
“Non-contextual Modeling”, and “Combinatorial Experimentation”. These new
concepts are inventions of the authors, derived from their real-world
experience, and their applicability demonstrated to today’s business
environment.
I’d like to take a moment to highlight the concept of the smart data
continuum as it is particularly relevant to understanding both what the
semantic web is and how it is just another step along a larger path of the
evolution of data fidelity. Below is the diagram of the smart data continuum.
See Figure 2 to see diagram of smart data continuum.
While the smart data continuum is covered in more detail in my book, I will
summarize the key points here. The smart data continuum reveals the path of
data evolution along a continuum of increasing intelligence. So, in a
nutshell, the semantic web is a shift in moving the intelligence (or smarts)
from applications to the data. The first major development along this path
was in the 1970’s when the term GIGO, or garbage-in-garbage-out,
characterized the dependence of programs on correct data. This elevated the
importance of data in programs and led to object-oriented programming
languages. In the 1990’s, the advent of HTML and the success of the World
Wide Web, caused another shift in the data continuum from proprietary schemas
to open schemas. Today, with web services, we are again undergoing a shift in
the data evolution from interoperable syntax to interoperable semantics. So,
the figure above highlights the progression of increasing intelligence from
proprietary schemas (office documents, databases), to open schemas (XML), to
multi-vocabulary schemas (taxonomies and namespaces) and finally to inference
and automated reasoning (Ontologies).
6. Lastly, the book is taken from the author’s real-world experience
transforming the Department Of Defense (DOD) and Intelligence Community into
a Net-centric environment. The principles and practices espoused in the book
are guiding the next generation military towards net-centricity.
Q: Please share more of your ideas around the Semantic Web?
A: Here are a few other ideas I am either actively experimenting with or
exploring:
• Semantic chains – one problem with metadata is that it is potentially
infinite. Unfortunately, you cannot burden every potential application with
metadata layers that it has no interest in. Thus, the rule with metadata must
be to provide “just enough”. Thus an XML or XML/RDF document must provide a
minimal set of metadata to enable the most common processing of that data;
however, the document must include a reference or link back to a larger pool
of metadata which provides context for the data or clarification of the data.
In turn, that pool may link back to another, possibly yet larger pool of
metadata to provide yet more context and thus we have a semantic chain of
metadata pools stretching back as far as it needs to go.
• Multitagging – yet another problem with metadata, especially in application
areas that require cross-domain processing, is the fact that content can only
be marked up (or governed) by a single DTD or Schema. In essence, each markup
language can be viewed as a perspective on a piece of content. If there are
multiple perspectives on that content, the only current way to achieve
capture of those perspectives is to either transform the document into
multiple separate markup languages (where the content itself is repeated) or
attempt to nest different tags within a single document. Unfortunately,
nesting breaks down due to the fact that it is illegal for tags to be
interlaced (and rightly so). Therefore, I have created a simple XML format
that allows markup language tags to be treated as separate layers on the
content. The analogy would be to think of tags as an overlay (like acetate)
on top of the content and thus multitagging will allow us to have multiple
overlays overlaid on top of a single body of content. This is especially
relevant for applications involving automated tagging, re-tagging and
temporary tagging. After a successful prototype, I plan on releasing the
specification to the W3C as a Note.
• Asymmetric search – in order to combat asymmetric threats (threats that do
not follow symmetrical thinking), we need to significantly improve our
ability to formally relate conceptual and physical entities in our
information spaces. In my opinion, this is both our current biggest problem
and biggest opportunity. I am not talking simple link analysis or simple
labeled links. This is a difficult problem that is against the normal tide of
traditional thought, and traditional information processing. Look at the way
most programming languages have links (references or pointers) as
second-class citizens that get buried inside other structures. More and more
we are seeing that the links between the dots is where we are sorely lacking.
I encourage every person with an entrepreneurial bent to study this problem
and think of innovative solutions. We desperately need better ways to
capture, manage and exploit formal relations (with their own set of metadata)
in our information systems. The next killer application is “Relation
authoring, sharing and discovery.” It is important to understand that this is
not just a technical problem; it is also a functional knowledge engineering
problem.
• I am working on several other ideas but will save discussion of them for a
future interview.
Q: Describe future book titles and articles can we expect from you?
A: In the near term, I plan on writing some articles on some of the ideas
knocking around in my head. Especially the ones related to the semantic web
and practical examples of applications in that area.
As for books, there are many possibilities. I will most likely stay on the
dual track of exploring pitfalls in other complex systems (XML, web services,
.Net) and exploring the intricacies and applications of semantic web
technologies.
Q: With your deep knowledge of the entire IT industry, what other pointers
would you like to give the readers?
A: Even though the economy is in a funk, that is (and should be) tangential
to progress in the IT industry. Hopefully it will dampen some of the hype
surrounding IT and let us get back to solving problems, increasing
productivity and improving our effectiveness in using and sharing knowledge.
The real important message for IT is that real progress is being made and
continues to be made in all aspects of software development. The industry is
maturing and both reliable and user-friendly IT systems are possible.
Lastly, here are some final tips:
- Forget operating systems – we are now on the layer “above” the operating
system. If you are tied to a particular operating system or protocol, you
need a better IT Architect.
- Knowledge management activities must pervade every aspect of the
organization. Especially capture. 90% of useful information is lost in
informal or non-existing capture methods.
- Beware of expensive, proprietary solutions: open standards and open source
are the way to go for most projects. Especially open standards.
- Never forget the human aspects of computing. Have your developers interact
with users regularly in a social setting. Also, start a mentoring program in
your organization.
- The next leap in information technology is all about information fidelity.
We have spent the last 30 years mostly on graphics fidelity (3D, user
interfaces, etc.). That era is mostly done. The next wave is information
fidelity: this means fine-grained metadata, relations between data elements,
and links between discovery and production of information.
Q: Thank you for taking the time to share with our readers and we look
forward to reading your books, and articles.
A: Thanks for allowing me to discuss these important issues with your
readers. I am always interested in the thoughts of other IT professionals so
they can feel free to email me at mike@daconta.net.
|
|