|
New
search technologies promise to put an end frustrating
searches that don''t quite satisfy user requirements. By
Rajiv Singh
In
an age of information overflow, information retrieval
or search is increasingly being acknowledged as a critical
issue, and the focus on search technologies is stronger
now than at any time before. Today, search is one of the
most challenging problems faced by individuals, business
organisations and government and related organisations.
The
transformation that is occurring in the area of ''search''
is indeed radical. As computers become more pervasive
in our daily lives, increasing fluidity or more human-like
interactions, will be sought with these machines.
Finding
information is a need not just for businesses, but also
at an individual level. This is particularly true today
as increasing number of personal tasks be it buying
a product or getting directions or phone numbers
begin to go online. A basic search box, with no further
interaction than the feeding of a keyword, is no longer
enough to serve anybody''s purpose.
A
large number of search enhancement tools are now talking
of context. "Not so long ago, content was king. In
2006, context will reign," so says Susan Feldman,
vice president, content technologies research, IDC. Context
is just a term, however, and its varied usage refers to
a diverse range of ideas from domain-specific search
engines to personalisation. At a general level, however,
search engines have to address a specific problem
the fact that people are never going to ask the right
questions, or, given the limitations of search technologies,
the right answers are never going to be thrown up.
The
quiet revolution
For an analyst, such as Susan Feldman, the advent of search
technology issues is part of a larger, ongoing development
in the last decade. Discussing the issue in an article,
Search: the quiet revolution, Feldman says, "Search
technology has been around for more than four decades.
But only in the past ten years, as the worldwide web has
become an integral part of the technology landscape, has
it occupied a prominent place in our work and personal
lives. And only in the past four years has search finally
become a hot and lucrative area of technology development.
Why the delay?" Feldman''s reasoning, as enumerated
below, would show that some kind of a historical conjunction
may now have been reached, allowing search technologies
to emerge at the forefront of technology related issues.
According
to Feldman, to be truly effective, search technologies
require computing power to sort through massive amounts
of text or data. This has now been achieved, she says,
even with our desktop machines. With computing power no
longer a problem, she points out, sophisticated language
analysis and complex matching algorithms are now coming
into play in aid of search technologies.
An
added spur, in the advancement of such technologies, has
been the discovery by companies that lost information
puts them at risk for non-compliance. Negative issues
such as these are not the only reason why information
access is a critical issue for firms today. The fact is
that search technologies are also turning out to be a
positive aid in product development and decision making
processes.
Finally,
as Feldman points out, our professional and personal lives
have been transformed in the past decade at an individual
level. Increasingly, we are looking for tools that will
allow us to sustain both professional and personal tasks
- from scheduling meetings with a client, to buying movie
tickets for the family.
Even
as a historical conjunction of demand, technology and
computing power may now have come to pass, Feldman says
that the kind of access to information that individuals
and organisations require may still be some distance away.
What is lacking, she says, is a deep understanding of
information interactions, and how to automate them
effectively.
Unstructured
information
Language is the basis of all our interactions, be it with
the computer or in the cyberspace, and language, as Feldman
points out, is "complex and ambiguous." This
is where frustrations with the sophistication of "search"
technology, as they exist, begin to be experienced by
individuals and organisations alike. This is also the
point at which a company such as Autonomy Corporation
now enters the picture.
Late
last month, content management and enterprise search specialist,
Autonomy Corporation Plc, announced its half year results
up to 2006, and reported a trebling in sales and a 351
per cent jump in operating profits to $27.9 million. A
company release quoted Autonomy founder and chief executive,
Dr Michael Lynch, as saying "
during the past
year unstructured information issues had gone "prime
time"."
Autonomy
says that the last few years have seen an explosive growth
in the use of unstructured information, which definition
would include documents, emails, telephone conversations
and multimedia. According to Autonomy, unstructured
information has traditionally been difficult for computers
to understand and use. Interestingly, it says that more
than 85 per cent of all information inside an enterprise
is now unstructured.
Of
course, unstructured information issues are what
Autonomy specialises in, a space that it defines as Meaning-Based
Computing (MBC). According to Autonomy, MBC enables
"
computers to understand the relationships
that exist between disparate pieces of information
and perform sophisticated analysis operations with real
business value, automatically and in real-time."
The disparate pieces of information that MBC can
act on are things like email and documents, as well as
PDFs, voice over IP and other types of content.
As
Lynch, explains, information in the IT world is divided
into two distinct groups: structured material that
goes in relational databases, and all the unstructured
material that doesn''t fit into IT infrastructures very
well. It is this unstructured material that is
now exploding in terms of usage.
According
to Autonomy, MBC solves the problem of accessing unstructured
information by applying a new breed of applications "
which
not only uncovers, but also makes sense of the 85 per
cent of enterprise information that remains hidden to
all other technologies including keyword search engines
and relational databases." Autonomy says MBC would
also be an active aid in tracking illegal activity.
Accessing
these disparate pieces of information in a meaningful
way is perhaps what Feldman is referring to when she speaks
about information interactions.
A
broad church
According to Lynch, unlike structured data, which allows
IT to automate, the problem with unstructured data is
that it has not proved to be amenable to automation -
so far. The aim of MBC, Lynch says, is to enable companies
to attempt a similar process with unstructured information.
According
to Lynch, a variety of technologies go into MBC, from
speech recognition to text understanding, creating platforms
that take the search and interactivity process yet another
step ahead. In an interview with James Murray (ITWeek,
21st Aug, 2006) Lynch refers to this application of
technologies as "a broad church."
It''s
an interesting analogy that Lynch uses, for the origins,
of what he refers to as MBC, have a quaint connection
to churches, vicars and some very interesting English
history. A 18th century English country vicar and mathematician,
Thomas Bayes, set about trying to prove the possibility
that God exists and arrived at a theorem that discussed
the mathematical probability of things.
Lynch
says that MBC traces its origins to Bayes''s enterprising
work and also builds on the work of Claude Shannon, ''the
Father of Information Theory'', whose Principles of
Information enable identification of the patterns
that naturally occur in text. These two sets of research,
says Lynch, lie at the heart of MBC. The underlying pattern-recognition
algorithm, derived from Bayes'' formulations, enables computers
to comprehend context, generalize from words to an idea,
and according to Lynch, grasp the root concepts beneath
the play of syntax..
However,
for now, Lynch says, "
we are just at the beginning
of this movement, but in a few years time you will see
unstructured information used and processed all over the
place."
Currently,
Lynch confesses, MBC is dominated by enterprise search,
but the technologies being developed to aid enterprises
will soon begin to have a wider use. One such radical
technology is implistic query. Instead of pausing
one''s work and seeking the help of a search engine with
a query, implistic technology would not only read
what is on the screen at any time, be it an email or a
web page, but with the press of one key understand what
is on the screen and summon up related information. Hyperlinking
is another interesting technology, which would provide
links to internal and external information based on whatever
one is working on. Smart or Active folders would do filing
by themselves allowing, for instance, all documents related
to Autonomy to be filed under that company''s name. This,
says Lynch, is the direction in which search technology
is broadly headed.
Even
as MBC evolves, Autonomy says that more than 16,000 blue-chip
corporations and government agencies are now accessing
the pattern matching algorithms in its products to extract
meaning from unstructured information. These range from
the US Department of Homeland Security, which is using
MBC across 21 agencies to monitor suspected terrorist
groups, the Ford Motor Company which is using the company''s
applications to transform the text, audio and video files
in its research libraries into meaningful reference material
in order to speed up work on new projects, as well as
a financial giant like Zurich Financial Services, which
is using MBC applications to prioritise research emanating
from more than 500 of its sources, for its risk managers.
John
Deere, Sprint, Whirlpool, Saab Ericsson, LexisNexis China
and RioTinto are the major new clients that Autonomy sold
its software to in the second quarter. The company has
also signed new business with multiple government, defence
and intelligence agencies around the globe, including
the US, the UK, the Netherlands, France, Italy and Singapore.
The
new additions join a roster of existing clients that includes
BAE Systems, Boeing, Daimler Chrysler, Shell, AOL, BBC,
Reuters, Hutchison 3G, Ericsson, T-Mobile, Philips, Coca
Cola, Kraft Foods, Nestle, Lloyds TSB, GlaxoSmithKline,
KPMG, Citigroup, ABN AMRO, Deutsche Bank, Nomura and the
US Securities and Exchange Commission.
According
to Feldman, Search technologies, such as text mining,
text analytics, categorisation, speech analysis, and translation
will eventually be embedded in a majority of people-facing
applications, such as cell phones, cars, gas pumps, home
entertainment centres, call centres and transit systems.
Feldman
feels that as the Search needs of people and organisations
continue to
develop, and get more complex, more than a search engine
what perhaps really is the need of the times is a true
information discovery platform, incorporating all search
related technologies. Given the pace at which IT technologies
develop, that may occur sooner than one may think.
|