| |
This is a suite of research projects related to user interfaces for
search, text data mining and empirical computational linguistics,
and automating web site evaluation.
Cha-Cha:
Cha-Cha is a search interface for heterogenous web intranets,
such as those found at large universities, corporations, and
government sites.
WebTANGO:
The goal of the WebTango project is develop tools and
techniques to improve the web design process via the
application of automated evaluation techniques.
FLAMENCO: We are exploring new ways to incorporate
metadata into search interfaces.
LINDI: We are developing a text data mining system for
automated discovery of new information from large text
collections.
Berkeley
Academic Business Language (BABL): The Berkeley Academic
Business Language (BABL) is an evolving set of models and
associated XML schemas for the domain of university education
and operations. Common data models facilitate the reuse of
information resources and databases, opening up possibilities
for creating more user-friendly applications that integrate
legacy "stovepipes" or that enable entirely new applications.
The XML Application Platform: The XML application platform
is a Java framework for implementing applications that can be
characterized as "forms moving around within and between
organizations." The platform represents all data models, business
rules, and workflow specifications as externalized XML documents
rather than scattering them throughout the application code.
The "Center in a Box": A generic set of XML schemas,
transforms, and style sheets for building a highly structured
and automated web site for a "center" or similar organization.
The navigation framework, site map, tables of contents, and
links are all created by transforms from XML instances.
The Cheshire II project is developing a next-generation online catalog
and full-text information retrieval system using advanced IR techniques.
The Cheshire II system was designed to overcome twin problems of topical
searching in online catalogs: search failure and information overload.
The system incorporates a client/server architecture with implementations
of current information retrieval standards including Z39.50 and SGML.
Economics-Informed Network Design
p2pecon@berkeley:
From file-sharing to mobile ad-hoc networks, community
networking to application layer overlays, the peer-to-peer (p2p)
networking paradigm promises to revolutionize the way we design,
build and use the communications network of tomorrow. The
fundamental premise of p2p systems is that individual peers
voluntarily contribute resources to the system. However, the
inherent tension between individual rationality and collective
welfare produces a misalignment of incentives in the grassroots
provisioning of p2p services. We combine economic foundations
(e.g., from game theory, agency theory, public finance, industrial
organization) with the rigors of system design and validation
methodologies to design p2p systems that are technically and
economically sound.
The 100x100 Project:
The 100x100 Project brings together economists, security and
networking experts, network operators, and policy specialists to
create blueprints for a network that goes beyond today's Internet.
Drawing on technology trends and the experience of the past 30
years, these scientists are re-prioritizing the fundamental
principles that underlie network design to craft networks that
will be ubiquitous in scale, revolutionary in bandwidth,
economically self-sustaining, resistant to attack, and tractable
to manage.
The Denali Project:
The Denali Project is a multi-institutional collaborative
research project developing next generation scalable services for
the global Internet, including: scalable performance-predictable
communication, scalable multicast for efficient data dissemination,
scalable storage for next generation information services, and
design principles for scalable services.
Mobile Media Metadata (MMM): We are creating software for
cameraphones that addresses long standing challenges in consumer
media creation, sharing, management, and reuse by leveraging the
spatio-temporal context and social community of media capture and
use (when, where, and by and with whom media is captured, shared,
and used). We use contextual metadata gathered from cameraphones
and cameraphone users to infer media content, context, and
community and thereby help automate media annotation, retrieval,
sharing, and reuse on mobile devices. We have conducted fairly
large scale deployments and user testing of our MMM prototypes
with 60 users using MMM1 on the Nokia 3650 cameraphone in
2003-2004 and 60 users using MMM2 on the Nokia 7610 cameraphone in
2004-2005. Our SIMS graduate student users in IS202 Information
Organization and Retrieval have also worked in project teams to
develop numerous innovative mobile media application concepts based on MMM1 and on MMM2.
Social Uses of Personal Media: This sister project lead
by Prof. Nancy Van
House is investigating a central problem for technology
design: predicting users and uses for emerging technologies,
i.e., doing user-centered design for users and uses that don't
yet exist. We use the term "social uses" to describe
the higher level motives that guide the specific actions that
users perform. These social uses and the associated findings
from our social science research have significant implications
for mobile media technology design and inform our development
of design methods aimed at projecting and designing for future
uses and users of mobile media technology.
Media
Streams Metadata Exchange (MSMDX): The MSMDX is creating
a platform for collaboratively annotating, retrieving, sharing,
and remixing multimedia content on the World Wide Web. This
platform will be used to discover whether the power of distributed
social networks together with semantic web technology can be
exploited to solve the problem of how to generate useful
machine-readable descriptions of multimedia content. The
usefulness of the descriptions produced will be evaluated by
building innovative media services that rely on them.
Active Capture: Actve Capture software and interaction
design automate the capture of stills and video for, and of,
users. By integrating capture, processing, and interaction,
Active Capture automates the traditional processes of direction
and cinematography. Using real-time media analysis in an
interactive control loop, Active Capture software structures
the user's interaction with a capture device to record reusable,
annotated media assets.
Adaptive Media: The Adaptive Medua project is researching
and developing software for the mass customization and
personalization of media by structuring media assets into
Adaptive Media Templates (AMTs). AMTs encode media assets in
such a way that they can co-adapt input media assets and compute
a unique customized and/or personalized result. We are extending
our research in Adaptive Media to include the development of media
components that understand their contents and the principles of
their recombination.
This study is an attempt to measure how much information is produced
in the world each year. We look at several media and estimate yearly
production, accumulated stock, rates of growth, and other variables
of interest. (See also the original "How Much
Information?" study, released in 2000.)
The Metadata Research Program explores information retrieval in a
networked environment. We design, build, and experiment with front-end
prototypes, strategic search commands, entry vocabulary modules, and
multi-database navigation.
- Unfamiliar Metadata:
DARPA-sponsored project "Search Support for Unfamiliar Metadata
Vocabularies." Searching is likely to be effective and efficient
only when the searcher is familiar with the classification,
categorizing, and indexing schemes (metadata vocabularies) being
searched. The rapid increase in network-accessible databases and
the widespread adoption of metadata vocabularies mean that
searches will increasingly be in metadata vocabularies that are
unfamiliar to the searcher. To provide a cost-effective remedy,
the project will develop Entry Vocabulary Modules that accept
topical statements in the searcher's terms ("query vocabulary")
and respond with a ranked list of terms in the system's vocabulary
("entry vocabulary"). July 1997 - Dec 2001.
- Seamless Searching of Numeric and Textual Resources:
A research project to demonstrate improved access to textual
material and numerical data on the same topic when searching
two very different kinds of databases: bibliographical (for
books, articles, patents, etc.) and numerical data-sets
(socio-economic databases). Entry Vocabulary Indexes developed
in the "Unfamiliar Metadata" project are being used. A National
Library Leadership Project funded the Institute of Museum and
Library Services (IMLS) funded research project, Oct 1999 - Sept
2002.
- Translingual Information Management:
Investment in the creation of online bibliographies and digital
libraries has resulted in a body of tens of millions of
pre-categorized and pre-classified records in all languages.
This vast infrastructure can be broken down into carefully
coded language fragments: titles, metadata, and sometimes
summaries or full text of documents. The goal is to show how
these resources can be used to improve crosslingual searching,
information management, and resources for language engineering.
Funded under the DARPA TIDES program, Feb 2000 - Jan 2003.
Several SIMS faculty and students are participating in the UC Berkeley
Digital Library project. The goal of this project is to develop the
technologies for intelligent access to massive, distributed collections
of multi-media documents including photographs, satellite images, videos,
full text documents, and "multivalent" documents comprised of multiple
terabyte databases.
|
|
|