ARCHIVED SITE (last updated Fall 2005)
For current information, please visit http://www.ischool.berkeley.edu

 

 
              sbtrap
site map
 
University of California, Berkeley School of Information
     Management and Systems
  SIMS > Research > Projects    
       
 

Research Projects

   
 

The Bailando Projects

This is a suite of research projects related to user interfaces for search, text data mining and empirical computational linguistics, and automating web site evaluation.

  • Cha-Cha: Cha-Cha is a search interface for heterogenous web intranets, such as those found at large universities, corporations, and government sites.

  • WebTANGO: The goal of the WebTango project is develop tools and techniques to improve the web design process via the application of automated evaluation techniques.

  • FLAMENCO: We are exploring new ways to incorporate metadata into search interfaces.

  • LINDI: We are developing a text data mining system for automated discovery of new information from large text collections.

The Center for Document Engineering Projects

  • Berkeley Academic Business Language (BABL): The Berkeley Academic Business Language (BABL) is an evolving set of models and associated XML schemas for the domain of university education and operations. Common data models facilitate the reuse of information resources and databases, opening up possibilities for creating more user-friendly applications that integrate legacy "stovepipes" or that enable entirely new applications.

  • The XML Application Platform: The XML application platform is a Java framework for implementing applications that can be characterized as "forms moving around within and between organizations." The platform represents all data models, business rules, and workflow specifications as externalized XML documents rather than scattering them throughout the application code.

  • The "Center in a Box": A generic set of XML schemas, transforms, and style sheets for building a highly structured and automated web site for a "center" or similar organization. The navigation framework, site map, tables of contents, and links are all created by transforms from XML instances.

Cheshire II

The Cheshire II project is developing a next-generation online catalog and full-text information retrieval system using advanced IR techniques. The Cheshire II system was designed to overcome twin problems of topical searching in online catalogs: search failure and information overload. The system incorporates a client/server architecture with implementations of current information retrieval standards including Z39.50 and SGML.

Economics-Informed Network Design

  • p2pecon@berkeley: From file-sharing to mobile ad-hoc networks, community networking to application layer overlays, the peer-to-peer (p2p) networking paradigm promises to revolutionize the way we design, build and use the communications network of tomorrow. The fundamental premise of p2p systems is that individual peers voluntarily contribute resources to the system. However, the inherent tension between individual rationality and collective welfare produces a misalignment of incentives in the grassroots provisioning of p2p services. We combine economic foundations (e.g., from game theory, agency theory, public finance, industrial organization) with the rigors of system design and validation methodologies to design p2p systems that are technically and economically sound.

  • The 100x100 Project: The 100x100 Project brings together economists, security and networking experts, network operators, and policy specialists to create blueprints for a network that goes beyond today's Internet. Drawing on technology trends and the experience of the past 30 years, these scientists are re-prioritizing the fundamental principles that underlie network design to craft networks that will be ubiquitous in scale, revolutionary in bandwidth, economically self-sustaining, resistant to attack, and tractable to manage.

  • The Denali Project: The Denali Project is a multi-institutional collaborative research project developing next generation scalable services for the global Internet, including: scalable performance-predictable communication, scalable multicast for efficient data dissemination, scalable storage for next generation information services, and design principles for scalable services.

Garage Cinema Research Projects

  • Mobile Media Metadata (MMM): We are creating software for cameraphones that addresses long standing challenges in consumer media creation, sharing, management, and reuse by leveraging the spatio-temporal context and social community of media capture and use (when, where, and by and with whom media is captured, shared, and used). We use contextual metadata gathered from cameraphones and cameraphone users to infer media content, context, and community and thereby help automate media annotation, retrieval, sharing, and reuse on mobile devices. We have conducted fairly large scale deployments and user testing of our MMM prototypes with 60 users using MMM1 on the Nokia 3650 cameraphone in 2003-2004 and 60 users using MMM2 on the Nokia 7610 cameraphone in 2004-2005. Our SIMS graduate student users in IS202 Information Organization and Retrieval have also worked in project teams to develop numerous innovative mobile media application concepts based on MMM1 and on MMM2.

  • Social Uses of Personal Media: This sister project lead by Prof. Nancy Van House is investigating a central problem for technology design: predicting users and uses for emerging technologies, i.e., doing user-centered design for users and uses that don't yet exist. We use the term "social uses" to describe the higher level motives that guide the specific actions that users perform. These social uses and the associated findings from our social science research have significant implications for mobile media technology design and inform our development of design methods aimed at projecting and designing for future uses and users of mobile media technology.

  • Media Streams Metadata Exchange (MSMDX): The MSMDX is creating a platform for collaboratively annotating, retrieving, sharing, and remixing multimedia content on the World Wide Web. This platform will be used to discover whether the power of distributed social networks together with semantic web technology can be exploited to solve the problem of how to generate useful machine-readable descriptions of multimedia content. The usefulness of the descriptions produced will be evaluated by building innovative media services that rely on them.

  • Active Capture: Actve Capture software and interaction design automate the capture of stills and video for, and of, users. By integrating capture, processing, and interaction, Active Capture automates the traditional processes of direction and cinematography. Using real-time media analysis in an interactive control loop, Active Capture software structures the user's interaction with a capture device to record reusable, annotated media assets.

  • Adaptive Media: The Adaptive Medua project is researching and developing software for the mass customization and personalization of media by structuring media assets into Adaptive Media Templates (AMTs). AMTs encode media assets in such a way that they can co-adapt input media assets and compute a unique customized and/or personalized result. We are extending our research in Adaptive Media to include the development of media components that understand their contents and the principles of their recombination.

How Much Information? 2003

This study is an attempt to measure how much information is produced in the world each year. We look at several media and estimate yearly production, accumulated stock, rates of growth, and other variables of interest. (See also the original "How Much Information?" study, released in 2000.)

Metadata Research Program

The Metadata Research Program explores information retrieval in a networked environment. We design, build, and experiment with front-end prototypes, strategic search commands, entry vocabulary modules, and multi-database navigation.
  • Unfamiliar Metadata: DARPA-sponsored project "Search Support for Unfamiliar Metadata Vocabularies." Searching is likely to be effective and efficient only when the searcher is familiar with the classification, categorizing, and indexing schemes (metadata vocabularies) being searched. The rapid increase in network-accessible databases and the widespread adoption of metadata vocabularies mean that searches will increasingly be in metadata vocabularies that are unfamiliar to the searcher. To provide a cost-effective remedy, the project will develop Entry Vocabulary Modules that accept topical statements in the searcher's terms ("query vocabulary") and respond with a ranked list of terms in the system's vocabulary ("entry vocabulary"). July 1997 - Dec 2001.

  • Seamless Searching of Numeric and Textual Resources: A research project to demonstrate improved access to textual material and numerical data on the same topic when searching two very different kinds of databases: bibliographical (for books, articles, patents, etc.) and numerical data-sets (socio-economic databases). Entry Vocabulary Indexes developed in the "Unfamiliar Metadata" project are being used. A National Library Leadership Project funded the Institute of Museum and Library Services (IMLS) funded research project, Oct 1999 - Sept 2002.

  • Translingual Information Management: Investment in the creation of online bibliographies and digital libraries has resulted in a body of tens of millions of pre-categorized and pre-classified records in all languages. This vast infrastructure can be broken down into carefully coded language fragments: titles, metadata, and sometimes summaries or full text of documents. The goal is to show how these resources can be used to improve crosslingual searching, information management, and resources for language engineering. Funded under the DARPA TIDES program, Feb 2000 - Jan 2003.

NSF Digital Libraries

Several SIMS faculty and students are participating in the UC Berkeley Digital Library project. The goal of this project is to develop the technologies for intelligent access to massive, distributed collections of multi-media documents including photographs, satellite images, videos, full text documents, and "multivalent" documents comprised of multiple terabyte databases.
see also

People > Faculty