Seminar 2K abstracts

Seminar 02/03 - Abstracts

SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery

Mark D. Hill,
Computer Sciences Department,
University of Wisconsin-Madison

Availability is increasingly important for shared memory multiprocessors,
but the market for commercial servers prefers that availability
not come at the cost of appreciably more hardware or a significant
degradation in performance. Implementation trends toward less-reliable
deep submicron transistors necessitate architectural techniques that
increase availability.

We develop an availability solution, called SafetyNet, that uses a
unified, lightweight checkpoint/recovery mechanism to support multiple
long-latency fault detection schemes. SafetyNet efficiently coordinates
checkpoints across the system in logical time and minimizes runtime
overhead by pipelining checkpoint validation with subsequent parallel
execution.

Initial results [Daniel Sorin, Milo Martin, Mark Hill, David Wood, ISCA
2002] show that SafetyNet can tolerate dropped coherence messages or the
loss of an interconnection network switch (a) without adding statistically
significant overhead during fault-free execution and (b) avoiding a
crash when tolerated faults occur. The talk will also touch upon future
directions toward tolerating hardware and software design errors.

Blue Gene: building PetaFLOPS computers

Jose G. Castaños
IBM T. J. Watson Research Center

Blue Gene is a research project currently underway at
IBM T.J. Watson Research Center. It has two equally
important goals: advance the state of the art of
biomolecular simulation and advance the state of the
art in computer design and software for extremely large
scale systems. In this talk we will describe the architectures
of the two high performance parallel computers that our
team is developing: BG/L, designed in collaboration
with LLNL, is capable of achieving 200 TeraFLOPS and will
target scientific and emerging commercial applications;
BG/C is a new multithreaded architecture with 1 PetaFLOPS peak
performance that represents a radical departure from current
designs. We will also present their system software and
programming environment and provide an overview of the
applications that are driving our research.

Data Cache Power Reduction Through Way Determination

Alex Veidenbaum
Information and Computer Science
University of California, Irvine

"Data Cache Power Reduction Through Way Determination"
Set associative DATA caches consume a significant portion
of CPU power, as high as 20+ percent of the total in some
processors. The reason is paralleL ACCess to all the
"cache ways" reducES latency and improveS PERformance.
This research introduces a concept of "way determination",
a mechanism to determine the correct cache way prior to
cache access. It is not a prediction and thus does not
incur time penalty if incorrect, only an energy penalty
of parallel access to all the cache ways. The accuracy
of the mechanism and its impact on power are evaluated
using SPEC benchmarks for an "Alpha 21264"-like architecture.
It is shown that D-cache power can be reduced from 30 to 55%
for 2- and 8-way set associative caches, respectively. The
total corresponding CPU power reduction is 7 and 13%.
Way determination has no impact on perfromance.

Customization of Operating Systems and Software Stacks

Nacho Navarro
Computer Architecture, Dept. UPC

Our goal is to automatically customize applications along with Linux
systems to outperform hand-customized systems in cost, performance,
reliability, and time to market while preserving compatibility. The
research efforts aim at deep program analysis on applications and systems
code, crosscutting optimization that spans traditional software
boundaries, and deriving and utilizing customized architectures and
microarchitectures. We argue that customizability, adaptability, and
reliability are mutually dependent and must be approached in a coherent
framework in order to reach a feasible solution.

El optimizador de bucles del compilador Open64/ORC

Eduard Santamaria
Computer Architecture Dept., UPC

En esta presentación describiremos el funcionamiento del
optimizador de bucles (LNO - Loop Nest Optimizer) del compilador
Open64/ORC, anteriormente llamado Pro64.

En primer lugar recordaremos brevemente los componentes que
forman el compilador, para después centrarnos en el optimizador
de bucles.

El LNO consta de 3 etapas: una primera etapa en la que se
identifican y se preparan los bucles candidatos a ser
optimizados; una segunda etapa donde se determinan y se aplican las
transformaciones necesarias para optimizar los bucles encontrados
en la etapa anterior; y una tercera etapa donde se aplican una serie de
transformaciones con el objetivo de reducir la presión sobre los
registros en el cuerpo del bucle.

En esta presentación explicaremos con detalle la primera etapa
del LNO. Concretamente veremos cómo en la primera etapa del LNO
se generan anidaciones de bucles de profundidad máxima mediante
la aplicación de transformaciones como la fusión y la distribución de
bucles.

Además, comentaremos el análisis de dependencias que realiza el
LNO para asegurar que estas transformaciones respetan la semántica
del programa.

A New Generation of Systematic Programming Tools

James Larus
Microsoft Research

To produce better software, software developers require a new generation
of programming languages and tools that apply the enormous computational
resources on a programmer's desk to help find errors and inconsistencies
in programs. Although these tools alone will not find, let alone
eliminate, all programming errors, they have already demonstrated that
they can improve program quality and reduce development cost. The
Software Productivity Tools group in Microsoft Research has developed a
variety of programming tools, which use simple, partial specifications
and sophisticated program analysis to find errors systematically. This
talk argues that this approach is beneficial, describes some existing
tools, and points out the many open research directions.

Technology innovation in the New HP

Rich Zippel & Paolo Faraboschi
HP Labs

The talk is going to cover some of the major research directions of the New
HP, that now combine the strengths of HP Labs and Compaq Research. We will
cover internet systems and storage, utility data centers, mobile and
multimedia, and digital imaging. In addition, we are going to preview the
upcoming HP Labs' activities planned to start in Barcelona.

Array Recovery and High Level Transformations for DSP Applications

Mike O'Boyle
University of Edinburgh

Efficient implementation of DSP applications are critical for many
embedded systems. Optimising compilers for application programs
written in C, largely focus on code generation and scheduling which,
with their growing maturity are providing diminishing returns. As DSP
applications typically make extesive use of pointer arithmetic, the
alternative use of high level transformations has been largely
ignored. This paper developes an array recovery technique that
automatically converts pointers to arrays, enabling the empirical
evaluation of high level source to source transformations. High level
techniques were applied to the DSPstone benchmarks on 3 platforms:
TriMedia TM-1000, Texas Instruments TMS320C6201 and the Analog SHARC
ADSP-21160. On average, the best transformation gave a factor of 2.43
improvement across the platforms. In certain cases a speedup of 5.48
was found for the SHARC, 7.38 for the TM-1 and 2.3 for the
C6201. These preliminary results justify pointer to array conversion
and further investigation into the use of high level techniques
for embedded compilers.

Dynamic Verification of Memory Coherence

Jason Cantin,
University of Wisconsin-Madison

Fault-tolerance is of growing importance for high-end computer
systems. It is especially important for shared-memory
multiprocessors, due to the increased complexity and
availability demands of commercial applications.

Our research focuses on the fault-tolerance of coherent shared
memories. In particular, we are trying to develop methods for
efficiently detecting violations of coherence and consistency
during execution. Ideally, such mechanisms can be implemented
in hardware and combined with rollback recovery mechanisms.

With efficient mechanisms for detecting violations of coherence
and consistency, there are possibilities for improving performance.
The detection mechanisms can be used to handle the corner
cases, allowing the protocols to use more aggressive optimizations
and speculation. Second, it may be possible to run a parallel
application on hardware with a memory model weaker than the one
for which it was compiled.

This is a work in progress. Our initial results show that detecting
violations of coherence and consistency is in general an intractable
problem. However, with additional information from the protocol the
problem can be made tractable. We are now focusing on online detection,
and seeking to minimize overheads so that hardware implementations
are feasible.

Tecnologia de control de rutas: mejorando rendimiento, coste y
fiabilidad en el acceso a Internet mediante BGP multihoming

Jose Miguel Pulido
RouteScience Technologies

Internet esta formada por un conjunto de sistemas autónomos (AS)
interconectados entre sí. Esta interconexión se realiza mediante el
protocolo Border Gateway Protocol (BGP). BGP fue diseñado para facilitar
la interconexión entre ASs, así como para permitir la aplicación de
políticas en estas conexiones por parte de los proveedores de servicio
de accesso a Internet. BGP ha demostrado una gran robustez en las tareas
para las que fue diseñado, especialmente ante el crecimiento sin
precedentes que Internet ha observado en los últimos años, tanto en el
número de usuarios como de sistemas, omo en el número de aplicaciones
que la utilizan. Si embargo, este mismo crecimiento impone nuevos
requirientos en la red, demandados por esos usuarios y aplicaciones.
Ejemplos de estos requerimentos incluyen optimizacion den ancho de
banda, y mejora de la percepción de rendimiento por parte de usuarios de
aplicaciones web y VoIP, entre otras. Estos requerimientos afectan cada
vez más a BGP debido al incremento en el número de empresas que utilizan
múltiples proveedores de acceso a Internet (multihoming), principalmente
para para evitar la dependencia de un único proveedor, y quieren sacar
el máximo partido de su inversión en conectividad. BGP no fue diseñado
para tener en cuenta coste y/o rendimiento, y no puede cumplir dichos
requerimientos.

La tecnología de control de rutas palía las limitaciones de BGP midiendo
en tiempo real el rendimiento a traves de varios caminos, asi como el
consumo de ancho de banda, y escogiendo también en tiempo real la mejor
ruta acorde con los requerimientos definidos por el usuario. En esta
charla presentaremos la tecnología de control de rutas y describiremos
como ofrece una solucion a las limitaciones de BGP. Más concretamente,
describiremos diferentes técnicas para obtener medidas de rendimiento de
forma simultánea a través de diferentes proveedores de acceso. Tambien
describiremos como se toman e implementan decisiones, y cuantificaremos
los beneficios obtenidos comparando las decisiones hechas con las que
hubiera tomado BGP.

Simulating a $2M Commercial Server on a $2K PC

Mark D. Hill
Computer Sciences Department
University of Wisconsin-Madison

Many future multiprocessor servers will execute large commercial
workloads, such as database management systems and web servers. Thus,
simulations of new multiprocessor designs should run these workloads.
However, simulating expensive servers running these large workloads
on low-cost personal computers presents many challenges. This talk
discusses the WISCONSIN MULTIFACET project's approach to commercial
workload simulation:

SIMULATION INFRASTRUCTURE AND WORKLOAD DEVELOPMENT PROCESS.

MANAGING SIMULATOR COMPLEXITY.

COPING WITH WORKLOAD VARIABILITY. Small changes in micro-architectural

See http://www.cs.wisc.edu/multifacet/papers/computer03_simulation.pdf
or the Multifacet project page.

Intel® Pentium® M Processor (Banias) - Next Generation Mobile Processor

Ronny Ronen,
Microprocessor Research Labs, Intel

The Intel® Pentium® M Processor (aka "Banias") is the first microprocessor
designed specifically for the requirements of tomorrow's mobile PCs. In
this presentation we first highlight the general trends in processor
microarchitecture and and then provide an in-depth overview of the Banias
processor, describing its micro architecture and focusing on advanced
power-reduction features and capabilities.

Accurate Time Modeling within Full System Simulation

Gustav Hållberg
Vice President of Product Management
Virtutech

An old adage in simulation has it that one has to strike a balance between
simulation accuracy and the workload you are simulating. Virtutech Simics
(http://www.simics.com) provides the user with a framework within which one can
dynamically change the simulation accuracy over a long simulation. This makes
it possible to run very large workloads, like SPEC suits on a complete server
class system, and in a sampling fashion turn on cycle-accurate timing models of
CPU micro-architecture, memory systems, etc.

We will focus on how Simics exposes interfaces for hooking in timing models,
while it takes care of the functional parts of the simulation. The separation
of timing from functionality is fundamental to how one models systems with
Simics. We will discuss why this separation exists, what it means, and why it
is of great help to those wanting to do detailed modeling.

L1 data cache latency reduction via Load-Store Queue locality

Alex Veidenbaum
University of California at Irvine

High-performance processors use a large set--associative L1 data
cache with multiple ports to achieve high ILP. As clock speeds
increase such a cache has a multi-cycle latency which is expected
to further increase in the future. The latency is an important
source of performance (and energy) loss in the CPU.
This paper proposes a solution to this problem exploiting locality
in the data previously fetched from the L1 cache. It is achieved
by modifying the Load/Store Queue (LSQ) design to allow "caching"
of previously accessed data values on both loads and stores after
the corresponding memory access instruction has been committed.
This inexpensive solution is shown to significantly improve perfomance,
especially for integer programs. A 32-entry modifed LSQ design allows
an average of 38% of the loads in the SpecINT95 benchmarks and 18% in
the SpecFP95 benchmarks to get their data from the LSQ. A 1-cycle LSQ
access as compared to a 3-cycle L1 cache access resutls in up to 10%
overall performance increase. It also leads to a 40% reduction in the
L1 data cache energy consumption.

Token Coherence: Enabling Faster Multiprocessor Servers by Decoupling Performance and Correctness

Mark D. Hill,
Computer Sciences Department,
University of Wisconsin-Madison

Database and web servers are an important part of society's
information infrastructure. As the number of clients (e.g., wireless
or embedded devices) continues to grow, service providers demand
high-performance and cost-effective hardware systems for running
server workloads.

Token Coherence is a new hardware technique for increasing the
performance of commercial server workloads running on moderate-sized
multiprocessors. These shared-memory multiprocessor servers use
cache-coherence protocols to provide the abstraction of a unified
shared memory. The performance of existing cache-coherence protocols
is currently constrained either by requiring global request ordering
or by the extra latency added by request indirections.

Token Coherence avoids these performance bottlenecks by decoupling
correctness requirements from performance aspects of the system. In
this new framework, the correctness substrate uses token counting to
explicitly encode read and write permissions, guaranteeing correct
shared memory semantics in all cases (without the overhead of
indirection or global ordering). A separate performance protocol
provides high performance in the common case, relying on a more
conservative (less efficient) mechanism only in rare cases. This
approach can increase performance, reduce cost, and perhaps reduce
protocol design errors. I will also briefly discuss wider
applications of this decoupled approach in solving other problems in
designing hardware and software systems.

Global Technology Outlook

Dr. Walter Heh
IBM Industry Solutions Labs., Zurich

IBM's Research Division is one of the largest and most successful industrial research organizations; in the area of Information Technology certainly the largest and - for example measured in the track record of past and current flow of patent disclosures - the most successful one.

The talk depicts the major trends in IT as observed and predicted by IBM Research and as seen in the year 2003. The presentation is structured in three parts "bottom-up" - from Technology over Infrastructure to Business Value - and uses for argumentation IBM and non-IBM technologies, as well as some selected IBM Research projects.

1.    Basic Technologies ("Faster, Cheaper, Better") refers to an acceleration of technical
       developments of all base technologies (processors, conventional storage, communication, display
       technologies). No trace of a slowing down - on the contrary. The central technology - integrated
       circuits on silicon - drives and depends on the development in the direction of genuine
        nanotechnology..
       This accumulated dramatic technology growth must cause the next big revolution:
       This results in Level 2.

2.    The Infrastructure "The Connected Planet": Miniaturization, lower power consumption and
       wireless technologies allow for connected computation anywhere. Overall connection and the
       mobile collaboration of people and devices revolutionizes the space dimension. As a
        consequence, the Internet has to evolve, too. On the other hand, e-business and technology
        advances transform the way how IT value   is delivered to customers to "On Demand
        Computing":
        Optimized resource allocation drives a utility-like model (e-Utility). A major step in built-in
        automatism on all levels reduces management costs (Autonomic Computing), and the Internet
         moves from information delivery to computing and service delivery on a large scale (Grid
        Computing).
        The equivalent development on the software and services side means easier integration
        on all levels - from the user surface through Portals, for services through Web Services, and also
         via software hubs in the core of the back office.
        Software and software quality become also dominant in the field of Embedded Systems, for
        example in the automotive industry.

3.    Two major trends in the Business Value are outlined:
        Large, even very large amounts of data can be analyzed efficiently and allow customer-centric
        business decisions (harvesting insight). Privacy issues and identity management are key task.
        Large processing power (often associated with large data sets) enables to model and simulate
        the behavior of large systems - not only in science and engineering, but also in economy. This
        allows Continual   Optimization, of supply chains, value chains, nationwide car traffic and global
         business systems. This is mandatory for a sustainable world.

Using Remote Memory Communication
for Self-Healing Systems

Liviu Iftode
Department of Computer Science
University of Maryland

In this talk I will introduce a novel self-healing approach for
a computer system based on Back-Door, a system architecture that allows
monitoring and repair actions to be performed on a remote operating system or
application image without using remote CPU cycles. The key ingredient of the
Back-Door architecture is the Remote Memory Communication technology
provided by standards like Virtual Interface Architecture or InfiniBand,
more specifically, the support for remote DMA read and write operations.
I will also present a preliminary prototype we developed as proof of
concept, which supports non-intrusive failure detection and recovery of
active client connections in a cluster of servers.

The Gelato Federation

Michel Bernard
HP-Switzerland

The Gelato Federation is a world-wide consortium of research organizations
dedicated to enabling scalable, open source Linux-based Intel Itanium
Processor Family computing solutions to address real world problems in
academic, government, and industrial research. The presentation will
introduce the Gelato current members, the specific areas of interest of the
Federation and the perspectives for further development.

Agentes, Crawlers y Busqueda en la Web

Ricardo Baeza-Yates
Center for Web research
DCC, Universidad de Chile

Agentes y Crawlers son muy parecidos, pero tienen distintos objetivos. En esta
charla revisitamos el uso de agentes para recuperar información en la Web, los
contrastamos con crawlers y analizamos su viabilidad tanto técnica como de
calidad de los resultados.

Dynamic Degree of Use Prediction

Gurindar S. Sohi
University of Wisconsin-Madison

Microprocessors today expend considerable resources in communicating values,
calling for sophisticated value communication structures that are optimized
for the communication requirements of a value. The degree of use of a value
--the number of times the value is going to be used--provides the most essential
information about the communication requirements of a value. Gathering this
information is the first step toward microarchitectural techniques that optimize
value communication. This talk will describe the idea of degree of use
predictors that are proposed as a way of predicting the degree of use of a value.
We will also consider potential optimizations, such as dynamic dead instruction
elimination, that use knowledge of a values communication requirements.

This is work with Adam Butts.

Optimal recovery schemes for fault tolerant distributed computing

Lars Lundberg
Parallel Architectures and Applications
for Real-Time Systems
Blekinge Institute of Technology

Clusters and distributed systems offer fault tolerance and high
performance through load sharing. When all computers are up and running, we would
like the load to be evenly distributed among the computers. When one or more
computers break down the load on these computers must be redistributed to other
computers in the cluster. The redistribution is determined by the recovery scheme. The
recovery scheme should keep the load as evenly distributed as
possible even when the most unfavorable combinations of computers break down, i.e. we
want to optimize the worst-case behavior. In this paper we define recovery schemes,
which are optimal for a number of important cases. We also show that the problem
of finding optimal recovery schemes corresponds to the mathematical problem
called Golomb Rulers. These provide optimal recovery schemes for up to 373
computers in the cluster.

Aplicacion del formalismo DEVS en modelizacion de arquitectura de procesadores

Gabriel Wainer
Dept. of Systems and Computer Engineering
Carleton University (Ottawa, Canada)

En los ultimos años ha habido una tendencia creciente intentando
desarrollar formalismos para especificar el comportamiento de sistemas
dinámicos de eventos discretos. El formalismo DEVS (Discrete EVents
Systems specification) fue creado en la decada del '70 con el objetivo de
modelizar sistemas de eventos discretos que puedan simularse independientemente de
los modelos creados. Un modelo DEVS es una entidad representada
rigurosamente, que permite la definicion de modelos que se pueden
acoplar jerarquicamente usando interfaces modulares. La tecnica se ha utilizado
para estudiar variedad de sistemas complejos, mostrando que permite
reducir la complejidad en el desarrollo de las simulaciones, facilitar el reuso
de modelos existentes, incrementando, simultaneamente, la velocidad de
ejecucion de los simuladores.

La herramienta CD++ implementa los conceptos basicos de la teoria DEVS,
y provee un entorno para desarrollar modelos complejos. La herramienta es
una implementacion de los conceptos definidos por el formalismo DEVS,
resultando en mayor simplicidad en desarrollar modelos. A su vez, la
inclusion de tecnicas de simulacion paralela permiten mejorar el
desempeño de las simulaciones. La herramienta se ha utilizado para modelar
variedad de aplicaciones. Una de las aplicaciones complejas que hemos
desarrollado fue creada con el objetivo de comprobar la simplicidad de desarrollo de
modelos, y servir como herramienta educativa en cursos de organizacion
de computadoras. La idea fue desarrollar una computadora simulada (llamada
Alfa-1) basada en la arquitectura de los procesadores SPARC. Para
comprobar la factibilidad de nuestra propuesta, todos los componentes del
procesador fueron desarrollados por alumnos en un curso de organización de
computadoras. La integración final de los componentes y la definición de
la Unidad de Control, asi como el test de integración, fue realizado por
ayudantes de la cátedra. El resultado es la provisión de una herramienta
de simulación de arquitecturas que puede ser utilizada en cursos del área.

Durante esta presentación se introducirán las principales características
del formalismo DEVS, se mostrarán aspectos prácticos de implementación
usando la herramienta CD++, y se ejemplificará el uso en modelización de
arquitectura de procesadores mostrando los resultados del proyecto Alfa-1.

Topology and correlations in the Internet

Romualdo Pastor-Satorras
UPC

The Internet is a world-wide network for information exchange,
composed by personal computers, servers, and routers (in charge of
controlling and routing information traffic), joined among them by
different kinds of physical connections. In this talk we will
discuss the topological properties of the Internet, discovered
from the analysis of Internet maps at different resolution scales.
This analysis will show that the Internet is a complex network
with scale invariant connectivity properties. The inherent
hierarchical structure of this network is reflected in the
presence of connectivity correlations, which constitute a new and
important element to be taken into account in future modeling
efforts of this system.

Advanced research in infrastructure for e-bussines on the internet

Professor Veljko Milutinovic
University of Belgrade

This presentation defines the major bottlenecks in the research related to
the infrastructure for e-business on the Internet (hardware, software,
system, and communications), and follows in two parts. In part one, for
each one of the major 4 bottlenecks, an overview is given about the
ongoing research at Stanford, MIT, and UC Berkeley. In part two, for the
same 4 bottlenecks, an overview is given about the recent and ongoing
research done at the University of Belgrade (leaded by Professor
Milutinovic), for industry in the USA, and for selected universities in
EU. Results of this research include prototypes for a number of commercial
products and about 40 papers published recently in IEEE journals. Topics
covered by that research and this presentation include: On-chip and
on-board accelerators for PC software, microprocessor improvements for
modern e-business, efficient integration of computing and communications,
genetic Internet search, customer satisfaction based Internet search,
engines for e-ducation, e-tourism, technology transfer, and scientific
interchange on the Internet, semantic web analysis, etc. Each particular
topic can be expanded into a self-contained separate talk.

An Architectural Perspective on Soft Errors from Cosmic Radiation

Shubu Mukherjee,
Intel Corp.

Moore's Law-the continuous exponential growth in transistors per
chip-has brought tremendous progress in the functionality and
performance of semiconductor devices, particularly microprocessors. Each
succeeding technology generation has also introduced new obstacles to
maintaining this growth rate. Transient faults from cosmic ray strikes have
emerged as a key challenge whose importance is likely to increase
significantly in the next few generations.

In this talk, I will show why single bit upsets from cosmic ray strikes
will be a critical constraint in future microprocessor design. Then, I
will describe a technique called Redundant Multithreading that can provide
a cost-effective solution to detect such single bit upsets in
microprocessor devices.

Program Transformations for Managing Shared Caches on Simultaenous Multithreading Processors

Dimitris Nikolopoulos
Computer Science
at the College of William&Mary

Simultaneous Multithreading processors use shared on-chip caches
which yield more cost-effective designs. Sharing a cache between
simultaneously executing threads can accelerate synchronization
and communication due to implicit prefetching, but it
causes excessive conflict misses and stresses the memory bandwidth,
if threads do not actually share data through the cache.
This talk presents some software solutions for virtually
partitioning shared caches between threads on multithreaded
processors and for scheduling threads to preserve bandwidth
without underutilizing it. We will discuss the use
of three methods originating in the optimizing compilers literature,
dynamic tiling, copying and padded block data layout,
to resolve conflicts due to sharing of the cache between threads.
We will also present a simple algorithm that combines these transformations
and two runtime methods to detect cache sharing and react
to it at runtime, one using minimal kernel extensions and one
using the processor hardware counters. Finally, we will present
a novel thread scheduling algorithm, which
attempts to co-schedule high-bandwidth with low-bandwidth threads
to avoid saturation of the memory bandwidth.

Adaptative Buble Routing

Ramon Beivide
Universidad de Cantabria

This talk is devoted to Bubble Flow Control (BFC) and Adaptive Bubble Routing (ABR), two achievements of our research group. BFC is a very competitive technique to avoid packet deadlock and to enhance network performance under Virtual Cut-Through switching. BFC virtual rings can be combined with adaptive virtual channels to provide high-performance adaptive routing. ABR has recently been chosen by IBM for designing the data network of the BlueGene/L supercomputer. Paper 1 addressed these topics.

ABR can be highly robust in the presence of faults. As another outcome, we have recently developed a fault-tolerant ABR which exhibits high survivability at very low-cost. Paper 2 shows the basis of this mechanism. ABR can be indistinctively used within regular or irregular topologies and under different environments. Several seminal ABR-based designs have been successfully simulated for LANs, SANs, clusters and cc-NUMA multiprocessors. Papers 3 and 4 highlighted these applications.

That's all folks!!!!!