## Department of Mathematical Modelling

Technical University of Denmark

Building 321/305

DK-2800 Lyngby

Denmark

Telephone: +45 4588 1433

Telefax : +45 4588 2673

E-mail : imm@imm.dtu.dk

## Distributed-memory matrix computations

## Susanne M. Balle

## Summary

The main goal of this project is to investigate, develop, and implement algorithms for numerical linear algebra on parallel computers in order to acquire expertise in methods for parallel computations. An important motivation for analyzaing and investigating the potential for parallelism in these algorithms is that many scientific applications rely heavily on the performance of the involved dense linear algebra building blocks. Even though we consider the distributed-memory as well as the shared-memory programming paradigm, the major part of the thesis is dedicated to distributed-memory architectures. We emphasize distributed-memory massively parallel computers - such as the Connection Machines model CM-200 and model CM-5/CM-5E - available to us at UNI-C and at Thinking Machines Corporation. The CM-200 was at the time this project started one of the few existing massively parallel computers. Several areas in the numerical linear algebra field are investigated and they illustrate the problems that arise as well as the techniques that are related to the use of massively parallel computers:

- Study of Strassen's matrix-matrix multiplication on the Connection Machine model CM-200. What performance can we expect to achieve? Why?
- Solving systems of linear equations using a Strassen-type matrix-inversion algorithm. A good way to solve systems of linear equations on massively parallel computers?
- Aspects of computing the singular value decomposition on the Connec-tion Machine CM-5/CM-5E. What are the quidelines to follow in order to achieve an efficient, highly parallel and scalable implementation of the considered algorithms? What are the numerical properties of our implementation?
- A relatively new algorithm to compute the singular vectors of bidiagonal matrix via Rutishauser's qd algorithm is investigated. this algorithm is built on top of several scan-operations. What difficulties occur when implementing this algorithm to massively parallel computers?
## IMM Ph.D Thesis 15, 1995

Last modi fied April 18, 1997Go back

For further information, please contact, Finn Kuno Christensen, IMM, Bldg. 321, DTU

Phone: (+45) 4588 1433. Fax: (+45) 4588 2673, E-mail: fkc@imm.dtu.dk