The International Workshop on Parallel and Symbolic Computation (PASCO) is a series of workshops dedicated to the promotion and advancement of parallel algorithms and software in all areas of symbolic mathematical computation. The pervasive ubiquity of parallel architectures and memory hierarchy has led to the emergence of a new quest for parallel mathematical algorithms and software capable of exploiting the various levels of parallelism: from hardware acceleration technologies (multicore and multi-processor system on chip, GPGPU, FPGA) to cluster and global computing platforms. To push up the limits of symbolic and algebraic computations, beyond the optimization of the application itself, the effective use of a large number of resources (memory and specialized computing units) is expected to enhance the performance multi-criteria objectives: time, energy consumption, resource usage, reliability. In this context, the design and the implementation of mathematical algorithms with provable and adaptive performances is a major challenge.
Proceeding Downloads
Techniques and tools for implementing IEEE 754 floating-point arithmetic on VLIW integer processors
- Claude-Pierre Jeannerod,
- Christophe Mouilleron,
- Jean-Michel Muller,
- Guillaume Revy,
- Christian Bertin,
- Jingyan Jourdan-Lu,
- Hervé Knochel,
- Christophe Monat
Recently, some high-performance IEEE 754 single precision floating-point software has been designed, which aims at best exploiting some features (integer arithmetic, parallelism) of the STMicroelectronics ST200 Very Long Instruction Word (VLIW) ...
Fifteen years after DSC and WLSS2 what parallel computations I do today: invited lecture at PASCO 2010
A second wave of parallel and distributed computing research is rolling in. Today's multicore/multiprocessor computers facilitate everyone's parallel execution. In the mid 1990s, manufactures of expensive main-frame parallel computers faltered and ...
Exploiting multicore systems with Cilk
The increasing prevalence of multicore processors has led to a renewed interest in parallel programming. Cilk is a language extension to C and C++ designed to simplify programming shared-memory multiprocessor systems. The workstealing scheduler in Cilk ...
Automated performance tuning
This tutorial presents automated techniques for implementing and optimizing numeric and symbolic libraries on modern computing platforms including SSE, multicore, and GPU. Obtaining high performance requires effective use of the memory hierarchy, short ...
Roomy: a system for space limited computations
There are numerous examples of problems in symbolic algebra in which the required storage grows far beyond the limitations even of the distributed RAM of a cluster. Often this limitation determines how large a problem one can solve in practice. Roomy ...
Generic design of Chinese remaindering schemes
We propose a generic design for Chinese remainder algorithms. A Chinese remainder computation consists in reconstructing an integer value from its residues modulo coprime integers. We also propose an efficient linear data structure, a radix ladder, for ...
A complete modular resultant algorithm targeted for realization on graphics hardware
This paper presents a complete modular approach to computing bivariate polynomial resultants on Graphics Processing Units (GPU). Given two polynomials, the algorithm first maps them to a prime field for sufficiently many primes, and then processes each ...
Parallel operations of sparse polynomials on multicores: I. multiplication and Poisson bracket
The multiplication of the sparse multivariate polynomials using the recursive representations is revisited to take advantage on the multicore processors. We take care of the memory management and load-balancing in order to obtain linear speedup. The ...
Parallel computation of the minimal elements of a poset
Computing the minimal elements of a partially ordered finite set (poset) is a fundamental problem in combinatorics with numerous applications such as polynomial expression optimization, transversal hypergraph generation and redundant component removal, ...
Parallel disk-based computation for large, monolithic binary decision diagrams
Binary Decision Diagrams (BDDs) are widely used in formal verification. They are also widely known for consuming large amounts of memory. For larger problems, a BDD computation will often start thrashing due to lack of memory within minutes. This work ...
Parallel arithmetic encryption for high-bandwidth communications on multicore/GPGPU platforms
In this work we study the feasibility of high-bandwidth, secure communications on generic machines equipped with the latest CPUs and General-Purpose Graphical Processing Units (GPGPU). We first analyze the suitability of current Nehalem CPU ...
Exact sparse matrix-vector multiplication on GPU's and multicore architectures
We propose different implementations of the sparse matrix-dense vector multiplication (SpMV) for finite fields and rings Z /m Z. We take advantage of graphic card processors (GPU) and multi-core architectures. Our aim is to improve the speed of SpMV in ...
Parallel Gaussian elimination for Gröbner bases computations in finite fields
Polynomial system solving is one of the important area of Computer Algebra with many applications in Robotics, Cryptology, Computational Geometry, etc. To this end computing a Gröbner basis is often a crucial step. The most efficient algorithms [6, 7] ...
A quantitative study of reductions in algebraic libraries
How much of existing computer algebra libraries is amenable to automatic parallelization? This is a difficult topic, yet of practical importance in the era of commodity multicore machines. This paper reports on a quantitative study of reductions in the ...
Parallel sparse polynomial division using heaps
We present a parallel algorithm for exact division of sparse distributed polynomials on a multicore processor. This is a problem with significant data dependencies, so our solution requires fine-grained parallelism. Our algorithm manages to avoid ...
A high-performance algorithm for calculating cyclotomic polynomials
The nth cyclotomic polynomial, Φn(z), is the monic polynomial whose ϕ(n) distinct roots are the nth primitive roots of unity. Φn(z) can be computed efficiently as a quotient of terms of the form (1 - zd) by way of a method the authors call the Sparse ...
Accuracy versus time: a case study with summation algorithms
In this article, we focus on numerical algorithms for which, in practice, parallelism and accuracy do not cohabit well. In order to increase parallelism, expressions are reparsed, implicitly using mathematical laws like associativity, and this reduces ...
Polynomial homotopies on multicore workstations
Homotopy continuation methods to solve polynomial systems scale very well on parallel machines. We examine its parallel implementation on multiprocessor multicore workstations using threads. With more cores we speed up pleasingly parallel path tracking ...
Parallel computations in modular group algebras
We report about the parallelisation of the algorithm to compute the normalised unit group V (FpG) of a modular group algebra FpG of a finite p-group G over the field of p elements Fp in the computational algebra system GAP. We present its distributed ...
Cache-oblivious polygon indecomposability testing
We examine a cache-oblivious reformulation of the (iterative) polygon indecomposability test of [19]. We analyse the cache complexity of the iterative version of this test within the ideal-cache model and identify the bottlenecks affecting its memory ...
Parallel sparse polynomial interpolation over finite fields
We present a probabilistic algorithm to interpolate a sparse multivariate polynomial over a finite field, represented with a black box. Our algorithm modifies the algorithm of Ben-Or and Tiwari from 1988 for interpolating polynomials over rings with ...
Spiral-generated modular FFT algorithms
This paper presents an extension of the Spiral system to automatically generate and optimize FFT algorithms for the discrete Fourier transform over finite fields. The generated code is intended to support modular algorithms for multivariate polynomial ...
High performance linear algebra using interval arithmetic
In this paper, we describe implementations of interval matrix multiplication and verified solution to a linear system, using entirely BLAS routines, which are fully optimized and parallelized.
Parallel computation of determinants of matrices with polynomial entries for robust control design
In this paper we consider computing determinants of polynomial matrices symbolically. Determinant computation of matrices with polynomial entries in a small number of variables is of particular interest since it commonly appears in solving engineering ...
Cache friendly sparse matrix-vector multiplication
Sparse matrix-vector multiplication or SpMXV is an important kernel in scientific computing. For example, the conjugate gradient method (CG) is an iterative linear system solving process where multiplication of the coefficient matrix A with a dense ...
Parallelising the computational algebra system GAP
We report on the project of parallelising GAP, a system for computational algebra. Our design aims to make concurrency facilities available for GAP users, while preserving as much of the existing code base (about one million lines of code) with as few ...
Index Terms
- Proceedings of the 4th International Workshop on Parallel and Symbolic Computation