The 10th IEEE International Symposium on
Parallel and Distributed Processing with Applications
Leganés, Madrid, 10-13 July 2012

Organizing Committee Program Committee

Program Accepted Paper List Workshops Tutorials Keynote Speakers

Call for Papers Call for Posters Conference Proceedings Special Issues Awards Final Paper Instructions

Registration Visa Visiting Madrid Venue Accommodation Sponsorship Previous Conferences Contacts Photo Gallery




The Performance Dimensions of PC Servers

As one knows, the frequency of the CPU is no longer increasing, so other dimensions, such as vectors and cores, have to be deployed in an optimal way. In the introduction we define the seven performance dimensions, both inside the core and across cores and nodes. We also cover basics of the performance monitoring subsystem inside the CPUs that must be used in order to understand the performance issues with a given software program. We explain in detail the fact that data-oriented programming is vital in order to reach good performance inside each core. We go on to define the vocabulary and the issues that cover multithreaded programming. The software environment of multithreading is well explained, i.e. pthreads, OpenMP, Threading Building Blocks and Cilk+. Multinode programming is also explained. Practical examples are used throughout the tutorial to reinforce the theoretical teaching of performance-oriented programming. In the final session we sum up our own experience with clusters of PC servers (including accelerators) and give the students a summary of lessons learnt as well as a perspective for the foreseeable future.


Andrzej Nowak is a staff researcher at CERN openlab - a collaboration of CERN and industrial partners HP, Intel, Oracle, Siemens and Huawei. Andrzej's early research concerned operating systems security, mobile systems security, and wireless technologies. Prior to joining openlab, he worked at Intel, where he investigated custom performance optimizations of the Linux kernel and took part in developing one of the first implementations of the IEEE 802.16 "WiMax Mobile" standard. In 2007, Andrzej became a member of the CERN openlab as a Marie Curie Fellow sponsored by the European Commission. His current research is focused on performance tuning, parallelism and modern many-core processor architectures. Another part of Andrzej's activities is related to educational work both within and outside of CERN.


A Hands-On Tutorial With MPI One-Sided Communication

This half-day tutorial covers all major aspects of MPI one-sided communication (also known as MPI RMA). Topics will include MPI active and passive communication models, noncontiguous communication using MPI datatypes, MPI's shared data consistency model, opportunities for performance tuning, and a preview of the recently passed MPI 3.0 extension to one-sided communication. Topics will be covered with a focus on application use-cases and hands-on, actionable examples that will provide users with take-home templates for common one-sided patterns.


David Goodell is a software developer at the Mathematics and Computer Science division of Argonne National Laboratory. He primarily works on the MPICH2 project, a widely-portable, high-quality implementation of the MPI standard. His research interests include communications software for parallel programming, system software portability, and lock-free algorithms for parallel programming.


Jim Dinan is the James Wallace Givens postdoctoral fellow at Argonne National Laboratory. He received his Ph.D. from the Department of Computer Science at The Ohio State University. His research interests include parallel programming models, parallel algorithms and applications, runtime systems, dynamic load balancing, computer architecture, and energy-aware computing.


Programming the GPU with CUDA

This tutorial gives a comprehensive introduction to programming the GPU architecture using the Compute Unified Device Architecture (CUDA). CUDA is an architecture and software paradigm designed for generic computing and hence does not require an explicit use of vertices, textures, colors, pixels and other elements of traditional graphics programming. CUDA was born in late 2006 for programming a GPU many-core architecture using SIMD extensions of C language, and it is available for Windows, Linux and MacOS users. A compiler generates executable code for the GPU, which is seen by the CPU as a many-core co-processor/accelerator. Since its inception, CUDA has achieved extraordinary speed-up factors in a great range of grand-challenge applications and has continuously increased its popularity within the High Performance Computing community. To that extent, CUDA is a technology being taught at more than 500 Universities worldwide, also sharing a range of computational interfaces with two competitors: OpenCL, championed by the Khronos Group, and DirectCompute, led by Microsoft. Third party wrappers are also available for Python, Perl, Java, Fortran, Ruby, Lua, Haskell, MatLab and IDL.

The tutorial is organized into two parts: First, we describe the CUDA architecture through hardware generations until we reach Fermi models. Second, we illustrate the way of programming applications using those resources, transforming typical sequential CPU programs into parallel codes. We emphasize the use of CUDA threads hierarchy structured into blocks, grids and kernels, and CUDA memory hierarchy decomposed into caches, texture, constant and shared memory, plus a large register file. For a programmer, the CUDA model is a collection of threads running in parallel which can access any memory location, but, as expected, performance boosts with the use of closer memory and/or collectively read by groups of threads. Illustrating examples will be used to discuss fundamental building blocks in CUDA, programming tricks, memory optimizations and performance issues on single graphics cards and even multi-GPU systems.


Manuel Ujaldon is Associate Professor at the Computer Architecture Department, University of Malaga (Spain) and Conjoint Senior Lecturer at the School of Electrical Engineering and Computer Science of the University of Newcastle (Australia). We worked in the 90's on parallelizing compilers, finishing his PhD Thesis in 1996 by developing a data-parallel compiler for sparse matrix and irregular applications. Over this period, we was part of the HPF and MPI Forums, working as post-doc in the Computer Science Department of the University of Maryland, College Park. Over the past decade, we started working on the GPGPU movement early in 2003 using Cg, and wrote the first book in spanish about programming GPUs for general purpose computing, where he described how to map irregular applications and linear algebra algorithms on GPUs. He adopted CUDA when it was first released, then focusing on image processing and biomedical applications. Over the past five years, he was authored more than 40 papers in journals and international conferences in these two areas. Dr. Ujaldon has been awarded as NVIDIA Academic Partnership 2008-2011, NVIDIA Teaching Center 2011-2013, NVIDIA Research Center 2012, and finally CUDA Fellow 2012. He has taught more than 30 courses on CUDA programming worldwide, including ACM and IEEE conferences and academic programs in European, North American and Australian Universities.


Parallel I/O in Practice

I/O on HPC systems is a black art. This tutorial sheds light on the state-of-the-art in parallel I/O and provides the knowledge necessary for attendees to best leverage I/O resources available to them. We cover the I/O software stack from parallel file systems at the lowest layer, to intermediate layers (such as MPI-IO), and finally high-level I/O libraries (such as HDF-5). We emphasize ways to use these interfaces that result in high performance, and benchmarks on real systems are used throughout to show real-world results.

This tutorial first discusses parallel file systems in detail (PFSs). We cover general concepts and examine four examples: GPFS, Lustre, PanFS, and PVFS. We examine the upper layers of the I/O stack, covering POSIX I/O, MPI-IO, Parallel netCDF, and HDF5. We discuss interface features, show code examples, and describe how application calls translate into PFS operations. Finally we discuss I/O best practice.


Dries Kimpe is Assistant Computer Scientist at Argonne National Laboratory. He received his master's degree from Ghent University (Belgium) and, in 2008, his PhD from KU Leuven (Belgium). Dries Kimpe is also a fellow of the Computation Institute at the University of Chicago. His research interests include parallel file systems, programming models for high performance computing and numerical simulation. His current research focuses on the integration of storage into high performance computing systems (in-system storage), and the development of petascale storage systems.

Organized by

Sponsored by

Copyright ISPA-2012. Created and Maintained by ISPA-2012 Web Team.