News About us Publications Projects Hardware Software Access Guide Apply Report Support Documentation

An introduction to the Notur high-performance computing facilities

This page provides information about the resources and services provided by the Notur project. The information is primarily for potential and new users of the national compute facilities.


Contents:

   1. General information
   2. Use of the resources
   3. Proposals, allocations, projects
   4. Access: user account, passwords, secure shell, data transfer
   5. Storage
   6. Execution of applications
   7. Programming
   8. Publications and acknowledgments
   9. Support


1. General information

The Notur project provides the national infrastructure for high-performance computing. The project operates a number of high-end supercomputer facilities. The facilities are available to individuals and groups at the Norwegian universities and colleges, the Meteorological Institute, and any other projects that are funded by the Research Council of Norway (Norges forskningsråd) or the Ministry of Education and Research (Kunnskapsdepartementet).

Detailed information about the Notur project can be found elsewhere. We mention the following:

  • A brief description of the Notur project and its organization
  • A description of the available hardware facilities
  • A list of available software on each of the facilities
  • How to get access to the facilities, including allocations and user accounts
  • Technical support for the facilities, including help-desk assistance, end-user documentation and application support

2. Use of the resources

The compute facilities are operated by the university partners of the Notur project. Each of the facilities is set up as a shared resource. At any time, each facility is used by multiple users and several applications from different users are executed simultaneously. Even if the facilities are large in terms of the number of processors, memory, and storage, there are limitations to their capacities. If one or more of these limitations is exceeded, the performance of the overall system may be seriously degraded.

To ensure that all users are satisfied with the provided services and to ensure that the facilities are used efficiently and provide amooth, reliable and robust services to all users, some additional rules, guidelines, and recommendations for use of the facilities are necessary that can be found on these pages.

The local regulations for use of the general IT-infrastructure at each university also apply to the Notur facilities operated by that university. The support staff can provide further information on these local regulations if needed.

It is important that the user complies with the rules and regulations of the Notur project and those of the universities. This applies to all types of resources provided by the Notur project, including compute facilities (processors, memory, and processor/node interconnect), disk storage and secondary storage (tape), and the network. If you are in doubt whether your (intended) usage of a facility conforms to the rules and regulations, please contact the local support staff.

3. Proposals, allocations, projects

Access to the compute facilities is by application. In the proposal, one applies for an allocation (quota) on one or more facilities. An allocation is a number of processor (CPU) hours. Allocation Units are used to be able to aggregate allocations from multiple facilities that have different processor characteristics (speed). In fact, Allocation Units for a facility are processor hours scaled by the speed of the processors of that facility.

The procedures and criteria for submitting a proposal for an allocation can be found here. Proposals are evaluated by a Resource Allocation Committee (RFK) that has been appointed by the Research Council of Norway. In short, allocations are assigned to researchers and projects that propose activity with a sound scientific objective and that is feasible to carry out with the requested allocations.

Allocations are granted per (allocation) period. An allocation period is a 6-month period, starting April 1 or October 1.

Allocations take the form of a project. The applicant will normally be the project responsible (or manager). A project responsible must inform UNINETT Sigma about any changes in the information that was provided in the proposal and any deviations that may lead to usage of the facilities that differs from what was originally planned.

Several users can be attached to one project. Users can request to be added to a project, but the project responsible must approve this. Users connected to a project share the allocations that are given to that project. Each user in a project shall ensure that the allocation is used only for the activities specified in the project proposal. Failure to comply with these requirements may lead to closure of user accounts or in reduction or withdrawal of project allocations.

Project names are of the form nn****k, where * is a digit [0-9].

Each project that receives an allocation is required to submit a Usage Report once a year.

Other regulations with respect to allocations are:

  • Once the allocation for a project on a facility is exhausted during the on-going allocation period, it is no longer possible for that project to execute computational tasks on that facility. In case more resources are needed, the project must apply for extra allocations by using the normal procedures.
  • Moving allocations between Notur facilities (for the same project) is only possible after a request is sent to and approved by the RFK.
  • Moving allocations between different projects (from project A to project B) is in principle not possible. This requires an application for extra allocations for project B.
  • The Notur project monitors whether allocations are actually being used during the on-going allocation period. Once it is noticed that a project is not using its allocation, a request will be sent to the project (responsible) about the expected usage for the remainder of the period. If the project expects to use considerably less than the assigned allocation or the project does not respond to the request within reasonable time, the allocation may be reduced.
  • Unused allocations at the end of the period cannot be moved to a future period.

Each facility has a command-line function (cost) that gives a brief status of the accounting statistics for a user or project. This includes the total allocation (quota) and the usage (CPU-hours consumed) so far.

4. Access: user account, passwords, secure shell, data transfer

Once an allocation (and a project) has been established, users can be added to the project that can make use of the granted allocations. Users can be added by sending an application for a user account to UNINETT Sigma.

Users must inform UNINETT Sigma in case the information that was provided in the application for a user account has changed.

Information regarding passwords for the user account can be found here.

A Secure Shell client (SSH) is the required/standard tool to connect to the facilities. SSH is a network protocol that allows data to be exchanged over a secure channel between two computers. Encryption provides confidentiality and integrity of data. SSH uses public-key cryptography to authenticate the remote computer and allow the remote computer to authenticate the user, if necessary. SSH is typically used to log into a remote machine and execute commands, but it also supports tunneling, forwarding arbitrary TCP ports and X11 connections. See further information on how to log in with SSH.

The facilities are stand-alone systems and do not mount remote file systems. Files can be transferred securely to and from the facilities with the scp or sftp utilities. In a Linux environment, the on-line manual pages provide more information (type 'man scp' or 'man sftp')

5. Storage

Directories. Each user has one private storage area (home directory) on each of the compute facilities where the user has an account. In addition, on most of the facilities, the user has also access to one or more shared storage areas (work directories) that can be used by several users simultaneously.

  • The user's home directory (/home) is intended for permanent data only. This includes source code, binary code, scripts, fixed input data and final computed results. There may be a default size limit (quota) on the home directories which may differ between facilities. Large-volume data shall not be stored for a long time on the home directory, but must be moved to storage at the user's local machine, or to a Notur archival service (tape storage) if available. Data in the home directory that is not accessed on a regular basis must be compressed. The home directory will often be on a slow filesystem (or may be NFS-mounted). Due to such performance limitations, the home directory must never be used for demanding I/O or large temporary storage during computation.
  • Work directories (/work or /scratch) are intended for intensive I/O and temporary storage during computation. Work directories typically reside on faster disks than home directories, and are considerably larger than a single user's home directory. Work directories are shared by several users. To prevent work directories from being full with data, it is required that the user removes his/her data from these directories once this data is no longer needed. Work directories are purged with regular intervals, see Purging Policies.

Disk quota is not regulated uniformly across the facilities. In practice, you will have a quota on your home directory (some Gigabytes), but no limit on the work directories. In case you need more space in your home directory than the current limit allows, contact the local support staff with a request to increase your quota.

Backup policies. Data in user home directories is backed up. Earlier versions of files are kept at least 90 days and deleted files at least half a year (182 days). The backup policy may differ per site and the user is encouraged to contact the local support staff for more details. Data in work directories is not backed up.

Purging policies. In case one of the work file systems is getting full, the system administrator may remove files without prior notice. To make sure there is always sufficient temporary storage available for running (on-going) applications, there may exist special routines for automatic cleanup of work directories. Automatic cleanup normally removes the older files first. In situations with high demand for temporary storage, files may be deleted after just a few days.

It is important not to keep important data in work directories for an extended period of time.

Users with special needs for temporary storage on a facility should contact the system administration of the facility before starting the application(s).

Attempts by the user to circumvent the purpose of the work directories and the cleanup routines by using creative techniques will be recorded and the user in question may be denied further access to the resource.

Compression of data. A user is strongly encouraged to compress data as much as possible. The following commands can be used to compress data without loss of information:

  • gzip file creates the compressed file file.gz
  • gunzip file.gz recreates the original uncompressed file
  • In Linux, type 'man gzip' for on-line information
  • bzip2 file creates the compressed file file.bz2
  • bunzip2 file.bz2 recreates the original uncompressed file
  • In Linux, type 'man bzip2' for on-line information
As a rule, large files in a user's home directory should always be compressed. Large uncompressed files can be kept in the work directories.

Long-term storage. The storage areas on the compute facilities are not meant for long-term storage. Data from a user may be deleted once his/her user acccount is closed. Projects that have a need for storing data sets that must survive shifts in hardware and software technologies or data sets that must be shared by several groups or communities, must consider applying for allocations from the NorStore facilities.

6. Execution of applications

The operating systems on the Notur facilities are variants of Unix and Linux.

Batch usage. Each Notur compute facility uses a batch system. A batch system is software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., (batch) jobs, among the available resources on the facility. A user must submit all his/her jobs to the batch system. The batch system uses a scheduler that starts jobs on the facility based on available resources and the job specifications that are supplied by the users. A job specification is a file (script) given to the batch system that contains user-supplied parameters like job priority, maximum allowed run-time, number of processors and memory requested, as well as the locations of the application binaries, input data and output data. Submission procedures and choice of parameters vary between the facilities and the user must acquaint himself/herself with the local set up before submitting jobs.

In case there are insufficient resources available to execute a job (e.g., due to usage of the system by other jobs), the execution of the job is postponed and the batch system places the job in a queue. The job will be queued until the resources become available that were requested in the job's batch script. The batch system uses sophisticated algorithms to optimize the use of the resources and to ensure a fair sharing of the overall resource between all users.

Interactive usage. Interactive execution of applications (i.e., execution of applications directly from the command-line) circumvents the batch system. Interactive usage is permitted for administrative tasks, like text editing, data handling, compilation and short test runs for program development.

Interactive usage of the facilities with resource-demanding applications is in principle not allowed. In particular applications that occupy processors for a longer period must not be used interactively. Some of the facilities have a small part that is reserved for interactive purposes, but still may impose certain restrictions on the jobs that can be submitted. Limitations may apply to run-time, number of processors, etc.

The user must know the local policies for interactive usage before starting resource-demanding interactive applications. In case the user violates these policies, the corresponding jobs may be terminated without prior notice.

In case you are not sure whether your applications can be executed on the facilities in batch mode or (the allowed) interactive modes, please contact the local support staff and UNINETT Sigma.

Software and job requirements. Users are expected to know their software applications as well as the requirements for each job that they submit on the facilities. Important properties of applications software are scalability and run-time efficiency. Important properties of jobs are expected run-time, memory demands and (temporary) storage demands. The Notur compute facilities are parallel systems, capable of running applications that use many processors simultaneously. However, application software may have poor scalability and limitations in the software may not lead to faster run-times when more processors are used. The number of processors that is requested for each job must therefore be judged with care.

It is good practice that users verify that submitted applications behave as expected during run-time, especially applications that consume large resources (run-time, etc.). It is not the responsibility of the system support staff to detect errors made by users. However, the support staff may terminate jobs that make wrong or bad use of resources and that interfere with other jobs that are running on the facilities.

Certain applications require large amounts (Gigabytes) of input data or produce large amounts of intermediate data and/or output data. Large amounts of data that need to be read from or written to disk may decrease the overall run-time of an application considerably. Always use the work directories of the facilities when large amounts of data are involved.

Software licensing. The user is not allowed to install or use software on any of the Notur installations if that will violate the licensing conditions that are attached to this software.

Job priorities. All users and jobs are normally treated equally on each of the facilities. In case you believe that (some of) your jobs should be executed with higher priority, you must request this to the support staff of the facility in question. Once a job is submitted, changing its priority may no longer be possible. Make sure that you make such request as soon as possible. In case the request concerns many jobs or jobs that will occupy a significant fraction of the resources, you must supply sufficient detail and justification.

7. Programming

For researchers that are not familiar with multi-processor (parallel) computers, it is important to learn how parallel computers can be used most efficiently. A sequential application (using one processor) will not necessarily execute much faster on a parallel computer than on a local desktop. It is crucial that a software application can make use of several processors simultaneously such that the required arithmetic calculations are distributed equally over the available processors, without changing the result of the overall computation. A computation using N processors simultaneously should be carried out approximately N times faster than the same computation using only a single processor.

Programming a parallel computer is in many ways similar to programming your local desktop in a Linux environment. You need software that is written in a standard programming language (e.g., C/C++, Fortran) that can be compiled using one of the compilers that are available on the system. Once you have created a binary that can be executed on the computer and the necessary input data, you can submit a job to the batch system.

This section does not attempt to provide full information on parallelization techniques of computer programs.

8. Publications and acknowledgments

Users are strongly encouraged to acknowledge the use of the Notur facilities in all their publications (journals, magazines, newspapers, etc.). It is also highly appreciated if you can send us copies of publications that acknowledge the Notur project, or send us pointers to any sources in the media where references to the Notur project can be found.

9. Support

Contact the support staff in case you are in doubt about any of the rules and regulations for a specific facility or to get assistance in resolving problems or executing your applications.

Information concerning the help-desk and advanced user support services provided by the Notur project can be found here.

In case you believe that you have not been given professional or adequate assistance, contact UNINETT Sigma.