Research Portal2018 Resource Allocations Competition Results

French

List of Resource Allocation Competitions 2018 Awards

Summary
Computational Resources
– CPU Allocations
GPU Allocations
– Cloud Allocations
Persistent Disk Storage Allocations
Review Process
Scaling for Compute Requests
Monetary Value of the 2018 Allocations

Summary

Canada’s national advanced research computing (ARC) platform is delivered through the partnership of Compute Canada, regional organizations (WestGrid, Compute Ontario, Calcul Québec and ACENET) and institutions across Canada. Providing researchers with access to the infrastructure and expertise they need to accomplish globally competitive, data-driven, transformative research, it serves the needs of more than 11,000 researchers, including over 3,900 faculty based at Canadian institutions as of January 1, 2018.

Recent investments have enabled a renewal of Canada’s national ARC platform — the incorporation of the new Stage 1 systems, Cedar (SFU), Graham (Waterloo), and Niagara (Toronto), yielded approximately 60PB of new raw storage capacity and 133,552 core years.

However, the dual challenge of the retirement of legacy systems and an ongoing growth in researcher demand for resources meant that demand continued to outstrip supply. The 2018 RAC competition received the highest number of applications in its history with 469 projects applying for an allocation — an increase of 15% over requests made in 2017. Unfortunately, due to the challenges discussed above, this year’s RAC was only able to award 55.1% of the total compute requested, 73% of the total storage requested, and 20.5% of the total GPUs requested.

In general, 80% of resources are reserved for the Resource Allocation Competitions (RAC), leaving 20% for use via the Rapid Access Service (RAS). Those with RAC awards will have a higher priority on the clusters; however, all users have access to modest quantities of compute, storage and cloud resources via RAS as soon as they have a Compute Canada account.

If you have questions about the terminology used in this page, please consult the Technical Glossary.

Table 1: Applications submitted to the Resource Allocation Competitions
between 2011 and 2018.

Year Total Year on Year Increase
2018 469 14.70%
2017 409 11.70%
2016 366 4.60%
2015 350 20.30%
2014 291 37.90%
2013 211 32.70%
2012 159 17.80%
2011 135

Computational Resources

CPU Allocations

Based on available computing resources, 55.1% of CPU (core year) requests were met by RAC 2018. New systems (Cedar, Graham, Niagara), which are faster and have more memory than older systems, provided nearly 75% of the available capacity or approximately 146,000 cores. In addition, CFI made an exception and allowed for a year’s extension of a number of legacy systems to help address the capacity shortfall. This resulted in a modest increase in available cores.

The new systems were allocated at close to 80% capacity, leaving approximately 20% capacity for new users and smaller development projects that did not receive an allocation through the RAC process. As mentioned above, a number of legacy systems, including (Briaree, MP2, Frontenac) were also allocated on an exceptional basis, but these are smaller systems, including some with older, less capable CPUs, so allocation rates were closer to 60% capacity with room for opportunistic and bursty use.

Table 2: 2018 Compute Allocations per System

CPU Cores available for allocations (100% capacity)* Cores requested Cores Allocated % of cores allocated vs. available
Cedar 54,912 90,272 44,063 80.24%
Graham 28,448 47,896 22,916 80.55%
Niagara 60,000 99,724 51,631.0 86.05%
Briarée 7,560 10,220 4,77 63.12%
Mp2 30,984 35,889 15,218 49.11%
Frontenac 3,500 3,956 2,244 64.11%
Orca** 8,880 0 4,355 49.04%
MS2** 1,936 0 1,514 78.20%
Glooscap** 2,000 0 1,577 78.85%
Placencia** 3,200 0 2,463 76.97%
Orcinus** 9,600 0 7,801 81.26%
Total 211,020 287,957 158,554 75.14%

 As of April 26, 2018

*This provides the total number of available cores. Generally, approximately 80% of these cores are allocated for RAC, leaving 20% for use through RAS for new users and development projects that did not receive a RAC allocation, as well as for system outages and upgrades.

** These systems were added to the allocation pool, on an exceptional basis, to help make some extra cores available to address capacity shortages.

Table 3: Historical Compute Ask vs. Allocation

Year Supply: Allocatable CPU CY  

Need: Total CY Requested

Provided: Total CY allocated Shortfall capacity CY % of the demand awarded
2018 211,020 287,957 158,632 129,325 55.10%
2017 182,760 255,63 148,10 107,538 57.90%
2016 155,952 237,86 128,46 109,399 54.00%
2015 161,888 191,690 123,699 67,991 64.50%
2014 190,466 172,989 133,508 39,481 77.20%
2013 187,227 142,106 126,677 15, 429 89.10%
2012 189, 024 103, 845 87, 312 16, 533 84.10%
2011 132, 316 72, 848 75, 471 -2, 623 103.60%

 

GPU Allocations

Constraint in GPU resources was greater than for CPUs. As Table 4 shows, requests for GPUs have increased 6 fold since 2015. In 2018, nearly 1000 new GPU devices became available as part of the Graham and Cedar clusters.  Unfortunately, at the same time, some older systems were removed from service, which meant that the allocation success rate was lower in 2018 at 20.5%, compared with 37.5% in 2017.

Table 4: Historical GPU demand vs. supply (GPU years)

Year Supply: Allocatable GPUs  

Need: GPUS Requested

Provided: Total GPUs allocated Shortfall capacity GPUs % of the need awarded
2018 976 4,092 1840 3,252 20.50%
2017 1,420 2,790 1,047 1,743 37.50%
2016 373 1,357 269 1,088 19.80%
2015 482 608 300 308 49.30%
2014 n/a 420 308 112 73.30%
2013 n/a 390 259 131 66.40%
2012 n/a 10 10 0 100.00%

 

Cloud Allocations

The Arbutus cluster at the University of Victoria has 10,336 allocatable virtual CPUs. These are available via RAC and RAS, and are also utilized for internal Compute Canada services such as software builds and hosting.  A further 36 legacy nodes remain in service as part of Cloud East at l’Université de Sherbrooke. RAC 2018 received a 36% increase in requests for virtual CPUs. Between Arbutus and the additional nodes at Cloud East (UdeS),  this year’s RAC was able to allocate 95% of the total virtual CPUs requested. In total, cloud storage was allocated at 94% of its capacity for 2018.

Storage at the University of Victoria will be increased to 6.1 PB early in the RAC 2018 year.

Persistent Disk Storage Allocations

The incorporation of the new Stage 1 systems Cedar, Graham, Arbutus and Niagara yielded approximately 60PB of new raw storage capacity  in 2018. “Nearline” capacity, to relocate data from online (disk) to nearline (tape), is under development and is expected to be available for RAC 2019. Until then, some of the nearline capacity requirements are being met by HPSS and Mammouth Archive.  

As of early 2018, a total of 45.5PB of persistent disk storage was allocated from the 54.6PB of allocatable supply.

Table 5: 2018 Storage Need v. Supply by Storage Type (TB)

Storage type Supply Need: Storage Requested TB Provided:  Storage allocated % of the demand awarded
Project 30, 100 32,307 28,737 89%
dCache 5,600 7,356 5,532 75%
Cloud *4,920 5,398 4,612 85%
Nearline (HPSS & Mammouth Archive) 14,000 17,503 6,607 38%
Total 54,620 62,564 45,487 73%

* This estimate of available cloud storage is dependent on the details of a migration to the new and more efficient erasure coded redundancy configuration.

Review Process

Submissions are evaluated for technical feasibility and scientific excellence. For the 2018 competitions, 469 applications were evaluated. Virtually all RAC applicants are requesting resources to support research programs and HQP that were are already funded through other tri-council and other peer-reviewed sources.

Technical Review Technical Staff
  • Technical adjustments are made to ensure requests are compliant with policy and aligned with the technical capabilities of available resources.
Scientific Review Disciplinary peer review panel evaluates each proposal
  • Each proposal receives multiple independent reviews;
  • Scientific committees meet to discuss the applications;
  • The peer review panel may or may not recommend specific cuts for an application;
  • The peer review panel gives a final science score.

Scaling for Compute Requests

As described above, there were insufficient ARC resources to fully meet the allocations requested through RAC 2018.  

As a result, a scaling function was applied to RAC 2018 (see chart below) to provide a means by which decisions on RAC allocations in a context of insufficient capacity could be made. This function, which is endorsed by the RAC Chairs committee, was set so that only applications with a science score of 2.0 or higher received an allocation, with a maximum of 83% of their total allocation request being met for those with a score of 5.  Applicants who did not receive a compute allocation can still make opportunistic use of system resources via Rapid Access Service.  

Table 6: Scaling Parameters for Compute Allocations

Scaling Parameters 2018 2017
Minimum Science Score for an allocation 2.0 2.2
% of CPU request allocated at minimum Science Score 10% 16%
% of CPU request allocated to 4.0 Science Score 61% 72%
% of CPU request allocated to 5.0 Science Score 83% 87.5%
Number of applications below minimum allocatable score 18 55


Monetary Value of the 2018 Allocations

These values represent an average across the national ARC platform’s facilities and include total capital and operational costs incurred to deliver the resources and associated services. These are not commercial or market values. For the 2018 competition, the value of the resources allocated was calculated on a per-year basis using the following rates:

Table 7: Historical Financial Value of RAC Awards

Financial value of award 2018 2017 2016 2015
1 core year $156.78 $188.84 $279.00 $275.00
1 GPU year $2,960.77 $566.52 $1,100.00 $1,100.00
1 TB of project storage / year $36.48 $128.00 $173.00 $190.00
1 VCPU year $91.05 $40.50 NA NA
1 TB of cloud storage (Ceph) / year $236.81 $178.50 NA NA

 

Costs for CPUs also reflect inclusion of 42,044 legacy CPU cores valued at $279/year each.  Capital costs are not included for legacy cores, only operational costs. The GPU methodology was improved from 2017 to be more accurate.  Some legacy storage and other resources were allocated for 2018, but costs are not differentiated from the new equipment.

With the exception of GPUs, the valuation of each of these resources decreases each year as older, more expensive, resources are retired and replaced with newer, more cost-effective, resources.

Top