A Unique Made in Canada Solution
The new national data infrastructure has increased granularity of access control to specific data items — whether files, objects, databases, or any of the content they include. Specific technologies for this include cloud-based services, object storage software, and traditional file systems.
Researchers and their projects will be able to strictly limit data access to authorized users, groups, or individuals. They will also be able to make selected items public.
New investments, spanning through late 2017, will deploy approximately 62 petabytes (PB) of persistent, available online storage across the four national sites. This will be backed up by a comparable quantity of tape storage. This represents an $18 million investment.
Ongoing investments are targeted to exceed 100PB in 2018, and 250PB by 2020.
- It’s cost-effective: Commodity-based storage building blocks are being used to provide capacity.
- It’s resilient: Off-site backups, data replication, and other mechanisms will assure against data loss.
- It’s high performance: The storage building blocks, combined with software-defined storage for all the access modalities, assure rapid access — even over the Internet.
- It’s well-supported: Compute Canada’s experts are here to help you! This includes email to firstname.lastname@example.org, and on-campus support at all member institutions.
- It’s scaleable: A national data infrastructure needs to expand and scale. Online storage, backups, and all the ways of accessing data will continue to grow over time, to enable Canada’s digital leadership.
Storage requests from last year’s Compute Canada resource allocation process was nearly double the available resources. The community of advanced research computing (ARC) users continues to grow, more than doubling in the past five years.
There will be a storage component for each of the four new national sites in Canada to support a federated national data infrastructure for all Canadian researchers. This approach is possible because of Compute Canada’s national platform and federated model for service delivery. Individual institutions or organizations will have opportunities to deploy storage locally and can federate their local repository into the national system for a scaleable solution to their storage needs.
The national data infrastructure will greatly expand existing capacities on large-scale file systems, used to store file-based data. The new national computational systems will deliver a consistent experience, with usernames and passwords, access points on systems, and centralized backups. The national helpdesk will provide centralized support for all storage and computational systems, in coordination with local and regional support.
The national data infrastructure will also include object storage systems. Object storage provides a common interface to data objects, regardless of their online locations. This will allow transparent data replication, to increase availability and resiliency. Object storage also provides advanced features for access control, enabling everything from highly limited access to full public access.
Data that are not under active use will have special consideration in the national data infrastructure. When a researcher needs to retain a dataset as part of an active project, but is not using that dataset, it will be efficiently moved to back-up tape. This greatly extends capacity of the online storage. Furthermore, tape storage uses no electricity at rest so is a ‘green’ approach to storage.