The specificity of our project is : 1) all data will be managed in an integrated environment; 2) the developed platform will be designed in such a way that it will be reusable by other projects with similar needs in different environments; 3) the platform will facilitate the re-use of data for future needs in a way that conforms with the concept of Biobanks
[i] and F.A.I.R. Data Principles
[ii].
Two distinct types of data must be managed:
- Data produced directly by the medical imaging hardware such as X-ray based scanners, nuclear medicine systems, single-photon emission computed tomography systems (SPECT), positron-emission tomography systems (PET), and in some cases, hybrid systems. The data produced by these systems include the images themselves, and in many cases, radiation dose structured reports (RDSR). As a general rule, these data are provided in the standard DICOM format, which greatly simplifies their collection.
- Data produced by MEDIRAD researchers. These data are constituted primarily of quantitation of the absorbed doses of radiation by tissues following medical imaging procedures, principally:
-
- Exposure to X-rays.
- Injection of radiopharmaceuticals for PET and SPECT imaging.
- Injection of 131I for the treatment of thyroid cancer.
SYTEM ARCHITECTURE:
The architecture of the integrated system, referred to as the Imaging and Radiation Dose Biobank (IRDBB), is displayed in Figure 1.
The main components of the architecture are:
1. A web server (IRDBB_UI) which manages the user interface and interaction with users, particularly, upload of images and database requests.
2. The KHEOPS image sharing platform. KHEOPS is used for image storage, to provide a DICOMweb interface to the images for third-party analysis software, and to provide an interface for third-party reporting systems.
3. A FHIR repository is used to manage non-DICOM data.
4. A semantic data manager, referred to as the Semantic Translator, is used to populate the semantic database using information derived from the imported images, including the metadata relevant for dosimetry, as well as other associated data sources. The saved semantic data are instances of classes defined by the OntoMEDIRAD ontology, which was designed to explicitly define all the information that is used by the MEDIRAD project.
5. A semantic database, implemented using the Stardog
[iii] RDF triplestore.
6. An identity and access manager (IAM) implemented using Redhat’s Keycloak
[iv] software.
Each component is instantiated using the Docker container virtualization system. Communication between the different containers is achieved via RESTful http services.
IRDBB can be accessed through any common web browser. Once a user account has been provided by an administrator, users can begin uploading data, or perform requests to the service to access data.
Users must log in using two factor authentication (2FA) using a personal password and a time-based one-time password
[v] (TOTP). Redhat’s FreeOTP
[vi] or Google’s Authenticator
[vii], running on the user’s smartphone, are suggested implementations of the TOTP algorithm. Hardware based tokens are supported for users who don’t have access to a smartphone.
All access to IRDBB and the KHEOPS backend is logged to an Elastic Search database and can be visualized using the Kibana software
[viii].
Data upload and query:
Data upload is accomplished with the following steps: 1) The user drag-and-drops a directory of DICOM or non-DICOM files to be imported into the upload interface; 2) The user then provides the patient pseudonym and selects the specific MEDIRAD clinical study the patient belongs to; 3) The imported data is de-identified using a profile specific to each clinical study by javascript code running within the user’s web browser, ensuring that no identifying information is transmitted beyond the user’s computer; 4) The imported data is indexed and stored to the KHEOPS platform or the FHIR repository, depending on its type; 5) Key data (data type, identifiers, the specific patient, data provenance, etc) are then extracted and interpreted prior to being translated to RDF and inserted into the semantic database.
Querying semantic data is done via the user’s web browser. The querying service provides a list of pre-defined SPARQL queries. Responses are provided as a CSV file, which can be imported into an Excel spreadsheet for example. It is also possible to directly download DICOM series from the response interface.
Reporting:
Reporting is provided using the KHEOPS Report Provider interface. Users are able to select DICOM studies for which reports are to be generated and are then redirected to an external reporting system. The generated report (in both PDF, and DICOM SR) will then be stored and indexed such that the information will be available in the IRDBB semantic database.
Cohort Partitioning:
Users can access the imported de-identified DICOM images directly through the KHEOPS image sharing platform. Within KHEOPS, the imported MEDIRAD data is available in a KHEOPS album that contains all of the MEDIRAD images. Users are then able to generate new albums of subsets of images, which can then be shared amongst colleagues.
Online Visualization of DICOM data:
DICOM data can be directly visualized by accessing the KHEOPS platform. Multiple visualization options are available. It is possible to directly visualize the images using the Open Health Imaging Foundation
[ix] (OHIF) webbased viewer, the multiplatform Weasis
[x] viewer, and the OsiriX
[xi] and Horos
[xii] viewers on MacOS.
Standards Based Access to DICOM Data (DICOMweb):
Users can generate secure tokens on the KHEOPS platform that can be used to access the MEDIRAD images using the standard DICOMweb protocol. By using KHEOPS albums, users can create virtual PACSs that contain imaging data to be analyzed. This functionality allows users to access DICOM data directly through RESTful http requests from commonly used software such as 3D Slicer
[xiii], MATLAB
[xiv], Python scripts, etc.
RESULTS:
The IRDBB system has been designed to cover the many needs of the MEDIRAD project subtasks. The imaging modalities currently supported include CT, PET, SPECT, and NM. Currently, two workflows for non-DICOM data generated from dosimetric calculations have been taken into account.
The first workflow concerns estimated doses of absorbed radiation by organs and tissues secondary to thoracic imaging (in both pediatric and adult populations). The second workflow concerns estimated doses of absorbed radiation by tissues secondary to 131I radiotherapy treatment for thyroid cancer. This estimation in this case is conducted by Monte Carlo simulation using SPECT imaging to identify the biodistribution of the radiopharmaceutical, and an estimation of pharmacokinetics.
For each workflow, an ad-hoc data structure was defined in the form of an XML schema. The schema defines the encoding of the data to be stored into the FHIR repository in addition to the origin of the data. For example, a 3D map of absorbed doses will be described by information such as the type, filename, format, reference to the method used to calculate the absorbed dose, and name of the software used to perform the analysis. All necessary elements of the vocabulary are described in the OntoMEDIRAD ontology.
[i] ESR Position paper on imaging biobank. Insights Imaging 6(4):403-10, (2015) DOI 10.1007/s13244-015-0409-x
[ii] Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016) DOI 10.1038/sdata.2016.18
[iii]
https://www.stardog.com
[iv]
https://www.keycloak.org
[v]
https://tools.ietf.org/html/rfc6238
[vi]
https://freeotp.github.io
[vii]
https://github.com/google/google-authenticator
[viii]
https://www.elastic.co
[ix]
http://ohif.org
[x]
https://nroduit.github.io
[xi]
https://www.osirix-viewer.com
[xii]
https://horosproject.org
[xiii]
https://www.slicer.org
[xiv]
https://www.mathworks.com/products/matlab.html