The Department of Informatics (DataSUS) of the Brazilian Health Ministry hosts microdata anonymized files from several health information systems, covering themes such as mortality, newborns, hospitalization, and transmissible diseases.
Those files are hosted in a public FTP server, but its access is geographically restricted to Brazil.
In order to offer an alternative way to access those files, I created a partial mirror of the FTP server in a S3 object storage architecture. This structure allows worldwide access, CDN distribution of files, and a redudancy in case of failure of the DataSUS FTP server.
Available health information systems and files
Currently, the following health information systems are mirrored:
- SIM – Sistema de Informações de Mortalidade
- SINASC – Sistema de Informações de Nascidos Vivos
- SINAN – Sistema de Informações de Agravos de Notificação
- SIH – Sistema de Informações Hospitalares do SUS
- SIA – Sistema de Informações Ambulatoriais do SUS
- CNES – Cadastro Nacional de Estalecimentos de Saúde
File Access
The S3 mirror is available at this endpoint:
https://datasus-ftp-mirror.nyc3.cdn.digitaloceanspaces.com
- The file structure at the S3 mirror follows the same directory structure of the FTP server.
- All files available at the FTP are mirrored, except expanded XML and CSV files.
Mirror update and files tree
The mirror is synced daily at 3:00 am Brazilian time. On each update, some file lists are produced:
The S3 bucket versioning option was enabled on November 7, 2024. Since then, the version history of all files (including deleted files) has been kept.
How to access a file?
Check the desired file name at full path list and append it to the endpoint access.
Example: SIM file for Bahia, 2022
https://datasus-ftp-mirror.nyc3.cdn.digitaloceanspaces.com/SIM/CID10/DORES/DOBA2022.dbc
Update logs
A log of the last update is provided and all update logs are stored in the folder rclone-logs
How to access an old update log?
Fist, locate the log file name here, then access the file. Example:
https://datasus-ftp-mirror.nyc3.cdn.digitaloceanspaces.com/rclone-logs/rclone_datasus_log_2024-11-07_03:04:55.txt
CDN
The files are cached in a CDN (content delivery network) to increase transfer speeds. This cache is refreshed every hour. To access directly the file, without the CDN, remove the cdn
from the address. Example:
https://datasus-ftp-mirror.nyc3.digitaloceanspaces.com/SIM/CID10/DORES/DOBA2022.dbc
Costs
This S3 mirror is available for free use, but I have running costs for storage and transfer volume at Digital Ocean. Please use it carefully and consciously.
Script
If you are curious to see how this works, check this code repository.