S3 DataSUS FTP mirror

The Department of Informatics (DataSUS) of the Brazilian Health Ministry hosts microdata anonymized files from several health information systems, covering themes such as mortality, newborns, hospitalization, and transmissible diseases.

Those files are hosted in a public FTP server, but its access is geographically restricted to Brazil.

In order to offer an alternative way to access those files, I created a partial mirror of the FTP server in a S3 object storage architecture. This structure allows worldwide access, CDN distribution of files, and a redudancy in case of failure of the DataSUS FTP server.

Available health information systems and files

Currently, the following health information systems are mirrored:

  • SIM – Sistema de Informações de Mortalidade
  • SINASC – Sistema de Informações de Nascidos Vivos
  • SINAN – Sistema de Informações de Agravos de Notificação
  • SIH – Sistema de Informações Hospitalares do SUS
  • SIA – Sistema de Informações Ambulatoriais do SUS
  • CNES – Cadastro Nacional de Estalecimentos de Saúde

File Access

The S3 mirror is available at this endpoint:

https://datasus-ftp-mirror.nyc3.cdn.digitaloceanspaces.com
Note
  • The file structure at the S3 mirror follows the same directory structure of the FTP server.
  • All files available at the FTP are mirrored, except expanded XML and CSV files.

Mirror update and files tree

The mirror is synced daily at 3:00 am Brazilian time. On each update, some file lists are produced:

Tip

The S3 bucket versioning option was enabled on November 7, 2024. Since then, the version history of all files (including deleted files) has been kept.

How to access a file?

Check the desired file name at full path list and append it to the endpoint access.

Example: SIM file for Bahia, 2022

https://datasus-ftp-mirror.nyc3.cdn.digitaloceanspaces.com/SIM/CID10/DORES/DOBA2022.dbc

Update logs

A log of the last update is provided and all update logs are stored in the folder rclone-logs

How to access an old update log?

Fist, locate the log file name here, then access the file. Example:

https://datasus-ftp-mirror.nyc3.cdn.digitaloceanspaces.com/rclone-logs/rclone_datasus_log_2024-11-07_03:04:55.txt

CDN

The files are cached in a CDN (content delivery network) to increase transfer speeds. This cache is refreshed every hour. To access directly the file, without the CDN, remove the cdn from the address. Example:

https://datasus-ftp-mirror.nyc3.digitaloceanspaces.com/SIM/CID10/DORES/DOBA2022.dbc

Costs

This S3 mirror is available for free use, but I have running costs for storage and transfer volume at Digital Ocean. Please use it carefully and consciously.

Script

If you are curious to see how this works, check this code repository.

Back to top