Raphael Saldanha
CV
Projects
Packages & Code
Datasets
Publications
Talks
Photos
Blog
Blog
Categories
All
(12)
arrow
(1)
climate
(2)
clustering
(1)
covid19
(1)
data
(1)
database
(4)
dtwclust
(1)
duckdb
(3)
epidemiology
(1)
package
(1)
parquet
(3)
rates
(1)
rle
(1)
sequences
(1)
sih
(1)
sqlite
(2)
tidyrates
(2)
time series
(1)
warm spell
(1)
zonal statistics
(1)
Handling 187 millions hospital admissions in Brazil with DuckDB
A patient geographical flow study
duckdb
sih
On ideal circumstances, any hospital admission would take place at the same city of residence of the patient. This facilitates the patient and family dislocation to the…
Jun 11, 2024
Persistent heat in Brazil: 1993 and 2023
climate
Persistent heat, or heat waves, are defined as sequences of days with temperatures above a reference value. These sequences of days of extreme heat are direct consequences…
Apr 22, 2024
Writing R packages with large datasets
package
data
This post aims to be a guide for those writing R packages that will contain datasets, presenting some approaches and solutions for some pitfalls.
Apr 11, 2024
Query local parquet files
database
parquet
arrow
After releasing parquet files with zonal statistics of climate indicators for Brazilian municipalities, I received some inquiries about how to query the files in an…
Apr 4, 2024
How 2023 was hot in different Brazilian municipalities?
climate
zonal statistics
According to the last Copernicus report, 2023 was exceptional hot year. The image bellow from this report circulated a lot at the social media.
Nov 27, 2023
Univariate and multivariate time series clustering
Examples with Brazilian climate data
clustering
time series
dtwclust
On this post we will try some strategies to cluster univariate and multivariate time series in R with the
{dtwclust}
package.
Nov 14, 2023
Age-adjusted COVID-19 mortality rates for Brazilian municipalities
covid19
tidyrates
On this post, we will compute crude and age-adjusted COVID-19 mortality rates for Brazilian municipalities, from 2020 to 2022 per epidemiological weeks.
Nov 7, 2023
Counting consecutive sequences of events: run length encoding and warm spell occurence example
rle
warm spell
sequences
Some days ago I was trying to count how many times consecutive sequences with values higher than a reference appears in a data frame.
Nov 6, 2023
Crude and adjusted rates in a tidy way
rates
epidemiology
tidyrates
Rates allow the comparison between the number of counts between multiple classes with different population sizes. For example, 10 disease cases that occur in 100 population…
Nov 1, 2023
SQLite database conversion to DuckDB and Parquet files
database
sqlite
duckdb
parquet
DuckDB is a relatively new database that works in a file, just like SQLite, but is very fast and designed for data science workflows.
Oct 24, 2023
Query remote parquet files with DuckDB
database
duckdb
parquet
DuckDB has a very interesting extension called httpfs that allows to query CSV and parquet files remotely, including S3 storage.
Oct 24, 2023
Some tips to work with SQLite database
database
sqlite
Databases are very useful for handling large-than-memory datasets, a common problem in Data Science. Several database engines work very well with R and Posit has a nice guide…
Oct 20, 2023
No matching items
Back to top