What I Use
Literature Review
- typeset.io - Typeset.io is a platform designed to assist researchers, academics, and authors in the process of writing, formatting, and preparing documents for publication.
- scholarcy - The AI-powered article summarizer.
- elicit - AI-based research assistant.
Journal Finders
Journal Finder by Elsevier: This tool allows you to enter your manuscript title and abstract to find journals that are a good match for your research. It also provides information on journal metrics and submission guidelines.
Springer Journal Suggester: Springer’s tool helps you find the right journal for your research by analyzing your abstract or keywords and suggesting journals that publish articles in your field.
PubMed Journal Selector: If your research is in the biomedical or life sciences, PubMed’s Journal Selector can assist you in finding journals that match your keywords and research area.
JANE (Journal/Author Name Estimator): JANE is a free online tool that helps you find journals and authors based on the text of your article’s title and abstract.
JournalGuide: This tool provides a comprehensive database of journals in various fields and allows you to search for journals by keywords or browse by subject area.
Scopus Journal Finder: Scopus, a bibliographic database, offers a journal finder feature that helps you find journals related to your research area based on keywords or article titles.
DOAJ (Directory of Open Access Journals): If you’re interested in open-access journals, DOAJ is a directory of freely available scholarly journals that you can search by subject or keyword.
Scimago Journal & Country Rank: On the Scimago platform, you can find information about journals, their rankings, citation data, and more, which can be useful for researchers looking to identify suitable journals for their research or assess the impact and prestige of journals in their field. It’s a valuable tool for academic research and evaluation.
Data Wrangling
Data Visualization
- ggplot2 for the vast majority of the graphics, together with the hrbrtheme for styling.
- patchwork to put graphics together.
- ggraph and igraph for most of the network related graphics
- plotly and other html widgets for interactive graphics.
- RColorBrewer and viridis and colormap to control color in charts.
- Ggrepel and other ggplot2 extension that make your life simpler.
- Heatmaply for most of the heatmaps
Publication-ready Tables
Reproducible Research
- R Markdown to produce statistical reports.
- Quarto to build 95% of the website for my courses and others.
Statistical Modeling
- easystats for easy statistical modeling, visualization, and reporting
Data Science
- NumPy for scientific computing.
- Pandas for data wrangling and analysis
- Matplotlib for data visulization
- Seaborn for advance statistical visualizations
- Plotly for interative data visualization
- researchpy to summarize data and perform statistical tests.
- Dask for big data analysis
- scikit-learn for machine learning
- scikit-image for life science image manipulation
Data Resources
The Project Open Data Dashboard gives overview statistics of available government data from various agencies.
Guide to Open Data Publishing & Analytics - A good article describing best practices for publishing data openly. Is also a good read for those who want to analyze other’s data.
A short list of data related R packages - packages that either access data or include data
Some Data Sources
Kaggle Data - A growing number of datasets used in Kaggle data analysis contests and available for any other use.
Nasdaq Data Link - mainly finance related data
NHANES - longstanding and thorough survey done by CDC
SEER - Cancer data
CDC WONDER - list of mainly CDC online databases
Healthy People Website - contains among other things links to various data sources
HCUP - collection of health related databases, focusing on US wide and state-specific samples of ER and hospital visits. Not free, but not too expensive.
Clinical Study Data Request - a way to get (tedious) access to clinical trial data
EMA Clinical Data Portal - looks like a way to get access to some clinical trial data for EMA registered studies.
MIMIC - a free and open database of critical care patient visits to a Boston hospital.
Data.gov - federak government data platform.
Analyze Survey Data for Free - Step by Step Instructions to Explore Public Microdata from an Easy to Type Website
Inter-university Consortium for Political and Social Research (ICPSR) - access to various social and behavioral sciences data.
A list hosted by Microsoft with links to various data sources
Infectious Disease Specific
General
- Project Tycho - infectious disease data
- http://www.viprbrc.org
- http://eupathdb.org
- ClinEpiDB - a database of (a few) clinical epidemiology studies, focusing on infectious diseases.
- ImmPort
Influenza
- http://www.fludb.org
- <gisaid.org>
- http://www.cdc.gov/flu/index.htm
- http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html
- http://www.SystemsInfluenza.org
TB
- OTIS on CDC WONDER: http://wonder.cdc.gov/tb.html
Cancer Bioinformatics
Data Sources
The Cancer Genome Atlas (TCGA): TCGA is a comprehensive collection of multi-dimensional cancer genomics data covering multiple cancer types.
International Cancer Genome Consortium (ICGC): Description: ICGC provides high-quality genomic and clinical data from various cancer projects worldwide.
Gene Expression Omnibus (GEO): GEO is a public repository hosted by the National Center for Biotechnology Information (NCBI) containing a vast collection of gene expression data, including cancer datasets.
European Genome-phenome Archive (EGA): Description: EGA is a repository for secure storage and sharing of human genetic and phenotypic data, including cancer datasets.
National Cancer Institute (NCI) Genomic Data Commons (GDC): Description: GDC is an open-access data portal providing access to a wide range of cancer genomics datasets.
cellxgene.cziscience.com - Download and visually explore reference-quality data to understand the functionality of human tissues at the cellular level with Chan Zuckerberg CELL by GENE Discover (CZ CELLxGENE Discover).
10XGenomics - High-performance in situ from the single cell leader
Analysis Tools
UCSC Xena: An online exploration tool for public and private, multi-omic and clinical/phenotype data
GEO2R: GEO2R is an interactive web tool that allows users to compare two or more groups of Samples in a GEO Series in order to identify genes that are differentially expressed across experimental conditions. Results are presented as a table of genes ordered by P-value, and as a collection of graphic plots to help visualize differentially expressed genes and assess data set quality. GEO2R uses a variety of R packages from the Bioconductor project. Bioconductor is an open-source software project based on the R programming language that provides tools for the analysis of high-throughput genomic data.
GEPIA2: GEPIA2 is a web-based tool for analyzing gene expression data in cancer. It stands for Gene Expression Profiling Interactive Analysis 2 and is an updated version of the original GEPIA tool. GEPIA2 allows users to explore gene expression patterns, perform survival analyses, and visualize gene expression data across various cancer types.
TIMER2.0: TIMER is a comprehensive resource for systematical analysis of immune infiltrates across diverse cancer types. This version of webserver provides immune infiltrates’ abundances estimated by multiple immune deconvolution methods, and allows users to generate high-quality figures dynamically to explore tumor immunological, clinical and genomic features comprehensively.
UALCAN: UALCAN is a web-based platform that provides interactive and comprehensive analysis of cancer transcriptome data. It enables users to explore gene expression patterns, perform survival analyses, and compare gene expression between tumor and normal samples across different cancer types. UALCAN utilizes data from The Cancer Genome Atlas (TCGA) to facilitate cancer research and provide insights into tumor biology.
cBioPortal for Cancer Genomics:: cBioPortal hosts a large collection of cancer genomics datasets, allowing users to explore and visualize the data.
GREIN : GEO RNA-seq Experiments Interactive Navigator: GREIN is an interactive web platform that provides user-friendly options to explore and analyze GEO RNA-seq data. GREIN is powered by the back-end computational pipeline for uniform processing of RNA-seq data and the large number (>6,000) of already processed datasets. These datasets were retrieved from GEO and reprocessed consistently by the back-end GEO RNA-seq experiments processing pipeline (GREP2).
OncoLnc: Description: OncoLnc is a web resource that provides survival analysis and expression correlation for genes of interest across multiple cancer datasets.
UCSC Cancer Genomics Browser: The UCSC Cancer Genomics Browser offers a comprehensive collection of cancer genomics data integrated with genomic annotations.
ONCOMINE: ONCOMINE is a powerful web-based platform for the analysis and visualization of cancer transcriptomic data. It provides researchers with access to a vast collection of publicly available gene expression datasets derived from cancer studies. ONCOMINE allows users to explore gene expression patterns, identify potential biomarkers, and compare gene expression between different cancer types or subtypes.
R Packages
TCGAbiolinks: An R/Bioconductor package for integrative analysis with GDC data. TCGAbiolinks is able to access The National Cancer Institute (NCI) Genomic Data Commons (GDC) thorough its GDC Application Programming Interface (API) to search, download and prepare relevant data for analysis in R.
maftools: Summarize, Analyze and Visualize MAF Files. This package attempts to summarize, analyze, annotate and visualize MAF files in an efficient manner from either TCGA sources or any in-house studies as long as the data is in MAF format.
SummarizedExperiment: The SummarizedExperiment container contains one or more assays, each represented by a matrix-like object of numeric or other mode. The rows typically represent genomic ranges of interest and the columns represent samples.
MutationalPatterns: Comprehensive genome-wide analysis of mutational processes. he package covers a wide range of patterns including: mutational signatures, transcriptional and replicative strand bias, lesion segregation, genomic distribution and association with genomic features, which are collectively meaningful for studying the activity of mutational processes.
GenVisR: Short for “Genomic Visualizations in R,” this tool provides visualization capabilities tailored to a variety of genomic data types, including data common in cancer research such as somatic mutations, copy number variations, and more.
Teaching Tools
- Bioicons - a collection of free drawings and diagrams on biological topics, which can be used in teaching (or research) presentations.
- learnr - R package that allows development of interactive web-based R tutorials.
- Feedback at scale - tutorial for using learnr and gradethis as teaching tools.
- Teaching Statistics and Data Science Online - materials for several teacher workshops taught by Mine Çetinkaya-Rundel.