Stats and R
https://statsandr.com/
Recent content on Stats and RHugo -- gohugo.ioenMon, 16 Dec 2019 00:00:00 +0000Paper: 'EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number'
https://statsandr.com/blog/paper-epilps-a-fast-and-flexible-bayesian-tool-for-estimation-of-the-time-varying-reproduction-number/
Wed, 19 Oct 2022 00:00:00 +0000https://statsandr.com/blog/paper-epilps-a-fast-and-flexible-bayesian-tool-for-estimation-of-the-time-varying-reproduction-number/Introduction Motivation Getting started A simulated example Smoothing the epidemic curve and estimating \(\mathcal{R}_t\) USA hospitalization data References Introduction A colleague (and friend) of mine recently published a research paper entitled “EpiLPS: A fast and flexible Bayesian tool for estimation of the time-varying reproduction number” in PLoS Computational Biology.
I am not in the habit of sharing research paper to which I did not contribute.How to keep yourself updated with the latest R news?
https://statsandr.com/blog/how-to-keep-up-to-date-with-the-latest-r-news/
Thu, 13 Oct 2022 00:00:00 +0000https://statsandr.com/blog/how-to-keep-up-to-date-with-the-latest-r-news/Introduction How do I keep track? Twitter Newsletters Conclusion Introduction At the end of one of the training sessions I gave on R, a student asked me the following question:
How do you keep yourself updated with the latest R news?
It is true that R, being open source (meaning that everyone can contribute), is evolving rapidly.One-sample Wilcoxon test in R
https://statsandr.com/blog/one-sample-wilcoxon-test-in-r/
Thu, 07 Jul 2022 00:00:00 +0000https://statsandr.com/blog/one-sample-wilcoxon-test-in-r/Introduction When? Data How? Combine statistical test and plot References Introduction In a previous article, we showed how to do a two-sample Wilcoxon test in R. Remember that there are actually two versions of this test:
The Mann-Whitney-Wilcoxon test (also referred as Wilcoxon rank sum test or Mann-Whitney U test), used to compare two independent samples. This test is the non-parametric version of the Student’s t-test for independent samples.Koh-Lanta 2022: the ambassadors probability problem
https://statsandr.com/blog/koh-lanta-2022-ambassadors-probability-problem/
Mon, 16 May 2022 00:00:00 +0000https://statsandr.com/blog/koh-lanta-2022-ambassadors-probability-problem/Introduction Before 2022 In 2022 Probabilities computation in R First draw Second draw Third draw Game limited to 3 draws Game limited to 5 draws Game limited to 100 draws Game limited to the number of necessary draws Final winning probabilities Visual representations Coded into a function Conclusion Introduction There is a popular TV show broadcasted in France and the french-speaking part of Belgium called “Koh-Lanta”.Paper: 'Semi-Markov modeling for cancer insurance'
https://statsandr.com/blog/paper-semi-markov-modeling-for-cancer-insurance/
Wed, 06 Apr 2022 00:00:00 +0000https://statsandr.com/blog/paper-semi-markov-modeling-for-cancer-insurance/I am happy to announce that our paper entitled “Semi-Markov modeling for cancer insurance” has been accepted for publication in the European Actuarial Journal.
Advancements in medicine and biostatistics have already resulted in a better access to insurance for people diagnosed with cancer. This materializes into the “right to be forgotten” adopted in several EU member states, granting access to insurance after a waiting period of at most 10 years starting at the end of the successful therapeutic protocol.Kruskal-Wallis test, or the nonparametric version of the ANOVA
https://statsandr.com/blog/kruskal-wallis-test-nonparametric-version-anova/
Thu, 24 Mar 2022 00:00:00 +0000https://statsandr.com/blog/kruskal-wallis-test-nonparametric-version-anova/Introduction Data Kruskal-Wallis test Aim and hypotheses Assumptions In R Interpretations Post-hoc tests Dunn test Combination of statistical results and plot Summary References Introduction In a previous article, we showed how to do an ANOVA in R to compare three or more groups.
Remember that, as for many statistical tests, the one-way ANOVA requires that some assumptions are satisfied in order to be able to use and interpret the results.Stats and R is 2 years old!
https://statsandr.com/blog/statsandr-is-2-years-old/
Thu, 16 Dec 2021 00:00:00 +0000https://statsandr.com/blog/statsandr-is-2-years-old/Introduction Analytics Users and page views Page views over time Page views per channel Page views per day of week and month of year Page views per month and year Top performing pages Page views by country User engagement by devices Browser information End note Introduction Stats and R has been launched exactly two years ago. Like last year, I think it is a good time to do a review of the past 12 months by sharing some figures about the audience of the blog.What statistical test should I do?
https://statsandr.com/blog/what-statistical-test-should-i-do/
Thu, 02 Dec 2021 00:00:00 +0000https://statsandr.com/blog/what-statistical-test-should-i-do/Being a teaching assistant in statistics for students with diverse backgrounds, I have the chance to see what is globally not well understood by students.
I have realized that it is usually not a problem for students to do a specific statistical test when they are told which one to use (as long as they have good resources and they have been attentive during classes, of course). However, it appears that the task is much more difficult for them when they need to choose what test to do.Multiple linear regression made simple
https://statsandr.com/blog/multiple-linear-regression-made-simple/
Mon, 04 Oct 2021 00:00:00 +0000https://statsandr.com/blog/multiple-linear-regression-made-simple/Introduction Simple linear regression: reminder Principle Equation Interpretations of coefficients \(\widehat\beta\) Another interpretation of the intercept Significance of the relationship Correlation does not imply causation Conditions of application Visualizations Multiple linear regression Principle Equation Interpretations of coefficients \(\widehat\beta\) Conditions of application How to choose a good linear model? \(P\)-value associated to the model Coefficient of determination \(R^2\) Parsimony Visualizations To go further Print model’s parameters Extract model’s equation Automatic reporting Predictions Linear hypothesis tests Overall effect of categorical variables Interaction Summary References Introduction Remember that descriptive statistics is a branch of statistics that allows to describe your data at hand.Running pace calculator in R Shiny
https://statsandr.com/blog/running-pace-calculator/
Mon, 15 Mar 2021 00:00:00 +0000https://statsandr.com/blog/running-pace-calculator/Introduction Running pace calculator How to use it? Code Introduction If you are a runner yourself, you are certainly aware of how important preparation is before a race. For the preparation of my first marathon, I used to rely on a training plan.
This running plan was great, but an important information was missing: the running pace. Most of the time, the distance and the time was given, but I needed to figure out the pace myself.Hypothesis test by hand
https://statsandr.com/blog/hypothesis-test-by-hand/
Wed, 27 Jan 2021 00:00:00 +0000https://statsandr.com/blog/hypothesis-test-by-hand/Descriptive versus inferential statistics Motivations and limitations Hypothesis test Why? When? How? Method A: Comparing the test statistic with the critical value Step #1: Stating the null and alternative hypothesis Step #2: Computing the test statistic Step #3: Finding the critical value Step #4: Concluding and interpreting the results Why don’t we accept \(H_0\)? Method B: Comparing the p-value with the significance level \(\alpha\) Step #1: Stating the null and alternative hypothesis Step #2: Computing the test statistic Step #3: Computing the p-value Step #4: Concluding and interpreting the results Method C: Comparing the target parameter with the confidence interval Step #1: Stating the null and alternative hypothesis Step #2: Computing the confidence interval Step #3: Concluding and interpreting the results Which method to choose?How to track the performance of your blog in R?
https://statsandr.com/blog/track-blog-performance-in-r/
Wed, 16 Dec 2020 00:00:00 +0000https://statsandr.com/blog/track-blog-performance-in-r/Introduction Prerequisites Analytics Users, page views and sessions Sessions over time Sessions per channel Sessions per day of week Sessions per day and time Sessions per month and year Top performing pages Time-normalized page views Page views by country Browser information User engagement by devices Content Finding topics Content distribution A small note about ads Future plans Thank you note Introduction Stats and R has been launched on December 16, 2019.Paper: 'Waiting period from diagnosis for mortgage insurance issued to cancer survivors'
https://statsandr.com/blog/waiting-period-cancer-survivors/
Mon, 23 Nov 2020 00:00:00 +0000https://statsandr.com/blog/waiting-period-cancer-survivors/I am happy to announce that our paper entitled “Waiting period from diagnosis for mortgage insurance issued to cancer survivors” has been published in the European Actuarial Journal.
Here is a brief summary of it:
Massart (2018) testimonial illustrates the difficulties faced by patients having survived cancer to access mortgage insurance securing home loan. Data collected by national registries nevertheless suggest that excess mortality due to some types of cancer becomes moderate or even negligible after some waiting period.ANOVA in R
https://statsandr.com/blog/anova-in-r/
Mon, 12 Oct 2020 00:00:00 +0000https://statsandr.com/blog/anova-in-r/Introduction Data Aim and hypotheses of ANOVA Underlying assumptions of ANOVA Variable type Independence Normality Equality of variances - homogeneity Another method to test normality and homogeneity Outliers ANOVA Preliminary analyses ANOVA in R Interpretations of ANOVA results What’s next? Post-hoc test Issue of multiple testing Post-hoc tests in R and their interpretation Tukey HSD test Dunnett’s test Other p-values adjustment methods Visualization of ANOVA and post-hoc tests on the same plot Summary References Introduction ANOVA (ANalysis Of VAriance) is a statistical test to determine whether two or more population means are different.Why do I have a data science blog? 7 benefits of sharing your code
https://statsandr.com/blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/
Wed, 02 Sep 2020 00:00:00 +0000https://statsandr.com/blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/#1 Learn by writing #2 Get feedback #3 Personal note to remind my future self #4 Contribute to the open source community #5 Stay humble, stay curious #6 Learn to be less perfectionist and to prioritize #7 Build connections and professional relationships How to start your own blog? My blog statsandr.com was launched in December 2019. Although 9 months of writing is a very short period compared to others, I can already say that it’s been an incredible and very enriching adventure!Graphics in R with ggplot2
https://statsandr.com/blog/graphics-in-r-with-ggplot2/
Fri, 21 Aug 2020 00:00:00 +0000https://statsandr.com/blog/graphics-in-r-with-ggplot2/Introduction Data Basic principles of {ggplot2} Create plots with {ggplot2} Scatter plot Line plot Combination of line and points Histogram Density plot Combination of histogram and densities Boxplot Barplot Further personalization Title and axis labels Axis ticks Log transformations Limits Scales for better axis formats Legend Shape, color, size and transparency Text and labels Smooth and regression lines Facets Themes Interactive plot with {plotly} Combine plots with {patchwork} Flip coordinates Save plot Managing dates Highlight data with {gghighlight} Tip To go further Introduction R is known to be a really powerful programming language when it comes to graphics and visualizations (in addition to statistics and data science of course!Mortgage calculator in R Shiny
https://statsandr.com/blog/mortgage-calculator-r-shiny/
Fri, 14 Aug 2020 00:00:00 +0000https://statsandr.com/blog/mortgage-calculator-r-shiny/Introduction Mortgage calculator How to use the mortgage calculator? Code of the app Introduction I recently moved out and bought my first apartment. Of course, I could not pay it entirely with my own savings, so I had to borrow money from the bank. I visited a couple of banks operating in my country and asked for a mortgage.
If you already bought your house or apartment in the past, you know how it goes: the bank analyzes your financial and personal situation and make an offer based on your propensity to repay the bank.Outliers detection in R
https://statsandr.com/blog/outliers-detection-in-r/
Tue, 11 Aug 2020 00:00:00 +0000https://statsandr.com/blog/outliers-detection-in-r/Introduction Descriptive statistics Minimum and maximum Histogram Boxplot Percentiles Z-scores Hampel filter Statistical tests Grubbs’s test Dixon’s test Rosner’s test Additional remarks References Introduction An outlier is a value or an observation that is distant from other observations, that is to say, a data point that differs significantly from other data points. Enderlein (1987) goes even further as the author considers outliers as values that deviate so much from other observations one might suppose a different underlying sampling mechanism.Wilcoxon test in R: how to compare 2 groups under the non-normality assumption?
https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/
Sun, 07 Jun 2020 00:00:00 +0000https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/Introduction Two different scenarios Independent samples Paired samples Combination of plot and statistical test Independent samples Paired samples Assumption of equal variances References Introduction In a previous article, we showed how to compare two groups under different scenarios using the Student’s t-test. The Student’s t-test requires that the distributions follow a normal distribution when in presence of small samples.1 In this article, we show how to compare two groups when the normality assumption is violated, using the Wilcoxon test.How to publish a Shiny app? An example with shinyapps.io
https://statsandr.com/blog/how-to-publish-shiny-app-example-with-shinyapps-io/
Fri, 29 May 2020 00:00:00 +0000https://statsandr.com/blog/how-to-publish-shiny-app-example-with-shinyapps-io/Introduction Prerequisite Step-by-step guide Additional notes Settings of your app Publish your dataset Introduction The COVID-19 virus led many people to create interactive apps and dashboards. A reader recently asked me how to publish a Shiny app she just created. Similarly to a previous article where I show how to upload R code on GitHub, I thought it would be useful to some people to see how I publish my Shiny apps so they could do the same.Correlation coefficient and correlation test in R
https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/
Thu, 28 May 2020 00:00:00 +0000https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/Introduction Data Correlation coefficient Between two variables Correlation matrix: correlations for all variables Interpretation of a correlation coefficient Visualizations A scatterplot for 2 variables Scatterplots for several pairs of variables Another simple correlation matrix Correlation test For 2 variables For several pairs of variables Combination of correlation coefficients and correlation tests Correlograms Correlation does not imply causation References Introduction Correlations between variables play an important role in a descriptive analysis.How to upload your R code on GitHub? An example with an R script on MacOS
https://statsandr.com/blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/
Sun, 24 May 2020 00:00:00 +0000https://statsandr.com/blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/Introduction Prerequisite Step-by-step guide Additional notes Introduction Few days ago, a colleague asked me how to upload some R code on GitHub in order to make it accessible to everyone. Due to the lockdown, I could not just go into his office and show him on his computer. So I sent him several screenshots showing, step by step, how to do so.
Right before I deleted the screenshots I’d just taken, I thought that perhaps they would be useful for other persons, so I wrote this article.Press
https://statsandr.com/press/
Sun, 24 May 2020 00:00:00 +0000https://statsandr.com/press/In the news Here is a roll-up of press mentions of the blog:
How can we predict the evolution of COVID 19 in Belgium? (UCLouvain: in English & in French) Evolution of COVID-19 hospital admissions in Belgium (LN24) Contact You can contact me here.
Social profiles Twitter Medium LinkedIn GitHub COVID-19 in Belgium: is it over yet?
https://statsandr.com/blog/covid-19-in-belgium-is-it-over-yet/
Fri, 22 May 2020 00:00:00 +0000https://statsandr.com/blog/covid-19-in-belgium-is-it-over-yet/Introduction New hospital admissions Overall By period Zooming in Patients in hospitals Patients in intensive care Confirmed cases By province By age group and sex Static Dynamic By age group, sex and province Introduction Note 1: The present article has been written on May 22, 2020 and has been updated infrequently. The current situation regarding COVID-19 in Belgium may therefore be different to what is presented below.One-proportion and chi-square goodness of fit test
https://statsandr.com/blog/one-proportion-and-goodness-of-fit-test-in-r-and-by-hand/
Wed, 13 May 2020 00:00:00 +0000https://statsandr.com/blog/one-proportion-and-goodness-of-fit-test-in-r-and-by-hand/Introduction In R Data One-proportion test Assumption of prop.test() and binom.test() Chi-square goodness of fit test Assumptions Does my distribution follow a given distribution? Observed frequencies Expected frequencies Observed vs. expected frequencies By hand One-proportion test Verification in R Goodness of fit test Verification in R Introduction In a previous article, I presented the Chi-square test of independence in R which is used to test the independence between two categorical variables.A package to download free Springer books during Covid-19 quarantine
https://statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/
Sun, 26 Apr 2020 00:00:00 +0000https://statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/Update Introduction Installation Download all books at once Create a table of Springer books Download only specific books By title By author By subject Improvements Acknowledgments Update The promotion has ended so it is not possible to download the books through R. If you did not download the books in time, you can still have access to them via this link.COVID-19 in Belgium
https://statsandr.com/blog/covid-19-in-belgium/
Tue, 31 Mar 2020 00:00:00 +0000https://statsandr.com/blog/covid-19-in-belgium/Introduction Top R resources on Coronavirus Coronavirus dashboard for your own country Motivations, limitations and structure of the article Analysis of Coronavirus in Belgium A classic epidemiological model: the SIR model Fitting a SIR model to the Belgium data Reproduction number \(R_0\) Using our model to analyze the outbreak if there was no intervention More summary statistics Additional considerations Ascertainment rates More sophisticated models Modelling the epidemic trajectory using log-linear models Estimating changes in the effective reproduction number \(R_e\) More sophisticated projections Conclusion References Introduction The Novel COVID-19 Coronavirus is still spreading quickly in several countries and it does not seem like it is going to stop anytime soon as the peak has not yet been reached in many countries.How to create a simple Coronavirus dashboard specific to your country in R?
https://statsandr.com/blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/
Mon, 23 Mar 2020 00:00:00 +0000https://statsandr.com/blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/Introduction Top R resources on Coronavirus Coronavirus dashboard: the case of Belgium How to create your own Coronavirus dashboard Additional notes Data Open source Accuracy Publish your dashboard Coronavirus dashboard: the case of Belgium
Introduction The Novel COVID-19 Coronavirus is the hottest topic right now. Every day, the media and newspapers share the number of new cases and deaths in several countries, try to measure the impacts of the virus on citizens and remind us to stay home in order to stay safe.How to do a t-test or ANOVA for more than one variable at once in R?
https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/
Thu, 19 Mar 2020 00:00:00 +0000https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/Introduction Perform multiple tests at once Concise and easily interpretable results T-test Additional p-value adjustment methods ANOVA To go even further Update with the {ggstatsplot} package References Introduction As part of my teaching assistant position in a Belgian university, students often ask me for some help in their statistical analyses for their master’s thesis.
A frequent question is how to compare groups of patients in terms of several quantitative continuous variables.Top 100 R resources on COVID-19 Coronavirus
https://statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/
Thu, 12 Mar 2020 00:00:00 +0000https://statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/R Shiny apps and dashboards Coronavirus tracker Coronavirus dashboard from the {coronavirus} package Visualization of Covid-19 Cases Modeling COVID-19 Spread vs Healthcare Capacity COVID-19 Data Visualization Platform Coronavirus 10-day forecast Coronavirus (COVID-19) across the world COVID-19 outbreak Flatten the Curve Explore the spread of Covid-19 Governments and COVID-19 Simulating COVID-19 Epidemic in Togo - West Africa Covid-19 Prediction Covid-19 Dashboard Healthcare worker deaths from novel Coronavirus (COVID-19) in the US Covid-19 Hospitalizations in Belgium COVIDMINDER: Where you live matters!How to perform a one-sample t-test by hand and in R: test on one mean
https://statsandr.com/blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/
Mon, 09 Mar 2020 00:00:00 +0000https://statsandr.com/blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/Introduction Null and alternative hypothesis Hypothesis testing Two versions of the one-sample t-test How to compute the one-sample t-test by hand? Scenario 1: variance of the population is known Scenario 2: variance of the population is unknown Different underlying distributions for the critical value How to compute the one-sample t-test in R? Scenario 1: variance of the population is known Scenario 2: variance of the population is unknown Confidence interval Combination of plot and statistical test Scenario 2: variance of the population is unknown Assumptions References Introduction After having written an article on the Student’s t-test for two samples (independent and paired samples), I believe it is time to explain in details how to perform one-sample t-tests by hand and in R.The 9 concepts and formulas in probability that every data scientist should know
https://statsandr.com/blog/the-9-concepts-and-formulas-in-probability-that-every-data-scientist-should-know/
Tue, 03 Mar 2020 00:00:00 +0000https://statsandr.com/blog/the-9-concepts-and-formulas-in-probability-that-every-data-scientist-should-know/What is probability? 1. A probability is always between 0 and 1 2. Compute a probability 3. Complement of an event 4. Union of two events 5. Intersection of two events 6. Independence of two events 7. Conditional probability Bayes’ theorem Example 8. Accuracy measures False negatives False positives Sensitivity Specificity Positive predictive value Negative predictive value 9. Counting techniques Multiplication Example Permutation Example By hand In R Combination Example By hand In R What is probability?FAQ - Frequently asked questions
https://statsandr.com/faq/
Mon, 02 Mar 2020 00:00:00 +0000https://statsandr.com/faq/Who is behind this blog? What is your background? Why did you launch this blog? What technology and theme do you use to write this blog and the articles? I am new to this blog, to R or to statistics, from where can I start? Can I reuse or translate the content of your blog? I would like to replicate an analysis you have done in one of your article, can I have access to the entire code?Student's t-test in R and by hand: how to compare two groups under different scenarios?
https://statsandr.com/blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/
Fri, 28 Feb 2020 00:00:00 +0000https://statsandr.com/blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/Introduction Null and alternative hypothesis Hypothesis testing Different versions of the Student’s t-test How to compute Student’s t-test by hand? Scenario 1: Independent samples with 2 known variances Scenario 2: Independent samples with 2 equal but unknown variances Scenario 3: Independent samples with 2 unequal and unknown variances Scenario 4: Paired samples where the variance of the differences is known Scenario 5: Paired samples where the variance of the differences is unknown How to compute Student’s t-test in R?Correlogram in R: how to highlight the most correlated variables in a dataset
https://statsandr.com/blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/
Sat, 22 Feb 2020 00:00:00 +0000https://statsandr.com/blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/Introduction Correlation matrix Correlogram Correlation test Code {ggstatsplot} package {lares} package All possible correlations Correlation of one variable against all others References Introduction Correlation, often computed as part of descriptive statistics, is a statistical tool used to study the relationship between two variables, that is, whether and how strongly couples of variables are associated.
Correlations are measured between 2 variables at a time. Therefore, for datasets with many variables, computing correlations can become quite cumbersome and time consuming.Getting started in R markdown
https://statsandr.com/blog/getting-started-in-r-markdown/
Tue, 18 Feb 2020 00:00:00 +0000https://statsandr.com/blog/getting-started-in-r-markdown/R Markdown: what, why and how? Before you start Components of a .Rmd file YAML header Code chunks Text Code inside text Highlight text like it is code Images Tables Additional notes and useful resources If you have spent some time writing code in R, you probably have heard of generating dynamic reports incorporating R code, R outputs (results) and text or comments.The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R
https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/
Thu, 13 Feb 2020 00:00:00 +0000https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/What is clustering analysis? Application 1: Computing distances Solution k-means clustering Application 2: k-means clustering Data kmeans() with 2 groups Quality of a k-means partition nstart for several initial centers and better stability kmeans() with 3 groups Optimal number of clusters Elbow method Silhouette method Gap statistic method Consensus-based algorithm Visualizations Manual application and verification in R Solution by hand Solution in R Hierarchical clustering Application 3: hierarchical clustering Data Solution by hand Single linkage Complete linkage Average linkage Solution in R Single linkage Optimal number of clusters Complete linkage Average linkage k-means versus hierarchical clustering What’s next?Contribute
https://statsandr.com/contribute/
Sat, 08 Feb 2020 00:00:00 +0000https://statsandr.com/contribute/Stats and R welcomes guest posts that provides unique insight into statistics and R.
How can you contribute? If you want to contribute and write for statsandr.com, please submit your article through this contribution form.
Once your guest post is received, I will review it and inform you about the decision (i.e., accepted, rejected, or accepted with minor changes).
Submission rules and guidelines Before submitting your article, please read the following points:An efficient way to install and load R packages
https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/
Fri, 31 Jan 2020 00:00:00 +0000https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/What is a R package and how to use it? Inefficient way to install and load R packages More efficient way Most efficient way {pacman} package {librarian} package What is a R package and how to use it? Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections of functions and datasets developed and published by R users are called packages.Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R
https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/
Wed, 29 Jan 2020 00:00:00 +0000https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/What is a normal distribution? Empirical rule Parameters Probabilities and standard normal distribution Areas under the normal distribution in R and by hand Ex. 1 In R By hand Ex. 2 In R By hand Ex. 3 In R By hand Ex. 4 In R By hand Ex. 5 Why is the normal distribution so crucial in statistics?Fisher's exact test in R: independence test for a small sample
https://statsandr.com/blog/fisher-s-exact-test-in-r-independence-test-for-a-small-sample/
Tue, 28 Jan 2020 00:00:00 +0000https://statsandr.com/blog/fisher-s-exact-test-in-r-independence-test-for-a-small-sample/Introduction Hypotheses Example Data Observed frequencies Expected frequencies Fisher’s exact test in R Conclusion and interpretation Combination of plot and statistical test References Introduction After presenting the Chi-square test of independence by hand and in R, this article focuses on the Fisher’s exact test.
Independence tests are used to determine if there is a significant relationship between two categorical variables. There exists two different types of independence test:Chi-square test of independence by hand
https://statsandr.com/blog/chi-square-test-of-independence-by-hand/
Mon, 27 Jan 2020 00:00:00 +0000https://statsandr.com/blog/chi-square-test-of-independence-by-hand/Introduction Hypotheses How the test works? Example Observed frequencies Expected frequencies Test statistic Critical value Conclusion and interpretation Introduction Chi-square tests of independence test whether two qualitative variables are independent, that is, whether there exists a relationship between two categorical variables. In other words, this test is used to determine whether the values of one of the 2 qualitative variables depend on the values of the other qualitative variable.Chi-square test of independence in R
https://statsandr.com/blog/chi-square-test-of-independence-in-r/
Mon, 27 Jan 2020 00:00:00 +0000https://statsandr.com/blog/chi-square-test-of-independence-in-r/Introduction Data Chi-square test of independence in R Conclusion and interpretation Combination of plot and statistical test Introduction This article explains how to perform the Chi-square test of independence in R and how to interpret its results. To learn more about how the test works and how to do it by hand, I invite you to read the article “Chi-square test of independence by hand”.
To briefly recap what have been said in that article, the Chi-square test of independence tests whether there is a relationship between two categorical variables.How to create a timeline of your CV in R?
https://statsandr.com/blog/how-to-create-a-timeline-of-your-cv-in-r/
Sun, 26 Jan 2020 00:00:00 +0000https://statsandr.com/blog/how-to-create-a-timeline-of-your-cv-in-r/Introduction Minimal reproducible example How to personalize it Additional note Introduction In this article, I show how to create a timeline of your CV in R. A CV timeline illustrates key information about your education, work experiences and extra activities. The main advantage of CV timelines compared to regular CV is that they make you stand out immediately by being visually appealing and easier to scan.RStudio addins, or how to make your coding life easier?
https://statsandr.com/blog/rstudio-addins-or-how-to-make-your-coding-life-easier/
Sun, 26 Jan 2020 00:00:00 +0000https://statsandr.com/blog/rstudio-addins-or-how-to-make-your-coding-life-easier/What are RStudio addins? Installation Addins Esquisse ggThemeAssist Questionr Recoding factors Reordering factors Categorize a numeric variable Remedy Styler Snakecaser ViewPipeSteps Ymlthis Reprex Blogdown What are RStudio addins? Although I have been using RStudio for several years, I only recently discovered RStudio addins. Since then, I am using these addins almost every time I use RStudio.
What are RStudio addins?Descriptive statistics in R
https://statsandr.com/blog/descriptive-statistics-in-r/
Wed, 22 Jan 2020 00:00:00 +0000https://statsandr.com/blog/descriptive-statistics-in-r/Introduction Data Minimum and maximum Range Mean Median First and third quartile Other quantiles Interquartile range Standard deviation and variance Summary Coefficient of variation Mode Correlation Contingency table Mosaic plot Barplot Histogram Boxplot Dotplot Scatterplot Line plot QQ-plot For a single variable By groups Density plot Correlation plot Advanced descriptive statistics {summarytools} package Frequency tables with freq() Cross-tabulations with ctable() Descriptive statistics with descr() Data frame summaries with dfSummary() describeBy() from the {psych} package aggregate() function summaryBy() from {doBy} group_by() and summarise() from {dplyr} Introduction This article explains how to compute the main descriptive statistics in R and how to present them graphically.Support my work
https://statsandr.com/support/
Tue, 21 Jan 2020 00:00:00 +0000https://statsandr.com/support/On this blog, I share my knowledge in the form of free articles and tutorials about statistics and R. My goal with the blog is to help people to understand statistical concepts (through examples and in plain English), and to apply them in R. When possible, I also contribute to open source projects on GitHub.
All the articles, Shiny apps and code are open source and available to everyone (code available directly in the articles or on GitHub).Tips and tricks in RStudio and R Markdown
https://statsandr.com/blog/tips-and-tricks-in-rstudio-and-r-markdown/
Tue, 21 Jan 2020 00:00:00 +0000https://statsandr.com/blog/tips-and-tricks-in-rstudio-and-r-markdown/Run code Insert a comment in R and R Markdown Knit a R Markdown document Code snippets Ordered list in R Markdown New code chunk in R Markdown Reformat code RStudio addins {pander} and {report} for aesthetics Extract equation model with {equatiomatic} Print model’s parameters Pipe operator %>% Others If you have the chance to work with an experienced programmer, you may be amazed by how fast she can write code.Descriptive statistics by hand
https://statsandr.com/blog/descriptive-statistics-by-hand/
Sat, 18 Jan 2020 00:00:00 +0000https://statsandr.com/blog/descriptive-statistics-by-hand/Introduction Location versus dispersion measures Location Minimum and maximum Mean Median Odd number of observations Even number of observations Mean vs. median \(1^{st}\) and \(3^{rd}\) quartiles \(q_{0.25}\), \(q_{0.75}\) and \(q_{0.5}\) A note on deciles and percentiles Mode Quantitative variables Qualitative variables Dispersion Range Standard deviation Standard deviation for a population Standard deviation for a sample Variance Variance for a population Variance for a sample Standard deviation vs.What is the difference between population and sample?
https://statsandr.com/blog/what-is-the-difference-between-population-and-sample/
Sat, 18 Jan 2020 00:00:00 +0000https://statsandr.com/blog/what-is-the-difference-between-population-and-sample/Introduction Sample vs. population Why a sample? Representative sample Paired samples Conclusion Introduction People often fail to properly distinguish between population and sample. It is however essential in any statistical analysis, starting from descriptive statistics with different formulas for variance and standard deviation depending on whether we face a sample or a population.
Moreover, the branch of statistics called inferential statistics is often defined as the science of drawing conclusions about a population from observations made on a representative sample of that population.A Shiny app for inferential statistics by hand
https://statsandr.com/blog/a-shiny-app-for-inferential-statistics-by-hand/
Wed, 15 Jan 2020 00:00:00 +0000https://statsandr.com/blog/a-shiny-app-for-inferential-statistics-by-hand/A Shiny app for inferential statistics: hypothesis tests and confidence intervals
Statistics is divided into four main branches:
Descriptive statistics Inferential statistics Predictive analysis Exploratory analysis Descriptive statistics provide a summary of the data; it helps explaining the data in a concise way without losing too much information. Data can be summarized numerically or graphically. See descriptive statistics by hand or in R to learn more about this branch of statistics.A Shiny app for simple linear regression by hand and in R
https://statsandr.com/blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/
Wed, 15 Jan 2020 00:00:00 +0000https://statsandr.com/blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/A Shiny app to perform simple linear regression (by hand and in R)
Simple linear regression is a statistical method to summarize and study relationships between two variables. When more than two variables are of interest, it is referred as multiple linear regression. See this article on linear regression for more details.
In this article, we focus only on a Shiny app which allows to perform simple linear regression by hand and in R:World map of visited countries in R
https://statsandr.com/blog/world-map-of-visited-countries-in-r/
Thu, 09 Jan 2020 00:00:00 +0000https://statsandr.com/blog/world-map-of-visited-countries-in-r/Like me, if you like traveling as much as R you might want to draw a world map of the countries you have visited in R. Below an example with the countries I have visited as of January 2020:A practical guide on optimal asset allocation
https://statsandr.com/blog/practical-guide-on-optimal-asset-allocation/
Tue, 07 Jan 2020 00:00:00 +0000https://statsandr.com/blog/practical-guide-on-optimal-asset-allocation/A Shiny app with an example of optimal asset allocation
In his book A Random Walk down Wall Street, Burton G. Malkiel advises readers of an optimal asset allocation depending on age. As an amateur investor, I thought it would be useful to develop a Shiny app which depicts his advice for other interested investors. Here is the link to the app:
Optimal asset allocation How to use this app?Draw a word cloud with a R Shiny app
https://statsandr.com/blog/draw-a-word-cloud-with-a-shiny-app/
Tue, 07 Jan 2020 00:00:00 +0000https://statsandr.com/blog/draw-a-word-cloud-with-a-shiny-app/Word cloud in a Shiny app
Below a Shiny app to help you draw a word cloud:
Word cloud Word clouds are particularly useful as part of text mining analyses. Moreover, it is also useful to analyze string and character variables for any datasets (see the different data types in R).
How to use this app? In Word source you can see two examples of word clouds with preloaded texts.How to embed a Shiny app in blogdown?
https://statsandr.com/blog/how-to-embed-a-shiny-app-in-blogdown/
Tue, 07 Jan 2020 00:00:00 +0000https://statsandr.com/blog/how-to-embed-a-shiny-app-in-blogdown/If you have developed and deployed a Shiny app and would like to embed it in blogdown, follow these steps:
create a new post as usual add output: html_document if it is not already included in the YAML metadata insert the following HTML code in the body of the post: <iframe height="800" width="100%" frameborder="no" src="https://antoinesoetewey.shinyapps.io/statistics-201/"> </iframe> You should change the URL with the URL of your deployed Shiny app (after src=, do not forget that the URL should start with http:// or https:// and should be surrounded by "A guide on how to read statistical tables
https://statsandr.com/blog/a-guide-on-how-to-read-statistical-tables/
Mon, 06 Jan 2020 00:00:00 +0000https://statsandr.com/blog/a-guide-on-how-to-read-statistical-tables/Shiny app to compute probabilities for the main probability distributions
Below a Shiny app to help you read the main statistical tables:
Statistics-101 This Shiny app helps you to compute probabilities for the main probability distributions.
How to use this app? Open the app via this link Choose the distribution Set the parameter(s) of the distribution (the parameters depend of course on the chosen distribution) Select whether you want to find the lower tail, upper tail or an interval Choose the value of x On the right panel (or below depending on the size of your screen) you will see:Newsletter
https://statsandr.com/subscribe/
Tue, 31 Dec 2019 00:00:00 +0000https://statsandr.com/subscribe/By subscribing to this newsletter you will be notified each time a new article is published. You can unsubscribe at anytime and your email address will never be shared.
#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own Mailchimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file.Data types in R
https://statsandr.com/blog/data-types-in-r/
Mon, 30 Dec 2019 00:00:00 +0000https://statsandr.com/blog/data-types-in-r/What data types exist in R? Numeric Integer Character Factor Logical This article presents the different data types in R. To learn about the different variable types from a statistical point of view, read “Variable types and examples”.
What data types exist in R? There are the 6 most common data types in R:
Numeric Integer Complex Character Factor Logical Datasets in R are often a combination of these 6 different data types.Variable types and examples
https://statsandr.com/blog/variable-types-and-examples/
Mon, 30 Dec 2019 00:00:00 +0000https://statsandr.com/blog/variable-types-and-examples/Introduction Different types of variables for different types of statistical analysis Big picture Quantitative Discrete Continuous Qualitative Nominal Ordinal Variable transformations From continuous to discrete From quantitative to qualitative Additional notes Misleading data encoding Introduction If you happen to work with datasets frequently, you probably know that each row of your dataset represents a different experimental unit (also called observation) and each column represents a different characteristic (called variable):How to create an interactive booklist with automatic Amazon affiliate links in R?
https://statsandr.com/blog/how-to-create-an-interactive-booklist-with-automatic-amazon-affiliate-links-in-r/
Thu, 26 Dec 2019 00:00:00 +0000https://statsandr.com/blog/how-to-create-an-interactive-booklist-with-automatic-amazon-affiliate-links-in-r/Introduction Requirements Create a booklist Create it in Excel then import it Create it directly in R Make it interactive Add URLs with your affiliate link to the table Extract affiliate link Append the book title and author to make it automatic Add links to the interactive table Final result Introduction Booklists are a useful way to share the books you have read and which you recommend to other readers and/or to promote the books you have written.Terms and policies
https://statsandr.com/terms/
Wed, 25 Dec 2019 00:00:00 +0000https://statsandr.com/terms/This is my personal blog written and edited by me (Antoine Soetewey). Your use of this website, in any and all forms, constitutes an acceptance of these terms and policies. This page is reviewed and revised from time to time.
All content provided is for informational purposes only. The articles and posts on this website are my own and do not necessarily represent the positions, strategies, or opinions of my employer or its subsidiaries.Data manipulation in R
https://statsandr.com/blog/data-manipulation-in-r/
Tue, 24 Dec 2019 00:00:00 +0000https://statsandr.com/blog/data-manipulation-in-r/Introduction Vectors Concatenation seq() and rep() Assignment Elements of a vector Type and length Finding the vector type Modifications of type and length Numerical operators Logical operators all() and any() Operations on character strings vector Orders and vectors Factors Creating factors Properties Handling Lists Creating lists Handling Getting details on an object Data frames Line and column names Subset a data frame First or last observations Random sample of observations Based on row or column numbers Based on variable names Based on one or multiple criterion Create a new variable Transform a continuous variable into a categorical variable Sum and mean in rows Sum and mean in column Categorical variables and labels management Recode categorical variables Change reference level Rename variable names Create a data frame manually Merging two data frames Add new observations from another data frame Add new variables from another data frame Missing values Remove NAs Impute NAs Scale Dates and times Dates Times Extraction from dates Exporting and saving Looking for help Introduction Not all data frames are as clean and tidy as you would expect.Links
https://statsandr.com/links/
Tue, 24 Dec 2019 00:00:00 +0000https://statsandr.com/links/ Below a list of useful links.
Contributions to online blogs:
www.r-bloggers.com antoinesoetewey.medium.com rweekly.org Additional resources:
R for Data Science and Advanced R (excellent free books on R, written by Garrett Grolemund and Hadley Wickham) statisticsbyjim.com delladata.fr (in French) Sitemap
https://statsandr.com/sitemap/
Tue, 24 Dec 2019 00:00:00 +0000https://statsandr.com/sitemap/A list of all the pages and articles found on the blog. If you cannot find what you are looking for, do not hesitate to contact me.
For you robots out there is an XML version available for digesting as well.
Pages Home Blog Tags About Contact Subscribe to the newsletter FAQ - Frequently asked questions Contribute - Guest post Support the blog Press Useful links Terms and policies Sitemap How to import an Excel file in RStudio?
https://statsandr.com/blog/how-to-import-an-excel-file-in-rstudio/
Wed, 18 Dec 2019 00:00:00 +0000https://statsandr.com/blog/how-to-import-an-excel-file-in-rstudio/Introduction Transform an Excel file to a CSV file R working directory Get working directory Set working directory User-friendly method Via the console Via the text editor Import your dataset User-friendly way Via the text editor Import SPSS (.sav) files Introduction As we have seen in this article on how to install R and RStudio, R is useful for many kind of computational tasks and statistical analyses.Contact
https://statsandr.com/contact/
Tue, 17 Dec 2019 00:00:00 +0000https://statsandr.com/contact/Thanks in advance for contacting me.
In order for me to answer you as soon as possible, here are the best communication methods:
Due to the increasing number of questions received, responding to each of them by email has become unmanageable and unproductive. Therefore, if you have a question, I invite you to add it as a comment at the end of the corresponding article. This way, other readers can benefit from the discussion and that saves me from answering the same thing several times.How to install R and RStudio?
https://statsandr.com/blog/how-to-install-r-and-rstudio/
Tue, 17 Dec 2019 00:00:00 +0000https://statsandr.com/blog/how-to-install-r-and-rstudio/What is R and RStudio? R RStudio How to install R and RStudio? The main components of RStudio Examples of code Calculator Comments Store and print values Vectors Matrices Generate random values Plot What is R and RStudio? R The statistical program R is nothing more than a programming language, mainly used for data manipulation and to perform statistical analyses. At the time of writing, this language is (one of) the leading program in statistics, although not the only programming language used by statisticians.About me
https://statsandr.com/about/
Mon, 16 Dec 2019 00:00:00 +0000https://statsandr.com/about/My name is Antoine Soetewey. I am a PhD candidate in statistics at UCLouvain (Belgium) within the Institute of Statistics, Biostatistics and Actuarial Sciences. My research interests focus on survival analysis and bio-statistical procedures applied to cancer patients.
In parallel with my doctoral thesis, I am teaching assistant for several courses in statistics and probability at bachelor and master’s level. I also provide trainings/workshops and consulting in data science, statistics and R programming as part of UCLouvain’s technology platform for Statistical Methodology and Computing Service.Hello World!
https://statsandr.com/blog/hello-world/
Mon, 16 Dec 2019 00:00:00 +0000https://statsandr.com/blog/hello-world/hello world
This is the first post for the blog Stats and R, just to introduce it. This blog aims at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R.
The goal of this website is to make statistics easy to understand by illustrating with examples and using plain English. When possible, for all statistical concepts covered here, I also write an article on how to apply these concepts in R.