Stats and R
https://statsandr.com/
Recent content on Stats and RHugo -- gohugo.ioenMon, 16 Dec 2019 00:00:00 +0000Hypothesis test by hand
https://statsandr.com/blog/hypothesis-test-by-hand/
Wed, 27 Jan 2021 00:00:00 +0000https://statsandr.com/blog/hypothesis-test-by-hand/Descriptive versus inferential statistics Motivations and limitations Hypothesis test Why? When? How? Method A: Comparing the test statistic with the critical value Step #1: Stating the null and alternative hypothesis Step #2: Computing the test statistic Step #3: Finding the critical value Step #4: Concluding and interpreting the results Why don’t we accept \(H_0\)? Method B: Comparing the p-value with the significance level \(\alpha\) Step #1: Stating the null and alternative hypothesis Step #2: Computing the test statistic Step #3: Computing the p-value Step #4: Concluding and interpreting the results Method C: Comparing the target parameter with the confidence interval Step #1: Stating the null and alternative hypothesis Step #2: Computing the confidence interval Step #3: Concluding and interpreting the results Which method to choose?How to track the performance of your blog in R?
https://statsandr.com/blog/track-blog-performance-in-r/
Wed, 16 Dec 2020 00:00:00 +0000https://statsandr.com/blog/track-blog-performance-in-r/Introduction Prerequisites Analytics Users, page views and sessions Sessions over time Sessions per channel Sessions per day of week Sessions per day and time Sessions per month and year Top performing pages Time-normalized page views Page views by country Browser information User engagement by devices Content Finding topics Content distribution A small note about ads Future plans Thank you note Introduction Stats and R has been launched on December 16, 2019.Paper: 'Waiting period from diagnosis for mortgage insurance issued to cancer survivors'
https://statsandr.com/blog/waiting-period-cancer-survivors/
Mon, 23 Nov 2020 00:00:00 +0000https://statsandr.com/blog/waiting-period-cancer-survivors/I am happy to announce that our paper entitled “Waiting period from diagnosis for mortgage insurance issued to cancer survivors” has been published in the European Actuarial Journal.
Here is a brief summary of it:
Massart (2018) testimonial illustrates the difficulties faced by patients having survived cancer to access mortgage insurance securing home loan. Data collected by national registries nevertheless suggest that excess mortality due to some types of cancer becomes moderate or even negligible after some waiting period.ANOVA in R
https://statsandr.com/blog/anova-in-r/
Mon, 12 Oct 2020 00:00:00 +0000https://statsandr.com/blog/anova-in-r/Introduction Data Aim and hypotheses of ANOVA Underlying assumptions of ANOVA Variable type Independence Normality Equality of variances - homogeneity Another method to test normality and homogeneity ANOVA Preliminary analyses ANOVA in R Interpretations of ANOVA results What’s next? Post-hoc test Issue of multiple testing Post-hoc tests in R and their interpretation Tukey HSD test Dunnett’s test Other p-values adjustment methods Visualization of ANOVA and post-hoc tests on the same plot Summary Introduction ANOVA (ANalysis Of VAriance) is a statistical test to determine whether two or more population means are different.Why do I have a data science blog? 7 benefits of sharing your code
https://statsandr.com/blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/
Wed, 02 Sep 2020 00:00:00 +0000https://statsandr.com/blog/7-benefits-of-sharing-your-code-in-a-data-science-blog/#1 Learn by writing #2 Get feedback #3 Personal note to remind my future self #4 Contribute to the open source community #5 Stay humble, stay curious #6 Learn to be less perfectionist and to prioritize #7 Build connections and professional relationships How to start your own blog? My blog statsandr.com was launched in December 2019. Although 9 months of writing is a very short period compared to others, I can already say that it’s been an incredible and very enriching adventure!Graphics in R with ggplot2
https://statsandr.com/blog/graphics-in-r-with-ggplot2/
Fri, 21 Aug 2020 00:00:00 +0000https://statsandr.com/blog/graphics-in-r-with-ggplot2/Introduction Data Basic principles of {ggplot2} Create plots with {ggplot2} Scatter plot Line plot Combination of line and points Histogram Density plot Combination of histogram and densities Boxplot Barplot Further personalization Title and axis labels Axis ticks Log transformations Limits Scales for better axis formats Legend Shape, color, size and transparency Text and labels Smooth and regression lines Facets Themes Interactive plot with {plotly} Combine plots with {patchwork} Flip coordinates Save plot Managing dates Tip To go further Introduction R is known to be a really powerful programming language when it comes to graphics and visualizations (in addition to statistics and data science of course!Mortgage calculator in R Shiny
https://statsandr.com/blog/mortgage-calculator-r-shiny/
Fri, 14 Aug 2020 00:00:00 +0000https://statsandr.com/blog/mortgage-calculator-r-shiny/Introduction Mortgage calculator How to use the mortgage calculator? Code of the app Introduction I recently moved out and bought my first apartment. Of course, I could not pay it entirely with my own savings, so I had to borrow money from the bank. I visited a couple of banks operating in my country and asked for a mortgage.
If you already bought your house or apartment in the past, you know how it goes: the bank analyzes your financial and personal situation and make an offer based on your propensity to repay the bank.Outliers detection in R
https://statsandr.com/blog/outliers-detection-in-r/
Tue, 11 Aug 2020 00:00:00 +0000https://statsandr.com/blog/outliers-detection-in-r/Introduction Descriptive statistics Minimum and maximum Histogram Boxplot Percentiles Hampel filter Statistical tests Grubbs’s test Dixon’s test Rosner’s test Additional remarks References Introduction An outlier is a value or an observation that is distant from other observations, that is to say, a data point that differs significantly from other data points. Enderlein (1987) goes even further as the author considers outliers as values that deviate so much from other observations one might suppose a different underlying sampling mechanism.Wilcoxon test in R: how to compare 2 groups under the non-normality assumption
https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/
Sun, 07 Jun 2020 00:00:00 +0000https://statsandr.com/blog/wilcoxon-test-in-r-how-to-compare-2-groups-under-the-non-normality-assumption/Introduction Two different scenarios Independent samples Paired samples Assumption of equal variances Introduction In a previous article, we showed how to compare two groups under different scenarios using the Student’s t-test. The Student’s t-test requires that the distributions follow a normal distribution.1 In this article, we show how to compare two groups when the normality assumption is violated, using the Wilcoxon test.
The Wilcoxon test is a non-parametric test, meaning that it does not rely on data belonging to any particular parametric family of probability distributions.How to publish a Shiny app: example with shinyapps.io
https://statsandr.com/blog/how-to-publish-shiny-app-example-with-shinyapps-io/
Fri, 29 May 2020 00:00:00 +0000https://statsandr.com/blog/how-to-publish-shiny-app-example-with-shinyapps-io/Introduction Prerequisite Step-by-step guide Additional notes Introduction The COVID-19 virus led many people to create interactive apps and dashboards. A reader recently asked me how to publish a Shiny app she just created. Similarly to a previous article where I show how to upload R code on GitHub, I thought it would be useful to some people to see how I publish my Shiny apps so they could do the same.Correlation coefficient and correlation test in R
https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/
Thu, 28 May 2020 00:00:00 +0000https://statsandr.com/blog/correlation-coefficient-and-correlation-test-in-r/Introduction Data Correlation coefficient Between two variables Correlation matrix: correlations for all variables Interpretation of a correlation coefficient Visualizations A scatterplot for 2 variables Scatterplots for several pairs of variables Another simple correlation matrix Correlation test For 2 variables For several pairs of variables Combination of correlation coefficients and correlation tests Correlation does not imply causation References Introduction Correlations between variables play an important role in a descriptive analysis.How to upload your R code on GitHub: example with an R script on MacOS
https://statsandr.com/blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/
Sun, 24 May 2020 00:00:00 +0000https://statsandr.com/blog/how-to-upload-r-code-on-github-example-with-an-r-script-on-mac-os/Introduction Prerequisite Step-by-step guide Additional notes Introduction Few days ago, a colleague asked me how to upload some R code on GitHub in order to make it accessible to everyone. Due to the lockdown, I could not just go into his office and show him on his computer. So I sent him several screenshots showing, step by step, how to do so.
Right before I deleted the screenshots I’d just taken, I thought that perhaps they would be useful for other persons, so I wrote this article.Press
https://statsandr.com/press/
Sun, 24 May 2020 00:00:00 +0000https://statsandr.com/press/In the news Here is a roll-up of press mentions of the blog:
How can we predict the evolution of COVID 19 in Belgium? (UCLouvain: in English & in French) Evolution of COVID-19 hospital admissions in Belgium (LN24) Contact You can contact me here.
Social profiles Twitter Medium LinkedIn GitHub COVID-19 in Belgium: is it over yet?
https://statsandr.com/blog/covid-19-in-belgium-is-it-over-yet/
Fri, 22 May 2020 00:00:00 +0000https://statsandr.com/blog/covid-19-in-belgium-is-it-over-yet/Introduction New hospital admissions Overall By period Zooming in Patients in hospitals Patients in intensive care Confirmed cases By province By age group and sex Static Dynamic By age group, sex and province Introduction Note 1: The present article has been written on May 22, 2020 and has been updated infrequently. The current situation regarding COVID-19 in Belgium may therefore be different to what is presented below.One-proportion and goodness of fit test (in R and by hand)
https://statsandr.com/blog/one-proportion-and-goodness-of-fit-test-in-r-and-by-hand/
Wed, 13 May 2020 00:00:00 +0000https://statsandr.com/blog/one-proportion-and-goodness-of-fit-test-in-r-and-by-hand/Introduction In R Data One-proportion test Assumption of prop.test() and binom.test() Chi-square goodness of fit test Does my distribution follow a given distribution? Observed frequencies Expected frequencies Observed vs. expected frequencies By hand One-proportion test Verification in R Goodness of fit test Verification in R Introduction In a previous article, I presented the Chi-square test of independence in R which is used to test the independence between two categorical variables.A package to download free Springer books during Covid-19 quarantine
https://statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/
Sun, 26 Apr 2020 00:00:00 +0000https://statsandr.com/blog/a-package-to-download-free-springer-books-during-covid-19-quarantine/Update Introduction Installation Download all books at once Create a table of Springer books Download only specific books By title By author By subject Improvements Acknowledgments Update The promotion has ended so it is not possible to download the books through R. If you did not download the books in time, you can still have access to them via this link.COVID-19 in Belgium
https://statsandr.com/blog/covid-19-in-belgium/
Tue, 31 Mar 2020 00:00:00 +0000https://statsandr.com/blog/covid-19-in-belgium/Introduction Top R resources on Coronavirus Coronavirus dashboard for your own country Motivations, limitations and structure of the article Analysis of Coronavirus in Belgium A classic epidemiological model: the SIR model Fitting a SIR model to the Belgium data Reproduction number \(R_0\) Using our model to analyze the outbreak if there was no intervention More summary statistics Additional considerations Ascertainment rates More sophisticated models Modelling the epidemic trajectory using log-linear models Estimating changes in the effective reproduction number \(R_e\) More sophisticated projections Conclusion References Introduction The Novel COVID-19 Coronavirus is still spreading quickly in several countries and it does not seem like it is going to stop anytime soon as the peak has not yet been reached in many countries.How to create a simple Coronavirus dashboard specific to your country in R
https://statsandr.com/blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/
Mon, 23 Mar 2020 00:00:00 +0000https://statsandr.com/blog/how-to-create-a-simple-coronavirus-dashboard-specific-to-your-country-in-r/Introduction Top R resources on Coronavirus Coronavirus dashboard: the case of Belgium How to create your own Coronavirus dashboard Additional notes Data Open source Accuracy Publish your dashboard Coronavirus dashboard: the case of Belgium
Introduction The Novel COVID-19 Coronavirus is the hottest topic right now. Every day, the media and newspapers share the number of new cases and deaths in several countries, try to measure the impacts of the virus on citizens and remind us to stay home in order to stay safe.How to do a t-test or ANOVA for more than one variable at once in R
https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/
Thu, 19 Mar 2020 00:00:00 +0000https://statsandr.com/blog/how-to-do-a-t-test-or-anova-for-many-variables-at-once-in-r-and-communicate-the-results-in-a-better-way/Introduction Perform multiple tests at once Concise and easily interpretable results T-test Additional p-value adjustment methods ANOVA To go even further References Introduction As part of my teaching assistant position in a Belgian university, students often ask me for some help in their statistical analyses for their master’s thesis.
A frequent question is how to compare groups of patients in terms of several quantitative continuous variables.Top 100 R resources on Novel COVID-19 Coronavirus
https://statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/
Thu, 12 Mar 2020 00:00:00 +0000https://statsandr.com/blog/top-r-resources-on-covid-19-coronavirus/R Shiny apps and dashboards Coronavirus tracker Coronavirus dashboard from the {coronavirus} package COVID-19 Global Cases Visualization of Covid-19 Cases Modeling COVID-19 Spread vs Healthcare Capacity COVID-19 Data Visualization Platform Coronavirus 10-day forecast Coronavirus (COVID-19) across the world COVID-19 outbreak Comparing Corona trajectories Flatten the Curve Explore the spread of Covid-19 Governments and COVID-19 Simulating COVID-19 Epidemic in Togo - West Africa Covid-19 Prediction Covid-19 Dashboard Healthcare worker deaths from novel Coronavirus (COVID-19) in the US Covid-19 Hospitalizations in Belgium COVIDMINDER: Where you live matters!How to perform a one sample t-test by hand and in R: test on one mean
https://statsandr.com/blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/
Mon, 09 Mar 2020 00:00:00 +0000https://statsandr.com/blog/how-to-perform-a-one-sample-t-test-by-hand-and-in-r-test-on-one-mean/Introduction Null and alternative hypothesis Hypothesis testing Two versions of the one sample t-test How to compute the one sample t-test by hand? Scenario 1: variance of the population is known Scenario 2: variance of the population is unknown Different underlying distributions for the critical value How to compute the one sample t-test in R? Scenario 1: variance of the population is known Scenario 2: variance of the population is unknown Confidence interval Assumptions Introduction After having written an article on the Student’s t-test for two samples (independent and paired samples), I believe it is time to explain in details how to perform one sample t-tests by hand and in R.The 9 concepts and formulas in probability that every data scientist should know
https://statsandr.com/blog/the-9-concepts-and-formulas-in-probability-that-every-data-scientist-should-know/
Tue, 03 Mar 2020 00:00:00 +0000https://statsandr.com/blog/the-9-concepts-and-formulas-in-probability-that-every-data-scientist-should-know/What is probability? 1. A probability is always between 0 and 1 2. Compute a probability 3. Complement of an event 4. Union of two events 5. Intersection of two events 6. Independence of two events 7. Conditional probability Bayes’ theorem Example 8. Accuracy measures False negatives False positives Sensitivity Specificity Positive predictive value Negative predictive value 9. Counting techniques Multiplication Example Permutation Example By hand In R Combination Example By hand In R What is probability?FAQ - Frequently asked questions
https://statsandr.com/faq/
Mon, 02 Mar 2020 00:00:00 +0000https://statsandr.com/faq/Who is behind this blog? What is your background? Why did you launch this blog? What technology and theme do you use to write this blog and the articles? I am new to this blog, to R or to statistics, from where can I start? Can I use your code or material in my own project? I would like to replicate an analysis you have done in one of your article, can I have access to the entire code?Student's t-test in R and by hand: how to compare two groups under different scenarios
https://statsandr.com/blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/
Fri, 28 Feb 2020 00:00:00 +0000https://statsandr.com/blog/student-s-t-test-in-r-and-by-hand-how-to-compare-two-groups-under-different-scenarios/Introduction Null and alternative hypothesis Hypothesis testing Different versions of the Student’s t-test How to compute Student’s t-test by hand? Scenario 1: Independent samples with 2 known variances Scenario 2: Independent samples with 2 equal but unknown variances Scenario 3: Independent samples with 2 unequal and unknown variances Scenario 4: Paired samples where the variance of the differences is known Scenario 5: Paired samples where the variance of the differences is unknown How to compute Student’s t-test in R?Correlogram in R: how to highlight the most correlated variables in a dataset
https://statsandr.com/blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/
Sat, 22 Feb 2020 00:00:00 +0000https://statsandr.com/blog/correlogram-in-r-how-to-highlight-the-most-correlated-variables-in-a-dataset/Introduction Correlation matrix Correlogram Correlation test Code {lares} package All possible correlations Correlation of one variable against all others References Introduction Correlation, often computed as part of descriptive statistics, is a statistical tool used to study the relationship between two variables, that is, whether and how strongly couples of variables are associated.
Correlations are measured between only 2 variables at a time. Therefore, for datasets with many variables, computing correlations can become quite cumbersome and time consuming.Getting started in R markdown
https://statsandr.com/blog/getting-started-in-r-markdown/
Tue, 18 Feb 2020 00:00:00 +0000https://statsandr.com/blog/getting-started-in-r-markdown/R Markdown: what, why and how? Before you start Components of a .Rmd file YAML header Code chunks Text Code inside text Highlight text like it is code Images Tables Additional notes and useful resources If you have spent some time writing code in R, you probably have heard of generating dynamic reports incorporating R code, R outputs (results) and text or comments. In this article, I will explain how R Markdown works and give you the basic elements you need to get started easily in the production of these dynamic reports.The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R
https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/
Thu, 13 Feb 2020 00:00:00 +0000https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/What is clustering analysis? Application 1: Computing distances Solution k-means clustering Application 2: k-means clustering Data kmeans() with 2 groups Quality of a k-means partition nstart for several initial centers and better stability kmeans() with 3 groups Optimal number of clusters Elbow method Silhouette method Gap statistic method NbClust() Visualizations Manual application and verification in R Solution by hand Solution in R Hierarchical clustering Application 3: hierarchical clustering Data Solution by hand Single linkage Complete linkage Average linkage Solution in R Single linkage Optimal number of clusters Complete linkage Average linkage k-means versus hierarchical clustering References What is clustering analysis?Contribute
https://statsandr.com/contribute/
Sat, 08 Feb 2020 00:00:00 +0000https://statsandr.com/contribute/Stats and R welcomes guest posts that provides unique insight into statistics and R.
How can you contribute? If you want to contribute and write for statsandr.com, please submit your article through this contribution form.
Once your guest post is received, I will review it and inform you about the decision (i.e., accepted, rejected, or accepted with minor changes).
Submission rules and guidelines Before submitting your article, please read the following points:An efficient way to install and load R packages
https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/
Fri, 31 Jan 2020 00:00:00 +0000https://statsandr.com/blog/an-efficient-way-to-install-and-load-r-packages/What is a R package and how to use it? Inefficient way to install and load R packages More efficient way Most efficient way {pacman} package {librarian} package What is a R package and how to use it? Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections of functions and datasets developed and published by R users are called packages.Do my data follow a normal distribution? A note on the most widely used distribution and how to test for normality in R
https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/
Wed, 29 Jan 2020 00:00:00 +0000https://statsandr.com/blog/do-my-data-follow-a-normal-distribution-a-note-on-the-most-widely-used-distribution-and-how-to-test-for-normality-in-r/What is a normal distribution? Empirical rule Parameters Probabilities and standard normal distribution Areas under the normal distribution in R and by hand Ex. 1 In R By hand Ex. 2 In R By hand Ex. 3 In R By hand Ex. 4 In R By hand Ex. 5 Why is the normal distribution so crucial in statistics? How to test the normality assumption Histogram Density plot QQ-plot Normality test References What is a normal distribution?Fisher's exact test in R: independence test for a small sample
https://statsandr.com/blog/fisher-s-exact-test-in-r-independence-test-for-a-small-sample/
Tue, 28 Jan 2020 00:00:00 +0000https://statsandr.com/blog/fisher-s-exact-test-in-r-independence-test-for-a-small-sample/Introduction Hypotheses Example Data Observed frequencies Expected frequencies Fisher’s exact test in R Conclusion and interpretation References Introduction After presenting the Chi-square test of independence by hand and in R, this article focuses on the Fisher’s exact test.
Independence tests are used to determine if there is a significant relationship between two categorical variables. There exists two different types of independence test:
the Chi-square test (the most common) the Fisher’s exact test On the one hand, the Chi-square test is used when the sample is large enough (in this case the \(p\)-value is an approximation that becomes exact when the sample becomes infinite, which is the case for many statistical tests).Chi-square test of independence by hand
https://statsandr.com/blog/chi-square-test-of-independence-by-hand/
Mon, 27 Jan 2020 00:00:00 +0000https://statsandr.com/blog/chi-square-test-of-independence-by-hand/Introduction Hypotheses How the test works? Example Observed frequencies Expected frequencies Test statistic Critical value Conclusion and interpretation Introduction Chi-square tests of independence test whether two qualitative variables are independent, that is, whether there exists a relationship between two categorical variables. In other words, this test is used to determine whether the values of one of the 2 qualitative variables depend on the values of the other qualitative variable.Chi-square test of independence in R
https://statsandr.com/blog/chi-square-test-of-independence-in-r/
Mon, 27 Jan 2020 00:00:00 +0000https://statsandr.com/blog/chi-square-test-of-independence-in-r/Introduction Data Chi-square test of independence in R Conclusion and interpretation Combination of plot and statistical test Introduction This article explains how to perform the Chi-square test of independence in R and how to interpret its results. To learn more about how the test works and how to do it by hand, I invite you to read the article “Chi-square test of independence by hand”.
To briefly recap what have been said in that article, the Chi-square test of independence tests whether there is a relationship between two categorical variables.How to create a timeline of your CV in R?
https://statsandr.com/blog/how-to-create-a-timeline-of-your-cv-in-r/
Sun, 26 Jan 2020 00:00:00 +0000https://statsandr.com/blog/how-to-create-a-timeline-of-your-cv-in-r/Introduction Minimal reproducible example How to personalize it Additional note Introduction In this article, I show how to create a timeline of your CV in R. A CV timeline illustrates key information about your education, work experiences and extra activities. The main advantage of CV timelines compared to regular CV is that they make you stand out immediately by being visually appealing and easier to scan. It also allows you to better present your “story” by showing the chronology of your jobs and activities and thus explain how you got to where you are today.RStudio addins, or how to make your coding life easier
https://statsandr.com/blog/rstudio-addins-or-how-to-make-your-coding-life-easier/
Sun, 26 Jan 2020 00:00:00 +0000https://statsandr.com/blog/rstudio-addins-or-how-to-make-your-coding-life-easier/What are RStudio addins? Installation Addins Esquisse Questionr Recoding factors Reordering factors Categorize a numeric variable Remedy Styler Snakecaser ViewPipeSteps Ymlthis Reprex Blogdown What are RStudio addins? Although I have been using RStudio for several years, I only recently discovered RStudio addins. Since then, I am using these addins almost every time I use RStudio.
What are RStudio addins? RStudio addins are extensions which provide a simple mechanism for executing advanced R functions from within RStudio.Descriptive statistics in R
https://statsandr.com/blog/descriptive-statistics-in-r/
Wed, 22 Jan 2020 00:00:00 +0000https://statsandr.com/blog/descriptive-statistics-in-r/Introduction Data Minimum and maximum Range Mean Median First and third quartile Other quantiles Interquartile range Standard deviation and variance Summary Coefficient of variation Mode Correlation Contingency table Mosaic plot Barplot Histogram Boxplot Dotplot Scatterplot Line plot QQ-plot For a single variable By groups Density plot Correlation plot Advanced descriptive statistics {summarytools} package Frequency tables with freq() Cross-tabulations with ctable() Descriptive statistics with descr() Data frame summaries with dfSummary() describeBy() from the {psych} package aggregate() function Introduction This article explains how to compute the main descriptive statistics in R and how to present them graphically.Support the blog
https://statsandr.com/support/
Tue, 21 Jan 2020 00:00:00 +0000https://statsandr.com/support/On this blog, I provide free articles and tutorials about statistics and R. My goal with the blog is to help people to understand statistical concepts (through examples and in plain English), and to apply them in R.
All the articles, Shiny apps and code are open source and available to everyone. I also try to reply to all questions I receive from readers by emails or as comments.
I work on the blog voluntarily in my spare time (when I am not too busy working on my PhD thesis).Tips and tricks in RStudio and R Markdown
https://statsandr.com/blog/tips-and-tricks-in-rstudio-and-r-markdown/
Tue, 21 Jan 2020 00:00:00 +0000https://statsandr.com/blog/tips-and-tricks-in-rstudio-and-r-markdown/Run code Insert a comment in R and R Markdown Knit a R Markdown document Code snippets Ordered list in R Markdown New code chunk in R Markdown Reformat code RStudio addins {pander} and {report} for aesthetics Extract equation model with {equatiomatic} Pipe operator %>% Others If you have the chance to work with an experienced programmer, you may be amazed by how fast she can write code.Descriptive statistics by hand
https://statsandr.com/blog/descriptive-statistics-by-hand/
Sat, 18 Jan 2020 00:00:00 +0000https://statsandr.com/blog/descriptive-statistics-by-hand/Introduction Location versus dispersion measures Location Minimum and maximum Mean Median Odd number of observations Even number of observations Mean vs. median \(1^{st}\) and \(3^{rd}\) quartiles \(q_{0.25}\), \(q_{0.75}\) and \(q_{0.5}\) A note on deciles and percentiles Mode Mode for qualitative variables Dispersion Range Standard deviation Standard deviation for a population Standard deviation for a sample Variance Variance for a population Variance for a sample Standard deviation vs.What is the difference between population and sample?
https://statsandr.com/blog/what-is-the-difference-between-population-and-sample/
Sat, 18 Jan 2020 00:00:00 +0000https://statsandr.com/blog/what-is-the-difference-between-population-and-sample/Introduction Sample vs. population Why a sample? Representative sample Paired samples Conclusion Introduction People often fail to properly distinguish between population and sample. It is however essential in any statistical analysis, starting from descriptive statistics with different formulas for variance and standard deviation depending on whether we face a sample or a population.
Moreover, the branch of statistics called inferential statistics is often defined as the science of drawing conclusions about a population from observations made on a representative sample of that population.A Shiny app for inferential statistics by hand
https://statsandr.com/blog/a-shiny-app-for-inferential-statistics-by-hand/
Wed, 15 Jan 2020 00:00:00 +0000https://statsandr.com/blog/a-shiny-app-for-inferential-statistics-by-hand/A Shiny app for inferential statistics: hypothesis tests and confidence intervals
Statistics is divided into four main branches:
Descriptive statistics Inferential statistics Predictive analysis Exploratory analysis Descriptive statistics provide a summary of the data; it helps explaining the data in a concise way without losing too much information. Data can be summarized numerically or graphically. See descriptive statistics by hand or in R to learn more about this branch of statistics.A Shiny app for simple linear regression by hand and in R
https://statsandr.com/blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/
Wed, 15 Jan 2020 00:00:00 +0000https://statsandr.com/blog/a-shiny-app-for-simple-linear-regression-by-hand-and-in-r/A Shiny app to perform simple linear regression (by hand and in R)
Simple linear regression is a statistical method to summarize and study relationships between two variables. When more than two variables are of interest, it is referred as multiple linear regression.
In this article, we focus only on a Shiny app which allows to perform simple linear regression by hand and in R:
Statistics-202 Here is the entire code (or see the last version on GitHub) in case you would like to enhance it.World map of visited countries in R
https://statsandr.com/blog/world-map-of-visited-countries-in-r/
Thu, 09 Jan 2020 00:00:00 +0000https://statsandr.com/blog/world-map-of-visited-countries-in-r/Like me, if you like traveling as much as R you might want to draw a world map of the countries you have visited in R. Below an example with the countries I have visited as of January 2020:A practical guide on optimal asset allocation
https://statsandr.com/blog/practical-guide-on-optimal-asset-allocation/
Tue, 07 Jan 2020 00:00:00 +0000https://statsandr.com/blog/practical-guide-on-optimal-asset-allocation/A Shiny app with an example of optimal asset allocation
In his book A Random Walk down Wall Street, Burton G. Malkiel advises readers of an optimal asset allocation depending on age. As an amateur investor, I thought it would be useful to develop a Shiny app which depicts his advice for other interested investors. Here is the link to the app:
Optimal asset allocation Here is the entire code (or see the last version on GitHub) in case you would like to enhance it.Draw a word cloud with a R Shiny app
https://statsandr.com/blog/draw-a-word-cloud-with-a-shiny-app/
Tue, 07 Jan 2020 00:00:00 +0000https://statsandr.com/blog/draw-a-word-cloud-with-a-shiny-app/Word cloud in a Shiny app
Below a Shiny app to help you draw a word cloud:
Word cloud Here is the entire code (or see the last version on GitHub) in case you would like to enhance it. See an example on how to use this app after the embedded code.
Word clouds are particularly useful as part of text mining analyses. Moreover, it is also useful to analyze string and character variables for any datasets (see the different data types in R).How to embed a Shiny app in blogdown?
https://statsandr.com/blog/how-to-embed-a-shiny-app-in-blogdown/
Tue, 07 Jan 2020 00:00:00 +0000https://statsandr.com/blog/how-to-embed-a-shiny-app-in-blogdown/If you have developed and deployed a Shiny app and would like to embed it in blogdown, follow these steps:
create a new post as usual add runtime: shiny (and output: html_document if it is not already included) in the YAML metadata insert the following HTML code in the body of the post: <iframe height="800" width="100%" frameborder="no" src="https://antoinesoetewey.shinyapps.io/statistics-201/"> </iframe> You should change the URL with the URL of your deployed Shiny app (after src=, do not forget that the URL should start with http:// or https:// and should be surrounded by "A guide on how to read statistical tables
https://statsandr.com/blog/a-guide-on-how-to-read-statistical-tables/
Mon, 06 Jan 2020 00:00:00 +0000https://statsandr.com/blog/a-guide-on-how-to-read-statistical-tables/Shiny app to compute probabilities for the main probability distributions
Below a Shiny app to help you read the main statistical tables:
Statistics-101 This Shiny app helps you to compute probabilities for the main probability distributions.
Here is the entire code (or see the last version on GitHub) in case you would like to enhance it. See an example on how to use this app after the embedded code.Newsletter
https://statsandr.com/subscribe/
Tue, 31 Dec 2019 00:00:00 +0000https://statsandr.com/subscribe/By subscribing to this newsletter you will be notified each time a new article is published. You can unsubscribe at anytime and your email address will never be shared.
#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; } /* Add your own Mailchimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file.Data types in R
https://statsandr.com/blog/data-types-in-r/
Mon, 30 Dec 2019 00:00:00 +0000https://statsandr.com/blog/data-types-in-r/What data types exist in R? Numeric Integer Character Factor Logical This article presents the different data types in R. To learn about the different variable types from a statistical point of view, read “Variable types and examples”.
What data types exist in R? There are the 6 most common data types in R:
Numeric Integer Complex Character Factor Logical Datasets in R are often a combination of these 6 different data types.Variable types and examples
https://statsandr.com/blog/variable-types-and-examples/
Mon, 30 Dec 2019 00:00:00 +0000https://statsandr.com/blog/variable-types-and-examples/Big picture Quantitative Discrete Continuous Qualitative Nominal Ordinal Variable transformations From continuous to discrete From quantitative to qualitative Additional notes Different types of variables for different types of statistical analysis Misleading data encoding This article presents the different variable types from a statistical point of view. To learn about the different data types in R, read “Data types in R”.How to create an interactive booklist with automatic Amazon affiliate links in R?
https://statsandr.com/blog/how-to-create-an-interactive-booklist-with-automatic-amazon-affiliate-links-in-r/
Thu, 26 Dec 2019 00:00:00 +0000https://statsandr.com/blog/how-to-create-an-interactive-booklist-with-automatic-amazon-affiliate-links-in-r/Introduction Requirements Create a booklist Create it in Excel then import it Create it directly in R Make it interactive Add URLs with your affiliate link to the table Extract affiliate link Append the book title and author to make it automatic Add links to the interactive table Final result Introduction Booklists are a useful way to share the books you have read and which you recommend to other readers and/or to promote the books you have written.Terms and policies
https://statsandr.com/terms/
Wed, 25 Dec 2019 00:00:00 +0000https://statsandr.com/terms/This is my personal blog written and edited by me. Your use of this website, in any and all forms, constitutes an acceptance of these terms and policies. This page is reviewed and revised from time to time.
All content provided is for informational purposes only. The articles and posts on this website are my own and do not necessarily represent the positions, strategies, or opinions of my employer or its subsidiaries.Data manipulation in R
https://statsandr.com/blog/data-manipulation-in-r/
Tue, 24 Dec 2019 00:00:00 +0000https://statsandr.com/blog/data-manipulation-in-r/Introduction Vectors Concatenation seq() and rep() Assignment Elements of a vector Type and length Finding the vector type Modifications of type and length Numerical operators Logical operators all() and any() Operations on character strings vector Orders and vectors Factors Creating factors Properties Handling Lists Creating lists Handling Getting details on an object Data frames Line and column names Subset a data frame First or last observations Random sample of observations Based on row or column numbers Based on variable names Based on one or multiple criterion Create a new variable Transform a continuous variable into a categorical variable Sum and mean in rows Sum and mean in column Categorical variables and labels management Recode categorical variables Change reference level Rename variable names Create a data frame manually Merging two data frames Add new observations from another data frame Add new variables from another data frame Missing values Remove NAs Impute NAs Scale Dates and times Dates Times Extraction from dates Exporting and saving Looking for help Introduction Not all data frames are as clean and tidy as you would expect.Links
https://statsandr.com/links/
Tue, 24 Dec 2019 00:00:00 +0000https://statsandr.com/links/ Below a list of useful links.
Contributions to online blogs:
www.r-bloggers.com antoinesoetewey.medium.com rweekly.org Additional resources:
R for Data Science and Advanced R (excellent free books on R, written by Garrett Grolemund and Hadley Wickham) statisticsbyjim.com delladata.fr (in french) www.r-users.com (jobs in R) Sitemap
https://statsandr.com/sitemap/
Tue, 24 Dec 2019 00:00:00 +0000https://statsandr.com/sitemap/A list of all the pages and articles found on the blog. If you cannot find what you are looking for, do not hesitate to contact me.
For you robots out there is an XML version available for digesting as well.
Pages Home Blog Tags About Contact Subscribe to the newsletter FAQ - Frequently asked questions Contribute - Guest post Support the blog Press Useful links Terms and policies Sitemap How to import an Excel file in RStudio?
https://statsandr.com/blog/how-to-import-an-excel-file-in-rstudio/
Wed, 18 Dec 2019 00:00:00 +0000https://statsandr.com/blog/how-to-import-an-excel-file-in-rstudio/Introduction Transform an Excel file to a CSV file R working directory Get working directory Set working directory User-friendly method Via the console Via the text editor Import your dataset User-friendly way Via the text editor Import SPSS (.sav) files Introduction As we have seen in this article on how to install R and RStudio, R is useful for many kind of computational tasks and statistical analyses.Contact
https://statsandr.com/contact/
Tue, 17 Dec 2019 00:00:00 +0000https://statsandr.com/contact/Thanks in advance for contacting me.
In order for me to answer you as soon as possible, here are the best communication methods:
If you have a question or a suggestion related to an article, I invite you to add it as a comment at the end of the corresponding article so other readers can benefit from the discussion For mistakes or bugs—and that can happen, we are all human after all—you can inform me about them by raising an issue on GitHub For all other requests, please use the contact form below If you need to send a file, first fill in the contact form to which I will reply and from there you will be able to send me your file (this is to limit spam) For a quick answer to your question, make sure to first check the FAQ and comments from other readers.How to install R and RStudio?
https://statsandr.com/blog/how-to-install-r-and-rstudio/
Tue, 17 Dec 2019 00:00:00 +0000https://statsandr.com/blog/how-to-install-r-and-rstudio/What is R and RStudio? R RStudio The main components of RStudio Examples of code Calculator Comments Store and print values Vectors Matrices Generate random values Plot What is R and RStudio? R The statistical program R is nothing more than a programming language, mainly used for data manipulation and to perform statistical analyses. At the time of writing, this language is (one of) the leading program in statistics, although not the only programming language used by statisticians.About
https://statsandr.com/about/
Mon, 16 Dec 2019 00:00:00 +0000https://statsandr.com/about/Hello, my name is Antoine Soetewey. I am a PhD student in statistics at UCLouvain (Belgium) within the Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA). My research interests focus on survival analysis and bio-statistical procedures applied to cancer patients.
In parallel with my doctoral thesis, I am teaching assistant for several courses in statistics and probability at bachelor and master’s level. I also provide trainings/workshops and consulting in statistics and R programming as part of UCLouvain's technology platform for Statistical Methodology and Computing Service (SMCS).Hello World!
https://statsandr.com/blog/hello-world/
Mon, 16 Dec 2019 00:00:00 +0000https://statsandr.com/blog/hello-world/hello world
This is the first post for the blog Stats and R, just to introduce it. This blog aims at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R.
The goal of this website is to make statistics easy to understand by illustrating with examples and using plain English. When possible, for all statistical concepts covered here, I also write an article on how to apply these concepts in R.