Jump to navigation
Qu’est-ce que le moteur de recherche PSL-Explore ?
C’est un outil qui agrège les ressources des bibliothèques, des services d’archives, des musées, ainsi que les publications scientifiques des établissements de PSL.
Vous y trouverez :
Comment trouver une ressource ?
Saisir une requête : sujet, auteur, titre d’une revue ou d’un livre, etc., et lancer la recherche.
Règles d’écriture :
Liste de résultats :
Les résultats s’affichent classés par pertinence. Il est possible de modifier le tri par défaut en cliquant sur le bouton TRIÉ PAR.
Compte-tenu du nombre important de ressources indexées par le moteur de recherche, il est recommandé d’affiner les résultats en utilisant les filtres situés à gauche de l’écran.
Il est possible de limiter les résultats :
Plusieurs filtres peuvent être combinés (exemple : ressources en ligne + français).
Les onglets :
Dans la liste des résultats, vous trouverez plusieurs onglets pour chaque document :
Data visualization is a recurrent problem in all domains of data-driven science.
Despite the existence of a plethora of data visualization possibilities, there is an over-prevalence of bar plots. Given that bar plots reveal precious little about the distribution of data, this type of visualization can be misleading.
A “plotting revolution” has been started by a group of young researchers at the DEC (the ENS’s Department of Cognitive Studies). The project’s goal is to communicate about the limitations of bar plots and the necessity of clear and complete data visualization.
They've created a campaign to encourage discussion about data visualization, and specifically the drawbacks of the ever-present bar plots that you can find in all scientific journals. [Read more about their project on their Kickstarter page, which explains their motivations and goals.]
Why is this such a big issue? Let’s take a look.
We’ve all been there. Everyone (including myself!) has published a paper with bar plots; they are after all the most common form of data visualization in many scientific fields.
Hopefully after this demonstration, though, you will be convinced that they are an inadequate tool for representing distributions, and should be left for count data if you feel a table won’t suffice.
For this tutorial, let us picture two sets of fake data. Both of them have the same mean (100) and the same standard deviation (20), but as we’ll see they are very different in terms of their distributions.
Let us first make some histograms to show that the two data sets have very different distributions. The first data set has a normal distribution, while the second one is logarithmic.
This difference can also be seen in the boxplots, although to a lesser extent. The box for the normal distribution has roughly evenly sized quartiles on either sides of the median, and only a couple outliers.
Conversely, the logarithmic data has a long left-skewed distribution, as can be seen by all of the low value outliers.
However, if we turn to bar plots, we see that they are identical! Both data sets have the same mean and standard deviation. Hence it becomes clear that bar plots completely lose the difference between the two data sets.
This can be problematic if, for instance, you want to run a statistical test to test for differences between these two groups.
Firstly, it’s a problem because many statistical tests require a normal distribution of the data, and a bar plot will not warn you that part of your data is non-normal.
Secondly, if you run a test only looking at means, you won’t be able to tell that the two groups are different in regards to distributions.
Whenever plotting data with a distribution, boxplots are to be preferred over bar plots.
N.B. The present demonstration was prepared using the free, open-access statistics tool R. For more information on installing and using R, check out Page Piccinini’s R course on her blog.