Fantastic plots and how to draw them

October 29, 2020

Information

Toma Tebaldi
Associate Research Scientist, Section of Hematology, Yale Cancer Center, Yale School of Medicine
YCCEH Seminar
October 15, 2020

ID5825

To CiteDCA Citation Guide

00:00Topic of today's is about speaking
00:04about data visualization.
00:05And so it will be very in general on
00:09ramps how to design some strategy,
00:12some issues, some principles to guide in
00:15the visualization of the of our data.
00:20So data visualization is important to
00:22explore the data and this is particularly
00:25crucial since nowadays data are becoming
00:28much more complex and much more bigger,
00:31and so in general there
00:33is a rise of data science.
00:35So not only in research,
00:37not only in biological research.
00:40The second function to data
00:43visualization for data visualization
00:45is to communicate the data and
00:47that may be the most traditional.
00:50I'm so that's what that.
00:54Creating publication figures,
00:55for example,
00:56is about to communicate data to others
01:01because communicating data visually is
01:04more efficient than than words in general.
01:09So in order to represent complex data here,
01:13I collected 3.
01:16General challenges and aims.
01:18So whenever you plot the data is
01:22important that the plots are and
01:24the representations are are precise,
01:27so they're truthful.
01:28That means that distortion has
01:31to be avoided as much as possible
01:34is not always achievable,
01:36so distortion sometimes is unavoidable.
01:39Think about for example,
01:41when you plot the 2D Maps
01:44for representing 3D data.
01:46But the point is that the
01:48distortion doesn't have to convey
01:49the message of the figure,
01:51so it has to be something that is not
01:54related to the main message of the feature.
01:57Otherwise it's a problem.
01:58Then the second point is clarity.
02:01So data the figure has not to be ambiguous,
02:06and the third one is the efficiency.
02:09So every.
02:11Inca every in every pixel is precious,
02:14so each decision in doing your plotter,
02:17each decision on the color on the size
02:20on the number of layers said that you
02:24that you plotter is important and it has an.
02:28Everything has to be has to have a purpose,
02:32so you should reduce the
02:34so called chartjunk here.
02:36Below the slide you see
02:38quotation from Edward.
02:39After that I discovered by the way.
02:43Only yesterday is that he
02:45never knew I'm I'm here from.
02:48Since three years and they
02:50never known that Edward Tufte,
02:52it is the most one of the
02:56most celebrated visualization.
02:58Antisa is is is there in new heaven.
03:03And the condition is that with an
03:05image you have to give to the viewer
03:07the greatest number of ideas in the
03:09shortest time and with the least possible,
03:11Inc.
03:13This is another general representation
03:15of his that you should consider
03:18to make a good visualization.
03:20Also, this is very general, so it's not.
03:23It's not only about science
03:25and basically they criteria are
03:27organized in four different sets,
03:29so you need to represent the information,
03:32but the fever also need to display
03:34to convey to communicate a story and
03:37that's the concept of the figure.
03:40This is connected also with the goal of the.
03:43All of the figures, so the function.
03:47So what is the message that you
03:50want to display and also the visual
03:53format is important.
03:55Obviously the weight of these four
03:58different layers is different in
04:00different applications for images,
04:02so visual form probably is more
04:05important for artistic display,
04:07while for scientific displays that
04:09probably in formation goal and
04:12story are more important.
04:14This doesn't mean that you should not
04:17consider also the visual visual part.
04:20I ideally the perfect visualization
04:22is at the center of these four steps.
04:28So this is for the introduction now.
04:31The rest of the presentation will be
04:33structured with some very concrete examples,
04:35and it's also organized in
04:37a way that is interactive,
04:39so I will show something.
04:41Some example of figures and I will try to
04:44ask you what could be wrong with this figure.
04:48Starting with this,
04:49but this is a figure that is very
04:52frequent in scientific publication.
04:55It's a barplot and it's the most.
04:58It's actually the most frequent disk image
05:01that you can find in biomedical journals.
05:08Do you have any ideas of what
05:11could be wrong with this?
05:13Not pretty, yes. Lots of it.
05:18It. It's lacking data.
05:21It's it's not showing you
05:23the data distribution.
05:24Yeah, yes, exactly there are.
05:26It's putting the treatment on the left,
05:29which I always don't like.
05:31I always want the control on the
05:33left. Oh yes, OK.
05:35Yes, that's true, yes,
05:36so it has a lot of like. Capital A.
05:42Visual problems, but the main yes.
05:44The main thing is that it doesn't
05:47show the data, so that's the main.
05:49That's the main drawback of this image,
05:52and so, particularly in the last year.
05:54So that's the trend that a lot of also
05:57publishers are requesting in images.
05:59So the principle is that you need ideally to
06:02show always the data points in every figure,
06:05because you should show the data that
06:08make up your fingers and these for a
06:11barplot means that you have to show.
06:13They did data points.
06:16So here you see an example of how these
06:19bar plot can be represented with the
06:22data points and you see here on the
06:25on the right you see the the single
06:28data points and also you see a summary
06:30statistics that could be for example,
06:33they mean plus minus standard deviation
06:35for the treatment and the end the control.
06:38In general this showing barplots
06:40with only the mean with the standard
06:42deviation is a problem and there was
06:45a publication of five years ago.
06:47The teacher, wife and one issues,
06:49for example,
06:50that the different data distribution
06:52can lead to the same bar block.
06:55You see an example here.
06:57So in a you should use.
06:59You see a barplot representation
07:01of distribution of data and all the
07:04distribution that you see from B2ER
07:06representing could be represented
07:08by that padlock.
07:09So the ideal situation would be
07:11what you see here in plot.
07:14Be where you have data that are.
07:18Symmetrically distributed,
07:19so this is the if the distribution
07:21of your real data is.
07:23These are the bar plot is less problematic,
07:26but for example in C use your situation
07:28where you have an outlier and so
07:31for example this would mean that the
07:33supposed difference that you are
07:35showing in the padlock is not real,
07:38but it's present only because
07:39you have these outlier pointer.
07:41But most of the other data are
07:44overlapping in the two distributions.
07:46Sometimes as you see in the,
07:49this plot could hide some patterns in
07:52the data, so that's what you see here.
07:55In the do you see my cursor?
07:59Yes, OK,
08:00so this for example shows that there are.
08:05The distributions that you see
08:07here are by model.
08:08This could be linked, for example,
08:11to replicate for example,
08:12technical replicates and
08:13biological replicates,
08:14or it could be an important
08:16property of the data.
08:18Nevertheless,
08:18it's something that you cannot
08:20see if you represent with with
08:23a bar plot and the also Bartlett
08:25hide the number of data that are
08:27used to visualize the plot.
08:29The barplot themselves and so
08:31for example in EU situation,
08:33where you have an equal number of.
08:35Points for the black and the white are
08:39Bartlett on the left and the right.
08:42At this is a problem also when you
08:45want to show paired data in barplots.
08:49So again here you see a situation
08:53where a barplot
08:54displays some is the same.
08:57For situations that you see
08:59displayed in BC&D, so be Cmdr.
09:01Very different situations here
09:03you could imagine, for example,
09:04that this data obtained from single
09:06patients at treated with the dragon,
09:09and you measure a parameter of the patients,
09:11and so the information related to each
09:14patient has to be connected so that the
09:17meaning of the of the pair that plot.
09:20So the situation in B shows that the
09:22Dragon has a consistent effect on all
09:25the patients and you can see that.
09:28Calculating for each patient,
09:30the difference between the dots on
09:32the left and on the right give rise
09:35to this to this plot here below,
09:38where all the differences are
09:40positive and are also consistent in.
09:42See you see a situation where
09:44the drug has very big,
09:46very different effects depending
09:48on the patient.
09:50So that the distribution of
09:52the differences is skewed.
09:53And by the way,
09:55this line represents the median
09:57difference that you see for
09:59each patients for the treatment.
10:01And the third plot indeed that you see
10:03has a composition of effects that.
10:05So here you see that the again
10:08the difference is by model.
10:09That means that there are patients
10:11that do not respond to the dragon,
10:14and you see here with the
10:15horizontal lines and some patients
10:17that responded to the dragon.
10:19So the resulting distribution of the
10:21difference as you see here is by model.
10:24The problem with her plots
10:25are and the problem.
10:26Also,
10:27if you use barplots with paired
10:29data is that you don't see any.
10:31Any of this structure so
10:33you you are losing it.
10:34So the best way is always to show
10:36the dots are of your distribution,
10:38maybe together with the bar plots
10:40and if the data are paid also
10:42to show the single connection
10:44with in between the dots.
10:49There is also an issue about about the
10:52choice of displaying the meme of your data,
10:56for example versus the media,
10:58or to show the standard deviation
11:01versus the standard error of the mean,
11:04so mean versus median are ways to represent
11:07summary of the centrality of a distribution.
11:11An the mean is preferable if you
11:14suppose your data are for example.
11:17Symmetrically distributed.
11:18For example, if you assume that the data
11:21has a normal or Gaussian distribution,
11:23while the median represents the
11:25mid is the point that represents
11:28the middle of your data.
11:30The middle of your distribution,
11:32and it's more generally applied
11:34independently from the shaper
11:36of the distribution of the data.
11:38So here you see an example where you
11:41have four different samples population
11:44and you plot the mean plus the standard.
11:48And that's the most conventional
11:50way that you see in publication.
11:52They mean plus standard deviation.
11:55And the median of the population will
11:58receive the single point and the
12:00horizontal bar represents the median.
12:02So an important point about.
12:06Mean versus median is that the
12:08mean and can be used only with
12:11symmetrical distributions.
12:13Otherwise it can be misleading.
12:15While the median is more
12:17generally appropriate.
12:19When you have an outlier like that,
12:21you would always recommend the meat being.
12:24Honey. When you have an outlier
12:27like in the third group there,
12:29yeah, then it makes more sense to
12:31use the median.
12:33Yeah, nobody showed them young.
12:34Is that the median is more robust
12:37data with outliers is totally
12:38more robust with outliers,
12:40and the median is not,
12:42so the presence of over outlier as
12:44you see here in C can shift a lot the
12:48mean while the median is is affected,
12:50but not so much.
12:53Especially from the magnitude
12:55of the outlier,
12:56I would say. So
12:58tomorrow question right?
12:59So so also, being you know,
13:02knowing there's a difference
13:03between me and a medium,
13:05but one of the things I heard,
13:07of course, haven't looked
13:09myself into this deeply enough.
13:11Is that for the meeting the distribution,
13:13unlike mean not necessarily follows
13:15a Gaussian or normal distribution,
13:17so that from a statistical point of view
13:20is going to be a little hard to calculate,
13:23certain significance etc.
13:24Based on medium data.
13:26Is that true?
13:27Or it's simply a misnomer?
13:30How to calculate the
13:32significance of differences?
13:34That's a different.
13:36So that's the difference of the approach.
13:39If you choose parametric test,
13:41such as the tester or the ANOVA
13:44and those tests assume that the
13:47distribution is Goshen is normal.
13:49Yeah, so you need to be careful so he is
13:52usually if it is a repeated measures.
13:55So if you're testing repeated
13:56measure yes soon the error is is
13:59is distributed in a Goshen way,
14:01but that is not always the case.
14:03For example,
14:04if you're comparing two population of
14:07jeans with a signal for each gene.
14:09Just have to check it.
14:11Well, so so this is something that I
14:13think will be particularly important
14:15for experimental scientist, right?
14:17Because you know, as an experiment
14:18is when we are trained, we know OK,
14:21when we did design experiment,
14:22we do service replica so we can
14:24join error bar without thinking
14:26Y and how to deal with it.
14:28And if you go to a statistician
14:30that will tell you say oh look,
14:32if you're going to use the test,
14:35you have to show me first that this is
14:37actually largely a normal distribution
14:39before you can actually use the T test.
14:42Whereas the vast majority
14:43of people in the lab,
14:45that's not how they will
14:46think about in the 1st place,
14:49and they also not trendy enough
14:51to think you know how to prove
14:53or disprove that's the case.
14:55So what would you suggest,
14:57especially when we're doing
14:58experiment that you cannot do
15:00200 replicas for each experiment.
15:02So what would be a good
15:04approach in that regard?
15:06Yeah, so there is a tradeoff between
15:09the ideal situation where the ideal
15:11situation would be always to have
15:14enough data points so that you can
15:16understand the shape of the distribution
15:19and the real case scenario with you
15:22can do as many replicates as you can,
15:24and so usually you have to assume
15:27that the distribution is normal, so.
15:33Ideally, you should always check her.
15:36And again, if we are repeating measures
15:39and you are collecting a measure
15:41of the same data in a repeated,
15:44that way you can assume that
15:46if the error is stochastic,
15:48it should be normally distributed.
15:50So you assume that the distribution of
15:52the error is Goshen, and it makes sense.
15:55But for example in other situation
15:57where you have a lot of measurements
16:00and measurements of different entities,
16:02for example, the expression of
16:04different genes we're doing like.
16:07A compilation of Jesus.
16:08Then these assumption is less probable,
16:11is less likely,
16:12and you should have enough data points
16:15so that you can switch from parametric
16:18tests one on parametric, so we're not.
16:23That doesn't make assumption of
16:25on the underlying distribution.
16:26It is, for example,
16:27they will cook some test or the
16:30Mann Whitney test.
16:31And the problem is that you need them
16:34or replicates because if the end is
16:36the size is less than five, you don't.
16:39You cannot reach the statistical
16:43significance as it is accepted below 0.05,
16:47but it's usually the more correct way.
16:53Then they the standard is not to use that,
16:55and so I remember there was a case
16:57where the paper was in review.
16:59It was from.
17:02Young bean and I remember we performed
17:05the Wilcoxon test and the reviewers
17:07as to why we didn't do the parameter
17:10test so so they asked for the opposite.
17:13They asked us to go against the
17:15ideal situation.
17:18I think this is very helpful.
17:20I think it's really,
17:21you know telling about me,
17:22especially for people who
17:24are not familiar with with.
17:26Test and also the the World Cup test.
17:29I think it's really suggest
17:30you to look into that.
17:32Things can be very helpful.
17:33Yeah, and and obviously you're
17:35so it's important when you plan.
17:37If you if you can to have enough data
17:40points to perform a nonparametric test.
17:43In high throughput
17:44experiments that they see now,
17:45for example single cell that's
17:47not there anymore problem because
17:49you have usually a lot of data
17:51points and so that's less of a
17:53problem that sometimes we work.
17:55Is it after him because they thought
17:58the number of data are increasing?
18:00And not
18:01that generic comment comma. These people
18:04are a lot of these lot of our group is blood
18:10hematology researchers. Yeah,
18:11and neither blood nor blood advances require.
18:16The investigator in their papers
18:17to show all the
18:19data points. And now
18:21I'm on the publication committee.
18:22We've actually talked about this,
18:24but we go by the Journal of Cell Bio.
18:28Instructions to authors and prep for figures,
18:30and there are Rockefeller Press
18:32publication. And they
18:34haven't. So they have genome research
18:36and germ cell bio Med and stuff,
18:38so they haven't come around to making
18:41people show all their dots etc.
18:43But a number of journals, as you know,
18:46half like JC I you know JC AI
18:50advances etc. They might not
18:51even review your paper if
18:53you show, for instance,
18:54your plots on the left here.
18:56Well, they might, you know,
18:57might not even go out
18:59for review. The pre reviewer's will say
19:01you know your figures are inadequate
19:03for our instructions, authors etc etc.
19:05So I think some journals are
19:07coming around to this is the way
19:09we really want to see the data.
19:12Yeah, I think there is a shift
19:15in the paradigm, let's say,
19:16and it will take years.
19:18But for example, I have a slide here
19:21where so this is from my experience,
19:24so that for example,
19:25all the family of the network journals
19:28have already this policies for the figure.
19:31So this is something I received after
19:34the review of a paper as an editorial
19:38guidelines and the food for these like.
19:41Policies that I had to change a
19:44lot of figures and you see that.
19:47And so that the one of the
19:49policy as you see here,
19:50the last one is that for sample
19:52size that are less than 10.
19:54And they want you to get to plot
19:57the individual data points and
19:58so they don't accept bar graphs.
20:00Got bargraphs anymore.
20:03And then, for example,
20:05if you have some statistics such as
20:08error bars with the lesson 3 replicates,
20:11you have to remove,
20:13remove them and you have to show
20:15to show the data without the
20:18statistics without the error.
20:20Then this also is a point that
20:23you usually is not satisfied.
20:25So when you plot some statistical
20:28significance values,
20:29they don't accept anymore,
20:31they start the stars.
20:33But you have to provide the
20:35precise P value in the figure.
20:38It means that you have some stars.
20:40You have to change the stars that
20:42converting start to the precise P
20:45value before before publishing and
20:47then also you have to provide the
20:49precise number size for each of your bars.
20:52For example,
20:53I mean I,
20:53I think in the past it was enough
20:56to provide a range like from
20:58three to six replicates,
21:00but now they really want the number for each.
21:04For each app and population,
21:05for each sample that you have.
21:08So these are,
21:10in my experience were something
21:12that I had to provide that,
21:15but after the radio so it was not.
21:18It was the editorial like.
21:22At stage of acceptance of the paper,
21:24and I think this is true now for all
21:28the families of the of the natural.
21:32Jordans
21:34it can I add something?
21:36Although this is only for
21:38publication that goal of publication,
21:39but it's important that we start
21:42practicing all these rules in
21:44our daily life because it's so
21:46painful that you have to do this
21:49when you you're trying to get
21:51the figures into the Journal.
21:53It's a lot easier to do it while you're
21:56making the figures in real life.
21:59Yeah, so obviously it
22:01says worker before there.
22:02Yeah it says work because otherwise
22:05you have to repeat all day. Fevers so
22:10yeah also echo that,
22:11and also just want to say that you know
22:14I used to just use Excel to placings.
22:17But since my many of my lab members
22:19start to use Graphpad prism to plot,
22:22that makes a huge difference in
22:24converting between different types
22:25of parts such as this kind of things.
22:28If you had a bar bar graph,
22:30Indiana in that software,
22:31then you can very easily change that to a
22:34bar graph with different dots distributed.
22:36So it's very easy to work with.
22:40Yeah, that's also I have something
22:42at the end of the presentation.
22:44So basically there are a lot of tools now
22:46more or less commercial, but tequila.
22:49They aren't really available.
22:51U as which are too many different formats
22:55and starting with the same initial data,
22:58basically formatted as a table.
23:01So that from the same table you can switch
23:03to there too many different visualizations.
23:06So that's that's true,
23:08and it's probably easier also to plot these
23:11dots with single dots as it was in the past.
23:15Without respect.
23:18OK, so that was the main point of this part.
23:22I had a part on the standard
23:24deviation standard error.
23:26That's another issue because the
23:28standard error is basically the
23:29standard deviation divided by the square
23:32root of the number of experiments,
23:34and so usually the standard
23:36error is displayed.
23:37But you have just be careful that it's
23:40a measure that tends to go to zero
23:43just because they increase the number
23:45of replicates or the number of points.
23:48So you see an example here where
23:50it seems by plotting the standard
23:52error that the black bar and the
23:55white bar have the same like measure
23:58of spread of the data.
23:59But if you look at the standard
24:02deviation you see that this is
24:04an effect of the factor.
24:06Today the Black bar has higher spread,
24:08but also more points,
24:10and that's why the standard
24:12error seems seems the same.
24:16So that's another another issue.
24:18So obviously for publication at the
24:20standard error of the mean is preferred,
24:23because it usually gives an impression
24:26of the data being less sparse.
24:30But especially with different
24:31number of samples in different
24:33in different bars that it could.
24:35This could be misleading.
24:40And all these issues were presented
24:42in these in this paper published
24:44five years ago in in plus biology.
24:49I would skip this,
24:50just that we will touch this later,
24:52but an alternative solution if
24:54you have enough data points.
24:56So I would say more than 10.
24:59An alternative solution instead
25:01of showing like but lotsa Ann
25:03is to show the distribution of
25:05the data is box whisker plot.
25:07As you see here they have some light
25:11model with more details on this.
25:14OK, so this is the next example.
25:17I think it's a biplot with the
25:19usage of the different browsers,
25:22so this is extra science image so.
25:26So this is a classic example
25:29in like visualization lessons.
25:31So what could we run with this?
25:39There's no end. Yeah, so that's a yes,
25:44so there is no endless so that you cannot.
25:47You don't know of how many,
25:50how many data points you use that in
25:52order to build the other frequencies.
25:55Obviously pie charts are used to display
25:59frequencies and proportions of some
26:02classes that sum up to 100 or or to one.
26:05The main problem is that so
26:07the idea is that you shouldn't.
26:10You should avoid by chance.
26:12So the idea for displaying an
26:15information of our proportion or of
26:17a percentage as a pie chart are is.
26:21Not the best choice.
26:23Because that it was shown that humans
26:27are very bad at reading angles,
26:30so we're not very precise,
26:32precise in understanding differences between
26:35angles and so between the designs of the.
26:39Slices of the pie and so usually if you
26:44convert the pie chart into a bar plot.
26:47Information is much more clear.
26:49It's true that the pie
26:51chart is more aesthetic.
26:53Appeared, but the bar plotter
26:55is in in any circumstances,
26:57usually more affecting in displaying girl.
26:59For example, differences in the
27:02usage of this genome browsers.
27:04So this has been a long issue and if you
27:07in many presentation so there is always
27:10this suggestion to avoid at all a pie charts.
27:13There are also some example of these.
27:15So these are three pie charts and you can see
27:19that it's they are different from each other.
27:22But it's very difficult to
27:24understand that the difference,
27:25so the difference is is in the size
27:28of the slice of the three pies,
27:31but it's very different.
27:32For example, to understand in each
27:35pie which one is the largest slides.
27:38And to draw comparison it much more
27:41more easier to understand these issues.
27:44So which pie is larger if the information is
27:48not displaced is not displayed as pie charts,
27:51but as market.
27:54So that's on the web.
27:56I also found these provocative.
27:59Label of pie charts as lighters.
28:03So in general it would be better to avoid
28:06displaying information as pie chart.
28:08And prefer a bar chart instead
28:10to show the same information.
28:14OK, so that was faster.
28:16This is another example.
28:18What could be wrong with this plot?
28:20Again, we have a treatment.
28:21We have a control.
28:22This time we see the data point.
28:26Scale is so wrong, so it covers
28:29the distribution of the lower end.
28:31Yes, exactly so this is a case where
28:34most of the data are compressed,
28:37since they have very different magnitude.
28:39Most of the data are compressed
28:42air in a very small part of the
28:46plot and we cannot understand.
28:48Very much how they are distributed
28:51because most of the plotter is related
28:53to these kind of two outliers.
28:56So this is an issue with the measures
28:58that have different magnitudes,
29:00so it could in my experience it happens.
29:03For example in gene expression measurements.
29:07Because they can vary,
29:08especially with the sequencing.
29:10They can value of four to five.
29:13Magnitude and the main way to solve this
29:16issue is to log transform the data.
29:19So instead of plotting in a
29:21linear scale to log normalizing,
29:24the scale of the data and this
29:26allows to restrict the distance
29:28between these two points,
29:30the outliers,
29:31and allow you to see also the
29:34distribution of the points.
29:36That here seems all compressed.
29:40So usually log transformation allow you to
29:43capture some information on the difference
29:45of your points that are more clear.
29:48Not in all cases, but in some
29:50cases rather than displaying the
29:52information in a linear scale,
29:55especially when you have a lot of range
29:58between your minimal and maximal.
30:00Measurements. An alternative way.
30:04Is not also a panel breaks,
30:07so personally I prefer log log
30:10transformation over panel breaker because
30:13there is mathematically more likely.
30:16Linear or elegant,
30:18but there are situations where you can.
30:20You can choose so this is an example.
30:24You have a bar chart.
30:26You have a huge difference between
30:28the measurements of a 2D and E&F.
30:31So this is how you solve the problem by
30:35introducing a breaker in your panel.
30:38So from 25 to 200 to 210 and this is the
30:41equivalent solution by log transformation.
30:44As you see,
30:46the solution that the two solutions
30:49give a fight a similar result.
30:51But here you insert the manual break of
30:54the data and this could be misleading.
30:57Here you saw the issue by log
31:00transforming all the measurements.
31:02So this is for example is an advantage
31:04because it affects all the measurement.
31:07And while this panel breaker
31:09affects only for example,
31:11these two bars and could distort the data.
31:18Another another scenario where you should
31:22consider log transformation is these.
31:26This could be a plotter that shows for gene
31:30expression levels from Aaron Isike for.
31:33Population of jeans.
31:35So each gene could be a doctor
31:37and he received it.
31:39There is a different year age,
31:41but you see that there are outliers like
31:44for example genes of ribosomal proteins.
31:46Histones usually are in these.
31:49Are in this part of the plot,
31:52but most of the gene are 90% of
31:54your jeans are in this part of the
31:56plot and you cannot really see.
32:01You cannot really inspect them
32:03because most of the plot is
32:05dedicated to some outliers.
32:07So again, here is a situation
32:09where you can log transform.
32:11Both are the coordinates.
32:12So let's say that here is the
32:15control and this is the treatment
32:17and this will allow you to see more
32:20in detail the differences in the
32:22expression of the bug of your jeans.
32:28In a situation like Visa,
32:30you should also consider
32:32issue if you're interested,
32:34for example in showing differences in
32:37expression between 3 between a control.
32:39For example, are one and
32:41the treatment are two.
32:43You have also the possibility to show.
32:48As the Y axis,
32:50the differences in the log values.
32:52So this representation
32:53here is the same as this,
32:56but it maximizes the visualization
32:58of the differences in the
33:00expression levels of genes.
33:01So this is something that you
33:04find a cold as as an MA plot.
33:07It was introduced with the
33:09analysis of microarray data,
33:11but you can find it also
33:13with sequencing data.
33:15Sometimes these two different
33:16visualization are used.
33:17Depending on the aim of the figure,
33:20so sometimes you will find these,
33:21especially when the message of the
33:23figure is that you don't see big
33:25differences between the two conditions,
33:27while if the message is that you find big
33:29differences between the two condition,
33:31you will find mostly these visualization.
33:35So here I would just point out that
33:37in any at any sequencing experiments,
33:40you will probably never find any gene
33:44that is in this area because they.
33:48But most of the genes,
33:50the main difference they make
33:51the main like determinant,
33:53is the basil expression levels.
33:55So usually your perturbations do not
33:57affect so much the expression of a gene,
34:00so that the gene is in these
34:03area of the plot of the oranges.
34:06And that's why this visualization
34:08is much more efficient in capturing
34:10the expression differences.
34:12Because they scale on on the
34:15expression at baseline.
34:19OK, so now I have a section I don't
34:21know that I'm I have a section
34:24about how to display distributions.
34:29So let's say that
34:29we have a display. One time you had 15
34:32minutes and if we go a little over, that's
34:34OK, OK? So when you have to
34:37represent the distribution of data,
34:39you have many choices.
34:40The histogram is one of the most used choice.
34:44It has the advantage that it can present.
34:48With detail, the shape of the
34:51distribution of your data.
34:53And so basically you have a variable of
34:56interest that usually is a continuous
34:58variable and you wanted to show
35:01how this variable is distributed.
35:03So you divide the range of the values in
35:06some beans and then you count the number
35:09of points that fall inside each being.
35:12The issue with the histograms is
35:14that you should be careful when when
35:17building the histograms and when looking
35:19at the histograms that there are
35:22some are being arbitrary parameters.
35:24In building up his histogram,
35:26mainly the choice of the bin size.
35:30So this is an example where the same
35:33distribution of data that is the
35:35distribution of the price of abedy
35:38apartments in French City has been
35:40being there in two different ways.
35:43So here is the price and hear the bin sizes.
35:47So the size of each of the bin is 10.
35:52Dollars.
35:53While it in here on the writer it is
35:57of $2 so you can see that using more
36:00granular bins allow you to see some
36:04the presence of some accumulations
36:06in your data that you cannot really
36:09see with the larger bin size,
36:11and this could be important because
36:14these accumulation this probably
36:16are accumulation of price that are
36:19due to the fact that they are prices
36:22that are commonly used.
36:23By many different Airbnbs, for example,
36:26because they are multipliers of 50 or 100,
36:29for example.
36:30But the fact is that depending on
36:33the choice of the bin,
36:34you see a different story.
36:38And then you should be always
36:42careful to select bin size.
36:45That doesn't affect too much data.
36:49There are also software tools
36:51that calculates depending on
36:53your data depending on squared,
36:56your points are placed the best
36:58and size of the bins so that you
37:02reduce the distortion of your data.
37:09An alternative way to represent
37:10distribution is to use a density plot.
37:13So a density plot is basically
37:15a smoothing of a histogram.
37:18Here you collect being said and here
37:21use motor the shape of the distribution
37:23so that you have a continuous function.
37:27This is graphically nice.
37:30And it allows to compare,
37:31for example distribution of
37:33two variables as you see here
37:35in green and in and in Violet,
37:37and the advantages that you can see also
37:40complex shapes of the distribution.
37:42For example here the bimodality
37:44or hear the presence of this show
37:46is that of the distribution.
37:48The pitfall is similar to the histogram,
37:51so you should always be careful
37:53in selecting the.
37:54How much is Martha the distribution?
37:56And here you see an example.
37:58So these are the.
37:59Points that were used at the single
38:02points that were the that were used
38:04in order to build the distribution.
38:07They were randomly chosen from a normal
38:10distribution and you can see that.
38:12Problem is similar to the bin size,
38:14so here you have to select basically.
38:19A wavelength in order to approximate
38:21that the function to a curve
38:23and depending on the wavelength,
38:26the resolution of the
38:27wavelength that you choose.
38:29The result is different,
38:31so you could have this kind of plot
38:34that seems to show a lot of local pixel,
38:38but by smoothing more you have
38:41instead the normal distribution
38:43from which you draw the data so.
38:46There is a balance which appear
38:48in choosing beings that are two
38:51larger or hear excessive smoothing.
38:53Because these over simplifies
38:54the original distribution,
38:55but on the other side,
38:57if you take a resolution that is too small,
39:01too granular,
39:02you can obtain that strange effects.
39:04So you could see for example,
39:06pics that are depending on the
39:09extraction of random numbers.
39:11Again,
39:11also in this case there are softwares
39:15that given the the original data,
39:18your original vote data can calculate the
39:22optimal smoothing wavelength in order
39:26to avoid distortions based on your data.
39:29A compact way to represent the
39:31distribution is the box whisker plot,
39:34and here you can see how a box
39:36whisker plot they can be obtained
39:39by this distribution of 20 points.
39:41So basically the box whisker plot
39:43represents as a box 50% of the data
39:46of the distribution to.
39:48Usually you have a central line
39:50that is the media.
39:51It's important,
39:52not laminar,
39:53but in the box whisker is always the medium.
39:56This is the first quartile
39:58and the third quartile.
40:00420 Percent 25th percentile of the data.
40:0375th percentile of the data.
40:05So in the box you have 50%
40:07of your day to the central.
40:09Here 50% of your data.
40:11Then you have the whiskers.
40:14They are standard definition of the
40:17Whisker Lanka is that they are as
40:20long as the interquartile range.
40:22That's the distance between Q1 and Q 3 * 1.5.
40:27And you see these as the whisker
40:30of your plot.
40:32So these collect most of the
40:34distribution of your data.
40:36The data that are outside the whiskers
40:38are considered to be outliers.
40:40For example,
40:41here you see there these three points.
40:44They are outside the whisker size,
40:46and so these usually are individually
40:49displayed in the whisker plot and are
40:52considered to be an outlier according
40:54to this definition of the whiskers.
40:57Yes,
40:58if you wanted
40:59to make these plots, yeah,
41:00is there an easy way to do it
41:03or do you like you personally,
41:05just do it by in R or something?
41:08Well, box plot. I don't think
41:10you can do them with Excel,
41:13but for example with Prisma
41:15or Origin you can totally.
41:20I think the only limitation is is
41:22Excel, but I be honest, I didn't
41:24check the last version of Excel.
41:27Right for us to think about,
41:28you know we can we have our data and there
41:31are many different ways of plotting it,
41:33but it sounds like prison might be the
41:35way to go in to try to do it in less.
41:37You're somebody like you.
41:38Who knows how to put it into our.
41:41Yes, probably, so please MA is it?
41:45Give you an option that is much.
41:47Use that if usually use them
41:48originally with respect to Prisma.
41:50I think it has more.
41:52I'm more power,
41:54so there are more things that you
41:56can do with origin then please MA.
41:59I think because it was designed
42:01for the for the physics community,
42:04but the tradeoff is always complexity,
42:06so please May is has less power,
42:09less choices, but it's easier
42:10to use rather than than origin,
42:13but both share the same philosophy
42:15so that you need to provide the data
42:18is a spreadsheet format and they are
42:21available in the software library at.
42:24OK, thank you to my can you say
42:27the name of the other not prism
42:29but the other programming?
42:31Or I have a slide after whether you show
42:34its origin? OK, thanks yeah.
42:37Ava question so,
42:38so my initial understanding is that
42:40the whisker lenses representing the
42:4295 percentile of the data range.
42:45But here it says the whisker
42:47length is 1.5 times this IQR lens.
42:50But if that's the case,
42:52why would the left side of
42:54the screen right side of risk
42:57are having different lens?
43:02Um? So that could be for example
43:06because here you have the, so that's
43:09the the maximal length of the whisker.
43:12But if the minimum of your
43:14data that is here is here,
43:17the whisker stops. So that's why.
43:19So I see here you have outliers and
43:22so that we can extend to the maximum
43:25point that is 1.5 at this measure.
43:27But if you before the the maximal distance
43:30here you meet the minimal pointer,
43:32the whisker and there and
43:34there so that's why.
43:36OK, I see it's also true that these
43:39whisker definition can be customized,
43:41so this is the default interpretation.
43:43I don't know who who decided this.
43:46I don't have the original publication,
43:48but you can choose whiskers to
43:50be differently, so that's why.
43:52Also in the Network Journal
43:54paper when you do a box plot,
43:56you have always to specify in the statistical
43:59methods how you design your box plot.
44:02So you have to provide how,
44:04for example, the skirts were defined.
44:07Because sometimes it's true that,
44:08for example,
44:09the whisker can represent like
44:1195% of the distribution.
44:13Right, so this is just the default,
44:15but it can be customized,
44:17so there are different choices.
44:21I have a question regarding the
44:24distribution again, maybe it's in
44:26continuation to what you just said.
44:30Some softwares allow a default value
44:32for the bin size and for the smoothening
44:36and all that say like Matlab that
44:38I've been trying to put this into.
44:41How reliable do you think that is?
44:44The default values and how would you suggest?
44:48Most of the time,
44:49most of the time, so I don't.
44:52I don't have experience with matter,
44:53but probably it will be that it's the
44:56same in our so so most of the time
44:58there is a sort of optimization there,
45:01so most of the time is fine. Uh, but.
45:07Sometimes, especially if you
45:09have a distribution of data,
45:11but you also have a pointer
45:14with cumulation of data.
45:16You could have problems in the.
45:20In the blocker so.
45:22But I don't have an example.
45:25OK, so like in 95% of the time I'm OK
45:30with the with the solution that is
45:33provided by the MATLAB or RA building tool.
45:38For example, sometimes when you compare
45:40to distribution with a different size
45:42with a different number of points,
45:44that could be that that can be a problem.
45:48Because sometimes there.
45:50If you're comparing for example
45:52distribution with 10 points with
45:54a distribution of 1000 points.
45:56Adopting the same wavelength
45:58could be a problem,
45:59and so you need to manually change it.
46:03So that's the yes,
46:05but that's that probably could be a.
46:08A practical example on when it's not ideal.
46:12Because the software,
46:13if you are trying to compare a
46:1610 points versus 1000 points,
46:18tries to define a common wavelength.
46:21But sometimes this leads
46:23to like distorted images.
46:25I don't have an example to show.
46:29That's good enough, thank you.
46:32And well, I can leave the note.
46:35Sometimes you can see also the
46:37nutshack inside your box whisker,
46:39so they're not sure is diesel
46:42feature that it represents a measure
46:44of certainty for the medium.
46:47So sometimes it is useful to have.
46:49These are 'cause if you are comparing
46:51a lot of box whisker plots a you can
46:54look at the uncertainty as if it was a
46:56sort of standard error of the media.
46:59And so if two box whisker overlapping,
47:02they're not.
47:03She's probably it means that the
47:05medians are not statistically.
47:08Inefficiently different.
47:09This could be a way to.
47:12The use of the notch or there is
47:15the interpretation of the data
47:16and the comparison of different
47:18distribution and that's why the
47:19box whisker plots are so popular,
47:21because they allow you to represent
47:24that distribution of data in
47:25a very compact format.
47:27This is another display of the
47:29anatomy of Big box whisker,
47:31but it doesn't add anything
47:33that I had also before.
47:35So here is an example where box whisker
47:38plots are used in order to compare
47:42the four different distributions.
47:43So the advantage is that they allow
47:46easy comparison so it's easy to
47:49compare the distribution of ABC and D.
47:52The problem they can have is that they
47:55hide the shape of the distribution.
47:58And also usually they hide the
48:01number of points that were used
48:03to build the box whisker.
48:05Sometimes you can code the number
48:08of points so the cardinality the
48:10size of the distribution as the
48:13width of the box whisker,
48:15but it's rarely used because it's not
48:18very visually beautiful, I would say.
48:21So one solution it could be to
48:23overlay over the box whisker,
48:26plot the jitter plot,
48:27so jitter plot represents the single
48:30points that were used to build
48:32the box whisker plot and they are.
48:34So while on the Y axis that there
48:36is the precise values on the X
48:39axis there randomly.
48:43Place that let's say there are
48:45also methods that do not display
48:47these points randomly butting up.
48:49Sell the random way that captures
48:50the shape of the distribution,
48:52and I think that that kind of plot
48:55is also present in in graph for
48:58the Prisma so the advantage of this
49:00is that you can see, for example,
49:03that would be the distribution is bimodal.
49:06So because you see that there are
49:08these high densities of points and
49:10the box whisker plot cannot capture
49:12that you cannot see from a box,
49:14whisker plot data distribution is
49:17bimodal and for example here you
49:19can see that these box whisker
49:21plot there has been is based on
49:24much less data than the others.
49:26So, uh, and a solution for these
49:29are is to enclose the box whisker
49:32plot into a violin plot.
49:35So violin plot representation
49:37like these allow you to see the
49:40same information of a box whisker,
49:43but also information on this shape
49:46of the distribution is basically
49:49in a violin plot.
49:50You add a density plot that is
49:54parallel to the vertical axis.
49:58And here, by using a violin plot you can see.
50:00That this,
50:01that this distribution is one pick
50:03has one pick. This one is by model.
50:08And you can add also the number here.
50:11He said of coding the number as
50:13the size of the distribution as
50:15the width of the distribution.
50:19So this is an example of compare
50:21of comparisons between different
50:23ways to show distribution.
50:25Here you see the histogram with the density,
50:28corresponding density plot,
50:29the same distribution
50:31visualized as a box plot,
50:33and visualized as a violin plot that
50:36captures both the features of a box
50:39plot cluster the density distribution.
50:42And this is for a normal distribution.
50:44This is for a bimodal distribution where you
50:47can see that the box plot doesn't capture,
50:50so the box plot can capture the fact
50:53that the data are not symmetrical and
50:55you see the for example the distance from
50:58the from the from the point of the box
51:01and the medium is much more than these.
51:04So the box whisker is good in capturing
51:07a symmetrical distributions but not
51:09the presence of more than one piece.
51:11So not the complex shape of the distribution.
51:15And there is a website here where
51:17you can where you can see a lot of
51:21examples where the different choice of
51:23visualization can lead to different.
51:26Conclusion as here.
51:29It's true also that the violin Plata
51:32is not efficient because you're
51:35sure you're showing twice.
51:37The same information,
51:39so this is aesthetically pleasant,
51:42but is not efficient because
51:44you're repeating basically this
51:46density twice above and below,
51:49and so that's why there are two
51:52saver for efficiency sufficiency.
51:55There are recent visualization
51:57strategies as the rain cloud plotter.
52:00So the Raincloud plot that shows a box
52:03whisker plot in the middle half violin
52:06plot here and then also the single point.
52:09So that's probably the one of the
52:12most complete exhaustive ways to
52:14represent a distribution of data.
52:16And they're called the rain cloud because
52:18of this effect is should be the cloud.
52:20And this is the rain that falls
52:22on the proposed below.
52:23So you can find information on
52:25how to block these are following
52:28the following these link.
52:29Another yeah.
52:31Quick question, is there a?
52:35How to say the restriction or limitation
52:38as to how many data points are required
52:43for generating reliable violin plot?
52:50Generally not so. Probably more than 10,
52:55I would say because otherwise so you can
52:58see that you can see it empirically,
53:00because if the data are too few you can
53:03see that the violin basically have sort
53:06of waves around each point of your data.
53:10So as a general. Is a general threshold.
53:15I would say 10 points would be the
53:19like the minimum number. And asking
53:21that question is of course if
53:23you have a lot to data points,
53:26these would be informative.
53:27But if you have, let's say
53:29less than 10 or small number,
53:31this could be really distorting
53:33or faking the. Yeah, yeah, that's
53:35true. That's why I would
53:37say 10 because it below 10.
53:39Probably the best strategy is to show
53:41the single points and then a summary
53:44such as the mean or median plus
53:46some validation standard dialogue,
53:48but not the not the distribution
53:50as a violin plot.
53:51So that's for a like less than 10 data.
53:57Alright, when data are too much,
53:59for example, it doesn't make
54:00sense to show the single points.
54:03Because that they are overlap,
54:04they overlap each other and so you
54:06don't see anything that happens
54:08when you have more than 1000 points,
54:10and so the best solution in that case
54:12is for example to show only the violin.
54:18So there is a Ranger for which.
54:22The best solution is to show the
54:24single data points with the cross bar,
54:27so an element with captures mean or
54:30median plus standard deviation order.
54:33Confidence interval there is a
54:35Ranger that is in the middle from 10
54:38to some hundreds where the violin
54:40plot and the box whisker plot are
54:43the best option to visualize.
54:44And when you have many,
54:47many data more than 1000, probably.
54:48If you want to capture the distribution
54:51then only there the violin plot rather
54:53than the single points is the best way.
55:01Did did he? Did it answer?
55:06Yeah, that was awesome.
55:08That is a great explanation.
55:11OK, another another alternative
55:13way to maximize efficiency of the
55:16violence that I saw a lot in the
55:18with single cell data, for example,
55:20is the the user split violin plots are,
55:23so you use the violin plot to show a
55:26comparison between two distributions.
55:29So you see here are this plot shows the
55:32representation of Asia or female and
55:34males are using different social, social,
55:37media, Instagram, Facebook, Twitter.
55:38So it's a way to show using.
55:41Half of a violin plot are differences
55:44in the distributions and this can
55:46be used when you have a contrast
55:49of two conditions or you want to
55:52compare two distributions.
55:53I'm also in the single cell.
55:57About the violin. Plots,
55:58like in the such cases, yeah,
56:00So what determines the height of the peaks?
56:03Or is that everything is normalized
56:05so that the total area the same,
56:07or the maximum height is the same?
56:10So most of the time,
56:12so you have choices usually so you can
56:16choose to have the same maximum hate.
56:20And that's usually the then.
56:22That's usually what you find,
56:24so you you plot there in a way
56:27that the Ranger is the same from
56:30here to here from here to here,
56:33the alternative is to use the
56:35real criteria for a for a density,
56:38and that should be that the
56:41area under visa is equal to 1.
56:45And so that the two have the same area.
56:48An alternative is to have an
56:50area that is proportional to
56:52the number of observations,
56:54but I think that visually most
56:57of the time you find that.
57:00The criteria is that you have in order
57:02to have balanced plots are the criteria,
57:05is to have the same Ranger.
57:07Meaning from here to the maximum
57:09for all their pull the plot
57:11independently from the area and
57:13dependently from the number of points.
57:18It's not probably the best solution from
57:20the point of view of communication,
57:21but it's most used. OK, thank you.
57:27I variation of this is also the
57:30use of ridgeline plots are that.
57:32They allow you to compare a
57:35lot of different densities.
57:37For example, here you see a comparison
57:40of the density of temperatures in
57:43different month in allocation metadata.
57:45Remember Lincoln NE and this
57:48is used in a single cell.
57:52Is Alotta now in these years
57:54with single cell data?
57:55For example,
57:56here you see that it is used to
57:58compare the distribution of the
58:01expression of 1 gene leads A
58:03or CL5 in different population
58:05of cells that are probability
58:07can from some blood sample.
58:10Different population and
58:11these allow you to see.
58:13Sorry to see an marker genes or to
58:16see how the expression of a gene is
58:19specific for a population of cells.
58:22So that's why I included because I
58:25see that the frequency of this plot,
58:28specially in the single cell
58:31visualization field is quite increasing.
58:34I have visa section of the
58:36presentation that we could skip.
58:38In general the message about Visa
58:40is that Venn diagrams are good
58:43when you have two Venn diagrams,
58:45but if they are,
58:46if they're more there a bad way to
58:49represent intersections between sets.
58:51And this actually is a plot that
58:54was published in Nature and it it's
58:56about a comparison of the genome
58:59of banana with other species.
59:01So the problem in general is that
59:04when you have more than two, 3,
59:06four but also two Venn diagrams,
59:09it's it's not the best way to
59:11visualize intersection with the use
59:13of the traditional Venn diagrams.
59:15So a table is probably more effective
59:18than this because the areas are
59:21not proportional to the size.
59:23And it's quite confusing to see
59:26the specific intersection and
59:28so on alternative way.
59:30That was developed in the recent year
59:32was the user the concept of this
59:35upset plots are so to represent the
59:37intersections in a matrix format.
59:40So represent these are as a member
59:42as a sum object.
59:44Example,
59:44a gene that is present on only
59:46List A only list D only list C
59:49intersection between AMD origin
59:51present in all the intersections.
59:53So you can use these matrix format
59:55to show the intersections and then
59:58you can display the cardinality.
01:00:00Of each.
01:00:01Intersection so the number of genes,
01:00:03for example that are only in the PDF
01:00:06error pathway that you see here.
01:00:08The number of genes that are in the
01:00:11common between the EGFR and P-10 path.
01:00:14With that you see here.
01:00:15So this is a way to show the cardinality
01:00:19of the global list that you see here.
01:00:22And also you can rank the intersections
01:00:24between the different sets according
01:00:26to their size to their personality.
01:00:29So it's much more clearer.
01:00:30To show the structure of the intersection.
01:00:34Rather than using the. A Venn diagram.
01:00:38I skip this because they are.
01:00:41There were some examples of bad
01:00:44usage of graphic in politics.
01:00:47And a lot are looking at online
01:00:50and related to Fox News.
01:00:52Of bad usage of klasa display.
01:00:54So the final part could be how to
01:00:56draw this pad. There is
01:00:58relative they were trying to.
01:01:00Not make the point. Yeah,
01:01:02well they were trying to make him
01:01:04to give a message by distorting the.
01:01:09Yeah, this for example is an
01:01:11issue if you always need to
01:01:13include the zero in your plot.
01:01:15Sir, this is controversial.
01:01:17Let's say that. In general,
01:01:19in Barplots it's a bad idea,
01:01:21but for example is a good
01:01:23idea in in time series,
01:01:25and that's because there in barplots
01:01:27the height of the bar plot is that
01:01:30your main message of the figure,
01:01:32while for example here in in a
01:01:34time series that the main message
01:01:36is how the two trajectories
01:01:38evolve and are interconnected.
01:01:40So the main issue is the horizontal
01:01:43axis and so you can skip the zero.
01:01:47So again,
01:01:47it depends on how much these inclusion
01:01:50or exclusion of the zero distort
01:01:53your your main message of the fever.
01:01:56So how to draw plots?
01:01:58So here are there is an outline
01:02:00of the software that you have,
01:02:03so this is some commercial
01:02:05software from the most from Excel.
01:02:07It's probably the most used or available,
01:02:10but it doesn't allow to plot all the
01:02:13solutions that they did show before, but.
01:02:16For example,
01:02:17Grandpa,
01:02:18Graphpad prism,
01:02:18or Origin Pro are through software
01:02:21that are available and with those
01:02:23that you should be able in an
01:02:26environment that is similar to Excel
01:02:28to produce most of the plots that
01:02:30you saw in the presentation today.
01:02:33So this is commercial software,
01:02:35doesn't require programming
01:02:36skill on these sides.
01:02:38Are you see the main solutions
01:02:40that are used by data scientists,
01:02:42but that require programming
01:02:44skills so that the two most
01:02:46common languages in data science,
01:02:48RR and Python so far are you have is GG plot.
01:02:53Library for Python.
01:02:54You have matplotlib or Seaborn.
01:02:58At these require programming so,
01:02:59but I would say that the advantage
01:03:02nowadays of using visa is that you can
01:03:05find a lot of really a lot of examples.
01:03:09Because there are a lot of website
01:03:12that where you can choose that
01:03:15you're like data visualization
01:03:17type and you see already the code.
01:03:20That you can use in order
01:03:22to produce the blocked.
01:03:23So I would say that you just need
01:03:26to know how to insert that or how to
01:03:28load that in the this programming
01:03:31environment table of data.
01:03:33And then most of the difficulties
01:03:35are probably in fixing details,
01:03:37so it's very easy to realize the plot,
01:03:40different plot.
01:03:41It's more complicated to adapt the
01:03:44small things that we are to your taste.
01:03:48But so so,
01:03:48this suggestion is that if you
01:03:50do a lot of visualization,
01:03:52it's worth investing in this.
01:03:57Here you see a maybe a future perspective
01:04:01that could be their own online solution.
01:04:04They're already available so summer,
01:04:06for example. You can produce upset plots,
01:04:09Aurora rain plot, Sir,
01:04:10or some other like exotic type
01:04:13of data visualization online.
01:04:15So there are websites,
01:04:17web web servers where you can insert
01:04:20your data as tables and they produce at
01:04:23the data that you want and you have.
01:04:27Some sort of interactivity,
01:04:28so that could be the future. Sure.
01:04:32Where are web servers provide you with
01:04:34the main programming environment?
01:04:36You need just to interfere data
01:04:38and you can see by interactively.
01:04:44By the interaction with the web server.
01:04:46How to customize the data?
01:04:48Most of the solutions right now are.
01:04:53Commercial, and so you need to pay,
01:04:56and that's the drawback of this.
01:04:58But it could be probably the
01:05:00future of matching the programming
01:05:02with easiness of usage.
01:05:06This is a useful resource that
01:05:08you can use also to decide which
01:05:11kind of blocked are you want.
01:05:13So there are a lot of these trees are that
01:05:16depending on what you want to represent,
01:05:19one numeric variable to numeric
01:05:21variables or categorical variables,
01:05:22you can follow the tree and arrive
01:05:24to your to the best graphical
01:05:26solutions to display your data.
01:05:28So I suggest you to visit it
01:05:30also to look at what are the
01:05:33kind of possibilities for data
01:05:35representations that you have online.
01:05:37There are many of these sites and
01:05:39now and that's why it's easy to
01:05:42look at the documentation and also
01:05:44to retrieve and reproduce the code.
01:05:46This is another example I closed with Visa.
01:05:52Patricia, that I find particularly
01:05:54related to data visualization and
01:05:57science is not natural itself about
01:06:00its nature under our observation.
01:06:03And so the science of data visualization
01:06:05is a way to allow more adherence between
01:06:10observation science visualization.
01:06:17Thank you come on.