Clarity of language

The purpose of writing reports is to communicate your findings with your audience. Each sentence should be readable and totally clear. Try not to confuse the reader. Think of each paragraph as a slide in a presentation and answer three questions:

  1. Why is this important enough to be in your text? How does it add to your story.
  2. What information are you trying to give to your audience? What is the takeaway sentence.
  3. How do you support this claim? How did you get this result?

Put yourself into the reader?s shoes. Usually a reader looks at your abstract, then jumps to the results section. Once they find something interesting to them, they will go to methods to make sure about the results and get the details. If the overall work interests them, they will look at your discussion.

The overall flow should be:

  • Introduction: Explain any relevant terms you will use in your work. Avoid side tracks, don?t confuse your audience by explaining every detail; just cite papers and if someone wants to learn more they will go and read it.
  • Methods: Any details of how you did the work goes into the methods. Give enough information that the reader can understand how to repeat your methods.
  • Results: Results follows the flow in the methods. Start each paragraph with a summary sentence, explaining what did you do, why you did it, and what did you find? Follow up with the details and numbers. Refer to tables and figures as needed. Avoid giving the details of the methods, those should go into the methods section. Keep relevant figures and tables that add to your story.
  • Discussion/Conclusion: Start by summarizing your work and the most noticable results. Any interpolations or probable outcomes from your results can be discussed here. Difficulties, limitations, and future work are discussed in this section. Do not repeat sentences from the results of methods.

Common mistakes

  • Finding synonyms on the web which have more syllabuses and sound harder to pronounce, thinking complexity makes your work seem more sophisticated.
  • Long sentences, joined with which, and, or that go on forever.
  • Using it/the/she/he constantly without clarifying the reference.
  • Abbreviating too much or too little.
  • Every figure and table should be referenced in your text.
  • When you are not sure, don?t try to scramble around and avoid explaining. It gets worse.
  • Keep the same verb tenses throughout the section. We (I) did …. We (I) found that …
  • Avoid sentences that bring forward personal bias: I believe, I think

Tips

  • Start by making an outline then extend each bullet to a paragraph.

    For example an outline for the introduction section on a paper to study Parkinson disease using miRNA in substantia nigra would be:

    1. What is Parkinson disease
    2. The mechanism is unknown
    3. Why use miRNA?
    4. Why the tissue substantia nigra?
    5. What is coming in this paper?

    You can read the result here. [Hoss, Andrew G., et al. “microRNA profiles in Parkinson’s disease prefrontal cortex.” Frontiers in aging neuroscience 8 (2016): 36.]

  • Punctuation helps a lot to make your text more readable.

  • Read more and more papers. Learn from good examples. Papers published in highly recognized journals are good examples to learn from. After you read the paper to learn the content, read it again, for the language.

  • Read your own paper as if you were the reader. Try to grade your own work.

  • When you watch a presentation or read a paper, try to notice when you get confused and what went wrong. Learn from other people?s mistakes.

Good examples

Here the first sentence explains why they did it (To determine whether changes in DNA methylation were associated with changes in gene expression), what they did (we measured transcript levels using mRNA-Seq). Following sentences explain the results:

To determine whether changes in DNA methylation were associated with changes in gene expression, we measured transcript levels using mRNA-Seq. Overall, the genes we find to be differentially regulated have a high degree of overlap (P < 1E-109) with a previous study (27) that used microarrays (Fig. S1F). Genes that overlap in both studies and decrease in expression in the STHdhQ111 cells are associated with developmental processes, neuron migration, regulation of signaling, and regulation of neural precursor cell proliferation (Dataset S2). Genes that increase in expression in the STHdhQ111 cells in both our study and the previously published microarray data are associated with categories including extracellular matrix organization, signal transduction, and cell differentiation (Dataset S2). [Ng, Christopher W., et al. “Extensive changes in DNA methylation are associated with expression of mutant huntingtin.” Proceedings of the National Academy of Sciences110.6 (2013): 2354-2359.]

Average Examples

Example 1

Original:

Then a set of Welch T test (with unequal variance) was conducted between same genes in different clusters, and the p values for the tests were adjusted with the FDR value. 1000 genes were filtered out to be the most significant markers for the variances between two clusters. Within these 1000 genes, the analyst did a cross validation with the supplementary material provided by the original paper. In the supplementary material there is a table of the logFC data of cancer vs control, the analyst checked the data and located genes with large differences in the number of fold changes vs control (for example, genes have large positive value in logFC in C3 vs Control and have a very negative value in logFC in C4 vs Control at the same time and vice versa), and check if these genes are in the 1000 genes located from the adjusted p value filter.

Purpose is unclear. Many references are given without prior mentioning. Methods are not clear. The use of the term “cross validation” which is not explained clearly.

Rewritten:

To determine if the genes expression in the detected different patient clusters have similar distributions, we applied Welch T test on the sample genes present in different clusters. The top 1000 genes with adjusted p-value<xxx were selected as the markers for the clusters. We compared these genes with the fold change values given by authors. [The criteria for the validation is unclear and should be rewritten]

Example 2

Original:

Data was read as CSV format, as comma separated files. Based on Marisa et al. we selected genes using three well defined metrics and these filters were applied to the RMA normalized and ComBat adjusted expression matrix data for this analysis. These filters included; (1) Genes expressed in at least 20% of the samples with expression values greater than log2(15); (2) Have a variance significantly different from the median variance of all probe sets using a threshold of p<0.01 then applying Chi-square statistic; and (3) genes with a coefficient of variation greater than 0.186. Additionally, a quantile chi-square test was applied to the dataset with degrees of freedom (N-1) and alpha level equal to 0.01. All filtering analysis were done on R version 3.5.1 with built in packages in Rstudio.

CSV and R version whatever is too much detail which does not add to the flow (part of accuracy). The text “in at least 20% of the samples” is ambiguous. The expression greater than log2(15), did they mean at least 15 reads? If the normalization method to get the expression values is explained above, remove log2 and just place a number. Rewrite sentences to make the method more clear. Keep all numbers within a reasonable precision (part of accuracy), for example 20% of samples, log2(15), 0.01, 0.186. Why did you choose 0.186?

Rewritten:

Variable genes were selected based on the three variability metrics suggested in Marisa et. al. First, genes were required to have a minimum of 15 reads in at least xxx samples (20% of all samples). Then, the genes were filtered based on their variance to ensure they have significantly higher variance than an average gene. The variance was computed by a chi-square test. Genes with a coefficient of variation greater than 0.186 at p-value < 0.01 were selected. These genes were used in the downstream analysis.

Example 3

Original:

It is interesting to see if any certain feature of a TR, i.e. pattern length, copy number or annotation would be predictive of VNTRs. In order to study that, we fitted a generalized regression model (glm) to predict whether a TR is VNTR, or if any VNTR will become common or hyper.

Rewritten:

We constructed generalized linear models to determine whether specific TR features, e.g., pattern length, copy number, array length, or genomic annotation, are predictive of VNTR status. Four models were designed, one to predict whether a TR is a VNTR and three more to predict VNTR subtypes (private, common, hyper).