AQA Syllabus focus:
'Peer review, implications for the economy, reliability across methods, test-retest, inter-observer reliability, validity and improving validity.'
This section explains how psychologists judge research quality through peer review, reliability, and validity, and why these standards matter for scientific credibility, practical decision-making, and the effective use of research funding.
Peer review
Before research is published in a journal or awarded funding, it is usually checked by other specialists in the same field. This helps psychology maintain scientific standards and prevents weak evidence from being treated as trustworthy.
Peer review is the assessment of research by other experts in the relevant area before the work is published or funded.
In peer review, reviewers typically examine whether the research question is worthwhile, whether the method is appropriate, whether the analysis supports the conclusions, and whether the work meets ethical and scientific standards.
Psychologists value peer review because it acts as a form of quality control.
It can identify poor methods or unsupported conclusions.
It can reduce the chance of fraudulent or careless work being accepted.
It can help editors and funding bodies decide which work deserves publication or financial support.
It encourages researchers to present clear evidence and justified conclusions.
Because published findings often shape later studies, textbooks, and applied practice, weaknesses missed at this stage can spread widely.
However, peer review is not perfect. Reviewers may show bias toward familiar theories, well-known researchers, or prestigious institutions. It can also be slow, and genuinely original ideas may be rejected because they do not fit current expectations. This means peer review improves standards, but it does not guarantee that only the best research will be accepted.
Implications for the economy
Peer review has important economic implications because research money, staff time, and equipment are limited. If funding is awarded to high-quality studies, resources are used more efficiently and findings are more likely to benefit education, health, and public policy.
Poor peer review can waste money in several ways.
Weak studies may receive funding that could have gone to better research.
Unreliable or invalid findings may lead to ineffective interventions.
Time may be spent following up results that are inaccurate or impossible to apply.
Public trust in science may fall, making future investment less likely.
Reliability
A central question in research methods is whether findings are consistent. If a study or measure gives very different results each time, psychologists cannot have much confidence in it.
Reliability is the extent to which a measure or research procedure produces consistent results.
Reliability applies across research methods. In experiments, it is improved when procedures, instructions, and materials are kept the same for all participants. In questionnaires or interviews, clear wording and a consistent order of questions help reduce inconsistency. In observations, precise coding systems and careful training make recording more dependable.
A study with low reliability may appear persuasive, but its findings are difficult to trust because the pattern may not reappear when the procedure is repeated.
Reliable findings are easier to repeat and check. If a measure lacks reliability, differences in results may reflect random variation rather than a real psychological effect.
Test-retest reliability
One way to assess consistency is to repeat the same measure with the same people.
Test-retest reliability is the consistency of a measure over time, assessed by giving the same test to the same participants on two separate occasions.
A high level of similarity between the two sets of scores suggests the measure is stable. This is especially useful for questionnaires or tests that are supposed to measure a fairly lasting characteristic. A difficulty is choosing the time gap. If it is too short, participants may remember their earlier responses. If it is too long, genuine change may happen, so lower consistency may not mean the measure is poor.
Inter-observer reliability
When research involves recording behavior, consistency between observers is especially important.
Inter-observer reliability is the extent to which two or more observers agree when recording the same behavior.
High inter-observer reliability shows that the recording system is clear enough for different people to use in the same way. It can be improved by training observers, defining behaviors carefully, and checking agreement before the main research begins. Low agreement suggests the categories are vague or that important judgments are being made differently by each observer.
Validity
Consistency alone is not enough. Psychologists also need to know whether a study actually measures what it claims to measure and whether the conclusions drawn from the findings are accurate.
Validity is the extent to which a study or measure accurately reflects what it is intended to assess.
A measure can be reliable without being valid.
For example, it may produce the same result each time but still fail to capture the behavior, trait, or process the researcher is interested in. This is why validity is often seen as more important than reliability when judging the overall quality of research.
Validity matters because weakly valid research can produce misleading conclusions. If the measure is inappropriate, or if the research situation does not reflect the intended behavior, the findings may appear convincing while actually telling psychologists very little.
Improving validity
Psychologists can improve validity by making sure that methods match the aim of the investigation as closely as possible.
Use measures that clearly relate to the concept being studied.
Define variables in specific, precise terms so they can be assessed accurately.
Remove ambiguous wording from questions, rating scales, or instructions.
Make scoring criteria clear so judgments are based on the same standards.
Check that materials and tasks genuinely represent the behavior or process of interest.
Use settings or procedures that allow behavior to reflect what the researcher wants to study.
Compare findings with other relevant evidence to see whether they support the same conclusion.
Researchers therefore judge validity by asking whether the evidence truly supports the claim being made, not just whether the procedure was neat or consistent.
Reliability and validity are closely connected.

A simple 2×2 diagram that classifies measurement outcomes by whether they are reliable and whether they are valid. This provides a quick visual summary of how consistency and accuracy relate, supporting evaluation-style answers that compare reliability with validity. Source
A measure that is not consistent is very unlikely to be accurate, so good reliability supports validity. However, reliability on its own is not enough: the method must also capture the right thing.
Practice Questions
Outline one implication for the economy of poor peer review in psychological research. (2 marks)
1 mark for identifying a valid implication, such as waste of research funding, wasted staff time, or reduced public investment in science.
1 mark for elaboration linked to poor peer review, for example that weak studies may receive money instead of better-quality research.
Explain what psychologists mean by reliability and validity. Discuss one way of assessing reliability and one way of improving validity in psychological research. (6 marks)
1 mark for defining reliability as consistency of a measure or procedure.
1 mark for defining validity as whether a study or measure assesses what it claims to assess.
Up to 2 marks for one way of assessing reliability:
1 mark for naming or briefly describing a method such as test-retest reliability or inter-observer reliability.
1 additional mark for elaboration, such as repeating a test over time or checking agreement between observers.
Up to 2 marks for one way of improving validity:
1 mark for identifying a relevant method, such as clearer operational definitions, removing ambiguous questions, or using more appropriate measures.
1 additional mark for explaining how this makes the measure or conclusions more accurate.
FAQ
Peer review for publication focuses on whether completed research is strong enough to enter the scientific record.
Peer review for grants happens before the study is carried out, so reviewers judge the value of the proposal, the quality of the planned method, and whether the project deserves funding over competing applications.
Because grant decisions happen earlier, mistakes at that stage can shape what research gets done at all.
Yes. Peer review is a filter, not a guarantee of perfection.
After publication, other psychologists may try to replicate the findings, reanalyze the data, or identify problems in the method or interpretation. This is sometimes called post-publication scrutiny.
A study may therefore pass peer review and still later be challenged, corrected, or even withdrawn.
These systems are designed to reduce bias.
In blind review, reviewers do not know the author’s identity.
In double-blind review, neither side knows the other’s identity.
The aim is to make judgments depend more on the quality of the work than on reputation, institution, gender, or previous status in the field.
This does not remove all bias, but it can reduce some common sources of unfairness.
A measure can be consistent but still contain cultural, language, or social bias.
For example, the wording may be understood differently by different groups, or the content may reflect experiences that are more familiar to some participants than others. In that case, the measure may give stable scores while still failing to assess everyone equally accurately.
So fairness depends on more than reliability alone.
The file-drawer problem refers to studies with null or unexciting findings being less likely to be published.
If only striking results are published, psychologists may get a distorted picture of the evidence. This can lead to more money being spent on ideas that seem stronger than they really are.
Economically, that means time and funding may be directed toward lines of research that look promising because unsuccessful studies remain hidden.
