The application of statistics and research methods can take many forms in applied sociology. At the graduate level, skills with statistical software tools is essential, but at the bachelor’s level, it is more important that a sociologist understand the meaning of basic statistics and knows how to use core research methods. Program evaluation and needs assessment require the knowledge of only basic statistics and research methods. Both of these applications are widely used and make excellent basic applied sociology skills.
Measuring effectiveness and outcomes at various levels of an organization have taken on increasing importance. Accountability is required not just in for profit organizations but increasingly in non-profit ones, as well.
A major shift has occurred in how social and human service organizations are evaluated. Once it was all about staff being busy counting how many people they served. Now the emphasis is on achieving measurable improvement in the people served. Now it’s all about outcomes and measuring impact.
In the context of heightened scrutiny of how tax money and charitable contributions are spent, many human service/social service programs are required to conduct program evaluation. If they are not required to, they ought to do it voluntarily. The people who work in these kinds of organizations often think they are doing great work because they are so busy. They think their programs are having the desired impact because they get such a great feeling every day when they work with their clients. Sadly these professionals may get their feelings hurt when questions are raised, especially in the media, about their program outcomes. Maybe good people and good programs don’t deserve to be shut down just because they can’t deliver hard data showing their effectiveness. But they can be shut down. Smart program managers and professionals must have the skills to evaluate their programs and show their value.
Introduction to Evaluation
Evaluation is a methodological area that is closely related to, but distinguishable from more traditional social research. Evaluation utilizes many of the same methodologies used in traditional social research, but because evaluation takes place within a political and organizational context, it requires group skills, management ability, political dexterity, sensitivity to multiple stakeholders and other skills that social research in general does not rely on as much. Here we introduce the idea of evaluation and some of the major terms and issues in the field.
Definitions of Evaluation
Probably the most frequently given definition is:
Evaluation is the systematic assessment of the worth or merit of some object
This definition is hardly perfect. There are many types of evaluations that do not necessarily result in an assessment of worth or merit — descriptive studies, implementation analyses, and formative evaluations, to name a few. Better perhaps is a definition that emphasizes the information-processing and feedback functions of evaluation. For instance, one might say:
Evaluation is the systematic acquisition and assessment of information to provide useful feedback about some object
Both definitions agree that evaluation is a systematic endeavor and both use the deliberately ambiguous term ‘object’ which could refer to a program, policy, technology, person, need, activity, and so on. The latter definition emphasizes acquiring and assessing information rather than assessing worth or merit because all evaluation work involves collecting and sifting through data, making judgements about the validity of the information and of inferences we derive from it, whether or not an assessment of worth or merit results.
The Goals of Evaluation
The generic goal of most evaluations is to provide “useful feedback” to a variety of audiences including sponsors, donors, client-groups, administrators, staff, and other relevant constituencies. Most often, feedback is perceived as “useful” if it aids in decision-making. But the relationship between an evaluation and its impact is not a simple one — studies that seem critical sometimes fail to influence short-term decisions, and studies that initially seem to have no influence can have a delayed impact when more congenial conditions arise. Despite this, there is broad consensus that the major goal of evaluation should be to influence decision-making or policy formulation through the provision of empirically-driven feedback.
‘Evaluation strategies’ means broad, overarching perspectives on evaluation. They encompass the most general groups or “camps” of evaluators; although, at its best, evaluation work borrows eclectically from the perspectives of all these camps. Four major groups of evaluation strategies are discussed here.
Scientific-experimental models are probably the most historically dominant evaluation strategies. Taking their values and methods from the sciences — especially the social sciences — they prioritize on the desirability of impartiality, accuracy, objectivity and the validity of the information generated. Included under scientific-experimental models would be: the tradition of experimental and quasi-experimental designs; objectives-based research that comes from education; econometrically-oriented perspectives including cost-effectiveness and cost-benefit analysis; and the recent articulation of theory-driven evaluation.
The second class of strategies are management-oriented systems models. Two of the most common of these are PERT, the Program Evaluation and Review Technique, and CPM, the Critical Path Method. Both have been widely used in business and government in this country. It would also be legitimate to include the Logical Framework or “Logframe” model developed at U.S. Agency for International Development and general systems theory and operations research approaches in this category. Two management-oriented systems models were originated by evaluators: the UTOS model where U stands for Units, T for Treatments, O for Observing Observations and S for Settings; and the CIPP model where the C stands for Context, the I for Input, the first P for Process and the second P for Product. These management-oriented systems models emphasize comprehensiveness in evaluation, placing evaluation within a larger framework of organizational activities.
The third class of strategies are the qualitative/anthropological models. They emphasize the importance of observation, the need to retain the phenomenological quality of the evaluation context, and the value of subjective human interpretation in the evaluation process. Included in this category are the approaches known in evaluation as naturalistic or ‘Fourth Generation’ evaluation; the various qualitative schools; critical theory and art criticism approaches; and, the ‘grounded theory’ approach of Glaser and Strauss among others.
Finally, a fourth class of strategies is termed participant-oriented models. As the term suggests, they emphasize the central importance of the evaluation participants, especially clients and users of the program or technology. Client-centered and stakeholder approaches are examples of participant-oriented models, as are consumer-oriented evaluation systems.
With all of these strategies to choose from, how to decide? Debates that rage within the evaluation profession — and they do rage — are generally battles between these different strategists, with each claiming the superiority of their position. In reality, most good evaluators are familiar with all four categories and borrow from each as the need arises. There is no inherent incompatibility between these broad strategies — each of them brings something valuable to the evaluation table. In fact, in recent years attention has increasingly turned to how one might integrate results from evaluations that use different strategies, carried out from different perspectives, and using different methods. Clearly, there are no simple answers here. The problems are complex and the methodologies needed will and should be varied.
Types of Evaluation
There are many different types of evaluations depending on the object being evaluated and the purpose of the evaluation. Perhaps the most important basic distinction in evaluation types is that between formative and summative evaluation. Formative evaluations strengthen or improve the object being evaluated — they help form it by examining the delivery of the program or technology, the quality of its implementation, and the assessment of the organizational context, personnel, procedures, inputs, and so on. Summative evaluations, in contrast, examine the effects or outcomes of some object — they summarize it by describing what happens subsequent to delivery of the program or technology; assessing whether the object can be said to have caused the outcome; determining the overall impact of the causal factor beyond only the immediate target outcomes; and, estimating the relative costs associated with the object.
Formative evaluation includes several evaluation types:
needs assessment determines who needs the program, how great the need is, and what might work to meet the need
evaluability assessment determines whether an evaluation is feasible and how stakeholders can help shape its usefulness
structured conceptualization helps stakeholders define the program or technology, the target population, and the possible outcomes
implementation evaluation monitors the fidelity of the program or technology delivery
process evaluation investigates the process of delivering the program or technology, including alternative delivery procedures
Summative evaluation can also be subdivided:
outcome evaluations investigate whether the program or technology caused demonstrable effects on specifically defined target outcomes
impact evaluation is broader and assesses the overall or net effects — intended or unintended — of the program or technology as a whole
cost-effectiveness and cost-benefit analysis address questions of efficiency by standardizing outcomes in terms of their dollar costs and values
secondary analysis reexamines existing data to address new questions or use methods not previously employed
meta-analysis integrates the outcome estimates from multiple studies to arrive at an overall or summary judgement on an evaluation question
Evaluation Questions and Methods
Evaluators ask many different kinds of questions and use a variety of methods to address them. These are considered within the framework of formative and summative evaluation as presented above.
In formative research the major questions and methodologies are:
What is the definition and scope of the problem or issue, or what’s the question?
Formulating and conceptualizing methods might be used including brainstorming, focus groups, nominal group techniques, Delphi methods, brainwriting, stakeholder analysis, synectics, lateral thinking, input-output analysis, and concept mapping.
Where is the problem and how big or serious is it?
The most common method used here is “needs assessment” which can include: analysis of existing data sources, and the use of sample surveys, interviews of constituent populations, qualitative research, expert testimony, and focus groups.
How should the program or technology be delivered to address the problem?
Some of the methods already listed apply here, as do detailing methodologies like simulation techniques, or multivariate methods like multiattribute utility theory or exploratory causal modeling; decision-making methods; and project planning and implementation methods like flow charting, PERT/CPM, and project scheduling.
How well is the program or technology delivered?
Qualitative and quantitative monitoring techniques, the use of management information systems, and implementation assessment would be appropriate methodologies here.
The questions and methods addressed under summative evaluation include:
What type of evaluation is feasible?
Evaluability assessment can be used here, as well as standard approaches for selecting an appropriate evaluation design.
What was the effectiveness of the program or technology?
One would choose from observational and correlational methods for demonstrating whether desired effects occurred, and quasi-experimental and experimental designs for determining whether observed effects can reasonably be attributed to the intervention and not to other sources.
What is the net impact of the program?
Econometric methods for assessing cost effectiveness and cost/benefits would apply here, along with qualitative methods that enable us to summarize the full range of intended and unintended impacts.
Clearly, this introduction is not meant to be exhaustive. Each of these methods, and the many not mentioned, are supported by an extensive methodological research literature. This is a formidable set of tools. But the need to improve, update and adapt these methods to changing circumstances means that methodological research and development needs to have a major place in evaluation work.
Measuring Variables and Gathering Data
After the hypothesis has been formulated, the sociologist is now ready to begin the actual research. Data must be gathered via one or more of the research designs examined later in this chapter, and variables must be measured. Data can either be quantitative (numerical) or qualitative (nonnumerical). Data gathered through a questionnaire are usually quantitative. The answers a respondent gives to a questionnaire are coded for computer analysis. For example, if a question asks whether respondents consider themselves to be politically conservative, moderate, or liberal, those who answer “conservative” might receive a “1” for computer analysis; those who choose “moderate” might receive a “2”; and those who say “liberal” might receive a “3.”
Data gathered through observation and/or intensive interviewing, research designs discussed later in this chapter, are usually qualitative. If a researcher interviews college students at length to see what they think about dating violence and how seriously they regard it, the researcher may make simple comparisons, such as “most” of the interviewed students take dating violence very seriously, but without really statistically analyzing the in-depth responses from such a study. Instead, the goal is to make sense of what the researcher observes or of the in-depth statements that people provide to an interviewer and then to relate the major findings to the hypothesis or topic the researcher is investigating.
The measurement of variables is a complex topic and lies far beyond the scope of this discussion. Suffice it to say that accurate measurement of variables is essential in any research project. In a questionnaire, for example, a question should be worded clearly and unambiguously. Take the following question, which has appeared in national surveys: “Do you ever drink more than you think you should?” This question is probably meant to measure whether the respondent has an alcohol problem. But some respondents might answer yes to this question even if they only have a few drinks per year if, for example, they come from a religious background that frowns on alcohol use; conversely, some respondents who drink far too much might answer no because they do not think they drink too much. A researcher who interpreted a yes response from the former respondents as an indicator of an alcohol problem or a no response from the latter respondents as an indicator of no alcohol problem would be in error.
As another example, suppose a researcher hypothesizes that younger couples are happier than older couples. Instead of asking couples how happy they are through a questionnaire, the researcher decides to observe couples as they walk through a shopping mall. Some interesting questions of measurement arise in this study. First, how does the researcher know who is a couple? Second, how sure can the researcher be of the approximate age of each person in the couple? The researcher might be able to distinguish people in their 20s or early 30s from those in their 50s and 60s, but age measurement beyond this gross comparison might often be in error. Third, how sure can the researcher be of the couple’s degree of happiness? Is it really possible to determine how happy a couple is by watching them for a few moments in the mall? What exactly does being happy look like, and do all people look this way when they are happy? These and other measurement problems in this particular study might be so severe that the study should not be done, at least if the researcher hopes to publish it.
After any measurement issues have been resolved, it is time to gather the data. For the sake of simplicity, let’s assume the unit of analysis is the person. A researcher who is doing a study “from scratch” must decide which people to study. Because it is certainly impossible to study everybody, the researcher only studies a sample, or subset of the population of people in whom the researcher is interested. Depending on the purpose of the study, the population of interest varies widely: it can be the adult population of the United States, the adult population of a particular state or city, all young women aged 13–18 in the nation, or countless other variations.
Many researchers who do survey research (discussed in a later section) study people selected for a random sample of the population of interest. In a random sample, everyone in the population (whether it be the whole U.S. population or just the population of a state or city, all the college students in a state or city or all the students at just one college, and so forth) has the same chance of being included in the survey. The ways in which random samples are chosen are too complex to fully discuss here, but suffice it to say the methods used to determine who is in the sample are equivalent to flipping a coin or rolling some dice. The beauty of a random sample is that it allows us to generalize the results of the sample to the population from which the sample comes. This means that we can be fairly sure of the attitudes of the whole U.S. population by knowing the attitudes of just 400 people randomly chosen from that population.
Other researchers use nonrandom samples, in which members of the population do not have the same chance of being included in the study. If you ever filled out a questionnaire after being approached in a shopping mall or campus student center, it is very likely that you were part of a nonrandom sample. While the results of the study (marketing research or social science research) for which you were interviewed might have been interesting, they could not necessarily be generalized to all students or all people in a state or in the nation because the sample for the study was not random.
A specific type of nonrandom sample is the convenience sample, which refers to a nonrandom sample that is used because it is relatively quick and inexpensive to obtain. If you ever filled out a questionnaire during a high school or college class, as many students have done, you were very likely part of a convenience sample—a researcher can simply go into class, hand out a survey, and have the data available for coding and analysis within a few minutes. Convenience samples often include students, but they also include other kinds of people. When prisoners are studied, they constitute a convenience sample, because they are definitely not going anywhere. Partly because of this fact, convenience samples are also sometimes called captive-audience samples.
Another specific type of nonrandom sample is the quota sample. In this type of sample, a researcher tries to ensure that the makeup of the sample resembles one or more characteristics of the population as closely as possible. For example, on a campus of 10,000 students where 60% of the students are women and 40% are men, a researcher might decide to study 100 students by handing out a questionnaire to those who happen to be in the student center building on a particular day. If the researcher decides to have a quota sample based on gender, the researcher will select 60 women students and 40 male students to receive the questionnaire. This procedure might make the sample of 100 students more representative of all the students on campus than if it were not used, but it still does not make the sample entirely representative of all students. The students who happen to be in the student center on a particular day might be very different in many respects from most other students on the campus.
As we shall see later when research design is discussed, the choice of a design is very much related to the type of sample that is used. Surveys lend themselves to random samples, for example, while observation studies and experiments lend themselves to nonrandom samples.
After all data have been gathered, the next stage is to analyze the data. If the data are quantitative, the analysis will almost certainly use highly sophisticated statistical techniques beyond the scope of this discussion. Many statistical analysis software packages exist for this purpose, and sociologists learn to use one or more of these packages during graduate school. If the data are qualitative, researchers analyze their data (what they have observed and/or what people have told them in interviews) in ways again beyond our scope. Many researchers now use qualitative analysis software that helps them uncover important themes and patterns in the qualitative data they gather. However qualitative or quantitative data are analyzed, it is essential that the analysis be as accurate as possible. To go back to a point just made, this means that variable measurement must also be as accurate as possible, because even expert analysis of inaccurate data will yield inaccurate results. As a phrase from the field of computer science summarizes this problem, “garbage in, garbage out.” Data analysis can be accurate only if the data are accurate to begin with.
This is a derivative of SOCIOLOGY: UNDERSTANDING AND CHANGING THE SOCIAL WORLD by a publisher who has requested that they and the original author not receive attribution, which was originally released and is used under CC BY-NC-SA. This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.