In August of 2016, schools using the edX platform agreed on a common data structure (standards) for their MOOCs in order to facilitate research on student learning and to allow for comparisons of data across schools. Of course, education researchers working with massive data sets are not the only ones who require standardized, well-organized data. With increasing frequency, a wide range of decision-makers — from higher ed administrators to marketing analysts to individual consumers — collect and analyze data in order to help answer a broad spectrum of questions and inform their decisions. These analyses rely on well-organized, and often standardized, data.
Well-organized data files form the basis for answering questions such as:
- What attributes of smartphones are important to 21-30 year-old consumers?
- Which HDTV should I buy?
- Are my students learning what I want them to learn?
I emphasize the term well-organized because organized data is essential for reliable analysis and subsequent decision-making.
Unfortunately, from an analysis point of view, data of all sorts may lack the organization required for a particular task. From my experience in assessment and evaluation (A&E) in higher education, the most problematic and least-organized data sets are those that have been collected for one reason but are re-purposed to answer a new set of pressing questions or inform a new set of decisions. In the A&E business, we call these “administrative data.”
One example of administrative data is the information that we collect during the registration process for TLL’s Kaufman Teaching Certificate Program (KTCP). We ask registrants to indicate which of the following responses best describes their role on campus:
- Graduate student
- Postdoctoral associate/fellow
- Research scientist
- Other (please describe)
When registrants select “Other,” common responses include:
- Visiting fellow
- Affiliate faculty
- Visiting scholar
- Research scholar
This information is collected purely for administrative purposes and it is easy to see that trying to use it, for example, to understand whether there is a substantive difference in the experiences of KTCP participants who are MIT visitors compared to those who are not visitors would prove problematic. For example, might someone who selected “Research scientist” or “Instructor” also be a visitor?
As an A&E specialist, I am frequently approached by faculty, administrators, and staff who ask me to aid them in answering pressing questions related to student learning and/or course design. Early in the process, I ask for their available data. Many times, however, the data is not organized for the purposes of answering the questions posed but for administrative purposes. This, as you can imagine, makes the task of answering their questions rather difficult. In many instances, they will need to collect additional data and organize it more specifically with the pressing questions in mind.
Returning to the KTCP example: if someone in TLL came to me and said they wanted to understand the differences in participant experience between visitors and non-visitors, I would suggest that they ask registrants the following revised set of questions:
1. Which of the following best describes your role on campus?
- Graduate student
- Instructor (if yes, answer #2)
- Research scientist (if yes, answer #2)
- Fellow / Scholar (if yes, answer #2)
- Faculty (if yes, answer #2)
2. Is your role considered a visiting position here at MIT?
By making a clear distinction in the data collection process, the identification of those who are visiting MIT and those who are not visiting is straightforward. In the new data file, there will be two columns of data — one for a participant’s role at MIT and another for their visiting status — instead of just one column containing all of the information.
Other improvements could include modification of the categories so they are mutually exclusive and clear to respondents (e.g., will the difference between “Fellow” and “Research scientist” be known to all respondents?). In addition, although the inclusion of Other as a catch-all response category can be very tempting to the survey creator, it is impossible to control what information the respondents will provide. In our KTCP example, if we are really interested in learning about differences between visiting and non-visiting respondents, we should not rely on respondents to provide the information in the Other category; instead, we should ask them directly as shown above.
If you’re planning to work with your own data (or to provide it to an A&E specialist who will analyze it for you), here are a few additional pointers on data collection and organization that will help you build a more solid foundation upon which to answer your research questions:
- Ensure that each respondent (students, programs, departments, etc.) has a unique identifier. In the case of students, the student MIT ID is typically used as a student’s unique identifier. Two things are important here: a) no other student can have the same MIT ID and b) no student should have more than one MIT ID. It sounds like this wouldn’t happen, but it does!
- Freeze your data file (save your data file at one particular point in time) and work with that file for analysis purposes. This is extremely important if you work with “live” data (i.e., data that is constantly being updated by you or someone else). When data keeps changing, it is very difficult to get a clear picture of what is going on. Freezing the data allows you to analyze the data at a given point in time. Of course, you may have to time the exact moment when you freeze your data in a way that is meaningful for the purposes of your questions.
- When collecting data, make sure that you categorize your unit of analysis (i.e., students, programs, departments) into unambiguous and mutually exclusive categories. Respondents should not be able to be in two or more categories at the same time.
- Finally, when faced with answering pressing questions, consider whether you actually have the data to answer the question(s). If you don’t, schedule the time to collect the information. Also, if you’re thinking of conducting an assessment down the road, develop your data “wish list” now. Developing an organized list of the data you’ll need to answer your questions will definitely save you a lot of hassle, and you can be certain that your evidence-based decision is based on a complete and organized data file.
If you’re interested in learning more about data organization and how best to work with administrative data, one of our A&E specialists would be happy to speak with you. We provide one-on-one consultations and workshops about all things related to assessment and evaluation!
 For example, see edX universities say “no” to mediocre on-line learning, eCampus News, August 31, 2016.
(“Data Pool” from Janet Rankin / cc by-nc-sa)