Project #3: Summarizing poll responses

Course links

Suppose that we have an inconveniently large mass of polling data, and we'd like to summarize it so as to clarify its political implications. The procedures that you'll develop in this lab will automate the process of tallying and classifying the poll results in various ways.

Step 1

I've prepared a starting point for you, at /home/stone/courses/scheme/examples/polls.ss. Copy it into your home directory and open it with DrScheme. It requires the teachpack /home/stone/courses/scheme/teachpacks/hop.ss; ask DrScheme to add that teachpack, if you haven't already done so.

Step 2

The program you loaded in step 1 includes a description of the format in which the polling data will reach you and a sample data set: a list of more than a thousand reports of individual polling sessions. Each of those reports is supposed to include a six-digit number identifying the person polled (the ``respondent''), a two-digit number indicating the county in which she resides, indications of her party affiliation and how she voted in the 2004 presidential election, her positions on sixteen different issues, and a two-digit number identifying the interviewer who posed the questions.

Study the opening comments and examine a few of the reports to make sure that you understand how the incoming data are structured.

Step 3

I've provided some procedures for cleaning up the data, that is, for excluding incorrectly formatted reports and duplicates (multiple reports involving the same respondent). Unfortunately, the procedures for excluding duplicates aren't working -- for instance, if you look closely, it turns out that the clean-sample-data list that is supposed to have emerged from the cleaning-up process still has duplicates in it (including no fewer than five identical reports from the respondent with ID number 354374).

Your first job, then, is to figure out what the programmer did wrong and fix it.

Step 4

Once the data have been successfully cleaned up, it is possible to classify and summarize them in a variety of ways. I've provided a simple example -- a procedure, party-tally, that counts the number of respondents who declared a particular party affiliation. And I've shown how to use this procedure to build a table (in the form of an association list, sample-party-tallies) that summarizes the party affiliations of the respondents from the sample poll.

Prepare a similar table showing how the respondents voted in the 2004 Presidential election.

Step 5

Many readers and users of poll results prefer to see such tallies expressed as percentages rather than raw counts. Design, write, and test a procedure, convert-table-to-percentages, that takes as its argument an association list in which the values are positive integers and produces a similar association list in which the values have been converted to percentages of their sum. (Round each percentage instead of keeping all the decimal places.) Apply your procedure to sample-party-tallies and check the result.

Step 6

Design, write, and test a procedure issue-response-by-party-tally that takes four arguments: a list of poll reports, an issue number (in the range from 1 to 16), a party-affiliation symbol, and a response symbol (yes, no, or no-opinion) as arguments and returns a natural number, indicating how many of the respondents who declared that party affiliation gave that response when asked about that issue:

> (issue-response-by-party-tally clean-sample-data 3 'Republican 'yes)
166
> (issue-response-by-party-tally clean-sample-data 12 'other 'no-opinion)
1

Step 7

Construct a list of sixteen association lists, one for each issue. Each of the association lists should have the response symbols yes, no, and no-opinion as its keys and, as the corresponding values, the (rounded) percentages of independent respondents who gave each of those answers when asked about the issue in the clean-sample-data poll. (By ``independent respondents,'' I mean respondents who gave their party affiliation as ``independent.'')

Step 8

Poweshiek County is county 79. Determine whether the percentage of Poweshiek County respondents who answered yes on issue number 13 was higher or lower than the percentage of respondents from other counties.

Step 9

Was voter turnout in the 2004 Presidential election higher among Republicans or Democrats participating in this poll?

Was the rate of ``voter defection'' -- voters belonging to one party voting for the other party's Presidential candidate -- higher among Republicans or Democrats participating in this poll?

Were independents who voted for Bush more likely to answer yes on issue 13 than those who voted for Kerry?

Which Presidential candidate was favored by respondents who answered no on issue 7?

Step 10

Devise similar questions that could be answered by mining the results of the clean-sample-data poll. Answer them.