Contents

Designing the Experiment in Terms of Alternatives and Attribute Levels

Choosing the Survey Media

Defining the Context for the Exercise

Designing the Sampling Plan

Validity of Stated-Preference Results

Combining Stated - and Revealed-Preference Data

Discrete Choice Models (DCMs) and the Willingness to Pay (WTP)

Page categories

Stated-preference surveys

Note: This material comes from the Travel Survey Manual Chapter 21. It was originally composed by Gonçalo Correia and Mark Bradley, drawing also from the FHWA’s Travel Survey Manual’s Chapter 13.

# Introduction

This chapter explains the basic theory of Stated-Preference (SP) surveys, with emphasis on transport mode choices, and provides some examples from recent practice. The chapter begins by distinguishing SP surveys from related survey experiments. Then, the four main stages of SP experiment design are described, focusing on the construction of choice alternatives and how these are presented to the respondents. The Chapter continues with discrete choice model theory and calibration using SP data sets, with emphasis on the binary logit model, and concludes with some specific examples.

In general transportation planning, surveys are meant to capture travelers’ current travel behavior. For instance, one is interested in knowing the actual mode a traveler is using, actual travel times, destinations, and so forth. This is known as Revealed Preference (RP) data, as the traveler is currently experiencing that behavior and making a choice based on his or her knowledge of the available travel options. Another type of data is based on Stated Responses (SR), in which hypothetical situations are presented to the respondents, who are then asked to choose based on the given attributes for each alternative, without necessarily experiencing them in real situations. SP is a very popular sub-class of SR methods, focused on estimating the utility function for alternatives (Lee-Gosselin 1995). Other question types in this class are stated intentions, stated tolerance, stated adaptation, and stated prospect, as discussed later in this chapter.

SR surveys offer a great advantage for overcoming the problem of the “new option”, whereby an analyst seeks to forecast the use of a new alternative (such as high-speed rail), particularly when the new option is very different from existing alternatives with which the respondent is familiar. The use of a new alternative is not reflected in RP data collected on choices made in real markets.

Another aspect where RP data often fails is that one is not able to correctly identify the alternatives that were not chosen. The decision maker faces options while having imperfect information, and not knowing all his/her alternatives. Other times the decision maker will only have access to attribute information on the chosen alternatives, e.g. having information on the current automobile trip and no information on all public transport alternatives possible in the area.

As noted above, the most common type of SR question is the SP variety, where the respondent is asked to chose, rank or rate different alternatives based on their attributes (e.g., travel time, travel cost, and wait time), thus giving information on the way the choice is made. For a choice-based SP survey one is aiming to get a choice on the preferred alternative. In a ranking experiment the respondent has to rank the alternatives in order of preference. In a rating experiment each alternatives must be classified using a scale which measures its attractiveness to the respondent. An example of a choice-based experiment is shown in Figure 21.1, where the objective is to understand how respondents make the choice between the automobile and bus alternative. Because each alternative is identified with its name, Automobile and Bus, this is called a “labeled” experiment and allows analysts to calibrate alternative-specific constants for such clear labels in DCMs. An alternative type of “non-labeled” experiment might simply seek to study bus users’ preferences among various service features, and ask respondents to choose between various bus service descriptions that differ only in terms of the service attributes, but have no overall distinguishing feature.

# Figure 21.1 Example of a Labeled Stated-Preference Experiment

There are at least four other types of SR surveys: Stated Intentions, Stated Tolerance, Stated Adaptation and Stated , as described below (and as extracted from Lee-Gosselin (1995)):

Stated Intentions: This is perhaps the simplest form of SR question. Typically one or two new choice alternatives are described, and respondents are asked if they would use the new alternative or not, in the form of a binary yes or no question (or they may be asked to rate how likely they would be to use the alternative). Such questions may be useful to get a simple overall indication of the demand for a new alternative, but do not contain enough detail to model the demand for alternatives with different attribute levels or under different scenarios. Also, a very simple type of choice question may not be sufficient to get respondents to consider their likely choice behavior very carefully.

Stated Tolerance: Techniques included in this class do not ask respondents to respond to alternative behavioral outcomes represented by specific attributes and attribute levels. Instead, respondents are asked to identify the conditions under which they would take a particular action or accept a particular behavioral outcome. The basic type of information sought is responses to questions such as: “Under what circumstances could you imagine yourself doing the following?” One form of this approach that received much attention in transportation planning in the 1980’s was the “transfer price” (TP) method. The respondent would consider two choice alternatives, and then be asked to imagine that the cost of one of the alternatives changes and indicate at what level he or she would switch to the other alternative. For example, auto commuters could be asked to consider their best transit option and indicate how high fuel prices would have to rise before they would switch to commuting by transit. This method thus gives a direct quantitative measure of the difference in utility between two alternatives, but it has been questioned whether or not travelers can respond very accurately to such questions. A related method that is popular in the field of environmental economics is that of “contingent valuation” (CV), in which people are asked to directly value a “good” that is not actually available for purchase. For example, people can be asked how much they would be willing to pay in additional taxes if it could ensure that everyone in their city has access to good public transportation. This could be thought of in the context of stated tolerance—how much of such a tax would people be willing to tolerate?

Stated Adaptation: Techniques included in this class ask respondents to indicate in a relatively open-ended manner how they would respond when faced with a particular set of constraints. The basic type of information sought are responses to questions such as: “What would you do differently if you were faced with the following specific constraints?”

Stated Prospect: With these techniques, neither the list of possible behavioral outcomes nor a detailed set of constraints is predetermined. Instead, respondents are typically presented with some sort of general scenario (e.g., an energy shortage) as a way of initiating the process of eliciting behavioral outcomes and constraints. Measurement methods for these techniques involve the use of simulation gaming techniques. The basic type of information sought are responses to questions such as: “Under what circumstances would you be likely to change your travel behavior and how would you go about it?”

Of the classes of SR techniques, SP surveys are the most important source of data for developing a Discrete Choice Model (DCM) to represent traveler decisions when faced with new travel alternatives and transportation policy actions. DCMs have played an important role in transportation modeling for the last 25 years. “They are namely used to provide a detailed representation of the complex aspects of transportation demand, based on strong theoretical justifications” (Bierlaire, 1997).

In this Chapter of the Travel Survey Manual we focus on SP experiments. In the following section the design and deployment of the experiments is explained focusing on the attributes and their levels as well as the media and the way to present the choices to the respondents. The next section presents the main analysis that can be conducted through the calibration of DCMs based on SP information.

# Designing SP Experiments

The design of SP experiments involves the following stages:

Designing the experiment in terms of the alternatives and attribute levels;
Choosing a media;
Defining the context for the exercise; and
Designing the sampling plan.

These are each described in turn here now.

# Designing the Experiment in Terms of Alternatives and Attribute Levels

“An experiment defined in scientific terms involves the observation of the effect upon one variable, a response variable, given the manipulation of the levels of one or more other variables” (Hensher, et al., 2005). This is a general definition that can be applied to any science or field of research and to any problem that involves a stimulus and a response.

In developing an experimental design, the first step is to specify the types of choice alternatives, the choice attributes, and the attribute levels to be included in the analysis. Consider the example of a binary choice where the respondent is asked to choose between driving or riding a bus based on the following attributes:

Travel time difference between the auto and the bus;
Cost difference between the auto and the bus in percentage;
Number of bus transfers.

In general, a minimum of three attributes is usually needed to provide a realistic context for the SP exercise. The attributes associated with a particular SP exercise should represent as much as possible those factors that are important in the choice process. Experience suggests that the number of attributes presented to a respondent should be limited to six or seven. Presenting respondents with more attributes makes the exercise increasingly difficult for respondents to deal with and may in some instances limit the usefulness of the data. (Note: Researchers tend to have different opinions in this regard, and some SP experiments have included over a dozen choice attributes—see Louviere, et al. (2000) for examples, and Jones and Bradley (2006) for further discussion.)

The classical approach, and the one that has been mostly used and tested to build these experiments, establishes attribute levels for the explanatory variables and then it uses these levels to build a design which is presented to the respondents.

The full factorial design that would result from all possible combinations of these levels most of the times results in too many choices. For instance if one defines three levels for each of the attributes described in the example above, the full factorial design would result in 3x3x3 = 27 different treatment combinations, too many to be answered by one respondent (Table 21.1).

# Table 21.1 Full Factorial Design

A note should be given on the minimum number of levels to use for the attributes: a minimum of three levels is required to detect non-linear relationships between attributes and preferences. Therefore when non-linear relationships are thought to exist, at least three levels should be used.

There are several ways to reduce the number of different treatment combinations required. These include the following:

Use “fractional-factorial” designs;
Remove options that will “dominate” or be “dominated” by all other options in the choice set;
Separate the alternatives into “blocks,” so that the full factorial design (or a larger fractional factorial) is completed, with different groups of respondents each responding to a different subset of options; and
Carry out a series of experiments with each individual, offering different attributes, but with at least one attribute common to all.

Use of a fractional factorial design - The approach to use a fractional orthogonal design can be done with many statistical software packages (SAS, SPSS, etc.). These computer programs search for a combination of the levels that result in 0 correlations between the attributes, known as the property of orthogonality, meaning independence between variables (Table 21.2). By guaranteeing that the attributes in the alternatives are uncorrelated onehelps to ensure that the effect of each attribute can be estimated as independently from the others as possible. (Recently, however, so-called “d-optimal” design approaches have been introduced, that are more statistically efficient than orthogonal designs under certain conditions. See Rose et al. (2008) and Bliemer and Rose (2008) for more details)

# Table 21.2 Fractional Orthogonal Design

Orthogonality among the attributes allows estimating the main effects of one variable on choice, independently of the effects that the other variables may have. For instance in the Auto/Bus mode choice example, if the choices that are presented to the respondents had always the same level for Travel Time Difference and Cost Difference, meaning total collinearity between the two vectors, it would not be possible to estimate the main effect of each of these variables on choice, because one cannot distinguish if the response is given due to price or to one or the other attribute.(This problem tends to occur in RP data, as one does not control the attribute levels. Although perfect collinearity is very unlikely to appear in real life data, correlations can sometimes be problematic, for instance travel time and travel distance can be highly correlated., )

While the fractional factorial approach can significantly reduce the number of treatments needed for a SP exercise, it typically does so by ignoring some or all interaction effects. If interactions among attributes are, in fact, significant, their effects will be loaded onto the individual main effects, while it will bias the estimate of the relative importance of individual attributes on response. The degree of bias will depend on the significance of the interaction effects. If this bias occurs, the main effects are said to be “confounded” with interaction effects. If interactions are expected to be important (e.g. the effect of real-time information for transit services may be highly related to the level of service frequency or waiting time), then a fractional design should be selected that allows unbiased estimation of those specific interaction terms.

Removing Dominant/Dominated Options - This approach applies primarily to SP exercises presented as choice experiments. With this approach, those choice alternatives that dominate or are dominated in each attribute by every other alternative included in the choice set can be excluded. The only potential drawback with this approach is that any respondents choosing alternatives at random or illogically will not be easily identified based on an analysis of their responses. Note that this approach can disrupt the orthogonality of the statistical design and introduce correlations between parameter estimates.

Block Design - Another approach involves dividing the total number of treatment combinations included in an experimental design into sub-sets (or blocks). The sample of respondents is randomly divided into groups, with each group receiving a different block.. This approach can be implemented by including the block number as an additional “attribute” in the design so that block membership is orthogonal to the choice attribute levels. In that way, each respondent will face all of the levels of the various choice attributes in a balanced way, which increases the efficiency of parameter estimation. Research by Hess, et al. (2008) has shown that the use of proper blocking of a large fractional factorial design is very important to ensure efficient parameter estimation.

Common Attributes - With this approach the attributes to be evaluated are divided among two or more experimental designs. At least one common attribute must appear in each design to allow comparison of relative preferences over all the attributes included. In practical examples, cost is often used as a common “linking” variable, since it has a metric, quantitative meaning that tends to be transferable across choice contexts. The issue of how best to divide the attributes into difference sets is mainly from the respondents’ point of view—which variables make the most sense to trade off versus one another. For example, if one wishes to evaluate many transit service attributes, then one could separate out the attributes related to the station/stop from those related to the trip inside the vehicle, using fare as a common linking attribute.

When generating choice sets via a factorial design, some alternatives that are generated may not be plausible and may affect the respondents’ confidence in the survey.. For instance,a respondent might find a “+60%” cost difference between the auto and the bus mode to be strange and unrealistic when he usually drives very few minutes from home to work and has no parking expenses. The respondent may try to imagine that situation, but his routine experience may keep him from understanding the hypothetical situation.

A common solution is to create designs that are built (“pivoted”) around the actual reported experiences of the respondents (Hensher, 2004). This is done by using information gathered in an earlier stage, where the respondent is asked about hisactual experience and behavior (a RP observation), characterized in the same attributes which are latter used for the alternatives specification. (Rose et al., 2008) state that“The use of a respondent’s experience, embodied in a reference alternative, to derive the attribute levels of the experiment has come about in recognition of a number of supporting theories in behavioral and cognitive psychology, and economics, such as prospect theory, case based decision theory and minimum regret theory” (They also warn, however, that care should be taken when using such customized designs in combination with d-optimal design approaches.)

# Choosing the Survey Media

Unless the SP exercise is very simple, some sort of visual presentation of the alternatives and attribute levels will be necessary in order to allow respondents to understand and comprehend what is being presented to them. This is particularly true for choice and rating exercises, in which the respondent must compare two or more alternatives. This would limit the usefulness of telephone interviews, unless the respondent has received survey materials in advance. Reading a large set of variables over the telephone would make it impossible for the respondent to memorize and compare the alternatives.

The format and layout of the instrument used for the exercise will depend to some extent on the type of response sought (i.e., choice, ranking or rating). For choice exercises, respondents will be comparing two or more alternatives at the same time. The alternatives comprising the choice set should appear together on a card, sheet of paper or computer screen. For ranking exercises, having each alternative on a separate card is very useful, since this approach allows the respondent to spread them out and physically arrange them in their order of preference however, this can also be done in a computer screen in more modern software. With rating data, it is usually only necessary to consider one alternative at a time independently from other alternatives. Therefore, a wide range of layouts are possible for these responses.

It is always useful and in some cases essential (e.g., when respondents are expected to complete the exercises on their own) to provide materials describing the alternatives, attributes, and attribute levels included in the exercise. This could include drawings or pictures of new travel modes (e.g., high-speed trains) or sample schedules and route maps for new transit services.

When SP designs are customized based on respondents’ actual reported choice situations, this approach can be handled most efficiently using computer-based technology because customized branching can obtain a clearer picture of each distinct respondent’s choices ; and then realistic alternative scenarios can be constructed to understand the respondent’s behavior. Although the survey itself is simple and straightforward for the respondent, there is significant behind-the-scenes programming used to resolve this complexity. The ability to survey respondents effectively using sophisticated methods allows the researcher to obtain the critical data he or she needs while making the survey experience simple and clear for the respondent.

One important advantage of computer-based surveys is the ability to immediately geocode the respondent’s origin and destination, and search databases to obtain realistic attribute levels for the O-D pair, allowing the construction of more realistic SP experiments for the respondent later in the survey (TRB, 2006)..

For many transit researchers, survey services are a very good solution to develop a survey at low cost and to learn firsthand about web-based surveys and how the process works. However, researchers often find that online services and generic survey software do not meet their needs. For example, longitudinal surveys cannot be created that track one respondent over time using such tools. Nor can SP surveys for mode choice studies be produced effectively using less expensive online survey services, although there is much more expensive software that does allow for advanced online mode choice surveys to be created. Features such as online geocoding and linking transit schedules are typically not incorporated into these surveys. Advanced validation cannot be accomplished, as these tools are not capable of, for example, comparing a zip code with a data table of zip codes to confirm if a respondent’s answer is an existing zip code or not. For general market research purposes, the most popular software for computer-based SP surveys is sold by Sawtooth Software. In the field of travel demand modeling, however, SP surveys are typically designed and fielded by consulting firms, often in collaboration with survey firms. In the US, Resource Systems Group has been conducting web-based SP surveys of travel behavior since the mid-1990’s.

# Defining the Context for the Exercise

A key objective in the design of SP exercises is to establish as much realism as possible. The following points noted by Jones (1989) are particularly relevant to building realism into the context of the exercise, the options that are presented and the responses that are permitted:

Focus on very specific rather than general behavior- i.e., ask respondents how they would respond to a particular product or service under a specific set of conditions rather than in general;
Use a realistic choice context that respondents have actually experienced or one that they feel they could be placed into;
Use existing or realistic levels of attributes within the experimental design so that the alternatives are built around these levels;
Limit the range over which attribute levels are varied to those values that respondents perceive to be possible;
Wherever possible, incorporate checks on the answers given;
Allow for the effect of day-to-day variability on choices;
Make sure that all variables relevant to the choice process are included in the analysis;
Where possible, simplify the presentation of choice exercises (e.g., by highlighting the attribute levels that are different between alternatives);
Make sure that constraints on choice are taken into account (e.g., fixed arrival times at work); and
Allow respondents to opt for a response outside the set of the experimental alternatives (e.g., in all alternatives in a mode choice exercise are too expensive, the respondent may choose not to make the trip, so “neither” should be included as a possible response).

Because of the nature of this type of survey where the respondent is asked to state his or her action according to attributes of alternatives which he or she has not perceived, it is extremely important to ask the right questions to not implicitly induce a specific answer.

The FHWA Travel Survey Manual (1996) provides some examples of confusing SP survey-related questions to avoid, and how they can be improved upon:

Questions Outside Respondent’s Experience:

Problem: “The agency is considering building a rail transit system similar to the one in Washington, DC.”

Improvement: “The agency is considering building a rail transit system.”

Technical Terms:

Problem: “Did you use an HOV lane for any part of your work trip?”

Improvement: “For any part of your trip from home to work, did you use a carpool lane that requires autos to have more than two people in them?”

Uncommon Idiom:

Problem: “With which mode did you make the trip?”

Improvement: “How did you get there?” List of modes provided by interviewer or questionnaire.

Omit Names of Alternatives:

Problem: “Under these circumstances would you choose to take the maglev system described above or would you choose to take the other alternative?”

Improvement: “Under these circumstances, would you choose to take choice A or choice B?”

Vary Descriptions of Alternatives:

Problem: A SP question refers to a two-page description of a proposed new mode developed by the equipment manufacturer, and asks respondents to select between it and the mode they use now for different combinations of travel times and costs.

Improvement: The description of the new mode should be minimized and well-balanced with positive and negative attributes. All alternatives should receive similar descriptions.

Link Personalities to Questions:

Problem: “Governor Williamson has proposed increases in transit service in the Mudville area. How do you fell about this proposal? Do you strongly agree, agree, disagree, or strongly disagree with it?”

Improvement: “How do you feel about the proposal to increase transit service in the Mudville are? Do you strongly agree, agree, disagree, or strongly disagree with it?”

Link Institutions to Questions:

Problem: “Please rate the bus service offered by the public transit agency, City Transit: excellent, good, fair, or poor?”

Improvement: “Please rate the bus service in your area. Is it excellent, good, fair or poor?”

# Designing the Sampling Plan

The same sampling issues associated to other surveys also apply to SP data. The difference with SP surveys is that each respondent typically provides responses to more than one choice exercise (typical response obtained in a RP survey). For example, if 50 respondents each complete 5 choice exercises, this would result in 250 data records. It is important to note that even with 250 responses, the sample size from the standpoint of assessing statistical precision is still 50. The fact that there are five data records for each respondent (i.e., five “repeated measures”) provides more information about each respondent, but not necessarily more about the population as a whole. Only an adequately sized random sample can do this. However, because SP survey sample sizes tend to be smaller than the sample sizes that are typical for RP household travel surveys (500-1500 respondents for a typical SP survey, as compared to 3,000-15,000 households for a typical regional travel survey), purely random sampling is rarely adequate, and it is often necessary to set quotas for specific sub-samples to ensure that there is sufficient data to estimate separate models and/or parameters for key market segments. In practice, sample size quotas or targets are often set for variables such as trip purpose (e.g. commute, non-commute), time of day (e.g. AM peak, PM peak, off peak), actual mode used (e.g. auto, transit), and vehicle occupancy (e.g. drive alone, drive with passenger(s)). Obviously, the sample size and quotas to be used in any specific context need to be tailored to the research purpose, the study area, and the survey budget.

# Validity of Stated-Preference Results

A concern often voiced about the use of SP data is that people do not necessarily do what they say they will do. Therefore a key issue associated with SP data is validity. Pearmain, et al. (1991) have reviewed a number of studies in which the validity of predictions of choice behavior based on SP techniques was investigated, Based on this review, they concluded that the results of most of these studies seemed encouraging, suggesting that SP techniques can predict choice behavior for the sample being studied with a reasonable degree of accuracy. However, they noted that most of the reported studies of validity had the following shortcomings:

The research was not done in a systematic way;
The research was carried out as a by-product of a practically-oriented study;
Some of the studies were based on incorrectly applied prediction methods; and
Typically the reported research only concerned the reproduction of existing behavior of the sample being studied; few studies deal with the generalization of predictions to entire populations, and very few look at the ability to predict behavioral changes in response to changed circumstances.

They concluded that additional systematic validity research is needed before definitive findings and general guidelines can be given. And, indeed, much validation research has been done since the early 1990’s, some of it reported by Louviere, et al. (2000). It is difficult to make general statements about the validity of SP research, however, because the validity of any particular study depends mainly on the quality and care with which the experiment is conceived, designed, and administered. This is partly a science, but also an art to some extent, and often there is no substitute for experience when it comes to surveys and market research. Therefore, it is very difficult to design and carry out meaningful SP research without obtaining at least some advice and input from persons and/or firms who have experience with this specific type of survey.

It is sometimes claimed that SP methods tend to produce mode-specific biases (alternative-specific constants) for new transit modes that are unrealistically high, particularly when they are divided by the time or cost coefficients and interpreted in terms of dollars or minutes advantage. One possible reason for “optimistic” forecasts of transit use with SP methods is “non-commitment” bias—the fact that respondents in hypothetical situations can imagine the attractive features of having a new transit alternative available, and can indicate that they would use the new service without having to actually commit themselves to the inconvenience and uncertainty of shifting to a new mode of travel. In actual situations, people may find it more difficult to change their habitual travel patterns. This potential problem with SP methods has long been recognized, and has been a major impetus for improvements in survey and design methods to make the experiments as realistic as possible. As described earlier, this can be done by customizing the choice context and attribute levels to mirror trips that respondents have actually made, and by prompting respondents to recall any choice constraints or circumstances of their actual trip that would make it difficult for them to change to another mode. Even with the most realistic, customized surveys, however, it is likely that some potential for non-commitment bias still exists. Furthermore, combined analysis of SP data with RP data, while useful in many ways, does not address this particular issue, because the RP data does not provide any information about modes that do not already exist.

One possible method for investigating this issue further is the “cheap talk” method, tested by Lu, Fowkes and Wardman (2006). In an SP survey to measure the demand for improved rail service, the authors included the following instructions (so-called “cheap talk”) for a random subsample of the respondents: “Previous surveys have sometimes found that people say they would be happy to pay extra for improved trains but when the fare is raised and the improved trains are provided, people say they would prefer the cheaper fare with the old trains. Bearing this in mind, as you read through the following choices, please imagine you will actually have to pay the fare stated.” (This particular example was not for a completely new rail mode, but one can imagine analogous text for that situation.) The authors found that including this additional text did significantly influence the results, lowering the estimated constant term for shifting to the improved type of train.

In general terms, it would seem that researchers should make more use of the split-sample approach described in the preceding example. By including well-design variations in the survey protocol and instructions for different subsamples of respondents, one can measure the influences that certain aspects of the hypothetical choice context have on the respondents. Although one cannot eliminate biases in this way, one can at least gain some idea of their possible magnitude, and then select the most conservative result for forecasting.

Another validity issue for SP surveys arises in the context of road pricing policies—the possibility that some respondents will state their choices as a sort of protest against the policy of introducing new tolls, regardless of the toll level. This is a type of strategic response bias that is analogous to the positive bias described for transit SP studies above. In this case, however, the bias will tend to be negative, with SP respondents perhaps less receptive to using tolled facilities in the hypothetical contexts than they would be in reality. It may also be more difficult to avoid or measure using a method such as the “cheap talk” approach described earlier, as it would seem more difficult to influence what tends to be a political or attitudinal phenomenon using simple verbal instructions. It may be more effective, in fact, to include a series of attitudinal questions to identify respondents who are most fervently opposed to the introduction of tolls, and then to estimate a different bias constant for such individuals. The difference between that estimate and a bias constant estimated on the remaining part of the sample would give an estimate of the size of this strategic bias.

# Combining Stated - and Revealed-Preference Data

The results of choice-oriented SP techniques is analogous to RP choice data collected as part of travel surveys. This gives rise to the possibility of combining these two types of data for model development and forecasting (Ben Akiva, et al. 1994). One approach would be simply to pool these two types of data, It has been shown, however, t hat this naïve pooling of SP and RP choice data can lead to seriously biased models. The key problem, noted by Bates (1988), Bradley and Kroes (1992) and others is that these two types of data are subject to different types of errors, making it unlikely that they share a common distribution of unobservable.

Specifically, SP choice experiments use carefully controlled scenarios to eliminate, as much as possible, the influence of extraneous variables so that the choice can be analyzed only as a function of the specific attributes and covariates that the researcher intends. That means that SP data tends to have a much higher “signal to noise ratio” than RP data where the research has no control over the choice context. When estimating discrete choice models (DCM), however, the scale of the coefficients is directly proportional to the amount of explained variation relative to the residual, unexplained variation (i.e, the “signal to noise ratio”). This means that models based on SP data tend to have coefficients with higher scale relative to RP-based models. In a predictive context, this means that SP-based models will tend to be more sensitive, and have higher elasticities. Therefore, although the relative values of SP-based parameter estimates are suitable for forecasting, the absolute levels may not be. It is advisable to re-scale SP-based parameter estimates using RP data before a model is used in forecasting. If the SP model includes a new alternative, then obviously there is no RP data for that particular alternative. However, there may be RP available for choices among two or more existing alternatives, and that data is usually adequate for re-scaling purposes.

A number of approaches have been developed to combine SP data and RP data for model estimation in a way that accounts for differences in error components. A sequential estimation procedure, described in Ben-Akiva and Morikawa (1990) can be carried out using readily available software. A more statistically efficient simultaneous approach has been developed which requires specialized software. This simultaneous approach has been adapted to use a form of nested logit estimation possible with existing software packages (Bradley and Daly, 1994).

# Discrete Choice Models (DCMs) and the Willingness to Pay (WTP)

As noted earlier, DCMs have been used extensively for transport mode choice studies. These may be calibrated through RP and/or SP data. The objective of this Chapter on SP surveys is not to explain DCM, for this we advise on reading for example BenAkiva and Lerman’s (1985) Discrete Choice Analysis, and Train’s (2002) Discrete Choice Methods with Simulation, these are recognized as two of the best text books to understand these models. DCMs may be applied to all consumer choices, and in fact its research started in the psychology and marketing fields before being used in engineering.

One of the most interesting sub-products of DCMs is the possibility of extracting trade-offs on the attributes of the alternatives. Among these trade-offs one of the most important is the willingness to pay, where one determines the amount of money that the respondents are willing to pay (WTP) in order to obtain some benefit. In linear utility specifications the WTP measures are calculated simply by dividing two parameter estimates. At least one of the parameters has to be measured in monetary units in order to produce a financial indicator.

One important WTP is the value of travel time savings (VTTS) where the travel time parameter is divided by the cost parameter producing a trade-off between both, measured in monetary units/travel time unit, as the utility is linear and compensatory in its parameters, the value of this ration measures the amount of money an individual is willing to pay in order to save one unit of time spent travelling, considering all the other parameters constant.

# (Eq. 1)

These measurements are very useful for pricing road infra-structure use or measuring the value of non-numerical attributes such as air or water quality, this last being extremely important for measuring the value of environmental externalities and incorporate them in a cost-benefit analysis. One should note that both parameters must be statistically significant in order to produce accurate measures of the VTTS.

Another example of an interesting WTP measure is the value of the time waiting for the bus or metro service. When comparing this with the value of travel time, one often observes a statistically, and practically, significant difference where waiting time is more costly than moving time. Other times, the number of transfers attribute may be shown to the respondent, resulting in an estimate of the average monetary value for this movement inside a terminal.

In several European countries, SP methods have been used to derive measures of the monetary values of various types of travel time to be used in cost-benefit analysis and other forms of economic evaluation—see Bates, et al. (1987) and Bradley and Gunn (1991), for example. Methods have also been developed to explicitly estimate the parameters of a log-normal distribution of VTTS across the population (Ben-Akiva, et al. 1993), and these methods are now more accessible with the introduction of software packages to perform “mixed logit” estimation.

# Examples

As mentioned above, study of the demand for public transportation has traditionally been the most common applied context for SP methods, particularly in Europe, but also in other parts of the world. Many of these studies have focused on the demand for (and willingness to pay for) improvements to existing transit services. The studies can include a large number of service attributes, including not only basic service levels such as fare, journey time, frequency, and number of transfers, but also various physical amenities in the vehicles and station environments, ticket purchase options, differences between classes of service, and a variety of other service features. Another common use of SP methods has been to predict the demand for a new type of public transportation service, for instance a new rail line where none exists at present. SP experiments have been used for many such projects, including major projects such as Eurotunnel and other high-speed rail corridors around the world.

The first example below was administered as part of the Seattle Household Activity Survey, carried out for the Puget Sound Regional Council (PSRC) and Washington State DOT in 2006, by MORPACE International teamed with Cambridge Systematics and Mark Bradley Research and Consulting. Full documentation is available in the “PSRC 2006 Household Activity Survey Analysis Report” (https://www.psrc.org/sites/default/files/finalreport_2006.pdf) This example used the phone-mail-phone survey approach that is typical for household travel surveys. Respondents who reported a trip in a relevant transit corridor as part of their travel diary data were then sent a customized follow-up SP questionnaire with hypothetical mode choice scenarios. They were then contacted once again by telephone to retrieve their answers. The introductory text and an example choice scenario are shown below. Note that the choice scenarios included an auto alternatives and two different transit alternatives, bus and rail. Also note that in addition to typical time and cost variables, the experiment also includes a seat availability variable and a service reliability variable.

Another mode choice example from a web-based survey is shown below. This example was from a 2006 study carried out by Resource Systems Group (RSG) to model mode choice between JFK airport and lower Manhattan in New York City. Note how the use of graphics and shading can make the choice scenarios clearer and more attractive for respondents.

Another common use of SP methods is to study road pricing for funding road construction and reducing congestion. In the U.S. over the last decade, this has been by far the most common context for using SP methods. The types of pricing most often depicted in the studies are new tolled lanes alongside existing lanes, such as high occupancy toll (HOT) lanes, or else totally new highways on which all lanes are tolled. A few SP studies have also looked at downtown area pricing and cordon-based pricing.

The first example shown below was carried out as a follow-up to the 2006 PSRC Household and Activity Survey, the same as for the mode choice example above. Respondents who had reported a trip in a relevant highway corridor were selected to participate in a follow-on Stated Preference survey. Customized SP scenarios were created based on the reported trip and mailed to respondents. A sample SP scenario is shown below. In addition to the toll and travel time variables, which are included in all SP experiments of this type, this experiment has two additional variables of interest:

Distance traveled: Because the free route may be an entirely different road than the tolled route, there may be a significant difference in terms of distance. In typical RP data, distance is so highly correlated with travel time that is not feasible to estimate separate time and distance coefficients. This SP data allows us to estimate such an effect.
Reliability of travel time: Here, a significant extra delay was defined as “more than 15 minutes late” (beyond the usual travel time), and the scenarios were varied in terms of how often such a delay occurs, allowing us to estimate the effect of the frequency of delay.

A second example is from a study carried out for San Francisco County (SFCTA) to study the possibility of implementing cordon pricing around specific areas of Downtown San Francisco. An SP survey was carried out in 2007 to aid in modeling the effects of such a policy and setting effective levels of cordon charge to influence traffic levels at different times of day. Auto travelers to downtown were intercepted and participated in a web-based SP interview. The web-based experiment was designed and carried out by Resource Systems Group (RSG) An example choice screen from the survey is shown below.

In contrast to the previous SP example from Seattle, this experiment did not include a non-tolled auto alternative, because, in the context of cordon pricing, that would mean not traveling to downtown San Francisco at all. (Additional survey questions about that possibility were asked after the SP questions.) However, a transit option was included, because transit to downtown San Francisco is a very viable alternative, and because part of the stated reason for cordon pricing would be to provide funding to maintain and improve transit services. For the auto alternatives, the variables used for this study were similar to those used for the Seattle SP described in the previous section, except that:

The definition of the peak period used for a given respondent was customized based on their actual departure time, and the duration and timing of the peak pricing period was varied across respondents, allowing a more detailed analysis of departure time shifting behavior.
For a given respondent, the effect of reliability was measured by fixing the frequency of delay and varying the length of the delay across the alternatives—the opposite of how it was presented in the Seattle SP. (Note that frequency was varied randomly across respondents, with ‘1 out of 10 trips’ used for half of the sample and ‘1 out of 5 trips’ used for the other half.)
All three auto alternatives involve using the same route, so there is no difference in distance.

A third example below is from an SP survey on HOT lane options carried out in the Minneapolis region to study the use of the MnPASS dynamically priced managed lane. Stated preference (SP) questions were developed to measure willingness to pay for use of the HOT lane. SP tradeoff questions were asked of all respondents who reported making a reference trip as a solo driver on the I-394. An interesting feature of this SP experiment is that it was carried out completely via telephone with no visual aids. This was possible because the choice context was very simple, with just two choice alternatives and two attributes—travel time and toll. The tradeoff questions were asked using the wording shown below. The value T in brackets was time reported by the respondent as the fastest reasonable time they could make the trip if there were no congestion. The values X and Y were varied using an experimental design, with several sets of values used per respondent to reflect different time/cost tradeoffs. For the last series of questions, the time savings Y was held constant and the toll level X was varied adaptively to find the point where the respondent would switch between using the toll lane and the free lane. This is an adaptation of the “transfer price” method described earlier in this chapter, and provides an individual-level estimate of the willingness to pay for each respondent. For more information on this study, see Bradley and Zmud (2006).

Now assume you’re making the same trip in the future that you just told me about. It’s a trip on the same day of the week, at the same time of day, for the same purpose, and you’re under the same time pressures. You enter the freeway, I-394, and find out that you can make this trip using a toll lane and paying via electronic toll collection if you want to.

If you were to use the general traffic lanes on I-394, your trip would take [T + Y minutes] and be free. If you were to use the toll lane you would $X and your trip would take [T minutes] saving Y minutes. Now under these conditions, which would you choose to do?

Use the toll lane, pay $X and save Y minutes

Use the general lane for free.

A third example context for SP research is the choice of where to park. Parking price and supply can be some of the most influential policy levers to influence travel behavior. The first example below is from a study of using on-campus parking versus using a shuttle bus from remote parking near a university in Mexico City. This survey was carried out during personal interviews with students and staff who were intercepted on campus, and the choice situations were presented on individual cards. Note the use of graphics to help clarify the meaning of the attributes.

The second parking example below is also for a choice between on-site parking and remote parking with a shuttle bus, and was also carried out during interviews with people intercepted on-site, this time with visitors to Muir Woods in the Bay Area.

# Conclusions

This chapter on Stated Preference surveys examines how to build experiments which allow for calibration of DCMs to better appreciate and anticipate travel behavior, particularly in the presence of new alternatives. Due to the stated nature of the responses (rather than verifiable, revealed behavior), it is very important to build the right context for the experiment and ask the right questions.

While demand model specification and estimation was not the focus of this Chapter, the usefulness of such SP data for DCM calibration was demonstrated, via calculation of trade-offs. One of these trade-offs is the willingness to pay in order to save travel time, a key measure in enhancing existing transport services and designing new ones.

# References

Bates, J., (1988). Econometric Issues in Stated-Preference Analysis. Journal of Transport Economics and Policy, XXII(1), 59-69.

Bates, J.J, M. Bradley, M. Wardman, A. Fowkes; H. Gunn and several others.(1987). The Value of Travel Time Saving. A report of research undertaken for the U.K. Department of Transport. Policy Journals. Newbury, UK, 1987.

Ben-Akiva, M. and Morikawa, T., (1990). Estimation of Switching Models from Revealed-Preferences and Stated Intentions. Transport Research 24A(6), 485-495.

Ben-Akiva, M. E. and Lerman, S. R., (1985). Discrete choice analysis: theory and application to travel demand, Cambridge.MIT Press.

Ben-Akva, M. , D. Bolduc, and M. Bradley. (1993). Estimation of travel choice models with randomly distributed values of time. Transportation Research Record. 1413: 88-97. Transportation Research Board, Washington, D.C.1993.

Ben-Akiva, M., M. Bradley, T. Morikawa, J. Benjamin, T. Novak, H. Oppewal and V. Rao. Combining revealed and stated preferences data. (1994) Marketing Letters. 5(2): 335-349. Springer, Amsterdam. 1994.

Bierlair, M. (1997) Discrete Choice Models. Available from: http://roso.epfl.ch/mbi/papers/discretechoice/paper.html. Access date: 11 October 2009.

Bliemer, M.C.J. and J.M. Rose (2008). “Construction of Experimental Designs for Mixed Logit Models Allowing For Correlation Across Choice Observations”. Paper presented at the Transportation Research Board Conference, Washington, DC, January 2008.

Bradley, M. and H. Gunn (1991). A stated preference analysis of values of travel time in the Netherlands. Transportation Research Record. 1285. Transportation Research Record, Washington, D.C.1991.

Bradley, M. and Kroes, E. (1992), Forecasting Issues in Stated-Preference Research. Selected Readings in Transport Survey Methodolog. E. Ampt, A. Richardson and A. Meyburg. Melbourne, Eucalyptus Press. pp. 89-107.

Bradley, M and A. Daly. (1994). Estimation of logit choice models using mixed stated preference and revealed preference information. (1994). In Understanding Travel Behavior in an Era of Urban Change. P.Stopher and M.Lee-Gosselin Ed. Pergamon Press, Oxford, 1994..

Bradley, M. and J. Zmud. (2006) Validating Willingness to Pay Estimates for Tolled Facilities through Panel Survey Methods. Paper presented at the 11th International Conference on Travel Behavior Research. Kyoto.

Tierney et al., FHWA (1996). Travel Survey Manual. Cambridge Systematics, Inc., Cambridge, MA.

Lee-Gosselin, M. (1995) The Scope and Potential of Interactive Stated-Response Data Collection Methods. TRB’s Conference on Household Travel Surveys, Irvine, CA.

Hensher, D. A., (2004). Identifying the influence of stated choice design dimensionality on willingness to pay for travel time savings. Journal of Transport Economics and Policy, 38, 425-446.

Hensher, D. A., Rose, J. M. and Green, W. H., (2005). Applied Choice Analysis - A Primer, Cambridge.Cambridge University Press.

Hess, S., C. Smith, S. Falzarano, and J.Stubits (2008). “Measuring the Effects of Different Experimental Designs and Survey Administration Methods using an Atlanta Managed Lanes Stated Preference Survey”. Paper presented at the Transportation Research Board Conference, Washington, DC, January 2008

Jones, P. (1989) An Overview of Stated-Preference Techniques. (Note –this needs more details, Goncalo…)

Jones, P.M. and M. Bradley (2006). “Stated Preference Surveys: An Assessment” in Travel Survey Methods: Quality and Future Directions. P. Stopher and C. Stecher ed. Elsevier Science. .

Louviere, J.J., D.A. Hensher and J.D, Swait. (2000). Stated Choice Methods: Analysis and Applications. Cambridge Univ. Press.

Lu, H., Fowkes, A. S,. Wardman, M. R. (2006) “The influence of stated preference (SP) design on the incentive to bias in responses”. Paper presented at the European Transport Conference, Strasbourg, October 2006.

Pearmain, D., Swanson, J., Kroes, E. and Bradley, M., (1991). Stated-Preference Techniques: A Guide to Practice. Steer Davies Gleave and Hague Consulting Group.

Rose, J. M., Bliemer, M. C. J., Hensher, D. A. and Collins, A. T., (2008). Designing efficient stated choice experiments in the presence of reference alternatives. Transportation Research Part B, 42(4), 395-406.

Sawtooth Software http://www.sawtoothsoftware.com/

Train, K. E., (2002). Discrete Choice Methods with Simulation.Cambridge University Press.

TRB (2006). Web-Based Survey Techniques. to be completed

The Online Travel Survey Manual provides a comprehensive overview of travel surveys. It is curated by Transportation Research Board’s Travel Survey Methods Committee (ABJ40).