- 
COST BENEFIT ANALYSIS
CPI 2007 Revision Initiative
Revised 1, June, 2005
Report on Plans for Continuous Revision of CPI Geographic and Housing Samples
Introduction
The Consumer Price Index (CPI) is the principal source of information concerning trends in consumer prices and inflation in the United States, and is one of the Nation’s most important economic indicators. The measure is used extensively for economic analysis and policy formulation in both the public and private sectors. The CPI also is used to adjust payments to social security recipients and to federal and military retirees, and for a number of entitlement programs such as food stamps and school lunches. In addition, the CPI is used to adjust individual income tax brackets, exemption amounts, and other tax parameters for changes due to inflation.
In order to maintain the accuracy and currency of the CPI, comprehensive updatings of the CPI have been undertaken by the Bureau of Labor Statistics (BLS) about every ten years. There have been six such Revisions in the history of the CPI. Revision periods provide an opportunity to reflect changes in the geographic distribution of the population and in consumers’ buying habits; to incorporate improvements in technology and index methodology; to update survey techniques; and to modernize computer system hardware and software. In the past, Revisions were funded through periodic budget increments. The most recent CPI Revision was funded through a multi-year initiative beginning in FY 1995. In addition to sample updating, it included projects to develop a new housing estimation system, a computer-assisted data collection system, and a new Telephone-based Point of Purchase Survey (TPOPS).
In December 1998, the BLS announced it would update the consumption expenditure weights in the Consumer Price Index for all Urban Consumers (CPI-U) and the CPI for Urban Wage Earners and Clerical Workers (CPI-W) to the 1999-2000 period, effective with release of data for January 2002. Additionally, CPI expenditure weights would be updated at two-year intervals subsequent to the 2002 updating. This policy change represented the first major step in moving from a decennial Revision schedule to a more accelerated process. The 1999-2000 weights, which were introduced in 2002 as scheduled, replaced 1993-95 weights that were first used in the index effective with January 1998 data. The next weight update will occur effective with release of CPI data for January 2004, when the weights will be updated to the 2001-02 period. As a result of this change, expenditure weight data will be, on average, "two years old" when introduced into the CPI, and four years old when replaced. By contrast, the 1993-95 weights were, on average, 3½ years old in January 1998, and they replaced weights that were about 15 years old.
In FY 2002, the BLS received funds to take further steps in revising and updating the Consumer Price Index on a continuous basis. Outlet sample rotation—that is, updating of stores and other establishments in which prices are collected—is now being completed on a four-year rather than the previous five-year cycle. Beginning in FY 2003, item samples for a significant proportion of index categories will be reselected midway between each four-year outlet sample rotation. Continuous modernization also is underway in the computer systems area. Work is now focused on upgrading and improving the infrastructure underlying the commodity and service components of the index.
The last important part of the Continuous Updating program initiated in 2002 was to conduct an evaluation of whether or not the continuous revision process can be extended to revising and updating the samples of geographic areas in which prices are collected and housing units for which rents are collected.1 This report provides the results obtained thus far from that ongoing evaluation.
The sample of geographic areas on which the index is built is selected on the basis of population data from the Decennial Census. The current CPI geographic sample was based on the 1990 Census and has been used in the index since 1998. In the past, new areas were introduced into the official CPI over a short period of time, necessitating a temporary but sharp increase, or “spike,” in the number of staff and related program resources. For example, during the last revision in the CPI area sample, 36 new areas were introduced in February 1998. In order to carry out the work associated with introducing the new areas, BLS had to increase staff and related resources significantly over a short period of time. Training and managing these resources can be both costly and somewhat wasteful, since the staff levels must be reduced when the revision period concludes. Moreover, the rapid increase in workload can create an environment conducive to data collection or processing errors.
Revising the sample of housing units is the other major Census-based CPI Revision activity that BLS has not yet converted to a continuous process. CPI prices are collecting using two surveys: the Housing survey, used for the Residential Rent and Owners’ Equivalent Rent indexes, and the Commodities and Services (C&S) survey used for all other item categories. As discussed above, outlet and item samples for C&S categories are rotated on a regular basis. When the geographic sample changes, the C&S rotation process must be redirected to a different set of areas, with some added complexity arising from the necessity of introducing entire outlet and item samples simultaneously in the new areas. This adds to the overall C&S rotation cost but does not represent a dramatic increase in resource requirements.
In contrast to C&S, there is no current process, and hence no source of funding, for housing sample rotation. Historically, updated samples of housing units for measuring changes in rental values have been introduced about every ten years, along with introduction of the area samples. During the last Revision, for example, housing samples in both the new and continuing CPI areas were introduced into the index in January 1999, one year after the introduction of the area sample. The Decennial Census has provided the necessary information on the location and types of housing units within geographic areas. Elimination of some questions from the 2000 Decennial Census short form have reduced its value in selecting locations within cities in which to collect rent data. A more fundamental deficiency in the use of Census data as a frame, however, is the inability to update the housing sample more frequently than decennially in response to changes in the neighborhood locations and types of housing units in which people live. Although the CPI samples are augmented with samples of newly constructed housing units, this is only a partial solution, not a true sample rotation process.
Housing sample reselection is a very costly activity, and the BLS cannot achieve a level updating budget without identifying a means of rotating housing samples on a continuous basis. This is likely to require that alternatives to the Decennial Census be found for sample frame and weighting information, both in new and continuing geographic areas.
It should be emphasized that reselections of the geographic and housing samples are almost inseparable activities from an operational standpoint. In particular, continuous area rotation probably requires that the BLS find a way to select housing samples without reliance on Census data alone. Otherwise, housing samples in the last rotating areas will be initiated long after the Census year on which their housing sample designs are based. This would work against the goal of maintaining timely and representative CPI samples. Meanwhile, it would be impossible to rotate the area sample for C&S pricing while leaving housing pricing in the old area sample. Field workload considerations make it infeasible to maintain collection of housing data in one set of cities and collection of prices of other commodities and services in a different set of cities.
Adding to the task of developing a continuous process for housing sample updating is a dissatisfaction with the operational methods used in the past. It is hoped that alternatives to these methods of locating and initiating rental housing units can be found that are less costly and more effective.
The remainder of this report lays out an approach to continuous rotation area and housing samples in turn, highlighting which planning activities have been accomplished and what research issues have yet to be resolved.
PSU Revision
This section discusses the necessary first step in designing a rotation plan for CPI areas; namely, the selection of the areas that will comprise the new sample2. This step is now complete. The new sample will also be employed in the Consumer Expenditure (CE) Survey beginning in 2005. To enable the CE household sample to be selected in coordination with the sampling for other Federal household surveys, it was necessary to provide the list of CPI/CE areas to the Census Bureau in July of 2002.
The new area sample was selected based on the 2000 Census of Population. It was first necessary to determine the basic constraints on the process, in particular the constraints imposed by the CPI area publication structure. Changes in populations between 1990 and 2000 required us to address the publication issue, since the currently published metropolitan areas—selected, as noted above, based on the 1990 Census—are no longer the largest areas in population. It was therefore necessary either to increase the number of published cities, or to drop some that are currently published.
In late FY 2001, joint meetings were initiated among representatives of the CPI and CE programs, as well as the BLS’s Division of Price and Index Number Research and Office of Survey Methods Research, to review what was done in the past and what changes might be desired for the future. Discussions focused on whether a reduction in the number of primary sampling units (PSUs), either certainty or non-certainty, could lead to an increase in the accuracy of the CPI. Several scenarios were developed and simulated using detailed cost and variance information taken from the CPI Item-Outlet Optimization Model. Based on these simulations and other considerations, it was decided that the number of published metropolitan areas would be reduced by three, with Milwaukee, Kansas City, and Cincinnati being dropped, while the West D (non-metropolitan) stratum would move from unpublished to published status.
Background. Currently, CPI series are published for 27 metropolitan areas with 1990 populations of at least 1.5 million. (Phoenix was added as the 27th published area in January 2002.) These include the 25 largest areas in 1990, plus Honolulu and Anchorage, which have much smaller populations; publication of the latter two areas is considered justifiable because of their unique locations. In addition, CPI data are published for smaller metropolitan areas (referred to as the B/C strata in CPI publications) in four Census regions and for non-metropolitan (D-size) urban areas in two Census regions.3 Data are collected but not published for the non-metropolitan West region, and the CPI has no D PSUs in the Northeast region because the population in that stratum was so small in 1990.
The total number of CPI PSUs is 87: of these, 31 are in the 27 published metropolitan areas (New York comprises three PSUs and Los Angeles and Washington/Baltimore each comprise two), 46 in the B/C strata and 10 in the D strata. Non-metropolitan, primarily rural PSUs also are sampled in all four regions for the CE program only. Altogether, there are 38 CPI “index areas,” for which basic CPI indexes are computed: the 31 A PSUs, the four regional B/C strata, and the three non-empty regional D strata.
Approximately 87 percent of the 1990 U.S. population were members of households covered by the CPI-U. This share will increase when samples based on the 2000 Census are incorporated in the CPI, both because of changing populations and because of changing OMB area definitions. Notably, the CPI-U population will be defined to include all residents of metropolitan and micropolitan Core Based Statistical Areas (CBSAs). Also, New England CBSAs will be defined using county rather than town boundaries.
The CPI’s D stratum will then be made up of the new “micropolitan” CBSAs. Partly as a consequence of this definitional change, the population of the Northeast D stratum will be large enough to justify selection of PSUs in that stratum. The increased population will not be sufficient to justify publication of the Northeast D stratum, but the West D stratum will be publishable, unlike the current situation.
It should be noted that specification of PSUs for the CPI and CE was based on preliminary OMB area definitions and populations. It was decided that postponing selection until these were declared final would have imposed unacceptable delay in implementing the new geographic sample.
Selection of New PSU Sample. Table 1 shows the 2000 populations of the 27 published metropolitan areas as proportions of the total CPI-U population. The table also shows that two metropolitan areas, Sacramento and San Antonio, now exceed some published areas in population. This made it necessary for the BLS to reassess the list of cities for which separate CPI index series would be calculated and published.
Selection of PSUs was carried out using statistical methods aimed at minimizing index variance. The most critical constraint imposed on this probabilistic selection process was the specification of published metropolitan areas, which would be selected with certainty. This decision determined the population boundary between the A (certainty area) strata and the non-certainty B/C and D strata. By reducing the number of published A cities, the standard error of the U.S. CPI can be improved, at the cost of reducing the level of geographic index detail provided to the public.
Simulations were used to evaluate several options. The options were distinguished primarily by the set of published certainty cities. One other issue that was investigated in the simulations was whether it would be possible to reduce the number of sampled B/C and D PSUs with only small upward impacts on variance. If this were true, the cost savings from reducing the total number of PSUs potentially could more than offset the variance loss by permitting an increase in the total number of sample quotes. The simulations demonstrated, however, that this hypothesis was false. Eliminating B/C PSUs led to increases in index variance that were far too large to be justifiable on the basis of cost savings.
There was no support for increasing the number of published metropolitan areas beyond the present 27. Besides implying the publication of local area indexes with high variances, such a strategy would increase the overall U.S.-level CPI variance.
It was also decided that ending publication of Anchorage and Honolulu was not appropriate. Those cities’ published indexes date from the early 1960s, and their deletion has been rejected in the past. Moreover, examination of price indexes is at least partially supportive of the hypothesis that Anchorage and Honolulu have unique inflation experiences: over the last 10 years their rates of price increase have been among the lowest of all published CPI cities.
The option ultimately selected was to drop three current A PSUs—Milwaukee, Kansas City, and Cincinnati—without adding either Sacramento or San Antonio, the largest metropolitan areas not currently published. The three dropped cities all gained population at a slower rate than the A cities as a whole, and their deletion represents a partial step toward the most efficient possible sample design for the CPI. Simulations showed that this change would slightly reduce the estimated six-month U.S. All Items standard error.
Examination of the population figures in Table 1 shows that there is a particularly large interval between Cincinnati and the next larger A PSU, Portland, providing what can be viewed as a natural point at which to place the A stratum boundary. The 296,000 population difference between Cincinnati and Portland is wider, for example, than the 290,000 range that contains Cincinnati, Sacramento, Kansas City, San Antonio, and Milwaukee. This means that the A boundary could be defined somewhere in this interval, at a population level that would neither narrowly include nor narrowly exclude any city.
This change also reduces the number of index areas, to 36. This offers another potential gain, from the mitigation of any small sample bias that may arise when basic indexes are computed using small numbers of prices. Allocating the same total number of quotes to a smaller number of index areas is likely to reduce the number of small item samples.
Having specified the number of certainty, self-representing PSUs, the CPI program was then able to select the complete set of 86 PSUs for the new, 2000 Census based sample. Initially, the CPI divided the remaining urban population into 58 equivalent strata. Each strata was designated to cover about the as much population as the smallest A-sized city. Based on the population, 42 strata were designated for the medium sized cities (previous B/C-sized cities are referred to in the table as X-sized) and the remaining 16 were designated for the micropolitan areas (D-sized cities are referred to in the table as Y-sized). Geographic areas were then mapped into the strata based on their population and on their longitude and latitude, variables that had been shown to be most significant in explaining price differential between areas. The result was 58 strata each composed of two or more "price change-equivalent" areas. One PSU was then selected from each strata using a "keyfitzed" probability sampling method that increases the probability of reselecting a PSU currently in the sample provided that it has not lost population between the two selection periods. For a more complete description of the sample selection process see the article by William Johnson, Owen Shoemaker, and Yeon Rhee “Redesigning the Consumer Price Index Area Sample,” attached as Appendix I.
The new geographic sample will include the 22 largest urban areas, comprising 26 PSUs, plus Anchorage and Honolulu. Another 27 smaller areas from the current sample are included in the new sample as well. Thirty-one areas will be new to the CPI sample, replacing 32 current sample areas.
The areas in the new sample are shown in Table 2 As the table shows, four PSUs will be allocated to the West D stratum, permitting publication of that stratum index for the first time.
Proposed Rotation Schedule. Over the past year the BLS has developed a revised schedule for introducing the new geographic areas into the CPI sample. As noted above, in 1998 all the new areas in the 1990 Census based sample were introduced simultaneously. In the proposed new schedule, only four or five areas will be introduced in any one-year, thereby smoothing out the required resource level. Under this plan, the 31 new areas will be divided into seven groups; the first group will be used in the CPI for the first time in 2008, and the last group will be used for the first time in 2014.
Chart 1 displays the schedule for bringing in the seven PSU groups. In each area, several steps must be completed prior to initial use of price data in the index: TPOPS collection, outlet sample processing, and initiation. As shown in the chart, each new geographic areas will roll into the CPI through a four-year process. During the first two years, the BLS will develop a sample of retail establishments and outlets from which to collect prices by working with the Bureau of the Census to conduct a TPOPS survey in the new area. The TPOPS survey will be an extension of our current survey process, which is carried out in every PSU on a continuous four-year rotation cycle. The regular sampling schedule of item and PSU categories will be adjusted to accommodate the need for complete, timely outlet samples in the new areas. In year three, BLS will process the data collected in the TPOPS survey and establish a field presence in the new geographic areas, hiring and training staff. In year four, BLS will initiate pricing of the CPI sample in retail establishments selected to represent the geographic area. At the conclusion of year four, the new geographic area will replace an existing area in the computation of the commodities and services component of the official CPI. At the start of each year, a new set of geographic areas will begin the four-year process. When fully implemented, both Census TPOPS and BLS processing and initiation activities will be underway simultaneously, although in different geographic areas. Tables 3 and 4 below show the activities in more detail and contrast the new process with the existing one. Table 3 addresses adding geographic areas and Table 4 addresses dropping areas.
Table 5 lists the CPI geographic areas in order of their priority for introduction (deletion) to (from) the CPI.. PSU's were grouped into four prioritized categories based on the importance of getting the new geographic areas into the CPI. Within the new geographic areas, the first category, referred to as "geographic holes," are the most important to get into the index. These nine new geographic areas are those selected to represent strata for which there is no priced area in the current sample design. For example, Augusta, Maine and Ithaca, New York are small cities in the Northeast, a stratum of the population not currently represented in the CPI geographic sample. It is important to add these areas as quickly as possible in order to represent fully the U.S. urban population. The ten geographic areas in category 1 that can be dropped immediately are labeled "strata duplicates." They can be dropped because they were not reselected to continue in the CPI and are in geographic strata for which another continuing area is present. For example, Johnstown, Pennsylvania is in the same strata as Reading, Pennsylvania. As noted above, Johnstown was not reselected from the 2000 Decennial CPI area update, while Reading was. Because they are both in the same geographic stratum, it is unnecessary, and in fact inefficient, to price both of these areas.
Each of the remaining three categories is comprised of sets of matched new and dropping geographic areas. The transition from the old sample to the new for these areas will be accomplished through the matched adding and dropping of areas simultaneously. Each category is prioritized based on the quality of the pairings. For example, the second group ("150-mile similar match") is comprised of five new geographic areas matched with five dropping areas that are within 150 miles of one another. These are the least similar matches and therefore are the highest priority in adding and dropping. For example, we can continue to price in Chanute, Kansas until we are able to add Springfield, Missouri, although Chanute is a small or C-sized area while Springfield is a medium sized or B-sized one. Next in priority are the three index area matches. For each of these three new geographic areas, there currently exists an area that can remain in the CPI as a proxy until the new area is introduced. While the proxy area is not from the same strata as the new, the match is similar in terms of index, or publication, area. For example, Gainesville, Florida is in the same publication area, medium-sized South, as Jacksonville, Florida. Finally, there are 14 pairs of strata matches. In these cases, it is relatively unimportant when they are rotated because the new area and the dropping area are both from the same strata, and therefore which area to price is a stochastic event. For example, it is a matter of chance whether we selected Dayton or Bellefontaine, Ohio for pricing as both are part of the Y-208 strata in the North Central U.S.
In addition to adding new areas, converting to the new sample will include redefining the geographic boundaries of continuing PSUs to conform to the updated OMB definitions, as counties are added to PSUs or moved from one PSU to another. For example, Hampden and Hampshire counties in Massachusetts were moved from the Springfield, Massachusetts PSU to the Boston SMSA, leaving Franklin County as the residual component of Springfield. The CPI introduces these geographic changes as quickly as possible, as part of the sample rotation. For expenditure weights, based on data from the Consumer Expenditure Survey, the new geographic definitions are fully reflected in the CPI beginning with the data for January 2008. For TPOPS, the conversion to the new geographic definitions will take place such that all samples drawn for initiation and use in the CPI after January 2008 will reflect the new geography. It should be noted that until the remaining C&S samples are rotated out of the CPI they will be based on the old geographic definitions. Housing samples will similarly be updated so that new samples initiated in anticipation of the January 2008 index will reflect the new definitions, while older samples will reflect current definitions until updated through replacement or rotation.
Tradeoff of Timeliness and Efficiency. The new approach allows staff to be retained longer, leading to a more efficient and skilled workforce. The corresponding disadvantage is that it will take longer than in the past to complete the area sample rotation. To the extent that rapid, significant population shifts have taken place or will take place over the next several years, the efficiency of the CPI could be reduced by the longer retention of areas that were selected on the basis of the 1990 Census.
It should be emphasized, however, that every two years the BLS updates the expenditure weights attached to item-area categories in the CPI. As part of those biennial updates, the population weights for CPI sample index areas are updated. Thus, the loss in efficiency from retaining PSUs longer than would otherwise be the case comes only from the fact that the retained PSUs may provide less efficient estimates of spending patterns for the index areas that they represent. For example, until rotated in 2011, the index for Stratum X364 will be estimated based on expenditure patterns and prices that are reflective of Gainesville (stratum X350). While both areas represent southern X-sized cities, their division into separate strata indicates that they do not share the same price-change-determining characteristics. Another unavoidable feature of the proposed PSU rotation schedule is that there will be a temporary increase in sampling error during 2008, resulting from the existence of nine PSU strata that contain no current CPI PSUs. Four of the nine new PSUs in those strata cannot be introduced efficiently until 2009. It will be prudent, therefore, to provide for a slight increase in item and outlet sample sizes to ensure that the resource-efficiency benefits of the new plan are not offset by any deterioration in estimation accuracy. Design simulations based on not having two PSUs in X300 and two PSUs in X400 revealed that the CPI could expect an increase in 6-month percent change standard error of about 1.95 percent, from 0.103 to 0.105, and a variance increase of 3.94 percent, from .01065 to .01107. This loss of efficiency due to not having the four “holes” in would require about a five percent increase in overall sample size.
Future PSU Sample Updates. After 2014, it is expected that another sample will begin to be introduced into the CPI on a rolling or continuous basis. It is likely that, following past practice, the 2010 Census will provide the population data underlying this next rotation. If an alternative source of data becomes available on a continuous basis, however, that source may become the basis for the post-2014 rotations. The Interagency sample redesign task force (the Consortium) has recently begun an examination of sample selection methodologies that would place less reliance on the decennial but would likely require full funding for the American Community Survey.
Relationship to CE Sample. The CE survey will introduce the new geographic sample in 2005. This means that the 2005-2006 expenditure weights used in the CPI-U and CPI-W beginning in 2008 will be based on the new sample. Final index values of the Chained CPI for All Urban Consumers, or C-CPI-U, are based on current expenditure data, so the CPI will need to develop a reverse mapping so that the current geographic sample will be reflected in the weights used in the final C-CPI-U indexes for 2005 through 2008. We are currently discussing whether, and how, the CE could adopt a rolling geographic rotation of its household samples in the future. A final decision on this will depend in part on the work currently underway through the Consortium.
Housing Revision
Background. The CPI housing sample is a sample of rental housing units that supports the two largest item categories in the index, Residential Rent and Owners’ Equivalent Rent. Revision of the housing sample, like geographic sample revision, historically has taken place at approximately ten-year intervals using Decennial Census data. The current housing samples were selected based on 1990 Census data. The samples were introduced with the January 1999 CPI, simultaneously with introduction of a new housing index estimation system and a new computer-assisted data collection methodology (CADC). Since these new samples were introduced in 1999, they have been augmented with “new construction” samples drawn using data on post-1990 construction permit data obtained from the Bureau of the Census. This process attempts to keep the sample representative over time.
Revising the CPI housing sample involves selecting a sample of rental units for each new CPI geographic area (PSU) as well as replacing rental samples in each continuing area. The process that was used to select rental units for the current CPI sample is described in the article by Frank Ptacek and Robert Baskin, “Revision of the CPI housing sample and estimators,” attached as Appendix II. First, each of the CPI geographic areas was divided into small pieces of geography, called segments. Data from the Decennial Census short form provided segment-level information from which expenditure estimates were developed for renters and owners. This information was then used to select statistically representative samples of segments.
The next step in the sampling process was to select a random sample of renters in each segment. The Census information on individual occupant households is not available to BLS, however, due to confidentiality rules. As a result, CPI staff had to develop a method for selecting rental unit samples without reference to the Census micro data on housing tenure.
BLS began by having field staff compile a listing of all the addresses in each selected segment by canvassing the neighborhood and physically recording each address. From each listing, a subset of housing units was selected for a personal visit process called screening, which determined whether the units were occupied by renters or owners. Units occupied by renters and otherwise eligible for inclusion in the sample were then initiated for ongoing collection of rent.
The process of listing and screening, which was conducted in 1997 and 1998, was very costly and in many cases it did not produce the desired result. In particular, in areas dominated by owners the number of renters found fell far below expectations. The specific causes for this low yield are not completely clear, although timing likely played an important role. The process began in 1997, long after the end of the 1990 reference period. The delay was unavoidable and stemmed from a number of factors—Decennial collection and processing, selection of PSUs, and delivery of Census data for segment creation and selection. During this time, however, household movements and overall shifts in tenure toward homeownership may have changed the 1990 data on which the sample was based. Other possible contributing factors are inaccuracies in Census data at the low level used by BLS and errors by BLS staff in the use of Census data.
Purchased Housing Lists. Because of these past problems, BLS decided to evaluate new approach for selecting renters for the CPI. The major feature of the new approach is the use of address lists and tenure information that have been developed—and that are updated regularly—in the private sector. The idea is to replace in whole or in part the listing and screening process used by BLS in the past. BLS conducted a study to identify providers of address lists and the characteristics of those lists. The first issue of concern is whether the coverage of the lists is adequate. A determination needs to be made on whether or not the lists have enough coverage at the geographic level required by BLS so that taking a sample from them would be representative of the universe of housing units.
BLS identified three vendors whose lists BLS wanted to evaluate further. The address lists for these vendors also contained qualitative information concerning housing tenure. The BLS contracted with Westat, a private statistical consulting firm, to evaluate the lists for two geographic areas, Richmond and Baltimore. The evaluation consisted of two parts. The first concerned the accuracy of the address lists in terms of the numbers of addresses relative to the Census. The second part was to conduct telephone and personal interviews to assess the accuracy of individual addresses on the lists, as well as to identify tenure and compare the results to the lists. One of the three vendors was eliminated from consideration prior to the evaluation since Westat determined that their list data did not contain up-to-date geographic information.
The Westat analysis with respect to coverage is attached as Appendix III. Westat compared the numbers of addresses from each of two vendors at the Census block level and compared those block-level counts to the 2000 Census file. They concluded that the gross coverage rate was relatively good for both vendors, but was better for the lists from one vendor, Marketing Systems Group (MSG). At the block level, the MSG information on tenure also appeared to be more accurate.
Appendix IV presents the second part of the Westat evaluation, concerned the accuracy of the reported tenure status. The MSG housing tenure information is in the form of codes. Each address has a code ranging from 0 to 9. A 0 is means there is a high probability that the unit is occupied by a renter. A 9 means there is a high probability the unit is occupied by an owner. Westat focused on units in low-renter areas and confined the analysis to units with codes of 0-8, because of the CPI’s over-riding need to use the lists to identify renters and because our previous work had verified that values of 9 reliably identified owners. Again, the MSG lists appeared to out-perform the lists from the other vendor. About 8 percent of the addresses on the lists could not be located and some percentage of the units had incorrect tenure status. The quality of the list information was sufficiently high to warrant further consideration, however.
Based on the results of the Richmond and Baltimore evaluations, BLS decided to broaden this line of analysis. Westat will be asked to obtain and analyze address list data for about 550 segments from a representative sample of that portion of the 1990 BLS sample that is continuing For Richmond and Baltimore we focused on areas with a renter percentage of 40 percent or less. This time the sample of segments will be representative of all types of segments.
The vendors for this analysis will be slightly different from the first study. Only MSG lists will be included; the other vendor will be dropped based on the findings by Westat mentioned above. In addition, address lists from another company, ADVO, will be included in the analysis. ADVO produces lists that contain addresses only. Their lists are purported to be the most complete lists of deliverable addresses available commercially.
The plan is to compare the two lists at the block group level both in terms of the numbers of addresses and the degree of matching or compatibility between them. The address lists produced from our listing operation will also be compared with the two private sector lists. We will be trying to find out if it is necessary to start with ADVO list in order to have a complete frame of addresses for sampling.
The other major goal of this study will be to compare the tenure information from MSG with data from the screening part of our housing survey. BLS is also in the process of obtaining summary level data for twenty-nine CPI pricing areas. The summary data will be counts of addresses for every zip code in each of the 29 PSUs. The listing will contain data for the MSG, ADVO and 2000 Census samples updated by a company named Claritas. The data will be used to evaluate the adequacy of the coverage of the lists over a large portion of the CPI area sample. Other Housing Issues. In addition to the use of private-sector data as an alternative to the Census for rental unit sampling, the CPI program is evaluating several other issues in designing the revised housing sample. One concerns the sampling and weighting of segments for the survey. In the last Revision, probability-proportional-to-size (pps) sampling was employed where s = expenditure. However, the 2000 Decennial Census does not provide block-level estimates of expenditure, nor will the Census Bureau’s American Community Survey (ACS). The BLS is considering ways to produce block-level expenditures from block-group or tract-level data and is also considering sampling at the block group level rather than the block level in the current sample design. Before that can be done, however, we must derive comparable rent and rental equivalence measures from Census rents and home values.
In the last Revision, home values were regressed against monthly rent values to obtain monthly rental equivalence values. This process appears to be flawed in two ways. First the model chosen imposed a "cap" on the rental equivalence values produced. Second, rented units likely have different values from owned units in the same segment. The BLS is currently exploring alternative ways of formulating monthly rental equivalence values: for example, a user cost approach. Another alternative is to fall back on pps sampling with size = number of units, as was done in the 1987 CPI Revision process. However, expenditures are the appropriate measure of size for the CPI, and the average level of expenditures per unit is different for owners and renters. According to the Census, for example, the ratio of renters to owners is roughly 1 to 2. According to weights derived from the CE Survey, the ratio of rental expenditure to owner expenditure is roughly 1 to 3. If we decide to use the 1987 approach we will need to modify the process to account for this difference.
Research also is ongoing to determine the housing unit attributes that most directly predict rent change. This could be used to justify the expanded use of “helper segments,” which are employed in the current housing sample to represent segments with extremely low numbers of renters to sample, or to provide guidance in selecting a stratification scheme for the new sample design.
Housing Rotation. A major goal of the CPI program is to update the rental unit samples within metropolitan areas on a faster than decennial basis. Most CPI outlet samples are rotated every four years, and starting in 2003 many item samples will be reselected midway between outlet sample rotations—that is, every two years. Housing rotation has remained on a once-a-decade frequency because of the absence of more timely sampling frames and data to use for sample weighting. Lists may provide an alternative sampling frame. Meanwhile, the ACS, if funded by Congress, could provide local area rent and housing value data on a continuous basis.
The BLS currently envisions a six-year rotation cycle for housing samples within PSUs, once the introduction of the new and continuing PSU housing samples is completed. This rotation cycle would correspond to the six collection panels that are now used in the CPI rental unit sample: each panel is now priced two times per year, and in the future we expect that we would reselect one of these panels each year in each PSU.
We are currently looking at the ACS as the source of updating our weights on an ongoing basis. If funding is obtained for FY 2004 and beyond, the ACS will provide estimates suitable to our needs in 2008.
If we use the ACS (or, for that matter, any other source of weighting data), a method of updating weights of the existing sample, particularly in non-self representing PSUs, will need to be devised. During 2003 the CPI program will test alternative sources and approaches for updating the weights. While the solution is not readily apparent at this time, we feel confident that an approach can be developed that will facilitate an ongoing rotation of the housing sample.
Possible Housing Rotation Plan. Chart 2 shows one potential plan for selecting housing samples in new and continuing index areas, and for subsequently rotating those samples on a six-year cycle.
As depicted above, the plan is very similar to the geographic plan presented in Chart 1. For the new geographic areas, we will begin with a complete new housing sample. Similar to the situation with C&S samples, it will several years to bring the new sample into the CPI in each area. During the first year of the new area housing process, we will procure the lists for the segments selected for the CPI for those areas. During the second year, addresses will be selected and field economists will screen the selected units and begin the initiation stage. After initiation, the units will be priced semi-annually for use in the CPI. As is the case with the C&S sample, a new group of geographic areas will begin the process each year. Also as is the C&S sample, some activity will be underway in each PSU.
For the continuing areas, a rotation process will be undertaken that again looks similar to the current C&S process, where one-fourth of the sample is updated in each PSU annually. For housing however, one panel (one-sixth) of the sample will be rotated each year. The rotation process will include the same set of steps as the process for establishing a sample for new geographic areas: identifying strata; procuring lists of housing units; screening for renters; initiating the new rent sample; and pricing and use in the CPI.
Conclusion
In the past year, considerable progress has been made on developing a plan for introducing the new sample of geographic areas and revising the housing sample. A new approach has been developed for adding new PSUs and dropping PSUs no longer needed. The approach spreads the workload over a period of about ten years and eliminates the resource spikes characteristic of previous revisions. At the same time those new PSUs that are needed to reflect the 2000 population distribution will be introduced first. Other PSUs that are merely replacing PSUs in the current sample will be introduced in the out years. Work has begun on the cost of the new approach, but final estimates will depend on the operational aspects of the new housing sample design.
Work on the new housing sample is also progressing. It now appears likely that it will be possible to make important improvements in the design of the new housing sample. In the first half of 2003 we expect to complete work that will result in a decision that address lists available in the private sector can be used both to list housing units and to streamline the screening process for finding renters from which to collect rents on an ongoing basis. It is anticipated that such a result will free up resources that can be used to improve the accuracy of the CPI. Equally important, it will provide a source of data that can be used to update the sample in each PSU on a regular basis. The other data needed to establish a regular updating process are data on expenditures for rent and rental equivalence. We plan to study ways in which the American Community Survey can be used to generate such data.
The current schedule calls for completing design work for the new housing sample by September 1, 2003. Detailed cost estimates for introducing the revised PSU sample and the new housing sample, including updating, will be completed by the end of the 2003 in time for the 2006 budget cycle. This schedule will provide enough time to permit introduction of new geographic areas in 2008, the date when CE expenditure data for the new PSU sample will be first used in compilation of the official CPI.
| Table 1. Population of Largest Metropolitan Area PSUs (2000 CBSA Definitions) | |||
| 
			 | 
			 | 
			 | 
			Percent of | 
| 
			 | 
			 | 
			 | 
			 | 
| Los Angeles | A419 | 12,365,627 | 4.8% | 
| Chicago | A207 | 9,172,106 | 3.6% | 
| New York | A109 | 8,008,278 | 3.1% | 
| New York suburbs | A110 | 7,718,773 | 3.0% | 
| Boston | A103 | 7,098,363 | 2.8% | 
| San Francisco | A422 | 7,039,362 | 2.7% | 
| New Jersey suburbs | A111 | 6,708,052 | 2.6% | 
| Philadelphia | A102 | 6,188,463 | 2.4% | 
| Detroit | A208 | 5,456,428 | 2.1% | 
| Dallas | A316 | 5,275,921 | 2.1% | 
| Washington DC | A312 | 5,027,797 | 2.0% | 
| Houston | A318 | 4,715,407 | 1.8% | 
| Atlanta | A319 | 4,201,220 | 1.6% | 
| Los Angeles suburbs | A420 | 4,008,018 | 1.6% | 
| Miami | A320 | 3,876,380 | 1.5% | 
| Seattle | A423 | 3,554,760 | 1.4% | 
| Phoenix | A429 | 3,251,876 | 1.3% | 
| Minneapolis | A211 | 3,136,198 | 1.2% | 
| Cleveland | A210 | 2,945,831 | 1.1% | 
| San Diego | A424 | 2,813,833 | 1.1% | 
| St. Louis | A209 | 2,693,603 | 1.0% | 
| Denver | A433 | 2,629,980 | 1.0% | 
| Baltimore | A313 | 2,552,994 | 1.0% | 
| Pittsburgh | A104 | 2,431,087 | 0.9% | 
| Tampa | A321 | 2,395,997 | 0.9% | 
| Portland | A425 | 2,275,095 | 0.9% | 
| Cincinnati | A213 | 1,979,202 | 0.8% | 
| Sacramento (New) | 
			 | 1,838,116 | 0.7% | 
| Kansas City | A214 | 1,776,062 | 0.7% | 
| San Antonio (New) | 
			 | 1,711,703 | 0.7% | 
| Milwaukee | A212 | 1,689,572 | 0.7% | 
| Honolulu | A426 | 876,156 | 0.3% | 
| Anchorage | A427 | 319,605 | 0.1% | 
| 
			 | 
			 | 
			 | 
			 | 
| Total of Above | 
			 | 137,731,865 | 53.6% | 
| 
			 | 
			 | 
			 | 
			 | 
| All Other CBSAs | 
			 | 119,278,302 | 46.4% | 
| 
			 | 
			 | 
			 | 
			 | 
| Total CPI-U Population | 
			 | 257,010,167 | 100.0% | 
Table 2. New PSUs by Index Area
| Publication Area | PSU Name | 2008 INDEX AREA CODE | 2000 Decennial Population | |
| 1 | Philadelphia-Wilmington-Atlantic City, PA-DE-NJ | A102 | 6,188,463 | 
				 | 
| 2 | Boston-Brockton-Nashua, MA-NH-ME-CT | A103 | 7,098,363 | 
				 | 
| 3 | Pittsburgh, PA | A104 | 2,431,087 | 
				 | 
| 4 | New York, NY | A109 | 8,008,278 | 
				 | 
| 5 | New York-Connecticut suburbs | A110 | 7,718,773 | 
				 | 
| 6 | New Jersey-Pennsylvania suburbs | A111 | 6,661,750 | 
				 | 
| 7 | Chicago-Gary-Kenosha, IL-IN-WI | A207 | 9,172,106 | 
				 | 
| 8 | Detroit-Ann Arbor-Flint, MI | A208 | 5,456,428 | 
				 | 
| 9 | St. Louis, MO-IL | A209 | 2,693,603 | 
				 | 
| 10 | Cleveland-Akron, OH | A210 | 2,945,831 | 
				 | 
| 11 | Minneapolis-St.Paul, MN-WI | A211 | 3,136,198 | 
				 | 
| 12 | Washington, DC-MD-VA-WV | A312 | 5,027,797 | 
				 | 
| 13 | Baltimore, MD | A313 | 2,552,994 | 
				 | 
| 14 | Dallas-Fort Worth, TX | A316 | 5,275,921 | 
				 | 
| 15 | Houston-Galveston-Brazoria, TX | A318 | 4,715,407 | 
				 | 
| 16 | Atlanta, GA | A319 | 4,201,220 | 
				 | 
| 17 | Miami-Fort Lauderdale, FL | A320 | 3,876,380 | 
				 | 
| 18 | Tampa-St. Petersburg-Clearwater, FL | A321 | 2,395,997 | 
				 | 
| 19 | Los Angeles County, CA | A419 | 12,365,627 | 
				 | 
| 20 | Los Angeles suburbs, CA | A420 | 4,008,018 | 
				 | 
| 21 | San Francisco, CA | A422 | 7,039,362 | 
				 | 
| 22 | Seattle-Tacoma-Bremerton, WA | A423 | 3,554,760 | 
				 | 
| 23 | San Diego, CA | A424 | 2,813,833 | 
				 | 
| 24 | Portland-Salem, OR-WA | A425 | 2,275,095 | 
				 | 
| 25 | Honolulu, HI | A426 | 876,156 | 
				 | 
| 26 | Anchorage, AK | A427 | 319,605 | 
				 | 
| 27 | Phoenix-Mesa, AZ | A429 | 3,251,876 | 
				 | 
| 28 | Denver-Boulder-Greeley, CO | A433 | 2,629,980 | 
				 | 
| 29 | Northeast X's | 
				 | 10,891,754 | 
				 | 
| 
				 | Providence, RI | X100 | 
				 | 
				 | 
| 
				 | Reading, PA | X100 | 
				 | 
				 | 
| 
				 | Syracuse, NY | X100 | 
				 | 
				 | 
| 
				 | Sharon, PA | X100 | 
				 | 
				 | 
| 30 | North Central X's | 
				 | 24,774,378 | 
				 | 
| 
				 | South Bend, IN | X200 | 
				 | 
				 | 
| 
				 | Rochester, MN | X200 | 
				 | 
				 | 
| 
				 | Springfield, MO | X200 | 
				 | 
				 | 
| 
				 | Madison, WI | X200 | 
				 | 
				 | 
| 
				 | Milwaukee-Racine, WI | X200 | 
				 | 
				 | 
| 
				 | Cincinnati-Hamilton, OH-KY-IN | X200 | 
				 | 
				 | 
| 
				 | Decatur, IL | X200 | 
				 | 
				 | 
| 
				 | Lincoln, NE | X200 | 
				 | 
				 | 
| 
				 | Elkhart-Goshen, IN | X200 | 
				 | 
				 | 
| 
				 | Kansas City, MO-KS | X200 | 
				 | 
				 | 
| 
				 | Saginaw-BayCity-Midland, MI | X200 | 
				 | 
				 | 
| 
				 | Youngstown-Warren, OH | X200 | 
				 | 
				 | 
| 31 | South X's | 
				 | 47,517,342 | 
				 | 
| 
				 | Tulsa, OK | X300 | 
				 | 
				 | 
| 
				 | Roanoke, VA | X300 | 
				 | 
				 | 
| 
				 | Louisville, KY | X300 | 
				 | 
				 | 
| 
				 | Clarksville, TN | X300 | 
				 | 
				 | 
| 
				 | New Orleans, LA | X300 | 
				 | 
				 | 
| 
				 | Knoxville, TN | X300 | 
				 | 
				 | 
| 
				 | Tuscaloosa, AL | X300 | 
				 | 
				 | 
| 
				 | Fort Hood, TX | X300 | 
				 | 
				 | 
| 
				 | Jacksonville, FL | X300 | 
				 | 
				 | 
| 
				 | El Paso, TX | X300 | 
				 | 
				 | 
| 
				 | SanAntonio, TX | X300 | 
				 | 
				 | 
| 
				 | BatonRouge, LA | X300 | 
				 | 
				 | 
| 
				 | Greenville-Spartanburg-Anderson, SC | X300 | 
				 | 
				 | 
| 
				 | Norfolk-Virginia Beach-Newport News, VA-NC | X300 | 
				 | 
				 | 
| 
				 | Ocala, FL | X300 | 
				 | 
				 | 
| 
				 | FortMyers-CapeCoral, FL | X300 | 
				 | 
				 | 
| 
				 | Florence, SC | X300 | 
				 | 
				 | 
| 
				 | Birmingham, AL | X300 | 
				 | 
				 | 
| 32 | West X's | 
				 | 15,944,435 | 
				 | 
| 
				 | Sacramento, CA | X499 | 
				 | 
				 | 
| 
				 | BoiseCity, ID | X499 | 
				 | 
				 | 
| 
				 | LasVegas, NV-AZ | X499 | 
				 | 
				 | 
| 
				 | Bellingham, WA | X499 | 
				 | 
				 | 
| 
				 | Fresno, CA | X499 | 
				 | 
				 | 
| 
				 | Merced, CA | X499 | 
				 | 
				 | 
| 
				 | Provo-Orem, UT | X499 | 
				 | 
				 | 
| 
				 | Yuma, AZ | X499 | 
				 | 
				 | 
| 33 | Northeast Y's | 
				 | 2,942,759 | 
				 | 
| 
				 | Augusta, ME | Y100 | 
				 | 
				 | 
| 
				 | Ithaca, NY | Y100 | 
				 | 
				 | 
| 34 | North Central Y's | 
				 | 8,717,815 | 
				 | 
| 
				 | Whitewater, WI | Y200 | 
				 | 
				 | 
| 
				 | Bellefontaine, OH | Y200 | 
				 | 
				 | 
| 
				 | Brookings-Madison, SD | Y200 | 
				 | 
				 | 
| 
				 | Macomb, IL | Y200 | 
				 | 
				 | 
| 35 | South Y's | 
				 | 12,322,746 | 
				 | 
| 
				 | Valdosta, GA | Y300 | 
				 | 
				 | 
| 
				 | Henderson, NC | Y300 | 
				 | 
				 | 
| 
				 | Eagle Pass, TX | Y300 | 
				 | 
				 | 
| 
				 | Picayune, MS | Y300 | 
				 | 
				 | 
| 
				 | Winchester, VA | Y300 | 
				 | 
				 | 
| 
				 | Greenwood, MS | Y300 | 
				 | 
				 | 
| 36 | West Y's | 
				 | 5,274,554 | 
				 | 
| 
				 | Newport, OR | Y400 | 
				 | 
				 | 
| 
				 | Bend-Redmond, OR | Y400 | 
				 | 
				 | 
| 
				 | El Centro, CA | Y400 | 
				 | 
				 | 
| 
				 | Prescott, AZ | Y400 | 
				 | 
				 | 
Note: PSUs in bold text are new selections; non-bolded PSUs are continuing.
Table 3. New Geographic Areas
| Process | New Process | Current Process | 
| TPOPS at CENSUS 
 | Years 1 and 2 Quarterly Survey conducted by Census; 12.5% of all POPS categories sampled in a given quarter. 2 years (8 quarters) required to complete a full TPOPS sample. | 
			 Quarterly Survey conducted by Census; 6.25% of all POPS categories sampled in a given quarter. 4 years (16 quarters) required to complete a full TPOPS sample | 
| TPOPS Processing at BLS | Years 2 and 3 Quarterly data will be stockpiled until the full TPOPS sample is received. All other processing is identical to current process | 
			 Data are stockpiled until 2 quarters are available. Processing includes outlier review and address coding and collapsing and outlet/Item sample selection | 
| Pre-Initiation processing at BLS (Field) | Year 3 Outlet/Item Samples will be processed for the entire geographic area at once. Otherwise the process is identical to the current process. | 
			 Field activities include parsing of samples into individual EA assignments, collapsing to existing outlets and refining address and contact information. | 
| Initiation of Sample in Field | Year 4 The entire outlet/Item sample will be initiated at the same time | 
			 12.5% of the geographic area's outlet/item sample is rotated each half-year. (Item-Outlet Rotation) In addition, 12.5% of area's item samples are updated in the existing outlets. (Within outlet rotation) | 
| Pricing | Year 5 Same as current process. | 
			 The entire Outlet/Item sample is priced according to the pricing schedule assignment (either monthly or bimonthly) | 
Table 4. Dropping Geographic Areas
| Process | New Process | Current Process | 
| TPOPS at CENSUS | Year 1 All TPOPS data collection will cease at Census 48 months (16 quarters) prior to the date at which a geographic area will drop from the official CPI | 
			 Quarterly Survey conducted by Census; 6.25% of all POPS categories sampled in a given quarter. 4-years required to complete a full TPOPS sample | 
| TPOPS Processing at BLS | Years 1 and 2 Data collected prior to the cessation of data collection at Census will be processed per the existing procedures | 
			 Data are stockpiled until 2 quarters are available. Processing includes outlier review and address coding and collapsing and outlet/Item sample selection | 
| Pre-Initiation processing at BLS (Field) | Years 2 and 3 Data collected prior to the cessation of data collection at Census will be processed per the existing procedures | 
			 Field activities include parsing of samples into individual EA assignments, collapsing to existing outlets and refining address and contact information. | 
| Initiation of Sample in Field | Years 2 and 3 Outlet/Item samples selected from TPOPS Data collected prior to the cessation of data collection at Census will be initiated in the field per the existing procedures | 
			 12% of the geographic area's outlet/item sample is rotated each half-year | 
| Pricing | Year 5 All price data collection will cease in January of the drop year. | 
			 The entire Outlet/Item sample is priced according to the pricing schedule assignment (either monthly or bimonthly) | 
Table 5. New Areas by Priority of Introduction
| 
			 | Added Areas | Dropping Areas | |
| Group 1 | Geographic holes | Strata Duplicates | |
| 
			 | Y102 | Augusta ME | Springfield, MA | 
| 
			 | Y104 | Ithaca NY | Buffalo, NY | 
| 
			 | Y430 | El Centro CA | Burlington, VT | 
| 
			 | Y432 | Prescott AZ | Johnstown, PA | 
| 
			 | Y206 | Whitewater WI | Albany, GA | 
| 
			 | X342 | Louisville KY | Brownsville, TX | 
| 
			 | X354 | New Orleans LA | Amarillo, TX | 
| 
			 | X476 | Bellingham WA | Melbourne, FL | 
| 
			 | X480 | Merced CA | Faribault, MN | 
| 
			 | 
			 | 
			 | Statesboro, GA | 
| Group 2 | 150-mile "similar" Match | 
			 | |
| 
			 | X348 | Clarksville TN-KY | Evansville, IN | 
| 
			 | X230 | Springfield, MO | Chanute, KS | 
| 
			 | Y316 | Henderson NC | Richmond, VA | 
| 
			 | Y324 | Greenwood MS | Pine Bluff, AR | 
| 
			 | X106 | Providence RI | Hartford | 
| Group 3 | Index-Area Matches | 
			 | |
| 
			 | X364 | Jacksonville FL | Gainesville, FL | 
| 
			 | X362 | Fort Hood TX | Beaumont, TX | 
| 
			 | X478 | Fresno CA | Modesto, CA | 
| Group 4 | Strata Matches | 
			 | |
| 
			 | Y208 | Bellefontaine OH | Dayton, OH | 
| 
			 | Y322 | Winchester VA | Morristown, TN | 
| 
			 | Y318 | Eagle Pass TX | Lafayette, LA | 
| 
			 | X336 | Roanoke VA | Raleigh, NC | 
| 
			 | X360 | Tuscaloosa AL | Florence, AL | 
| 
			 | X214 | South Bend IN | Columbus, OH | 
| 
			 | Y314 | Valdosta GA | Arcadia, FL | 
| 
			 | Y212 | Macomb IL | Mt. Vernon, IL | 
| 
			 | X334 | Tulsa OK | Oklahoma City, OK | 
| 
			 | X470 | Sacramento CA | Chico, CA | 
| 
			 | X358 | Knoxville TN | Chattanooga, TN | 
| 
			 | X216 | Rochester MN | Wausau, WI | 
| 
			 | X366 | El Paso TX | Midland, TX | 
| 
			 | Y426 | Newport OR | Pullman, WA | 
| Chart 1. PSU Rotation PLAN | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | ||
| 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | |
| Group | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | |
| Group 1 Adding 5 PSUs | Census TPOP collection | 
 | processing | INITIATION | Pricing | 
 | 
 | 
 | 
 | 
 | 
 | |
| Group 1 Dropping 10 PSUs | No TPOPS collection | no processing or initiation | 
 | Drop PSUs | 
 | 
 | 
 | 
 | 
 | 
 | ||
| 
			 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | |
| Group 2 Adding 4 PSUs | 
 | Census TPOP collection | 
 | processing | INITIATION | Pricing | 
 | 
 | 
 | 
 | 
 | |
| 
			 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | |
| 
			 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | |
| Group 3 Adding 5 PSUs | 
 | 
 | Census TPOP collection | 
 | processing | INITIATION | Pricing | 
 | 
 | 
 | 
 | |
| Group 3 Dropping 5 PSUs | 
 | 
 | No TPOPS collection | no processing or initiation | 
 | Drop PSUs | 
 | 
 | 
 | 
 | ||
| 
			 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | |
| Group 4 Adding 5 PSUs | 
 | 
 | 
 | Census TPOP collection | 
 | processing | INITIATION | Pricing | 
 | 
 | 
 | |
| Group 4 Dropping 5 PSUs | 
 | 
 | 
 | No TPOPS collection | no processing or initiation | 
 | Drop PSUs | 
 | 
 | 
 | ||
| 
			 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | |
| Group 5 Adding 4 PSUs | 
 | 
 | 
 | 
 | Census TPOP collection | 
 | processing | INITIATION | Pricing | 
 | 
 | |
| Group 5 Dropping 4 PSUs | 
 | 
 | 
 | 
 | No TPOPS collection | no processing or initiation | 
 | Drop PSUs | 
 | 
 | ||
| 
			 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | |
| Group 6 Adding 4 PSUs | 
 | 
 | 
 | 
 | 
 | Census TPOP collection | 
 | processing | INITIATION | Pricing | 
 | |
| Group 6 Dropping 4 PSUs | 
 | 
 | 
 | 
 | 
 | No TPOPS collection | no processing or initiation | 
 | Drop PSUs | 
 | ||
| 
			 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | 
 | |
| Group 7 Adding 4 PSUs | 
 | 
 | 
 | 
 | 
 | 
 | Census TPOP collection | 
 | processing | INITIATION | Pricing | |
| Group 7 Dropping 4 PSUs | 
 | 
 | 
 | 
 | 
 | 
 | No TPOPS collection | no processing or initiation | 
 | Drop PSUs | ||
 
  
List of Appendices
Johnson, William H., Owen J. Shoemaker, and Yeon W. Rhee (2002), “Redesigning the Consumer Price Index Area Sample,” Proceedings of the Section on Government Statistics, American Statistical Association.
Ptacek, Frank, and Robert M. Baskin, (1996), “Revision of the CPI housing sample and estimators,” Monthly Labor Review, December.
Westat, (September 25, 2002), Evaluation of Vendor Lists Using Census Data.
Westat, (February 26, 2003), Evaluation of Vendor Lists: Comparison of Vendor and Interview Data.
Redesigning the Consumer Price Index Area Sample
William H. Johnson, Owen J. Shoemaker, and Yeon W. Rhee
U. S. Bureau of Labor Statistics, 2 Mass Ave NE, Room 3655, Washington, DC 20212
KEY WORDS: multistage, stratified, controlled selection, overlap
	
Any opinions expressed in this paper are those of the authors and do not constitute policy of the Bureau of Labor Statistics.
	
This paper describes the PSU selection process for the next CPI Revision. The U. S. Consumer Price Index (CPI) employs a multistage sample design that has been revised every ten years. The first stage consists of selecting primary sampling units (PSUs) which are formed from Metropolitan or Micropolitan Core Based Statistical Areas (CBSAs) based on preliminary definitions by the Office of Management and Budget.
	
The PSU selection process for the next CPI Revision is quite similar to the process of selecting the sample for the 1998 CPI Revision (see Williams et al). The biggest difference has been the use of variance models of six-month index change for the Commodities and Services part of the CPI-U in determining the set of certainty PSUs and the distribution of non-certainty PSUs across Census region by size class combinations. Alternative methodologies for stratifying PSUs prior to selection were considered and work on modeling CPI-U change since 1992 influenced the selection of stratifying variables. All of the programs involved in the work on selecting the 1998 CPI Revision PSU sample were updated or rewritten.
	
	
The process of selecting the PSU sample involves six steps:
Determine the PSUs selected with certainty
Determine the number of non-certainty PSUs and their distribution across regions
Stratify the non-certainty PSUs
Use Keyfitzing to improve expected overlap
Use controlled selection to generate a set of sampling patterns and weights
Select a sample of PSUs
	
	
The first step in the process of selecting the PSU sample is to determine which PSUs are certainty PSUs. In order to determine the certainty PSUs it was necessary to determine the possible certainty PSUs. The most likely certainty PSUs are those which are already certainty PSUs in the existing CPI area sample. However with the shift to CBSA based definitions it became necessary to determine what the new definitions of the current certainty PSUs are likely to be. The certainty cities were mapped along with preliminary CBSA definitions. It was assumed that a CBSA would either be entirely included or entirely excluded from these areas. In cases where a CBSA was partially contained in a current certainty PSU, the probability of the outside counties being in the final definition given to BLS by the Census Bureau was examined as part of the assessment of whether to include or exclude the CBSA.
	
After the expected definitions of the current certainty cities were decided, the remaining possible certainty cities were the remaining individual metropolitan CBSAs. The largest metropolitan CBSAs outside of the current certainty cities were determined and considered for inclusion in the list of new certainty PSUs.
	
Next it was necessary to determine the criteria for PSUs to be selected with certainty. There were several possible options. The entire CPI-U population to be represented is the total population contained in all metropolitan and micropolitan CBSAs. This population is 257,010,167.
	
The options considered included:
1,500,000 – the population cutoff used previously for determining certainty cities
1,680,000 – a population cutoff that wouldn’t cause the loss of any current certainty cities
1,800,000 – a population cutoff which was considered for use previously
2,141,751 – the population cutoff obtained by using 120 half sample equivalents (HSEs) to represent the total population of 257,010,167
2,570102 – the population cutoff obtained by using 100 HSEs to represent the population of 257,010,167
4,283,501 – the population cutoff obtained by using 60 HSEs to represent the population of 257,010,167
	
A half sample equivalent is a unit of sample size. Each certainty city will receive at least two HSEs and each selected non-certainty city will receive one HSE.
	
The option of using 1,500,000 as a population cutoff for determining certainty cities was dropped as it would add too many certainty cities to be affordable. Each certainty city must have enough sample for their individual city CPIs to be publishable on at least a semi-annual basis. This makes the certainty cities much more costly than non-certainty cities.
	
The decision as to which set of cities should be selected with certainty required information so one could compare the various possible sets of certainty PSUs. In order to compare the various options, the model used for optimizing the CPI Commodities and Services sample was generalized. (see Leaver et al) This model attempts to select outlet and item sample sizes for groups of PSUs which will produce the lowest variance given the available budget for travel and data collection. The model was generalized by allowing the number of non-certainty PSUs in each non-self representing index area to be a variable that the optimization program could optimize over. This created the need for an additional constraint though as the number of non-certainty PSUs was determined by the total number of HSEs minus the number of HSEs used by the certainty PSUs.
	
In addition, the relative importances for each index area and group of items had to be recalculated for each scenario. The populations used for calculating the population relative importances were from the 2000 Census. The cost weights used for calculating the relative importance of groups of items were from the 1999 Consumer Expenditure survey. An index area as used in this paper is either a certainty city or a Census region by size class combination. There are four Census regions: Northeast, Midwest, South, and West. There are two size classes corresponding to metropolitan and micropolitan CBSAs. Note that some micropolitan CBSAs are part of the current certainty cities and thus their population should be included with the certainty city and not with the non-self representing index area covering micropolitan CBSAs in the Census region in which the PSU resides..
	
Some additional options were explored. Even though we currently allocate one HSE to each non-certainty PSU, there was interest in what would happen if two HSEs were allocated to each non-certainty PSU. It would be expected to roughly halve the number of non-certainty PSUs, but the effect on variance was less obvious. Also, there was concern that the grossly uneven relative importances of the index areas may have a negative impact on sample allocation and on the variance of the all U.S. – all items CPI-U. Thus an option was explored where the largest Census region, the South, was broken apart using Census divisions. The South was divided into two index areas, one being the South Atlantic division and the other index area being composed of the East South Central and West South Central divisions. New variance components for the optimization model were calculated for the new index areas.
	
The optimization model yielded a result with non-integer numbers of PSUs in each non-self representing index area. These values were rounded to even integers in such a way that the total number of HSEs added up correctly. The optimization model was then rerun using these fixed numbers of PSUs to provide results that could be compared with results from other scenarios. The information was used in determining what the set of certainty PSUs would be.
	
The list of certainty PSUs is not yet public information and can’t be included in this paper. Some of the results that were found can be discussed. In comparing the allocation of one vs. two HSEs to each non-certainty PSU, it was found that allocating two HSEs to each non-certainty PSU increased the modeled standard error of six month CPI change for C&S by an average of 13.6% across the scenarios. This was primarily due to the large contribution of the between PSU component of variance in non-self representing index areas. This was surprising given that the PSU components of variance are so small compared to other components of variance. However the much smaller divisor of the PSU component of variance as compared to other components allowed it to have a greater contribution to the total variance. In all cases the PSU component of variance ended up contributing more than 50% of the total variance for all of the index areas representing metropolitan CBSAs.
	
Dividing the South based on Census divisions also ended up increasing the total variance. It appears based upon the model used that it is preferable to have fewer and larger index areas with larger PSU samples than to have a larger number of smaller index areas. This is again a result of the large contribution of the between PSU component of variance of non-self representing index areas.
	
Once the decision was made on a set of certainty PSUs, the number of PSUs in each non-self representing index area was also determined based on the output of the optimization program from the chosen scenario. The chosen design did shift towards having more PSUs in the West region and slightly fewer elsewhere. There are more of what are called C-size PSUs as the population they cover has grown greatly in relative importance between 1990 and 2000. For the 1998 CPI Revision sample, the C PSUs were the urban part of areas outside of metropolitan statistical areas. The C PSUs now represent the micropolitan CBSA population, excluding those CBSAs which are part of a certainty PSU. Having the CPI-U population be the total population in CBSAs resulted in an increase in the total percent of the U.S. population covered by the CPI-U.
	
	
Non-certainty PSUs are grouped together into strata and one PSU is selected from each stratum. (see Dippo et al) It is desirable that the PSUs within a stratum be homogeneous. The first task was to determine by what measure the PSUs should be homogeneous.
	
It the early 1990’s, work was done on modeling CPI-U change for certainty PSUs by variables we had available from Census as well as geographic variables. None of these models were especially promising. However, for the 1998 CPI Revision, a four variable model using normalized latitude, normalized longitude, normalized latitude squared, and percent urban was chosen for use in three out of four Census regions and a model consisting of seven Census variables was chosen for the South region. Once a model was chosen, the strata were formed so as to be as homogeneous as possible with respect to these variables, subject to the restriction that strata should have roughly equal population. (see Williams et al)
	
This research was updated by examining the predictive power of these models for more recent time periods as well as examining their value in modeling CPI-U change for non-self representing PSUs and for modeling changes in the housing index. The chosen models have performed worse since they were originally researched and no other really good models have been found. Thus the chosen model this time was simply the four variable model from before with normalized longitude squared included for the purpose of symmetry.
	
Given the relatively weak predictive power of the chosen model, two other options were also examined: Using no stratification and a purely geographic stratification.
	
With no stratification, the PSUs would be drawn from each region by size class without replacement and with probability proportional to expenditure. This was done for simulation purposes with SAS PROC SURVEY SELECT.
	
The purely geographic
	stratification was based on Peano ordering the PSUs based on the
	median latitude and longitude of the centroids of the counties
	composing the PSUs.  Examples of Peano curves can be found at
	http://www.contrib.andrew.cmu.edu/~malin/java/PeanoHilbert.html. 
	The Peano curve for a 
	 grid
	is based on a recursive N-shaped pattern.  In each region by size
	class combination, the points representing the PSUs were placed on
	a
grid
	is based on a recursive N-shaped pattern.  In each region by size
	class combination, the points representing the PSUs were placed on
	a grid.
	 The calculation of an ordering value is based on interleaving the
	digits of the binary representations of the coordinates of the PSUs.
	 Once the PSUs are ordered, the ordered list of PSUs in each region
	by size class is cut into the appropriate number of strata.  The cut
	points are made so that the population in each stratum is roughly
	the same.  It was also attempted to make the cut points  such that
	when there was a large jump in the calculated ordering value between
	two points then the two points would fall in different strata.  This
	purely geographic stratification ended up producing strata which
	looked like rectangular stripes.
grid.
	 The calculation of an ordering value is based on interleaving the
	digits of the binary representations of the coordinates of the PSUs.
	 Once the PSUs are ordered, the ordered list of PSUs in each region
	by size class is cut into the appropriate number of strata.  The cut
	points are made so that the population in each stratum is roughly
	the same.  It was also attempted to make the cut points  such that
	when there was a large jump in the calculated ordering value between
	two points then the two points would fall in different strata.  This
	purely geographic stratification ended up producing strata which
	looked like rectangular stripes.
	
In order to cluster PSUs to be similar according to the five variable model discussed above, a program using a hill climbing algorithm by Friedman and Rubin was used. This program first rescales all of the variables so that they are of roughly equal importance. It does this by calculating an unstratified population weighted sum of squares for each of the variables and then multiplies the values of the variables by ten divided by the square root of the sum of squares:
 where
	where
 is
	the value of the ith variable for the jth PSU
is
	the value of the ith variable for the jth PSU
 is
	the population of the jth PSU
is
	the population of the jth PSU
 
 
	
The program then attempts to minimize the stratified total sums of squares
 
given the total number of strata, which is an input to the program. This program repeats the minimization procedure to form strata in each Census region by size class. The program is constrained on the size of the strata, and these constraints were estimated using the minimum and maximum stratum populations from the geographic stratification and adjusting them by 10%.
	
	
Given our budgetary limitations, it is generally desirable to keep as many of our current PSUs in the next sample as possible.
The first step was to determine what is meant by an overlap PSU. Given the considerable changes in definitions of the PSUs it is possible that part of a PSU might currently be in the CPI sample but not other parts. The preliminary definition was that 30% of the counties or 30% of the 2000 population of a PSU currently be covered by the CPI sample. This was complicated by the fact that counties are composed of Minor Civil Divisions (MCDs) in the Northeast region. Current CPI PSUs in the Northeast are defined at the MCD level, while the new PSUs are defined at the county level. It was decided that a county composed of MCDs was overlap if at least 5% of its 2000 population was overlap. A PSU composed of MCDs is considered overlap as long as 30% of the counties are overlap and at least one of those counties has at least 30% of its 2000 population being overlap based on MCDs.
	
The inherited Keyfitzing procedure attempts to increase the likelihood of selecting PSUs which are overlap, or which have a greater relative importance in 2000 than in 1990. Some changes in the program had to be made due to the massive redefinition of PSUs. The Keyfitzing procedure operates at the level of the intersection of a new stratum with a stratum for the 1998 CPI Revision PSU sample. Due to redefinitions, there are many cases where only part of a new PSU lies within one of these intersections. Thus the PSUs were broken in pieces for the purpose of Keyfitzing and then the pieces were added together to give the total new probability of selection of a PSU.
	
The procedure works as follows:
For each Region X City Size X New Stratumi X Old Stratumj calculate the new probability of the PSU k or the part of PSU k being selected:
 
where 
	 is
	the probability of selection of the intersection of PSU k with new
	stratum i and old stratum j.
is
	the probability of selection of the intersection of PSU k with new
	stratum i and old stratum j.
	
There are several possible cases:
a) The intersection is empty so there are no PSUs to consider
b) The intersection is a single
	PSU k.  Then the Keyfitzed probability is 
	 
c) There is no PSU in the intersection which was selected in the old sample:
For each PSU k in the intersection assign the Keyfitz probability as
If 
	 then
	then 
	 
If 
	 then
	then 
	 
Here 
	 is
	the probability of selection of PSU k intersected with new stratum i
	and old stratum j based on 1990 populations.
is
	the probability of selection of PSU k intersected with new stratum i
	and old stratum j based on 1990 populations.
d) A PSU s was selected in the old sample and at least partially resides in the intersection:
If 
	 then
	then 
	 
 for all other PSUs k within the intersection.
	for all other PSUs k within the intersection.
Here the new and old probabilities are based on the old PSU definition for PSU s intersected with new stratum i and old stratum j. The Keyfitz probability for new PSUs within the intersection of new stratum i and old stratum j is calculated by determining the percentage of 2000 population of the old PSU s resides within each of the new PSUs.
e) A PSU s was selected in the old sample and at least partially resides in the intersection:
If 
	 then
	then 
	 
If k is a PSU in the intersection other than s, then
if then
	then 
	 
else if 
	 then
	then
 
	
After this procedure has been done for each intersection of new and old strata then the PSUs are reaggregated and their total probabilities of selection are determined.
	
The selection of a stratification was made on the basis of the total expected number of overlap PSUs. It turned out that the stratifications with the highest overlap were from the clustering procedure using normalized latitude, normalized longitude, normalized latitude squared, normalized longitude squared, and percent of population which is urban. As the clustering procedure had been run multiple times, there was usually more than one stratification to choose from in each Census region by size class. It turned out that having a lower total sums of squares did not equate with having higher expected overlap.
	
The following table summarizes the expected number of overlap PSUs for the various options examined, both pre- and post-Keyfitzing:
	
	
	
	
	
| Region – City size | #overlap PSUs No stratification | #overlap PSUs Peano ordering | #overlap PSUs clustering program | 
| X100 | 1.07 | 1.01 | 0.98 | 
| X200 | 5.00 | 4.56 | 4.30 | 
| X300 | 5.06 | 5.01 | 4.96 | 
| X499 | 1.45 | 1.37 | 1.40 | 
| C100 | 0.10 | 0.05 | 0.05 | 
| C200 | 0.24 | 0.23 | 0.22 | 
| C300 | 0.27 | 0.30 | 0.30 | 
| C400 | 0.35 | 0.34 | 0.34 | 
| X000 | 12.58 | 11.95 | 11.64 | 
	
| Region – City size | #overlap PSUs Peano ordering after Keyfitzing | #overlap PSUs clustering program after Keyfitzing | 
| X100 | 2.43 | 2.82 | 
| X200 | 6.56 | 8.40 | 
| X300 | 6.44 | 7.59 | 
| X499 | 3.10 | 4.46 | 
| C100 | 0.05 | 0.05 | 
| C200 | 0.23 | 0.22 | 
| C300 | 0.30 | 0.30 | 
| C400 | 0.34 | 0.34 | 
| X000 | 18.53 | 23.27 | 
	
	
	
It is hoped that the number of overlap PSUs selected is not much less than the expected number of overlap PSUs. Thus a procedure called controlled selection was used. A program used to do the controlled selection for the 1998 CPI Revision PSU sample could not be successfully compiled and run in our current computing environment. An alternative called PC Consel (see Lin) was investigated. We had some success with this program, however in the South region it would not give a solution as apparently no exact solution to the controlled selection problem exists. Thus a new SAS IML program was written in order to handle the controlled selection problem.
	
The following is a description of the controlled selection problem:
	
Create a 3-dimensional grid of
	stratum x state x overlap status.  Sum the probabilities of
	selection of the PSUs in each cell.  A pattern describes an entire
	sample.  In each cell it has either a zero (select zero PSUs from
	this cell) or one (select one PSU from this cell).  The controlled
	selection problem is to find a set of patterns 
	 with probabilities of selection
	with probabilities of selection 
	 such
	that
such
	that 
	 ,
	where
,
	where 
	 is
	the value of zero or one for the ith pattern for stratum x, state y,
	and overlap status z and
is
	the value of zero or one for the ith pattern for stratum x, state y,
	and overlap status z and  
	 is the sum of probabilities of selection of PSUs in the cell for
	stratum x, state y, and overlap status z.
	is the sum of probabilities of selection of PSUs in the cell for
	stratum x, state y, and overlap status z.
	
In addition there are constraints
	with respect to the number of PSUs selected per state and per
	overlap status.  These constraints are imposed on each individual
	pattern.  Let 
	 be
	the total probability of PSUs in state i.  Let
be
	the total probability of PSUs in state i.  Let 
	 be
	the integer part of
be
	the integer part of 
	 .
	 Then each pattern must contain either
.
	 Then each pattern must contain either 
	 or
	or 
	 PSUs in state i.  The sum of probabilities of patterns having
	PSUs in state i.  The sum of probabilities of patterns having 
	 PSUs is
	PSUs is 
	 and the sum of probabilities of patterns having
	and the sum of probabilities of patterns having 
	 PSUs is
	PSUs is 
	 .
.
	
Let 
	 be
	the sum of probabilities of selection of overlap PSUs across all
	strata and states.
be
	the sum of probabilities of selection of overlap PSUs across all
	strata and states.
Let 
	 be
	the integer part of O.  Then each pattern must select
be
	the integer part of O.  Then each pattern must select 
	 or
	or 
	 overlap
	PSUs.  The sum of probabilities of patterns with
overlap
	PSUs.  The sum of probabilities of patterns with 
	 overlap PSUs is
	overlap PSUs is 
	 and the sum of probabilities or patterns with
	and the sum of probabilities or patterns with 
	 overlap PSUs is
	overlap PSUs is 
	 .
.
	
The above constraints on the set of patterns comprises the controlled selection problem. Once this problem is solved, a pattern is selected based on the probabilities of the patterns. If there is more than one PSU corresponding to a cell with a value of one, then a single PSU is selected with probability proportional to its probability of selection within its stratum.
	
Note that there isn’t
	necessarily a solution for the controlled selection problem.  If
	there is no exact solution, then it is desirable to have a partial
	set of patterns 
	 which have a sum of probabilities as close to one as possible.
	which have a sum of probabilities as close to one as possible.
	
The program randomly generates patterns by selecting a value of zero or one in each cell of the pattern using the probability in that cell. The program then verifies that the pattern meets the state and overlap constraints. If the pattern violates any constraints then the pattern is discarded and a new pattern is generated. If the pattern meets the state and overlap constraints then the pattern is kept and it is assigned a probability. The probability assigned to the pattern is the smallest remaining probability in any cell where a PSU was selected or the smallest remaining probability of the state and overlap controls met:
Let 
	 
For each state i, the associated
	probability with the constraint is 
	 if
	if 
	 PSUs are selected and
	PSUs are selected and 
	 if
	if 
	 .
.
For the overlap constraint, the
	associated probability is 
	 if O overlap PSUs are selected in the pattern and
	if O overlap PSUs are selected in the pattern and 
	 if O+1 overlap PSUs are selected.
	if O+1 overlap PSUs are selected.
	
The probability assigned to the pattern is the minimum of the cell probabilities, the state constraint probabilities, and the overlap constraint probability.
	
Once the pattern has a probability, that probability is deducted from each cell where a PSU was selected as well as from the state and overlap constraints met. For example, if the pattern probability is 0.2 and the number of PSUs in a state with 2.4 expected PSUs is 2, then the 0.6 probability initially assigned to selecting 2 instead of 3 PSUs in that state would be reduced to 0.4.
	
The new problem with the probabilities subtracted now goes through the same procedure until all probability is exhausted.
	
The way the patterns are constructed and the probabilities assigned, the sum of probabilities of patterns where a given PSU is selected will add up to the probability of the given PSU being selected. In addition, the probabilities associated with the state and overlap constraints will add up properly.
	
	
References:
	
	
Dippo, Cathryn S., and Jacobs, Curtis A., "Area Sample Redesign for the Consumer Price Index," Proceedings of the Survey Research Methods Section, American Statistical Association, 1983, 118-123.
	
Leaver, Sylvia G., Johnson, William, Shoemaker, Owen, and Benson, Thomas S., (1999) "Sample Redesign for the Introduction of the Telephone Point of Purchase Survey Frames In the Commodities and Services Component of the U.S. Consumer Price Index ," Proceedings of the Section on Government Statistics, American Statistical Association, 1999, 292-297.
	
Lin, Ting-Kwong, "Some Improvements on an Algorithm for Controlled Selection," Proceedings of the Survey Research Methods Section, American Statistical Association, 1992, 407-410.
	
Williams, J.L., Brown, E.F., Zion, G.R., "The Challenge of Redesigning the Consumer Price Index Area Sample," Proceedings of the Survey Research Methods Section , American Statistical Association (Vol. 1), 1993, 200-205.
	
Evaluation of vendor ListS
using census data
Submitted by:
Westat
1650 Research Blvd.
Rockville, MD 20850
September 25, 2002
Table of contents
Section Page
1 introduction 1
2 data 2
3 Evaluation of Lists at Block Level 3
3.1 Coverage of Lists 3
3.2 Accuracy of Renter Distribution 5
3.3 Assessment Using CPI Data 16
4 SUMMARY 20
List of Appendices
Table Page
A Detailed Tables A-1
List of Tables
Table Page
1 Number of housing units from SF1, MSG, and Dunhill, by county 4
2 Percent of census blocks not found in lists, by SF1 characteristics 5
3 Percent renter using two different coding schemes for the MSG file, by county 6
4 Percent renter for the SF1, MSG, and Dunhill files, by county 6
5 Percent of blocks in MSG and SF1, by percent renter 7
6 Percent of blocks in Dunhill and SF1, by percent renter 7
7 Percent of blocks in MSG and SF1, by percent renter and state 8
8 Percent of blocks in Dunhill and SF1, by percent renter and state 9
9 Percent of blocks in MSG and SF1, by percent renter and county 9
10 Percent of blocks in Dunhill and SF1, by percent renter and county 10
Table of contents (Continued)
List of Tables (Continued)
Table Page
11 Percent of blocks in MSG and SF1, by percent renter and block size 12
12 Percent of blocks in Dunhill and SF1, by percent renter and block size 13
13 Ratios of the number of occupied housing units in a block for MSG and Dunhill to SFI, by geography 15
14 Ratios of the number of rented housing units in a block for MSG and Dunhill, by geography 15
15 Ratios of the number of owned housing units in a block for MSG and Dunhill, by geography 16
16 Number of blocks in CPI and SF1, by percent renter and CPI sample size 17
17 Percent of blocks in CPI and SF1, by percent renter and CPI sample size 18
18 Percent of blocks in CPI and MSG, by percent renter and CPI sample size 19
19 Percent of blocks in CPI and Dunhill, by percent renter and CPI sample size 19
The Office of Prices and Living Conditions of the Bureau of Labor Statistics (BLS) is exploring the use of purchased lists of addresses to enhance or replace the in-person listing and screening processes used to identify renters in the Consumer Price Index (CPI) Housing Survey. This report describes the evaluation of the lists based on aggregates of census block level data. Another task and report will evaluate the lists at the individual housing unit level based on data collected from a sample of housing units.
To conduct the research, we contacted three vendors to obtain lists of housing units for selected counties in Baltimore and Richmond Metropolitan Statistical Areas. The counties included in the Baltimore MSA were Baltimore County, Howard County, and Queen Anne’s County. In the Richmond MSA, the two counties included were Hanover County and Henrico County.
Westat purchased lists from Marketing Systems Group (MSG) and Dunhill International and contacted Experian. Experian could not provide data at this time4 because the data on housing units in its files do not contain 2000 Decennial Census block identifiers needed for matching the data to the census files. The costs of obtaining the lists were $15,000 for the MSG list and $11,000 for the Dunhill list. Both the MSG and Dunhill lists use U.S. Postal delivery addresses as the base and then append additional data from other sources. The prime data source for the MSG list is Info USA and Dunhill uses Knowledgebase. BLS also provided us with block-level data from the CPI Housing Survey in the targeted counties.
This report gives census block-level summaries that compare the lists from the vendors with the corresponding block-level data from the 2000 Decennial Census. Comparisons are made for the total number of housing units, the number and proportion of rented housing units, and the number and proportion of owned housing units. These comparisons are given by state, county, percent renter, and by block size using the data from the 2000 Decennial Census. In addition to summary information in this report, a detailed dataset containing block-level data from each list and from the census is being submitted. This report also compares the block level information from the lists with CPI sample information. A limitation of the comparisons to the CPI data is that the data from the CPI are only available for a sample of blocks in the MSAs and the sample size in some of the blocks is small.
As noted above, the second task will involve selecting a sample of households from the lists and comparing the data for the housing unit to data collected from the sampled housing units. An important goal of this task is to examine the accuracy of tenure (own/rent) data from the lists. This task will be covered in a separate report.
Both MSG and Dunhill were asked to provide a list for the specified counties in the two MSAs. The list was to include data identifying the state, county, tract, tract subgroup, block group, block, address, name (where available), telephone number (where available), and tenure (where available). The list from Dunhill had more than 5,000 cases with missing data for the tract subgroup field. We filled these blanks with zeros and Dunhill verified with their data source that this was appropriate.
The list must contain census blocks because the records are matched to block-level data from 2000 Decennial Census Summary File 1 (SF1). A total of 13,262 blocks with at least one occupied housing unit5 were identified and extracted from the SF1 in the five specified counties.
The MSG file contained 557,559 records from the five specified counties. Each record corresponds to a housing unit. Of these records 350,845 (63%) have a telephone number available. The tenure variable on the MSG list has values ranging from 0 to 9. The value 0 means the housing unit is a ‘definite’ renter and the value 9 means the unit is a ‘definite’ owner. As the values go from 1 to 8 the likelihood of the unit being a rental decreases.
The Dunhill file contained 471,868 records from the five specified counties. As with the MSG file, each record corresponds to a housing unit. Of these records 272,405 (58%) have a telephone number available. The tenure variable from Dunhill is a dichotomy that indicates the unit is either owned or rented.
Both the MSG and the Dunhill files do classify the housing units by census block, but this data is inferred from the address using various geographic coding techniques. For example, MSG indicated that the block was assigned based on the ZIP+4 Code. Thus, it is possible, and even likely in some cases, that the data for a housing unit is classified as being in a particular census block but it actually falls in another block. This report cannot assess this type of error because we do not have access to housing unit level data from the 2000 census.
The CPI Housing Survey data is for a sample of 2000 Census blocks in the five counties. BLS created a block-level summary with the number of housing units sampled in the block, the number of these that are owned and the number of these that are rented. The file contains 285 blocks that summarize the data of 2,799 sampled housing units. Of the 285 blocks, we could not match five of the blocks to the SF1 data.
The first issue addressed is the coverage of the MSG and Dunhill lists. By coverage, we mean the number of housing units identified in the list as compared to the number of occupied housing units counted in the SF1 file. We assume the SF1 file is complete and accurate for this purpose.6
The first step of the process was to determine if the two lists could be matched to a corresponding census block from the SF1 file in the selected counties. After summarizing the lists at the block-level and dealing with the blank data in the Dunhill file, the housing units from the two files were matched to the SF1 blocks. The MSG file had records in 809 blocks that were not in the SF1 file (in the selected counties) and the Dunhill file contained records in 1,234 blocks that were not in the SF1 file.
Table 1 gives the number of occupied housing units in the SF1 file and in the lists from MSG and Dunhill. The numbers are given for all the data from the lists, and just for the blocks that match to a census block on the SF1 file. The records from the lists that do not match to the SF1 blocks were eliminated from subsequent analysis and are not further discussed unless specifically noted.
Even after restricting the analysis to the matching blocks, the MSG list still has either about the same or more housing units than the SF1 file for each of the five counties. The list from Dunhill has fewer housing units than both the MSG list and the SF1 file in each county.
Table 1. Number of housing units from SF1, MSG, and Dunhill, by county
| Geography | Number of housing units | SF1 | MSG | Dunhill | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
| Total | Total | 544,477 | 557,559 | 471,868 | 
| 
				 | In matching blocks | 
				 | 548,929 | 456,579 | 
| County | 
				 | 
				 | 
				 | 
				 | 
| Baltimore | Total | 299,877 | 303,191 | 258,377 | 
| 
				 | In matching blocks | 
				 | 299,869 | 252,954 | 
| Howard | Total | 90,043 | 96,146 | 81,435 | 
| 
				 | In matching blocks | 
				 | 93,568 | 78,551 | 
| Queen Anne's | Total | 15,315 | 15,308 | 14,376 | 
| 
				 | In matching blocks | 
				 | 14,642 | 13,305 | 
| Hanover | Total | 31,121 | 32,981 | 30,084 | 
| 
				 | In matching blocks | 
				 | 32,539 | 27,360 | 
| Henrico | Total | 108,121 | 109,933 | 87,596 | 
| 
				 | In matching blocks | 
				 | 108,311 | 84,409 | 
NOTE: The SF1 counts are for occupied housing units.
We also examined the files to determine if there were some blocks with occupied housing units in the SF1 without any corresponding data from the lists. Out of the 13,262 census blocks in the SF1 file in the five counties, the list from MSG had data from 12,329 blocks and the list from Dunhill had data from 12,176 blocks. To investigate the census blocks without corresponding data from the lists, we examined the characteristics of the SF1 blocks that were not matched to the lists. Table 2 gives the percentage distribution of the 933 SF1 blocks that were not found in the MSG file and the 1,086 SF1 blocks that were not found in the Dunhill file by the characteristic of the block in the SF1 file. The distributions for MSG and Dunhill are relatively consistent by the SF1 characteristic. Two variables that are highly related to the missingness are block size and percent renter. Over 20 percent of the SF1 blocks with 10 or fewer occupied housing units do not have a corresponding block-level record in either the MSG or Dunhill file. The blocks with zero percent renter in the SF1 file are missing at a high rate for both MSG and Dunhill, as are the blocks with over 40 percent renters. The missingness also varies considerably by county.
The comparisons in terms of coverage at this level indicate that the gross coverage rate for the MSG file is better than that of the Dunhill file. In fact, the MSG counts indicate more housing units than are in the SF1 data file. The Dunhill coverage rate is good, but less than the SF1 counts.
The tables in the rest of the report are typically given as either percentages or ratios for the numbers in the blocks that are in both the SF1 and the list (MSG or Dunhill). The appendix contains the counts of the data. The counts for the SF1 blocks that are not in the list are also given in these tables. The percentages for all the SF1 blocks can be computed from these tables, although these percentages are not the most informative ones for evaluating the accuracy of the list data.
Table 2. Percent of census blocks not found in lists, by SF1 characteristics
| SF1 block characteristic | MSG missing blocks | Dunhill missing blocks | 
| 
				 | 
				 | 
				 | 
| Total | 7.0% | 8.2% | 
| State | 
				 | 
				 | 
| Maryland | 7.2% | 8.5% | 
| Virginia | 6.6% | 7.5% | 
| 
				 | 
				 | 
				 | 
| County | 
				 | 
				 | 
| Baltimore | 5.5% | 6.0% | 
| Howard | 9.0% | 10.8% | 
| Queen Anne’s | 22.4% | 16.8% | 
| Hanover | 16.1% | 15.0% | 
| Henrico | 4.6% | 3.7% | 
| 
				 | 
				 | 
				 | 
| Block size (occupied units) | 
				 | 
				 | 
| 0<s10 | 20.3% | 22.9% | 
| 10<s30 | 3.0% | 3.7% | 
| 31 or more | 1.5% | 2.2% | 
| 
				 | 
				 | 
				 | 
| Percent renter | 
				 | 
				 | 
| r=0 | 12.8% | 14.3% | 
| 0<r10 | 1.2% | 1.9% | 
| 0<r10 | 3.0% | 3.8% | 
| 20<r30 | 4.4% | 5.6% | 
| 30<r40 | 4.6% | 6.9% | 
| r>40 | 12.6% | 14.1% | 
As noted previously, the MSG file contains a variable with codes ranging from 0 to 9 that could be used to classify the tenure status of the unit. To summarize the data, we computed two binary own/rent variables from these data. The first variable classified units with codes of 0 to 6 as renters and 7 to 9 as owners. The second variable classified units with codes of 0 to 5 as renters and 6 to 9 as owners. Table 3 shows the percent renters for each county using both variables and the SF1 percent renter using all the data from the MSG file prior to matching. Because the second scheme (0 to 5 renters/6 to 9 owners) gives a closer match to the SF1 percent, we use this scheme throughout the report unless otherwise noted.
Table 3. Percent renter using two different coding schemes for the MSG file, by county
| County | SF1 | MSG (0-6/7-9) | MSG (0-5/6-9) | 
| 
				 | 
				 | 
				 | 
				 | 
| Baltimore | 32.5% | 34.3% | 31.8% | 
| Howard | 26.2% | 29.6% | 27.0% | 
| Queen Anne's | 16.6% | 21.8% | 15.1% | 
| Hanover | 15.7% | 17.8% | 13.7% | 
| Henrico | 34.3% | 37.7% | 34.3% | 
NOTE: These percentages use all the MSG data, not just those in the matching blocks.
Table 4 gives the distribution of the percent renter for the SF1, MSG and Dunhill files for all records (not just the matching blocks). The MSG distribution uses the second scheme (0 to 5 renters/6 to 9 owners). This table shows that the percent renter from the MSG list is closer to the SF1 percent renter than the Dunhill list for all but Hanover County. Even within county the difference between percentages are not large. The percent renter from the Dunhill list is more than 6 percentage points different from the SF1 percent in three of the five counties, while the MSG percent renter never differs from the SF1 percent by more than 2 percentage points. This analysis suggests the MSG list might match more closely than the Dunhill list on this characteristic, but further investigation using the matching blocks is required and is given below.
Table 4. Percent renter for the SF1, MSG, and Dunhill files, by county
| County | SF1 | MSG | Dunhill | 
| 
				 | 
				 | 
				 | 
				 | 
| Baltimore | 32.5% | 31.8% | 26.3% | 
| Howard | 26.2% | 27.0% | 24.0% | 
| Queen Anne’s | 16.6% | 15.1% | 24.2% | 
| Hanover | 15.7% | 13.7% | 17.1% | 
| Henrico | 34.3% | 34.3% | 23.5% | 
NOTE: These percentages use all the MSG data, not just those in the matching blocks. The MSG percent rental uses codes 0 to 5.
The tables below compare the distribution of the renter occupied housing units in the lists to the SF1 distribution for the matching blocks (the 12,329 MSG blocks that match to the SF1 and the 12,176 Dunhill blocks that match to the SF1). Table 5 gives the percent distribution for the matching MSG blocks categorized by the SF1 percent renter distribution. For example, 51 percent of the blocks that are classified by the SF1 as having no renters (r=0) are blocks that are also classified into this category in the MSG file. The diagonal elements where the SF1 and the MSG categories are identical are in bold. The MSG designation for blocks with greater than 40 percent renter is 72 percent, the highest in the table. However, this is partially due to the categorization scheme. For example, if only two categories were used, then 95 percent of the blocks categorized as 40 percent or less renters would be in the same category using the MSG data.
Table 6 gives the same distribution for the Dunhill file compared to the SF1 data. The distribution for the Dunhill file is similar to that of the MSG file. If only two categories were used to summarize these data, then 97 percent of the blocks categorized as 40 percent or less renters would be in the same category using the Dunhill data. The main difference between the MSG and Dunhill percent distributions is for the extreme categories. For the Census blocks with 0 to 10 percent renter, the list from Dunhill has a more blocks in the same category than the list from MSG. On the other hand, the list from MSG has a higher percentage for categories with more than 20 percent renters.
Table 5. Percent of blocks in MSG and SF1, by percent renter
| 
				 | MSG | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 51% | 24% | 13% | 4% | 3% | 5% | 100% | 
| 0<r10 | 22% | 49% | 18% | 6% | 2% | 2% | 100% | 
| 0<r10 | 17% | 30% | 27% | 15% | 6% | 4% | 100% | 
| 20<r30 | 15% | 17% | 25% | 22% | 11% | 9% | 100% | 
| 30<r40 | 15% | 6% | 21% | 19% | 18% | 21% | 100% | 
| r>40 | 11% | 2% | 4% | 5% | 7% | 72% | 100% | 
| Total | 27% | 28% | 17% | 9% | 5% | 15% | 100% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table 6. Percent of blocks in Dunhill and SF1, by percent renter
| 
				 | Dunhill | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 49% | 28% | 14% | 4% | 2% | 3% | 100% | 
| 0<r10 | 19% | 58% | 17% | 3% | 1% | 2% | 100% | 
| 0<r10 | 21% | 37% | 29% | 8% | 3% | 3% | 101% | 
| 20<r30 | 20% | 24% | 34% | 12% | 5% | 4% | 99% | 
| 30<r40 | 20% | 13% | 27% | 19% | 9% | 11% | 99% | 
| r>40 | 12% | 3% | 8% | 9% | 10% | 58% | 100% | 
| Total | 27% | 33% | 19% | 6% | 3% | 11% | 99% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The next tables give the same data broken down by state and county. Table 7 is the percent renter distribution by state for the MSG file and Table 8 is the percent renter distribution by state for the Dunhill file. The distributions are very similar across state for both the MSG and Dunhill files, with no remarkable differences. Table 9 gives the percent renter distribution by county for the MSG file and Table 10 is the percent renter distribution by county for the Dunhill file. As expected, the distributions by renter status from both sources are more variable at the county level than the state level. Two counties, Queen Anne’s and Hanover, are very different from the other counties. These two counties have much lower percentages in the over 40 percent renter category for both MSG and Dunhill than observed in the other counties. For example, Queen Anne’s county has only 26 percent in the diagonal for the over 40 percent category for MSG while the average for this category across all the counties is 72 percent (the corresponding percentage for the Dunhill file is 21 percent compared to the overall average of 58 percent). We are not aware of any reason for the problems in these two counties, but it does suggest that a wider range of counties may have to be examined to assess the quality of the lists.
Table 7. Percent of blocks in MSG and SF1, by percent renter and state
| 
				 | MSG | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| MD | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 51% | 25% | 12% | 4% | 3% | 5% | 100% | 
| 0<r10 | 22% | 50% | 18% | 6% | 2% | 2% | 100% | 
| 0<r10 | 17% | 31% | 27% | 15% | 6% | 4% | 100% | 
| 20<r30 | 17% | 17% | 25% | 22% | 11% | 8% | 100% | 
| 30<r40 | 17% | 7% | 19% | 18% | 17% | 23% | 101% | 
| r>40 | 11% | 2% | 4% | 5% | 6% | 72% | 100% | 
| Total | 27% | 29% | 17% | 8% | 5% | 15% | 101% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| VA | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 52% | 23% | 13% | 5% | 3% | 4% | 100% | 
| 0<r10 | 24% | 45% | 19% | 7% | 3% | 2% | 100% | 
| 0<r10 | 17% | 28% | 28% | 15% | 7% | 5% | 100% | 
| 20<r30 | 13% | 17% | 26% | 24% | 11% | 9% | 100% | 
| 30<r40 | 11% | 5% | 25% | 21% | 20% | 18% | 100% | 
| r>40 | 10% | 2% | 4% | 5% | 8% | 70% | 99% | 
| Total | 27% | 25% | 18% | 10% | 6% | 14% | 100% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table 8. Percent of blocks in Dunhill and SF1, by percent renter and state
| 
				 | Dunhill | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| MD | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 47% | 28% | 15% | 4% | 2% | 4% | 100% | 
| 0<r10 | 17% | 60% | 17% | 4% | 1% | 2% | 101% | 
| 0<r10 | 18% | 36% | 31% | 8% | 3% | 3% | 99% | 
| 20<r30 | 18% | 20% | 36% | 15% | 6% | 5% | 100% | 
| 30<r40 | 18% | 11% | 24% | 23% | 12% | 12% | 100% | 
| r>40 | 11% | 2% | 6% | 9% | 10% | 62% | 100% | 
| Total | 24% | 34% | 19% | 7% | 4% | 12% | 100% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| VA | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 53% | 27% | 13% | 3% | 1% | 2% | 99% | 
| 0<r10 | 26% | 52% | 16% | 3% | 1% | 1% | 99% | 
| 0<r10 | 27% | 37% | 25% | 7% | 1% | 1% | 98% | 
| 20<r30 | 23% | 31% | 31% | 8% | 4% | 3% | 100% | 
| 30<r40 | 24% | 17% | 33% | 12% | 5% | 9% | 100% | 
| r>40 | 14% | 4% | 11% | 9% | 11% | 50% | 99% | 
| Total | 32% | 32% | 19% | 6% | 3% | 9% | 101% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table 9. Percent of blocks in MSG and SF1, by percent renter and county
| 
				 | MSG | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Baltimore County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 49% | 27% | 13% | 4% | 2% | 4% | 99% | 
| 0<r10 | 21% | 49% | 20% | 7% | 2% | 1% | 100% | 
| 0<r10 | 16% | 32% | 27% | 16% | 6% | 3% | 100% | 
| 20<r30 | 16% | 18% | 25% | 23% | 11% | 7% | 100% | 
| 30<r40 | 17% | 9% | 21% | 17% | 16% | 21% | 101% | 
| r>40 | 7% | 2% | 4% | 3% | 6% | 78% | 100% | 
| Total | 25% | 29% | 17% | 9% | 5% | 15% | 100% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Howard County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 53% | 26% | 11% | 3% | 3% | 3% | 99% | 
| 0<r10 | 21% | 60% | 12% | 3% | 1% | 3% | 100% | 
| 0<r10 | 16% | 33% | 30% | 12% | 4% | 6% | 101% | 
| 20<r30 | 13% | 17% | 17% | 20% | 13% | 19% | 99% | 
| 30<r40 | 17% | 2% | 8% | 21% | 17% | 35% | 100% | 
| r>40 | 9% | 2% | 2% | 7% | 4% | 76% | 100% | 
| Total | 28% | 35% | 13% | 6% | 4% | 14% | 100% | 
Table 9. Percent of blocks in MSG and SF1, by percent renter and county (Continued)
| 
				 | MSG | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Queen Anne's County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 56% | 11% | 9% | 5% | 6% | 12% | 99% | 
| 0<r10 | 27% | 42% | 22% | 3% | 2% | 4% | 100% | 
| 0<r10 | 26% | 25% | 21% | 13% | 5% | 10% | 100% | 
| 20<r30 | 23% | 11% | 30% | 15% | 11% | 10% | 100% | 
| 30<r40 | 18% | 2% | 20% | 22% | 20% | 18% | 100% | 
| r>40 | 39% | 2% | 10% | 15% | 8% | 26% | 100% | 
| Total | 36% | 18% | 17% | 10% | 7% | 12% | 100% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Hanover County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 59% | 25% | 9% | 3% | 1% | 3% | 100% | 
| 0<r10 | 30% | 51% | 13% | 3% | 0% | 3% | 100% | 
| 0<r10 | 21% | 43% | 20% | 9% | 0% | 6% | 99% | 
| 20<r30 | 23% | 27% | 21% | 10% | 5% | 13% | 99% | 
| 30<r40 | 32% | 6% | 26% | 24% | 3% | 9% | 100% | 
| r>40 | 20% | 7% | 8% | 5% | 14% | 45% | 99% | 
| Total | 35% | 33% | 14% | 6% | 2% | 9% | 99% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Henrico County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 49% | 22% | 14% | 6% | 4% | 5% | 100% | 
| 0<r10 | 22% | 43% | 21% | 9% | 4% | 2% | 101% | 
| 0<r10 | 16% | 23% | 30% | 17% | 9% | 4% | 99% | 
| 20<r30 | 11% | 15% | 27% | 27% | 12% | 8% | 100% | 
| 30<r40 | 7% | 5% | 24% | 21% | 24% | 19% | 100% | 
| r>40 | 8% | 1% | 4% | 5% | 7% | 75% | 100% | 
| Total | 24% | 23% | 19% | 11% | 7% | 16% | 100% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table 10. Percent of blocks in Dunhill and SF1, by percent renter and county
| 
				 | Dunhill | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Baltimore County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 45% | 30% | 16% | 4% | 2% | 4% | 101% | 
| 0<r10 | 17% | 60% | 17% | 4% | 1% | 2% | 101% | 
| 0<r10 | 16% | 38% | 32% | 9% | 3% | 3% | 101% | 
| 20<r30 | 14% | 22% | 39% | 15% | 6% | 4% | 100% | 
| 30<r40 | 14% | 12% | 25% | 25% | 12% | 11% | 99% | 
| r>40 | 6% | 2% | 6% | 9% | 10% | 67% | 100% | 
| Total | 22% | 35% | 19% | 7% | 4% | 13% | 100% | 
Table 10. Percent of blocks in Dunhill and SF1, by percent renter and county (Continued)
| 
				 | Dunhill | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Howard County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 46% | 34% | 12% | 3% | 2% | 2% | 99% | 
| 0<r10 | 13% | 67% | 14% | 2% | 1% | 2% | 99% | 
| 0<r10 | 18% | 38% | 30% | 6% | 2% | 6% | 100% | 
| 20<r30 | 12% | 14% | 30% | 20% | 7% | 16% | 99% | 
| 30<r40 | 19% | 6% | 21% | 19% | 19% | 17% | 101% | 
| r>40 | 11% | 2% | 7% | 6% | 10% | 64% | 100% | 
| Total | 24% | 40% | 16% | 5% | 4% | 11% | 100% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Queen Anne's County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 61% | 8% | 15% | 6% | 4% | 5% | 99% | 
| 0<r10 | 23% | 49% | 19% | 5% | 1% | 2% | 99% | 
| 0<r10 | 33% | 25% | 23% | 8% | 6% | 5% | 100% | 
| 20<r30 | 44% | 11% | 25% | 11% | 4% | 4% | 99% | 
| 30<r40 | 41% | 9% | 18% | 16% | 5% | 11% | 100% | 
| r>40 | 52% | 2% | 10% | 12% | 4% | 21% | 101% | 
| Total | 44% | 19% | 18% | 8% | 4% | 7% | 100% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Hanover County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 57% | 24% | 12% | 3% | 2% | 3% | 101% | 
| 0<r10 | 24% | 55% | 13% | 4% | 1% | 3% | 100% | 
| 0<r10 | 29% | 39% | 20% | 8% | 2% | 2% | 100% | 
| 20<r30 | 26% | 30% | 28% | 8% | 1% | 7% | 100% | 
| 30<r40 | 35% | 6% | 35% | 6% | 3% | 13% | 98% | 
| r>40 | 25% | 4% | 19% | 12% | 14% | 26% | 100% | 
| Total | 35% | 34% | 17% | 6% | 3% | 6% | 101% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Henrico County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 52% | 28% | 14% | 3% | 1% | 2% | 100% | 
| 0<r10 | 27% | 51% | 18% | 3% | 0% | 1% | 100% | 
| 0<r10 | 27% | 37% | 27% | 7% | 1% | 1% | 100% | 
| 20<r30 | 22% | 32% | 32% | 8% | 5% | 2% | 101% | 
| 30<r40 | 22% | 19% | 33% | 13% | 5% | 8% | 100% | 
| r>40 | 12% | 4% | 10% | 9% | 11% | 54% | 100% | 
| Total | 31% | 31% | 20% | 6% | 3% | 10% | 101% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The last set of tables of this nature are the distributions for the MSG and Dunhill files broken down by the size of the block in terms of the number of occupied housing units computed from the SF1 data. Table 11 shows the distribution for the MSG data. A relatively consistent pattern is that as the block size increases the percentages in the main diagonal increase, with only a few minor departures. In other words, the blocks with larger numbers of occupied units are more accurately classified with the MSG data. The pattern for the Dunhill data given in Table 12 is similar, with one exception. For blocks with no renters in the SF1, the percentages classified as having no renters by the Dunhill data decrease as the block size increases. It is possible that this may have something to do with the way the Dunhill data are classified by renter status, but we are uncertain about how this process works.
Table 11. Percent of blocks in MSG and SF1, by percent renter and block size
| 
				 | MSG | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=0<s10 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 60% | 8% | 13% | 5% | 5% | 9% | 100% | 
| 0<r10 | 38% | 20% | 28% | 4% | 4% | 6% | 100% | 
| 0<r10 | 39% | 10% | 19% | 14% | 7% | 10% | 99% | 
| 20<r30 | 44% | 11% | 17% | 11% | 7% | 9% | 99% | 
| 30<r40 | 33% | 2% | 26% | 12% | 10% | 17% | 100% | 
| r>40 | 35% | 3% | 8% | 8% | 8% | 39% | 101% | 
| Total | 50% | 8% | 15% | 7% | 6% | 15% | 101% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=10<s30 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 48% | 32% | 13% | 5% | 2% | 1% | 101% | 
| 0<r10 | 32% | 35% | 21% | 7% | 3% | 2% | 100% | 
| 0<r10 | 20% | 28% | 29% | 13% | 6% | 4% | 100% | 
| 20<r30 | 11% | 21% | 30% | 21% | 9% | 7% | 99% | 
| 30<r40 | 6% | 11% | 22% | 26% | 17% | 19% | 101% | 
| r>40 | 3% | 2% | 7% | 11% | 13% | 63% | 99% | 
| Total | 30% | 28% | 20% | 10% | 5% | 8% | 101% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=31 or more | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 30% | 56% | 10% | 2% | 1% | 1% | 100% | 
| 0<r10 | 14% | 61% | 16% | 5% | 2% | 2% | 100% | 
| 0<r10 | 4% | 40% | 29% | 19% | 5% | 3% | 100% | 
| 20<r30 | 3% | 15% | 25% | 32% | 15% | 10% | 100% | 
| 30<r40 | 3% | 5% | 13% | 18% | 32% | 30% | 101% | 
| r>40 | 1% | 1% | 2% | 2% | 3% | 91% | 100% | 
| Total | 10% | 40% | 15% | 9% | 4% | 22% | 100% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental. The variable ‘s’ is the number of occupied housing units in the block.
Table 12. Percent of blocks in Dunhill and SF1, by percent renter and block size
| 
				 | Dunhill | 
				 | |||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=0<s10 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 64% | 8% | 14% | 5% | 3% | 7% | 101% | 
| 0<r10 | 46% | 17% | 26% | 6% | 3% | 3% | 101% | 
| 0<r10 | 52% | 7% | 17% | 13% | 6% | 6% | 101% | 
| 20<r30 | 43% | 9% | 23% | 12% | 8% | 7% | 102% | 
| 30<r40 | 42% | 3% | 17% | 13% | 9% | 15% | 99% | 
| r>40 | 38% | 2% | 10% | 8% | 8% | 33% | 99% | 
| Total | 54% | 7% | 15% | 8% | 5% | 12% | 101% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=10<s30 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 44% | 35% | 17% | 3% | 1% | 1% | 101% | 
| 0<r10 | 33% | 41% | 19% | 5% | 1% | 1% | 100% | 
| 0<r10 | 25% | 36% | 26% | 8% | 2% | 2% | 99% | 
| 20<r30 | 21% | 29% | 31% | 12% | 4% | 3% | 100% | 
| 30<r40 | 15% | 22% | 34% | 14% | 9% | 7% | 101% | 
| r>40 | 8% | 6% | 17% | 15% | 11% | 44% | 101% | 
| Total | 31% | 34% | 21% | 6% | 2% | 5% | 99% | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=31 or more | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 17% | 74% | 7% | 1% | 1% | 0% | 100% | 
| 0<r10 | 8% | 73% | 15% | 2% | 1% | 2% | 101% | 
| 0<r10 | 3% | 50% | 37% | 6% | 2% | 2% | 100% | 
| 20<r30 | 3% | 28% | 47% | 14% | 4% | 5% | 101% | 
| 30<r40 | 1% | 11% | 30% | 36% | 11% | 11% | 100% | 
| r>40 | 1% | 2% | 3% | 7% | 11% | 76% | 100% | 
| Total | 6% | 49% | 18% | 6% | 4% | 18% | 101% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental. The variable ‘s’ is the number of occupied housing units in the block.
In addition to these cross-tabulations, we computed some ratios at the block level to provide a more complete description of the accuracy of the lists. The three ratios computed for each list were: (1) the ratio of the number of occupied housing units in the list to the number of occupied housing units in the SF1; (2) the ratio of the number of rented units in the list to the number of rented units in the SF1; and, (3) the ratio of the number of owned units in the list to the number of owned units in the SF1. One important note concerning these tables is the summaries of the ratios given below are done at the block-level. Thus, each block is treated equally, irrespective of the number of housing units in the block.
Table 13 gives the ratios of the number of occupied housing units from the lists to the SF1 overall and by state and county. The mean of the ratios for the MSG and the Dunhill are both greater than unity, indicating overcoverage. This finding is consistent with the results in Table 1.7 As noted earlier, the main source of variation in the ratios for both the MSG and Dunhill files are the counties rather than aggregates by state.
More interesting than the means are the distributions of the ratios for the two lists. The medians from the MSG file are always close to unity, while those for the Dunhill file are always less than unity. Looking at all the percentiles, the distribution for the MSG ratios are basically shifted upward compared to the Dunhill, often by about 0.10 to 0.25. The standard deviation is also an interesting measure of the stability of the ratios. The standard deviation of the ratio for Dunhill is larger than for MSG, but this appears to be largely a function of the data for Henrico county in Virginia. In this county, the Dunhill file distribution is severely shifted downward (the median ratio is 0.47) and yet there are some large ratios (the 90th percentile ratio is 2.0). The standard deviation for this county is 12.6 and this causes the Dunhill ratios to be less stable for the aggregates at the Virginia and overall level.
Table 14 gives the ratios of the number of rented housing units from the lists to the SF1 overall and by state and county. Table 15 gives the corresponding ratios of the number of owned units. The ratios for the rented units in Table 14 are not as stable as those for the owned units in Table 15 because the denominator (the number of rented units in the SF1 file) may be small and this will cause the ratio to be unstable. Nonetheless, the pattern in Table 14 for rented units is consistent with that observed in Table 13 for all occupied units. The Dunhill file has severe problems for Henrico county. However, this pattern does not persist for the ratios of the number of owned units given in Table 15. These ratios are relatively stable and the Dunhill ratios are closer to the MSG ratios. In particular, the ratios for Henrico county from the Dunhill file are very similar to the MSG ratios and are not out of line with the ratios for other counties. Thus, the discrepancy in the Dunhill file is due to the differences in both the number of all occupied and the number of rented units in Henrico county, not the number of owned units in the county.
Table 13. Ratios of the number of occupied housing units in a block for MSG and Dunhill to SFI, by geography
| 
				 | 
				 | 
				 | Stand. | Percentiles | ||||
| Geography | List | Mean | dev. | 90 | 75 | 50 | 25 | 10 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| All | MSG | 1.31 | 2.79 | 1.82 | 1.20 | 1.00 | 0.84 | 0.56 | 
| 
				 | Dunhill | 1.17 | 5.55 | 1.55 | 1.03 | 0.88 | 0.69 | 0.47 | 
| MD | MSG | 1.33 | 3.18 | 1.83 | 1.20 | 1.00 | 0.84 | 0.56 | 
| 
				 | Dunhill | 1.16 | 2.74 | 1.60 | 1.05 | 0.88 | 0.70 | 0.49 | 
| VA | MSG | 1.26 | 1.70 | 1.75 | 1.20 | 1.00 | 0.85 | 0.57 | 
| 
				 | Dunhill | 1.19 | 8.93 | 1.50 | 1.00 | 0.86 | 0.67 | 0.46 | 
| Baltimore County | MSG | 1.26 | 2.22 | 1.67 | 1.17 | 1.00 | 0.86 | 0.63 | 
| 
				 | Dunhill | 1.11 | 2.21 | 1.46 | 1.00 | 0.88 | 0.72 | 0.50 | 
| Howard County | MSG | 1.60 | 6.17 | 2.22 | 1.36 | 1.04 | 0.86 | 0.56 | 
| 
				 | Dunhill | 1.36 | 4.68 | 1.90 | 1.19 | 0.92 | 0.74 | 0.50 | 
| Queen Anne's County | MSG | 1.43 | 2.74 | 2.33 | 1.42 | 0.93 | 0.50 | 0.26 | 
| 
				 | Dunhill | 1.31 | 2.42 | 2.00 | 1.26 | 0.80 | 0.50 | 0.25 | 
| Hanover County | MSG | 1.49 | 2.64 | 2.50 | 1.50 | 1.00 | 0.71 | 0.40 | 
| 
				 | Dunhill | 1.21 | 1.78 | 2.00 | 1.25 | 0.87 | 0.60 | 0.36 | 
| Henrico County | MSG | 1.18 | 1.28 | 1.58 | 1.14 | 1.00 | 0.88 | 0.64 | 
| 
				 | Dunhill | 1.19 | 10.15 | 1.29 | 1.00 | 0.85 | 0.70 | 0.49 | 
Table 14. Ratios of the number of rented housing units in a block for MSG and Dunhill, by geography
| 
				 | 
				 | 
				 | Stand. | Percentiles | |||||
| Geography | List | Mean | dev. | 90 | 75 | 50 | 25 | 10 | |
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | |
| All | MSG | 1.57 | 5.92 | 3.00 | 1.50 | 0.92 | 0.00 | 0.00 | |
| 
				 | Dunhill | 1.48 | 8.61 | 2.50 | 1.00 | 0.60 | 0.00 | 0.00 | |
| MD | MSG | 1.60 | 6.36 | 3.00 | 1.57 | 0.92 | 0.00 | 0.00 | |
| 
				 | Dunhill | 1.58 | 6.90 | 3.00 | 1.22 | 0.67 | 0.09 | 0.00 | |
| VA | MSG | 1.50 | 4.85 | 3.00 | 1.50 | 0.93 | 0.00 | 0.00 | |
| 
				 | Dunhill | 1.26 | 11.40 | 2.00 | 1.00 | 0.48 | 0.00 | 0.00 | |
| Baltimore County | MSG | 1.62 | 6.79 | 3.00 | 1.58 | 0.93 | 0.04 | 0.00 | |
| 
				 | Dunhill | 1.59 | 7.39 | 3.00 | 1.20 | 0.67 | 0.20 | 0.00 | |
| Howard County | MSG | 1.86 | 6.00 | 4.00 | 1.94 | 1.00 | 0.00 | 0.00 | |
| 
				 | Dunhill | 1.78 | 5.34 | 3.27 | 1.67 | 0.91 | 0.17 | 0.00 | |
| Queen Anne's County | MSG | 1.03 | 1.85 | 2.00 | 1.00 | 0.50 | 0.00 | 0.00 | |
| 
				 | Dunhill | 1.17 | 4.53 | 2.00 | 1.00 | 0.25 | 0.00 | 0.00 | |
| Hanover County | MSG | 1.15 | 2.79 | 2.08 | 1.00 | 0.50 | 0.00 | 0.00 | |
| 
				 | Dunhill | 1.27 | 5.69 | 2.00 | 1.00 | 0.50 | 0.00 | 0.00 | |
| Henrico County | MSG | 1.61 | 5.32 | 3.00 | 1.67 | 1.00 | 0.17 | 0.00 | |
| 
				 | Dunhill | 1.25 | 12.63 | 2.00 | 1.00 | 0.47 | 0.00 | 0.00 | |
Table 15. Ratios of the number of owned housing units in a block for MSG and Dunhill, by geography
| 
				 | 
				 | 
				 | Stand. | Percentiles | ||||
| Geography | List | Mean | dev. | 90 | 75 | 50 | 25 | 10 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| All | MSG | 1.50 | 3.85 | 2.00 | 1.26 | 1.00 | 0.82 | 0.52 | 
| 
				 | Dunhill | 1.54 | 3.25 | 2.00 | 1.16 | 0.92 | 0.75 | 0.50 | 
| MD | MSG | 1.53 | 3.80 | 2.14 | 1.27 | 1.00 | 0.83 | 0.53 | 
| 
				 | Dunhill | 1.59 | 3.43 | 2.00 | 1.16 | 0.92 | 0.75 | 0.50 | 
| VA | MSG | 1.43 | 3.95 | 2.00 | 1.25 | 1.00 | 0.81 | 0.52 | 
| 
				 | Dunhill | 1.43 | 2.81 | 2.00 | 1.17 | 0.92 | 0.75 | 0.50 | 
| Baltimore County | MSG | 1.54 | 4.14 | 2.00 | 1.23 | 1.00 | 0.85 | 0.60 | 
| 
				 | Dunhill | 1.65 | 3.71 | 2.00 | 1.14 | 0.92 | 0.77 | 0.55 | 
| Howard County | MSG | 1.55 | 2.53 | 2.50 | 1.40 | 1.03 | 0.84 | 0.57 | 
| 
				 | Dunhill | 1.45 | 2.53 | 2.19 | 1.22 | 0.94 | 0.76 | 0.50 | 
| Queen Anne's County | MSG | 1.42 | 2.44 | 2.50 | 1.47 | 1.00 | 0.50 | 0.24 | 
| 
				 | Dunhill | 1.29 | 2.02 | 2.20 | 1.30 | 0.87 | 0.50 | 0.25 | 
| Hanover County | MSG | 1.63 | 3.98 | 2.67 | 1.52 | 1.00 | 0.73 | 0.37 | 
| 
				 | Dunhill | 1.29 | 2.57 | 2.00 | 1.27 | 0.90 | 0.66 | 0.36 | 
| Henrico County | MSG | 1.37 | 3.94 | 1.79 | 1.20 | 1.00 | 0.83 | 0.57 | 
| 
				 | Dunhill | 1.47 | 2.88 | 1.93 | 1.13 | 0.93 | 0.77 | 0.57 | 
Overall, the analysis suggests that the MSG data correspond more closely to the SF1 data than the Dunhill data when the data are aggregated to the block-level. The variability of the data with respect to the SF1 data by county raises some concerns about whether the findings can be generalized to other states and counties not included in this analysis.
The other source of data that can be used to evaluate the data from the lists is the block-level summary data from the CPI Housing Survey as provided by BLS. As we noted earlier, the main limitation associated with using these data is that they are sample data and do not cover many of the blocks in the areas and even those included are only covered for a sample of housing units.
We attempt to deal with the small sample size in two ways. First, instead of examining the more complete distribution of the percent of renters as done in the previous section, we classify each block into either a 40 percent or less renter category or a more than 40 percent renter category. Second, we create three groups of blocks depending on the number of sampled housing units that were found in the CPI sample. The three groups are less than 5 sampled occupied units, 5 to 9 sampled occupied units, and 10 or more sampled occupied units. Any categorization of a block based on a small sample size is obviously tenuous so the small and medium categories are not very informative for this analysis. Unfortunately, most of the blocks fall into these less informative categories. There are 147 blocks with less than 5 sampled units, 72 with between 5 and 9 sampled units, and only 61 with 10 or more sampled units.
An even more troublesome problem for the analysis is the fact that there are very few rental units in the blocks with more than 10 sampled units in the CPI. Table 16 shows the number of blocks cross-classified by the percent renter in the CPI and the SF1 by block size. This table only counts the number of blocks in the CPI and the SF1 that match. The appendix contains tables that include the five blocks that did not match when merged with the SF1 file.
Table 16. Number of blocks in CPI and SF1, by percent renter and CPI sample size
| 
				 | CPI | 
				 | |
| SF1 | 0<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
| All blocks | 
				 | 
				 | 
				 | 
| 0<r40 | 200 | 7 | 207 | 
| r>40 | 21 | 52 | 73 | 
| 
				 | 
				 | 
				 | 
				 | 
| ss<5 | 
				 | 
				 | 
				 | 
| 0<r40 | 87 | 6 | 93 | 
| r>40 | 16 | 38 | 54 | 
| 
				 | 
				 | 
				 | 
				 | 
| 5ss<10 | 
				 | 
				 | 
				 | 
| 0<r40 | 55 | 1 | 56 | 
| r>40 | 2 | 14 | 16 | 
| 
				 | 
				 | 
				 | 
				 | 
| ss>9 | 
				 | 
				 | 
				 | 
| 0<r40 | 58 | 0 | 58 | 
| r>40 | 3 | 0 | 3 | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The variable ‘ss’ is the number of housing units in the CPI sample.
Because of the sample size limitations, we restrict our analysis to overall aggregates by the CPI block sample size. The appendix has more complete tables of the counts by state, county, and block size and the appendix tables include the missing data for those interested in the details of the matching of the CPI and SF1 files. Table 17 shows the percent distribution of the blocks by the percent renter for the CPI and the SF1. As with the earlier data, when coarse categories are used, the percentage agreement is high.
Tables 18 and 19 have the same format as Table 17, but instead of the SF1 data the MSG and Dunhill data are tabulated. The percentages in the main diagonal cells are high for these tables, but not quite as large as they are in Table 17 using the SF1 file data. The main difference is that both lists are less accurate for blocks that are identified as having more than 40 percent renters in the CPI. However, this difference cannot be given much credence if we assume the SF1 data are more accurate than the CPI data.
Table 17. Percent of blocks in CPI and SF1, by percent renter and CPI sample size
| 
				 | CPI | 
				 | |
| SF1 | 0<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
| All blocks | 
				 | 
				 | 
				 | 
| 0<r40 | 96.6% | 3.4% | 100.0% | 
| r>40 | 29.5% | 70.5% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| ss<5 | 
				 | 
				 | 
				 | 
| 0<r40 | 93.5% | 6.5% | 100.0% | 
| r>40 | 31.0% | 69.0% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| 5ss<10 | 
				 | 
				 | 
				 | 
| 0<r40 | 98.2% | 1.8% | 100.0% | 
| r>40 | 11.8% | 88.2% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| ss>9 | 
				 | 
				 | 
				 | 
| 0<r40 | 100.0% | 0% | 100.0% | 
| r>40 | 100.0% | 0% | 100.0% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The variable ‘ss’ is the number of housing units in the CPI sample.
Table 18. Percent of blocks in CPI and MSG, by percent renter and CPI sample size
| 
				 | CPI | 
				 | |
| MSG | 0<r40 | r>40 | Total | 
| All blocks | 
				 | 
				 | 
				 | 
| 0<r40 | 94.4% | 5.6% | 100.0% | 
| r>40 | 42.2% | 57.8% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| ss<5 | 
				 | 
				 | 
				 | 
| 0<r40 | 90.0% | 10.0% | 100.0% | 
| r>40 | 38.6% | 61.4% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| 5ss<10 | 
				 | 
				 | 
				 | 
| 0<r40 | 96.0% | 4.0% | 100.0% | 
| r>40 | 40.9% | 59.1% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| ss>9 | 
				 | 
				 | 
				 | 
| 0<r40 | 100.0% | 0% | 100.0% | 
| r>40 | 100.0% | 0% | 100.0% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The variable ‘ss’ is the number of housing units in the CPI sample.
Table 19. Percent of blocks in CPI and Dunhill, by percent renter and CPI sample size
| 
				 | CPI | 
				 | |
| MSG | 0<r40 | r>40 | Total | 
| 
				 | 
				 | 
				 | 
				 | 
| All blocks | 
				 | 
				 | 
				 | 
| 0<r40 | 91.5% | 8.5% | 100.0% | 
| r>40 | 39.7% | 60.3% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| ss<5 | 
				 | 
				 | 
				 | 
| 0<r40 | 88.7% | 11.3% | 100.0% | 
| r>40 | 34.0% | 66.0% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| 5ss<10 | 
				 | 
				 | 
				 | 
| 0<r40 | 87.9% | 12.1% | 100.0% | 
| r>40 | 42.9% | 57.1% | 100.0% | 
| 
				 | 
				 | 
				 | 
				 | 
| ss>9 | 
				 | 
				 | 
				 | 
| 0<r40 | 100.0% | 0% | 100.0% | 
| r>40 | 100.0% | 0% | 100.0% | 
NOTES: Table may not add to 100 percent due to rounding.
The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The variable ‘ss’ is the number of housing units in the CPI sample.
Lists of housing units by census block are available from vendors and these lists might be used to enhance or replace an in-person listing and screening process currently employed in the CPI Housing Survey. To evaluate the quality of the lists, we purchased lists from MSG and Dunhill for five counties in the Baltimore and Richmond Metropolitan Statistical Areas. The costs of obtaining the lists were $15,000 for the MSG list and $11,000 for the Dunhill list.
This report compares the data from the lists with the corresponding block-level data from the 2000 Decennial Census as contained in the SF1 data file. Since the CPI Housing Survey is especially interested in blocks with 40 percent or less renters, the comparisons focused on the distributions of the percent renters. In addition, BLS provided us with block-level data from the CPI Housing Survey in the five counties and these data are also summarized in this report.
The MSG file contained 557,559 housing unit records from the five specified counties and 63 percent of these units have a telephone number available. The Dunhill file contained 471,868 records from the five specified counties and 58% of these housing units have a telephone number available.
The first issue considered in the report was the coverage of the lists. After matching the lists to the SF1 file, the MSG list had either about the same or more housing units than the SF1 file for each of the five counties, while list from Dunhill had fewer housing units than the MSG list and the SF1 file in each county. These comparisons suggest that the gross coverage rate for the MSG file is better than that of the Dunhill file, but the Dunhill coverage rate is still relatively good.
The second evaluation was of the accuracy of the percent renter classifications from the two vendor files as compared to the SF1 data. The Dunhill list differed by more than 6 percentage points from the SF1 percent in three of the five counties, while the MSG list never differed from the SF1 percent by more than 2 percentage points when aggregates were examined. The analysis was then restricted to the matching blocks and again the MSG list appeared to be more accurate. The largest differences between the MSG and Dunhill percent renter distributions were for the Census blocks with 0 to 10 percent renter, where the list from Dunhill was more accurate, and for categories with more than 20 percent renters where the MSG list was more accurate.
The percent renter distributions by state and county were also examined. While the distributions were very similar by state, the county distributions were rather odd. Two counties had very low match rates as compared to other counties. This finding could indicate a local component to be the quality of the lists.
Further analysis of the ratios of the number of housing units and rented units for the two lists also produced an unusual finding at the county level. The list from Dunhill for Henrico county in Virginia did not track with the SF1 data. This difference caused some of the summaries of aggregates at higher levels such as the state level to be poorer for the Dunhill list.
The analysis of the percent renter distribution indicated the MSG data correspond more closely to the SF1 data than the Dunhill data when the data are aggregated to the block-level. However, the variability of the data by county raises some concerns about whether the findings can be generalized to other states and counties not included in this analysis.
The last analysis used the CPI Housing Survey data. The small sample size severely limited the ability to use these data to evaluate the vendor lists. We attempted to deal with the small sample size by using coarser percentage renter distributions and aggregating only to groups based on the sample size in the CPI blocks. Despite these efforts, no definitive conclusions can be drawn from these data with respect to the quality of the lists of the vendors.
The next task in the project involves selecting a sample of household from the lists and comparing the data from the list for the housing unit to data collected from the sampled housing units. An important goal of this task is to examine the accuracy of tenure data from the lists at the housing unit level. The report from that task will greatly supplement the information from this analysis.
Appendix A
Detailed Tables
Table A-1. Number of blocks in MSG and SFI when codes 0 to 5 are treated as renters, by percent renter
| 
				 | MSG | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 1,724 | 825 | 425 | 152 | 97 | 155 | 495 | 3,873 | 
| 0<r10 | 783 | 1,724 | 650 | 215 | 79 | 67 | 44 | 3,562 | 
| 0<r10 | 370 | 644 | 582 | 332 | 127 | 93 | 66 | 2,214 | 
| 20<r30 | 148 | 163 | 244 | 215 | 103 | 84 | 44 | 1,001 | 
| 30<r40 | 80 | 34 | 114 | 104 | 98 | 113 | 26 | 569 | 
| r>40 | 189 | 34 | 76 | 90 | 119 | 1,277 | 258 | 2,043 | 
| Total | 3,294 | 3,424 | 2,091 | 1,108 | 623 | 1,789 | 933 | 13,262 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table A-2. Number of blocks in MSG and SFI when codes 0 to 6 are treated as renters, as percent renter
| 
				 | MSG | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 1,307 | 838 | 657 | 206 | 151 | 219 | 495 | 3,873 | 
| 0<r10 | 400 | 1,500 | 1,021 | 363 | 145 | 89 | 44 | 3,562 | 
| 0<r10 | 217 | 391 | 727 | 461 | 207 | 145 | 66 | 2,214 | 
| 20<r30 | 95 | 71 | 218 | 262 | 179 | 132 | 44 | 1,001 | 
| 30<r40 | 47 | 20 | 85 | 110 | 126 | 155 | 26 | 569 | 
| r>40 | 138 | 19 | 66 | 78 | 110 | 1,374 | 258 | 2,043 | 
| Total | 2,204 | 2,839 | 2,774 | 1,480 | 918 | 2,114 | 933 | 13,262 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table A-3. Number of blocks in MSG and SFI, by state and percent renter
| 
				 | MSG | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| MD | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 1,158 | 574 | 282 | 98 | 63 | 109 | 347 | 2,631 | 
| 0<r10 | 545 | 1,274 | 464 | 146 | 53 | 43 | 32 | 2,557 | 
| 0<r10 | 243 | 435 | 373 | 217 | 78 | 57 | 50 | 1,453 | 
| 20<r30 | 97 | 98 | 145 | 125 | 63 | 49 | 30 | 607 | 
| 30<r40 | 57 | 23 | 64 | 61 | 57 | 77 | 22 | 361 | 
| r>40 | 132 | 21 | 52 | 61 | 75 | 893 | 172 | 1,406 | 
| Total | 2,232 | 2,425 | 1,380 | 708 | 389 | 1,228 | 653 | 9,015 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| VA | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 566 | 251 | 143 | 54 | 34 | 46 | 148 | 1,242 | 
| 0<r10 | 238 | 450 | 186 | 69 | 26 | 24 | 12 | 1,005 | 
| 0<r10 | 127 | 209 | 209 | 115 | 49 | 36 | 16 | 761 | 
| 20<r30 | 51 | 65 | 99 | 90 | 40 | 35 | 14 | 394 | 
| 30<r40 | 23 | 11 | 50 | 43 | 41 | 36 | 4 | 208 | 
| r>40 | 57 | 13 | 24 | 29 | 44 | 384 | 86 | 637 | 
| Total | 1,062 | 999 | 711 | 400 | 234 | 561 | 280 | 4,247 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Percent number based on codes 0 to 5.
Table A-4. Number of blocks in MSG and SFI, by county and percent renter
| 
				 | MSG | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Baltimore County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 815 | 445 | 217 | 72 | 36 | 67 | 201 | 1,853 | 
| 0<r10 | 411 | 944 | 376 | 128 | 44 | 23 | 17 | 1,943 | 
| 0<r10 | 175 | 341 | 290 | 176 | 63 | 31 | 27 | 1,103 | 
| 20<r30 | 69 | 77 | 108 | 99 | 45 | 28 | 13 | 439 | 
| 30<r40 | 40 | 21 | 50 | 40 | 39 | 51 | 9 | 250 | 
| r>40 | 72 | 16 | 37 | 33 | 59 | 756 | 97 | 1,070 | 
| Total | 1,582 | 1,844 | 1,078 | 548 | 286 | 956 | 364 | 6,658 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Howard County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 206 | 101 | 42 | 13 | 12 | 12 | 60 | 446 | 
| 0<r10 | 93 | 266 | 54 | 14 | 6 | 14 | 10 | 457 | 
| 0<r10 | 25 | 52 | 48 | 19 | 6 | 10 | 12 | 172 | 
| 20<r30 | 9 | 12 | 12 | 14 | 9 | 13 | 6 | 75 | 
| 30<r40 | 8 | 1 | 4 | 10 | 8 | 17 | 4 | 52 | 
| r>40 | 12 | 3 | 3 | 10 | 6 | 105 | 32 | 171 | 
| Total | 353 | 435 | 163 | 80 | 47 | 171 | 124 | 1,373 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Queen Anne’s County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 137 | 28 | 23 | 13 | 15 | 30 | 86 | 332 | 
| 0<r10 | 41 | 64 | 34 | 4 | 3 | 6 | 5 | 157 | 
| 0<r10 | 43 | 42 | 35 | 22 | 9 | 16 | 11 | 178 | 
| 20<r30 | 19 | 9 | 25 | 12 | 9 | 8 | 11 | 93 | 
| 30<r40 | 9 | 1 | 10 | 11 | 10 | 9 | 9 | 59 | 
| r>40 | 48 | 2 | 12 | 18 | 10 | 32 | 43 | 165 | 
| Total | 297 | 146 | 139 | 80 | 56 | 101 | 165 | 984 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Hanover County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 159 | 66 | 25 | 8 | 3 | 8 | 96 | 365 | 
| 0<r10 | 82 | 138 | 36 | 7 | 
				 | 7 | 7 | 277 | 
| 0<r10 | 38 | 76 | 36 | 16 | 
				 | 11 | 9 | 186 | 
| 20<r30 | 18 | 21 | 16 | 8 | 4 | 10 | 8 | 85 | 
| 30<r40 | 11 | 2 | 9 | 8 | 1 | 3 | 1 | 35 | 
| r>40 | 20 | 7 | 8 | 5 | 14 | 45 | 42 | 141 | 
| Total | 328 | 310 | 130 | 52 | 22 | 84 | 163 | 1,089 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Henrico County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 407 | 185 | 118 | 46 | 31 | 38 | 52 | 877 | 
| 0<r10 | 156 | 312 | 150 | 62 | 26 | 17 | 5 | 728 | 
| 0<r10 | 89 | 133 | 173 | 99 | 49 | 25 | 7 | 575 | 
| 20<r30 | 33 | 44 | 83 | 82 | 36 | 25 | 6 | 309 | 
| 30<r40 | 12 | 9 | 41 | 35 | 40 | 33 | 3 | 173 | 
| r>40 | 37 | 6 | 16 | 24 | 30 | 339 | 44 | 496 | 
| Total | 734 | 689 | 581 | 348 | 212 | 477 | 117 | 3,158 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Percent number based on codes 0 to 5.
Table A-5. Number of blocks in MSG and SFI, by block size and percent renter
| 
				 | MSG | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=0<s10 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 878 | 121 | 193 | 67 | 68 | 130 | 438 | 1,895 | 
| 0<r10 | 27 | 14 | 20 | 3 | 3 | 4 | 7 | 78 | 
| 0<r10 | 146 | 38 | 71 | 50 | 27 | 38 | 40 | 410 | 
| 20<r30 | 92 | 23 | 35 | 23 | 15 | 19 | 30 | 237 | 
| 30<r40 | 62 | 4 | 49 | 23 | 19 | 32 | 22 | 211 | 
| r>40 | 168 | 13 | 36 | 36 | 40 | 185 | 171 | 649 | 
| Total | 1,373 | 213 | 404 | 202 | 172 | 408 | 708 | 3,480 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=10<s30 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 728 | 483 | 191 | 77 | 27 | 20 | 50 | 1,576 | 
| 0<r10 | 488 | 524 | 315 | 107 | 45 | 32 | 33 | 1,544 | 
| 0<r10 | 187 | 267 | 271 | 125 | 55 | 33 | 23 | 961 | 
| 20<r30 | 48 | 94 | 133 | 93 | 41 | 33 | 12 | 454 | 
| 30<r40 | 14 | 24 | 48 | 57 | 37 | 42 | 4 | 226 | 
| r>40 | 12 | 8 | 24 | 37 | 46 | 219 | 34 | 380 | 
| Total | 1,477 | 1,400 | 982 | 496 | 251 | 379 | 156 | 5,141 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=30 or more | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 118 | 221 | 41 | 8 | 2 | 5 | 7 | 402 | 
| 0<r10 | 268 | 1,186 | 315 | 105 | 31 | 31 | 4 | 1,940 | 
| 0<r10 | 37 | 339 | 240 | 157 | 45 | 22 | 3 | 843 | 
| 20<r30 | 8 | 46 | 76 | 99 | 47 | 32 | 2 | 310 | 
| 30<r40 | 4 | 6 | 17 | 24 | 42 | 39 | 
				 | 132 | 
| r>40 | 9 | 13 | 16 | 17 | 33 | 873 | 53 | 1,014 | 
| Total | 444 | 1,811 | 705 | 410 | 200 | 1,002 | 69 | 4,641 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental. The variable ‘s’ is the number of occupied housing units in the block.
Percent number based on codes 0 to 5.
Table A-6. Number of blocks in Dunhill and SFI, by percent renter
| 
				 | Dunhill | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 1,632 | 925 | 479 | 117 | 54 | 114 | 486 | 3,873 | 
| 0<r10 | 680 | 2,027 | 589 | 117 | 25 | 58 | 6 | 3,562 | 
| 0<r10 | 452 | 779 | 617 | 169 | 54 | 58 | 50 | 2,214 | 
| 20<r30 | 185 | 230 | 324 | 118 | 47 | 41 | 37 | 1,001 | 
| 30<r40 | 108 | 69 | 145 | 101 | 50 | 57 | 32 | 569 | 
| r>40 | 213 | 47 | 137 | 156 | 179 | 1,023 | 185 | 2,043 | 
| Total | 3,270 | 4,077 | 2,291 | 778 | 409 | 1,351 | 796 | 13,262 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table A-7. Number of blocks in Dunhill and SFI, by state and percent renter
| 
				 | Dunhill | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| MD | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 1,053 | 632 | 335 | 83 | 42 | 88 | 398 | 2,631 | 
| 0<r10 | 421 | 1,514 | 427 | 88 | 19 | 45 | 43 | 2,557 | 
| 0<r10 | 248 | 503 | 428 | 117 | 43 | 48 | 66 | 1,453 | 
| 20<r30 | 100 | 112 | 206 | 87 | 32 | 31 | 39 | 607 | 
| 30<r40 | 60 | 36 | 79 | 77 | 40 | 40 | 29 | 361 | 
| r>40 | 135 | 26 | 75 | 106 | 117 | 755 | 192 | 1,406 | 
| Total | 2,017 | 2,823 | 1,550 | 558 | 293 | 1,007 | 767 | 9,015 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| VA | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 579 | 293 | 144 | 34 | 12 | 26 | 154 | 1,242 | 
| 0<r10 | 259 | 513 | 162 | 29 | 6 | 13 | 23 | 1,005 | 
| 0<r10 | 204 | 276 | 189 | 52 | 11 | 10 | 19 | 761 | 
| 20<r30 | 85 | 118 | 118 | 31 | 15 | 10 | 17 | 394 | 
| 30<r40 | 48 | 33 | 66 | 24 | 10 | 17 | 10 | 208 | 
| r>40 | 78 | 21 | 62 | 50 | 62 | 268 | 96 | 637 | 
| Total | 1,253 | 1,254 | 741 | 220 | 116 | 344 | 319 | 4,247 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table A-8. Number of blocks in Dunhill and SFI, by county and percent renter
| 
				 | Dunhill | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Baltimore County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 743 | 487 | 255 | 58 | 26 | 67 | 217 | 1,853 | 
| 0<r10 | 329 | 1,146 | 335 | 69 | 11 | 32 | 21 | 1,943 | 
| 0<r10 | 168 | 404 | 344 | 96 | 31 | 30 | 30 | 1,103 | 
| 20<r30 | 57 | 93 | 165 | 64 | 24 | 17 | 19 | 439 | 
| 30<r40 | 33 | 29 | 61 | 61 | 29 | 27 | 10 | 250 | 
| r>40 | 62 | 21 | 54 | 85 | 100 | 646 | 102 | 1,070 | 
| Total | 1,392 | 2,180 | 1,214 | 433 | 221 | 819 | 399 | 6,658 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Howard County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 173 | 126 | 46 | 12 | 7 | 9 | 73 | 446 | 
| 0<r10 | 58 | 294 | 63 | 11 | 6 | 10 | 15 | 457 | 
| 0<r10 | 28 | 60 | 48 | 9 | 3 | 10 | 14 | 172 | 
| 20<r30 | 8 | 10 | 21 | 14 | 5 | 11 | 6 | 75 | 
| 30<r40 | 9 | 3 | 10 | 9 | 9 | 8 | 4 | 52 | 
| r>40 | 15 | 3 | 10 | 8 | 13 | 86 | 36 | 171 | 
| Total | 291 | 496 | 198 | 63 | 43 | 134 | 148 | 1,373 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Queen Anne’s County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 137 | 19 | 34 | 13 | 9 | 12 | 108 | 332 | 
| 0<r10 | 34 | 74 | 29 | 8 | 2 | 3 | 7 | 157 | 
| 0<r10 | 52 | 39 | 36 | 12 | 9 | 8 | 22 | 178 | 
| 20<r30 | 35 | 9 | 20 | 9 | 3 | 3 | 14 | 93 | 
| 30<r40 | 18 | 4 | 8 | 7 | 2 | 5 | 15 | 59 | 
| r>40 | 58 | 2 | 11 | 13 | 4 | 23 | 54 | 165 | 
| Total | 334 | 147 | 138 | 62 | 29 | 54 | 220 | 984 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Hanover County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 154 | 64 | 32 | 7 | 5 | 8 | 95 | 365 | 
| 0<r10 | 65 | 147 | 34 | 11 | 3 | 9 | 8 | 277 | 
| 0<r10 | 50 | 69 | 35 | 14 | 3 | 4 | 11 | 186 | 
| 20<r30 | 19 | 22 | 21 | 6 | 1 | 5 | 11 | 85 | 
| 30<r40 | 11 | 2 | 11 | 2 | 1 | 4 | 4 | 35 | 
| r>40 | 24 | 4 | 18 | 11 | 13 | 25 | 46 | 141 | 
| Total | 323 | 308 | 151 | 51 | 26 | 55 | 175 | 1,089 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Henrico County | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 425 | 229 | 112 | 27 | 7 | 18 | 59 | 877 | 
| 0<r10 | 194 | 366 | 128 | 18 | 3 | 4 | 15 | 728 | 
| 0<r10 | 154 | 207 | 154 | 38 | 8 | 6 | 8 | 575 | 
| 20<r30 | 66 | 96 | 97 | 25 | 14 | 5 | 6 | 309 | 
| 30<r40 | 37 | 31 | 55 | 22 | 9 | 13 | 6 | 173 | 
| r>40 | 54 | 17 | 44 | 39 | 49 | 243 | 50 | 496 | 
| Total | 930 | 946 | 590 | 169 | 90 | 289 | 144 | 3,158 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
Table A-9. Number of blocks in Dunhill and SFI, by block size and percent renter
| 
				 | Dunhill | 
				 | ||||||
| SF1 | r=0 | 0<r10 | 10<r20 | 20<r30 | 30<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=0<s10 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 901 | 106 | 192 | 73 | 41 | 96 | 486 | 1,895 | 
| 0<r10 | 33 | 12 | 19 | 4 | 2 | 2 | 6 | 78 | 
| 0<r10 | 186 | 24 | 62 | 46 | 21 | 21 | 50 | 410 | 
| 20<r30 | 86 | 18 | 45 | 23 | 15 | 13 | 37 | 237 | 
| 30<r40 | 75 | 5 | 31 | 24 | 17 | 27 | 32 | 211 | 
| r>40 | 178 | 10 | 48 | 38 | 38 | 152 | 185 | 649 | 
| Total | 1,459 | 175 | 397 | 208 | 134 | 311 | 796 | 3,480 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=10<s30 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 666 | 527 | 260 | 40 | 10 | 17 | 56 | 1,576 | 
| 0<r10 | 498 | 615 | 288 | 73 | 11 | 19 | 40 | 1,544 | 
| 0<r10 | 237 | 339 | 244 | 73 | 18 | 20 | 30 | 961 | 
| 20<r30 | 91 | 128 | 135 | 53 | 19 | 14 | 14 | 454 | 
| 30<r40 | 32 | 49 | 75 | 30 | 19 | 15 | 6 | 226 | 
| r>40 | 27 | 19 | 58 | 49 | 36 | 148 | 43 | 380 | 
| Total | 1,551 | 1,677 | 1,060 | 318 | 113 | 233 | 189 | 5,141 | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| BLKSIZE=30 or more | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| r=0 | 65 | 292 | 27 | 4 | 3 | 1 | 10 | 402 | 
| 0<r10 | 149 | 1,400 | 282 | 40 | 12 | 37 | 20 | 1,940 | 
| 0<r10 | 29 | 416 | 311 | 50 | 15 | 17 | 5 | 843 | 
| 20<r30 | 8 | 84 | 144 | 42 | 13 | 14 | 5 | 310 | 
| 30<r40 | 1 | 15 | 39 | 47 | 14 | 15 | 1 | 132 | 
| r>40 | 8 | 18 | 31 | 69 | 105 | 723 | 60 | 1,014 | 
| Total | 260 | 2,225 | 834 | 252 | 162 | 807 | 101 | 4,641 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental. The variable ‘s’ is the number of occupied housing units in the block.
Table A-10. Number of blocks in CPI and SF1, by percent renter
| 
				 | CPI 0<r40 | CPI r>40 | 
				 | ||||
| 
				 | SF1 r | SF1 r | 
				 | ||||
| 
				 | 0<r40 | r>40 | Missing | 0<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Total | 200 | 21 | 2 | 7 | 52 | 3 | 285 | 
| MD | 104 | 14 | 1 | 1 | 30 | 2 | 152 | 
| VA | 96 | 7 | 1 | 6 | 22 | 1 | 133 | 
| ss=1 | 87 | 16 | 2 | 6 | 38 | 2 | 151 | 
| ss=2 | 55 | 2 | 
				 | 1 | 14 | 1 | 73 | 
| ss=3 | 58 | 3 | 
				 | 
				 | 
				 | 
				 | 61 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The variable ‘ss’ is the number of housing units in the CPI sample.
Table A-11. Number of blocks in CPI and MSG, by percent renter
| 
				 | CPI 0<r40 | CPI r>40 | 
				 | ||||
| 
				 | MSG r | MSG r | 
				 | ||||
| 
				 | 0<r40 | r>40 | Missing | 0<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Total | 186 | 35 | 2 | 11 | 48 | 3 | 285 | 
| MD | 95 | 23 | 1 | 4 | 27 | 2 | 152 | 
| VA | 91 | 12 | 1 | 7 | 21 | 1 | 133 | 
| ss=1 | 81 | 22 | 2 | 9 | 35 | 2 | 151 | 
| ss=2 | 48 | 9 | 
				 | 2 | 13 | 1 | 73 | 
| ss=3 | 57 | 4 | 
				 | 
				 | 
				 | 
				 | 61 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The variable ‘ss’ is the number of housing units in the CPI sample.
Table A-12. Number of blocks in CPI and Dunhill, by percent renter
| 
				 | CPI 0<r40 | CPI r>40 | 
				 | ||||
| 
				 | Dunhill r | Dunhill r | 
				 | ||||
| 
				 | 0<r40 | r>40 | Missing | 0<r40 | r>40 | Missing | Total | 
| 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Total | 194 | 27 | 2 | 18 | 41 | 3 | 285 | 
| MD | 101 | 17 | 1 | 6 | 25 | 2 | 152 | 
| VA | 93 | 10 | 1 | 12 | 16 | 1 | 133 | 
| ss=1 | 86 | 17 | 2 | 11 | 33 | 2 | 151 | 
| ss=2 | 51 | 6 | 
				 | 7 | 8 | 1 | 73 | 
| ss=3 | 57 | 4 | 
				 | 
				 | 
				 | 
				 | 61 | 
NOTES: The variable ‘r’ is the percentage of the occupied housing units in the block that are rental.
The variable ‘ss’ is the number of housing units in the CPI sample.
Evaluation of vendor ListS:
Comparison of Vendor and Interview data
Final Report
Submitted to:
Bureau of Labor Statistics
2 Massachusetts Ave, NE
Washington D.C 20212
Submitted by:
Westat
1650 Research Blvd.
Rockville, MD 20850
February 26, 2003
Table of contents
Section Page
1 introduction 1
2 METHODS 1
2.1 Sample Design 2
2.2 Questionnaire 5
Interviewer Training 5
Data Collection 5
Weighting and Estimation of Variances 6
Interviewing Results 10
3 RESULTS 11
3.1 Quality of Contact Information 11
3.2 Quality of Tenure Status 12
4 Discussion 17
List of Appendices
Page
A supplementary tables A-1
B Telephone questionnaire B-1
C in-person questionnaire c-1
d training agenda d-1
e advance letter e-1
List of Tables
Table Page
1 Universe Totals for Sample Frame 19
2 Final Result by Mode of Interview 20
3 Final Result by Match Between Vendors 21
4a Final Result by MSG Tenure Status 22
4b Final Result by Collapsed MSG Tenure Status 23
5 Final Result by Dunhill Tenure Status 24
6 Final Result by Percent Renters on Block 25
7 MSG Tenure Status by Survey Report of Tenure Status 26
Table of contents (Continued)
List of Tables (Continued)
Table Page
8 MSG Tenure Status by Survey Report of Tenure Status for the Two Sample Groups 27
9 Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for the Two Sample Groups 28
10 MSG Tenure Status by Survey Report of Tenure Status for Mode Groups 30
11 Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Mode Groups 31
12 MSG Tenure Status by Survey Report of Tenure Status for Percent Renters on Block 33
13 Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Percent Renters on Block 35
14 MSG Tenure Status by Survey Report of Tenure Status for Each County 39
15 Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Each County 42
The Office of Prices and Living Conditions of the Bureau of Labor Statistics (BLS) is exploring the use of purchased lists of addresses to enhance or replace the in-person listing and screening processes used to identify renters in the Consumer Price Index (CPI) Housing Survey. If a list can be used to identify renters, it is anticipated that there could be some cost savings associated with the CPI survey. Perhaps more importantly, it may provide a way to enhance the survey’s ability to identify renters in high owner areas --- a difficult problem for the CPI in the past.
To conduct the research, we purchased two lists of housing units for selected counties in the Baltimore and Richmond Metropolitan Statistical Areas. The lists were purchased from Marketing Systems Group (MSG) and Dunhill International. The MSG list cost $15,000 and the Dunhill list was $11,000. Both the MSG and Dunhill lists use U.S. Postal delivery addresses as the base and append additional data from other sources. The prime data source for the MSG list is Info USA and Dunhill uses Knowledgebase.
A previous report provided information on the accuracy of these two lists at a block level (Westat, 2002). The purpose of this report is to describe an evaluation of each list at an individual housing unit level. The next section provides an overview of the methods used to collect the data. The third section describes the results of the evaluation and the final section summarizes the results.
This project focussed on evaluating the quality of information on tenure status provided by the MSG and Dunhill lists (hereafter referred to as “tenure status”). This variable is being considered for use sampling for the CPI Housing survey. There is interest to use the variable to improve the efficiency of identifying renters in high-owner neighborhoods. To conduct the evaluation, a sample of housing units was drawn in selected Maryland and Virginia counties. These states were chosen because they were relatively close to Westat, which allowed for some cost savings when conducting the interviews. In Baltimore, the counties were Baltimore County, Howard County and Queen Anne’s County. In Virginia, the counties were Hanover County and Henrico County.
The sample was selected by dividing the households into those with and those without a telephone number, as indicated by the vendor for each unit. Samples were drawn from each of these groups with the intent of having field interviewers make attempts to contact and collect information on the ownership status of the household. The field interviewers were instructed to conduct a telephone interview for those units that had a telephone number and conduct an in-person interview with those where no telephone number was provided.
A short, 10-item questionnaire was administered to each household. The primary focus of the interview was to assess tenure status. These data were then used to assess the quality of the information provided by the vendor files.
The first step in sampling for the project was to create a sampling frame for selecting households. The sampling frame was created from the data files from the vendors (MSG and Dunhill) for the two selected MSAs, but was further restricted as described below. The purpose of limiting the coverage of the frame further in the following way: (1) only blocks where the Census reported between 0 and 40% renters were included (blocks with 0 renters were excluded); (2) only households in the MSG file were included (if a household was only identified from the Dunhill file it was excluded); (3) housing units in the MSG file with a tenure value from ‘0’ to ‘8’ were included (households with a tenure value of ‘9’ on the MSG file, is most likely to be an owned unit, were excluded); (4) housing units with no telephone number listed were included only those in block groups with a large enough number of housing units. Also households with no telephone in and Queen Anne’s county in Maryland were excluded.
These restrictions were imposed to efficiently estimate the reliability of the vendor files in blocks that had a high proportion of owners. In addition, these restrictions kept the costs of the data collection to an acceptable level.
The initial step of frame construction was matching the data files from Dunhill and MSG at the housing unit level. This was done using matching software. The matching specifications required both units to be in the same census block (thus, state, county, tract, and block had to be identical). Matching then took place using address, name and telephone number from the two vendor files. Priority in the statistical matching was given to units whose address in both files matched exactly. Records with similar addresses and the same telephone or name were also considered a match.
From this matching, three files were created corresponding to records that appeared on both MSG and Dunhill (M&D), MSG only (M-o), and Dunhill only (D-o). As noted above, units from the D-o file were eliminated from the sampling frame, because the project team was unsure of the quality of the information on the Dunhill file at the time of sampling. Table 1 provides the number of households in each of the groups (M&D; M-o; D-o). The M&D and the M-o files were concatenated at this point.
The next step was to delete housing units from the frame when the MSG variable indicating ownership status had the highest possible value of ‘9’ (indicating a high degree of certainty that unit was owned). The reason for this restriction was that previous analysis by BLS found that the ‘9’ value was highly reliable, in the sense that it matched closely with actual data from the CPI survey. The second row in Table 1 provides the universe totals after taking out records where the MSG tenure variable had a value of ‘9’. Eliminating these units substantially changes the composition of the universe. Over half (65%) of the records are eliminated from the universe. The cases deleted are disproportionately in the group where the two vendor files match one another.
The next step was to stratify the housing units by the presence of a telephone number (although we sometimes refer to these as telephone and nontelephone strata, the stratification is based on whether the vendor file contained a telephone number for the household rather than whether the household actually had a telephone). The third and fourth rows of Table 1 give the universe counts in the two strata. The last step in creating the sampling frame was to exclude from the nontelephone strata those housing units that were either in Queen Anne’s county or were in block groups with 11 or fewer households. The next to last row in Table 1 provides the count for the restricted nontelephone stratum and the last row gives the final universe totals overall (across both telephone and nontelephone strata).
The telephone stratum was further partitioned by county and tenure status (owned or rented). If the MSG tenure value was less than 6 the household was classified as a renter and if it was greater than 5 it was classified as an owner. Essentially, the telephone stratum was divided into 10 strata (the five counties crossed by the two tenure status). The sample was then allocated to each stratum with the goal of obtaining 600 completed telephone interviews after accounting for response rates and eligibility rates. The sample size allocated was the same for each tenure status in a county (e.g., in Baltimore 243 cases were sampled from the owned stratum and 243 cases were sampled from the rented stratum). The total sample size selected by county was: 486 in Baltimore, 202 in Howard, 250 in Queen Anne’s, 142 in Hanover, and 420 in Henrico. Within the 10 strata the households were selected with simple random sampling. Originally, a subsample of 500 of the total 1,500 sampled cases were selected to be a reserve sample. Later, due to lower than anticipated yields, the entire telephone reserve sample was released.
The design of the nontelephone sample differed from the simple stratified design of the telephone stratum in order to reduce data collection costs. A two-stage sample design was used, where 16 block groups were selected from Maryland (across Baltimore and Howard counties) and 10 block groups were selected from Virginia (Hanover and Henrico counties). The block groups were sampled within each state using probabilities proportional to the number of households in the block group using systematic sampling after sorting the block groups by county.
In the second stage of sampling, households were sampled from the sampled block groups. As in the telephone sample, households were first classified as owned or rented and an equal size sample of 8 owned households and 8 rented households were selected from each block group by simple random sampling. If the block group had fewer than 8 owned (or rented) households, then all of the units in that tenure were included in the sample. Overall, 401 households were sampled (15 fewer than would have been selected if each block group had enough units of the appropriate tenure). The final step was to include about one-third of the cases into the reserve sample. To do this, the sample cases were sorted by state, block group, and tenure status and an equal probability, systematic sample was selected. Of the 401 cases only 268 were released to the field and the remaining 133 were reserve cases that were never released.
As noted above, the initial sample released for interviewing was 1,268 households, 1,000 from the telephone stratum and 268 without a telephone number. Midway through the field period the lower yield rates required releasing the additional 500 cases with telephone numbers to achieve the targeted number of 800 completed interviews. Thus, the final sample size was 1,768 households.
Based on questions proposed by BLS, Westat developed two questionnaires – one for the telephone cases (Appendix B) and one for the in-person cases (Appendix C). They differ in that the telephone version verifies the respondent’s address, and also asks if the address is for a business. After identifying a respondent who lives in the home and is at least 18 years old, the survey asks whether the home is rented or owned, and how much the rent or mortgage is.
Seven field interviewers and one field supervisor were trained. Interviewers received 10 hours of training – four home study and 6 in-person. In-person training was held October 14, 2002 at Westat’s Rockville offices. Roughly one week before that date, interviewers were sent a general interviewer training manual that describes Westat interviewing procedures. Interviewers were asked to read this manual and complete test questions about the material. The in-person session consisted of lecture and role plays. Topics included an overview of the project and purpose of the survey; administering the questionnaires; contact procedures, and administrative procedures. The training agenda appears in Appendix D. The field supervisor attended the same training session as the field interviewers, but also met with project staff separately to discuss supervisory tasks and responsibilities.
Data collection began October 15, 2002. The field period was originally scheduled to end November 26, 2002. However, data collection was extended to December 31, 2002 in order to ensure that the second sample release received sufficient contact attempts.
The field supervisor held weekly one-on-one telephone conversations with each interviewer to review outstanding cases, discuss any unusual situations the interviewer may have encountered, and advise the interviewer of any procedural updates. In turn, the field supervisor participated in weekly conference calls with Westat project staff to report case status and any outstanding interviewer issues.
All sampled persons were sent an advance letter (Appendix E) on BLS letterhead. The letter notified them that the study would be taking place, and that they would be contacted by an interviewer for a very brief interview. All sampled persons who completed an interview were sent a thank you letter and brief self-administered survey, the purpose of which was to validate that they had completed the interview as reported by the field interviewer.
About halfway through data collection, it became evident that the field interviewers were having difficulty with the telephone cases. Therefore, on November 11, 2002, the Westat Telephone Research Center (TRC) began calling the telephone cases which field interviewers had been unable to contact. In addition, the reserve sample consisted of telephone cases only, and those were assigned directly to the TRC. Calls to reserve sample households began on November 25, 2002.
For the field cases, we sent a short survey to all households that completed an interview, asking them to verify that a field interviewer had contacted them to ask questions about their ownership status and mortgage or rental amount. For a sample of 10 percent of the telephone cases, the field supervisor placed a follow up call to the household if the survey was not returned.
2.5 Weighting and Estimation of Variances
The tables in this report contain estimates from the survey. The estimates are based on the weighted counts of the number of households with different characteristics, where the weights account for sampling from the frame and contain an adjustment for nonresponse. The first step of weighting was to produce a baseweight that is the inverse of the probability of selecting the household from the frame. Note that these base weights do not contain any adjustments for households that were eliminated from the sampling frame as described above.
In the 10 telephone strata (defined above as the cross of county and tenure status), the households were selected by simple random sampling. Thus, the inverse of the probability of selection for every household within a county by tenure status is equal to the number of units in the sampling frame in that stratum divided by the number sampled. The weight can be written as
 
where Nt,hi is the number of households in the sampling frame in telephone stratum in county h and tenure status i, nt,hi is the number of sampled households in telephone stratum in county h and tenure status i.
In the nontelephone stratum, the baseweight is the product of the two stages of sampling. The probability of selecting a block group in a state is proportional to Sj/S., where Sj is the number of households in block group j and S. is the sum of the Sj over all the nontelephone households in the sampling frame. At the second stage, the probability of selecting a household in tenure status k in sampled block group j is 8/Tjk where Tjk is the number of households in block group j that are in tenure status k. Thus, the baseweight is the product of the inverse of these two terms and can be written as
 
where p is the number of block groups selected in the state. Since the reserve sample was not released for field work, the final baseweight is the number of households released in tenure status k rather than the fixed number 8 presented above.
If responses had been obtained for all sampled households, estimates using the baseweights would be unbiased. However, a nonresponse adjustment to the baseweights was used to compensate because some households did not respond. All the sampled and released households were divided into three categories (respondents, nonrespondents and ineligibles) based on their participation in the survey. These three categories are described below.
Category 1: Respondents. This group consists of all eligible sample units that participated in the survey.
Category 2: Nonrespondents. This group consists of all eligible sample units that did not provide substantially complete and usable survey data.
Category 3: Ineligibles. This group consists of all sample units that were ineligible or out of scope for the survey.
To reduce the bias of the estimates, the nonresponse adjustments were computed for households within adjustment classes or cells that were relatively homogeneous with respect to response rates. The cells were formed by examining the response rates for several characteristics that were available from the sampling frame. The table below defines the non-response adjustment cells.
| Cell | Mode | State | MSG Tenure Value | A(nr) | 
| 1 | Telephone number | MD | 0,1,2,3 | 1.70186 | 
| 2 | Telephone number | MD | 4,5,6 | 2.21435 | 
| 3 | Telephone number | MD | 7,8 | 1.63997 | 
| 4 | Telephone number | VA | 0,1,2,3 | 1.37690 | 
| 5 | Telephone number | VA | 4,5,6 | 1.56302 | 
| 6 | Telephone number | VA | 7,8 | 1.45393 | 
| 7 | No Telephone number | MD | ALL VALUES | 1.20481 | 
| 8 | No Telephone number | VA | ALL VALUES | 1.13723 | 
Within each of the 8 cells, the baseweight was multiplied by a nonresponse adjustment factor. The nonresponse adjustment factor is the ratio of the sum of the baseweights for respondents and nonrespondents to the sum of the baseweights for the respondents. The nonresponse adjustment factor for cell c can be written as
			 
where
 is baseweight for household i
(omitting the subscripts), Rc
is the set of responding households in cell c,
and Nc
is the set of nonresponding households in cell c.
The factors are given in the table above. The nonresponse adjusted
weight for household i
(
is baseweight for household i
(omitting the subscripts), Rc
is the set of responding households in cell c,
and Nc
is the set of nonresponding households in cell c.
The factors are given in the table above. The nonresponse adjusted
weight for household i
( )
is
)
is
			 if
	if
 and is a responding household,
and is a responding household,
			 otherwise.
	otherwise.
These nonresponse adjusted weights are used in the report.
In order to compute the precision of the estimates, weights that can be used with replication variance software were also created. A total of 126 replicate weights using the following procedures. A total of 12 variance strata were created, 10 were for the telephone strata (defined by county and tenure status) and 2 were for the nontelephone strata (defined by state). Within each of the 10 telephone strata, the sampled households were sorted in the order of selection and systematically assigned to variance units labelled 1 to 10 (thus there were 10 variance units in each of the 10 variance strata for the telephone strata. In the nontelephone strata, the households in the same primary sampling unit (the block groups) were assigned to the same variance unit (thus there were 16 variance units in Maryland and 10 in Virginia). Consequently, every sampled household was assigned to one of 12 variance strata and one of either 10 variance units (for the telephone strata) or 26 variance units (for the nontelephone households).
Using this structure, the replicate weights were created using a stratified jackknife procedure in a standard fashion. Within a variance stratum the replicate baseweight was created by deleting the base weight for all households in the same variance unit (by making it zero) and increasing the baseweight for the other households in the same variance stratum. The weights for the other strata are not altered. Since there were 10 variance strata with 10 variance units in the telephone strata, this results in 100 replicate weights. The remaining 26 replicate weights are associated with the 16 and 10 variance units in the nontelephone strata. For example, consider the first replicate weight which is associated with the first telephone strata and variance unit. Replicate weight 1 for all households in this first telephone strata are set to zero if they are for variance unit 1. Replicate weight 1 for households from the same telephone stratum but a different variance unit are the household’s baseweight times 10/9 (the number of variance units in the stratum divided by one less than this number). Replicate weight 1 for all other households that are not in the first variance stratum are equal to their baseweight. The process for the other replicate weights follows the same procedure.
Since
the weights were adjusted for nonresponse, the same nonresponse
adjustment method was used to create replicate nonresponse adjusted
weights. Essentially, the nonresponse adjustment factors 
 were first computed for each of the 126 replicate weights and then
these adjustments were multiplied by the corresponding base replicate
weights to produce the nonresponse adjusted replicate weights.
were first computed for each of the 126 replicate weights and then
these adjustments were multiplied by the corresponding base replicate
weights to produce the nonresponse adjusted replicate weights.
The precision of the estimates were then be computed using these replicate weights. We used WesVar (version 4) and the JKn method (this corresponds to the stratified replication method described above). The variance of an estimate using this replication method can be written as
	 
where
 is
the replicate estimate for stratum h
when variance unit i
is deleted (computed using replicate nonresponse adjusted weight i)
and
is
the replicate estimate for stratum h
when variance unit i
is deleted (computed using replicate nonresponse adjusted weight i)
and 
 is the estimate from stratum h
using the full sample nonresponse adjusted weight.
is the estimate from stratum h
using the full sample nonresponse adjusted weight. 
Table 2 shows final result codes for the total sample of 1768 and by mode of interview. If one includes “wrong or nonexistent address” as a complete, then the overall response rate is 69.7%, with 86.1% in-person and 66.2% on the telephone. These rates exclude businesses and non-working numbers from the denominator. If one excludes the bad addresses from both the numerator and denominator, the overall response rate is 66.6%, with 59.2% on the telephone and 84.9% for the in-person component.
There is some variation in response rates across different types of units. For example, the response rate varied by whether the two vendor files matched on the address (Table 3), with those units that matched having a higher response rate than those that didn’t match (66.6% vs. 58.3%). Table 4a shows the response rates by the full set of MSG tenure codes. Table 4b collapses the tenure codes using “0-2” to be renters and “3-8” as owners. When doing this, owners have a higher response rate than renters (64.8% vs. 55.5%). Interestingly, the opposite seems to be the case when using the Dunhill tenure indicator (Table 5), although these data only apply to units that matched with the MSG file. There is no discernable pattern in the response rates by the percent of renters on the block (Table 6).
Two different types of analyses were conducted. The first examined the quality of the contact information provided on the files. The second looked at the quality of the tenure information. The latter was assessed by comparing the tenure status provided by the two different vendor files to what was collected when completing the interview.
Both the address and the telephone number for the sampled unit were provided in the files. To evaluate the address information, the results listed in Table 2 were used. For the in-person interviews, these data were used to calculate the proportion of housing units where there was an incorrect address. There were a total of 268 in-person sample units. In all cases, the interviewer was in a position to either confirm or reject the address provided by the MSG file. Using these data, it is estimated that approximately 8.2 percent of the addresses in the file do not exist. This takes the 22 cases where a wrong address was identified and divides it by 268.
The telephone portion of the sample provides information on the quality of the telephone numbers provided in the files. One measure of quality is the number of telephone numbers that are for a business. In addition to the code for a business, the interview results used in this calculation are the non-working numbers and completed interviews. None of the other result codes could be used to determine whether the unit called was a business or not. From this, it is estimated that approximately 4 percent were for a business. A second measure of quality is the proportion of non-working numbers. There were a total of 1500 numbers in this portion of the sample. Of these, 186 were non-working or 12.4 percent.
Finally, interviewers asked each telephone respondent whether the address provided by the vendor was the same as the address of the respondent. For 220 cases, the respondent noted the address was different. The denominator for this proportion includes all those cases where it could be determined that the address was correct, including the businesses and the completed interviews. Using these as a base, approximately 25 percent of the telephone numbers were for the wrong address.
Tables 7 through 15 provide a comparison of the information that was obtained during the interview with that indicated on each of the two vendor files. Each of these tables presents the percent of cases comparing the two measures. The percentages were calculated with data that were weighted as described above in section 2.5. Standard errors and significance tests were estimated using the procedure described above. The standard error for the percentages were suppressed if the denominator of the percentage was based on less than 20 unweighted cases.
Many of the comparisons are in the form of a 2 x 2 tables comparing the tenure status of the data file and the survey report. For these tables, two different statistics are presented. One is the Gross Difference Rate (GDR). This is a measure of disagreement between the vendor and interview. It is interpreted as the total number of households that were misclassified by the vendor record (assuming the interview is correct). For each 2x2 table of the counts:
| 
				 | 
				 | Interview | |
| 
				 | 
				 | Own | Rent | 
| Vendor | Own | a | b | 
| Rent | c | d | |
the GDR is computed as:
GDR = (b+c)/n. Where n=a+b+c+d
The 2x2 tables presented below are expressed as weighted percentages of the total. Also included in the tables are the standard errors of this percentage, the weighted and unweighted counts. One can compute the GDR from these percentages as the sum of the two off-diagonal elements (b+c).
The second measure listed in these table is the Net Difference Rate (NDR). This is a measure of the net bias in the vendor data . It is computed by:
NDR=(b-c)/n.
This can be computed from the tables discussed below by subtracting the off-diagonal elements, again using the weighted percentages.
Table 7 compares the full MSG tenure status variable with the survey report. The MSG data have a total of 9 values, ranging from 0 to 8. The low value (0) represent units that the source is most confident as being a renter, while the high value is where the source is most confident to be an owner.8
As can be seen from Table 7, the number of cases for the low values of the MSG tenure variable are quite small. This is partly because the blocks were selected from the sample that had a relatively high percentage of owners. The relationship between the survey report and the MSG tenure status is statistically significant (chi-square = 26, df=8; p<.0001). As the MSG values get higher, there is a greater chance that the survey report will be an “owner”. The match with the survey report seems to have a point of inflection where MSG is equal to ‘3’. In this case, 73.1% reported on the survey to be owning. This compares to values of less than 40% for the 0-2 values.
Table 8 provides these data broken out by the matching status between the two vendor files. The pattern evident on Table 7 holds for the sample where the MSG and the Dunhill file match. The pattern is slightly different for the sample where there are no matches. For this portion of the sample, there seem to be many more owners, with only one value having a majority of self-reported renters (tenure status = 4).
Table 9 provides the comparison for each of the vendor files. To make this comparison, the MSG tenure variable was collapsed into two categories. Renters were defined as those with values 0-2, while owners were defined with values 3-8. These values were selected because of the pattern noted in Table 7 above, which found a point of inflection in self-reported ownership between the “2” and “3” value. Table 9 breaks out the comparison by whether or not there was a match between the two vendor files. Where there was a match, the MSG has a GDR value of 21.9, indicating that these data did not match the survey report 21.9% of the time. A large portion of the disagreement was when the MSG indicated the household was being owned, but the interview file indicated it was actually rented. The latter bias is quantified by a large positive NDR of 16.9%. That is, the error rate was higher for vendor identified owner units than renter units. This NDR is significantly different from zero, as indicated by the relatively small standard error (2.7).
A similar pattern emerges when doing this comparison for the MSG data that did not match to the Dunhill file. These data are less likely to correspond to the survey reports at all, with a GDR of 37.2%. This is significantly different from the group that did match (difference in GDR’s = 15.3; p<.05). The NDR for the group that did not match (30.6%) is in the same direction for the group that did match and is quite a bit larger (difference in NDR’s = 13.7; p<.05).
A different pattern emerges for the Dunhill file. Of the households identified as owners by this source, 28.5% disagreed on tenure status, with the largest error being when Dunhill indicated a renter, but the interview listed the house as being owned. In other words, the NDR for the Dunhill file is in a different direction than for the comparable MSG data. This is further reinforced when comparing the percent of units identified as an owner by the vendor that were consistent with the survey information. The percent of units identified by the MSG as a renter is 71.9% (6.4/8.9 = 71.9), compared to 45.7% for the Dunhill file (14.5/31.7 = 45.7). As shown at the bottom of the table, this difference of 26.2% (71.9 – 45.7 = 26.2) is statistically significant (z=3.4).
More generally, the pattern of the GDR’s across the vendor data-sets indicates that the MSG data are slightly more accurate than the Dunhill data (p<.05). The opposite is true for the NDRs (p<.05).
The difference in the direction of the NDR’s is partly a function of the way the MSG tenure status variable was created. If one splits the MSG data by defining an “owner” using a higher value of this variable, then the NDR becomes negative (like Dunhill). When doing this, however, the GDR also goes up. For example, using the values of “0-5” to define a “renter” on the MSG file increases to the GDR to approximately 40% and shifts the NDR to -17%.
The comparison between the two data sources is complicated by the way the sample was selected. As noted in section 2.5, the units the MSG file identified as “certainty” owners (MSG code = 9) were eliminated from the frame. Taking these out of the current analysis, then, should lead to an overstatement of the total error in the files. In particular, it may drive the MSG file to a higher level of error, since the sampling was done after the most accurate records on the file had been eliminated.
The match between the vendor and interview data does vary slightly by the mode of the interview (Table 10). There is a consistent tendency for the in-person households to have a slightly higher percentage of owners across all groups. However, only one of these differences is statistically significant at conventional levels (value of “3” – 86.6 vs. 59.7; p<.05).
Comparison between the vendor files for the different modes is shown in Table 11, which restricts the sample to only those cases where the MSG and Dunhill files matched each other.9 For the Dunhill data, the GDR’s are 31.6 and 25.4 for the in-person and telephone respectively. Similar, but slightly lower, GDR’s occur for the MSG file (21.3 and 22.5). Neither of the differences in the GDRs between vendors is statistically significant at conventional levels, although the in-person difference approaches significance (z=1.7; p<.10). For the Dunhill telephone data, there appears to be very little or no net bias. The owners identified by Dunhill are about as equally likely to be mis-identified as the renters, as indicated by the NDR of -.3. It is not clear whether this is an effect of mode of interview or sample design. The households that had a telephone number in the vendor files were assigned to the telephone mode, while those without a telephone number were assigned to in-person interviewing. Mode and type of unit, therefore, are confounded in this comparison.
As with the total sample, the MSG file does a better job of identifying renters than the Dunhill file for both modes of interviews. The difference in the percent of renters identified by the MSG and Dunhill that have a survey indicating the unit is rented is 23.6% and 22.0% for the in-person and telephone, respectively (bottom of each panel of Table 11). These are significant at p<.10 (z=1.7, 1.8).
For the MSG data, the GDR are directly correlated with the percent of persons renting on the block (Table 13). For the MSG data, the GDRs range from 14.6 to 42.9 as one moves from low renter (0% - 10%) to high renter (31% - 40%). This pattern is not evident for the Dunhill file. The range of GDRs is from 25.8 (0% - 10% renter) to 29.0 (31% - 40% renter).
For the NDR’s, the MSG data also show a direct relationship with the percent of renters on the block, although it is not quite as strong as found for the GDRs. The NDRs range from a low of 11.8 in the 0% - 10% blocks to 49.8 in the 31% - 40% blocks. The Dunhill data do not show as strong a pattern in this direction. The highest NDR is for the block with the most renters (19.6), but this has a relatively high standard error. This estimate is not statistically different from 0, using a 95% confidence interval.
The MSG data identify renters better than the Dunhill data for the low-renter blocks (0% - 10% renters). The percentage of renters identified by each vendor that is consistent with the survey 23.6% higher for the MSG data, which is statistically significant (z=2.2). The direction of this difference is the same for the other types of blocks, but only approaches statistical significance for the blocks with the most renters (31% - 40% renters; difference = 12.7; z=1.6).
The final set of tables are for the individual counties. The data disaggregated by the full MSG tenure variable have very small sample sizes, which makes it difficult to interpret (Table 14). For the collapsed MSG variable (Table 15), there is no clear pattern across the counties. For the GDRs, for example, no one county stands out as being particularly accurate or inaccurate. While there is more variation for the NDRs, these have larger standard errors and many are not significantly different from 0.
How one views the quality of the information from the two vendor files depends on how they will be used. Assuming an in-person contact for all addresses, the data described above indicate that approximately 8 percent of the addresses will not exist. For the MSG file, between 21 percent and 37 percent (depending on the matching status to the Dunhill file) of the records indicating tenure status are in error, as judged by the tenure status reported during the interview. The overall bias depends on how the MSG tenure-status variable is collapsed. The approach taken above was to use a relatively low value of the MSG tenure status variable to serve as a cutoff for designating a unit as an owner. Under this scenario, the bias tended to be greatest among those that the vendor identified as an owner, but turned out to be a renter. However, if one uses a larger value of the MSG tenure variable to define an “owner” (e.g., “5”), then the bias is in the opposite direction (i.e., larger for units identified by the vendor as “renters”). When switching coding schemes in this way, the gross difference rate goes up as well (e.g., from around 25 to 40).
The Dunhill file exhibited higher gross difference rates and lower net difference rates than the MSG file. The direction of the net difference rates were also different across the two files, at least as defined in the above tables. The NDR for the MSG file tended indicate the error was largest for units identified as owners, while for the Dunhill file the error tended to be greatest for units identified as renters. If one plans on using the file to identify renters, therefore, the MSG file is preferable.
For the MSG file, there was a correlation between the percent of renters on the block and the accuracy of the information. The more renters on the block, the greater the error. In addition, the direction of this error seemed to vary, with the “owner” designation of the MSG file being in error more often as the proportion of renters on the block increased. This pattern was not evident for the Dunhill file. Neither the gross or net error rates varied systematically by the percent of renters on the block. The MSG file seemed to best at identifying renters, when compared to the Dunhill file, for those blocks that had the fewest renters. If the greatest interest is to identify renters on those blocks where there are mostly owners, then the MSG file seems to perform the best relative to the Dunhill file.
Table 1. Universe Totals for Sample Frame.
| 
				 | MSG & Dunhill Match | MSG only | Dunhill only | Total | 
| TOTAL | 266,795 | 63,083 | 29,898 | 359,776 | 
| Total without MSG Tenure = 9 | 65,435 | 30,045 | 
				 | 95,480 | 
| Telephone number | 35,953 | 15,101 | 
				 | 51,054 | 
| No telephone number | 29,482 | 14,944 | 
				 | 44,426 | 
| No telephone number without Queen Anne’s County | 28,300 | 14,393 | 
				 | 42,693 | 
| Final Sample Frame | 64,253 | 29,494 | 
				 | 93,747 | 
Table 2. Final Result by Mode of Interview.
| Result Code | Total Sample | Telephone | In Person | 
| Telephone Interview Completed | 629 | 629 | NA | 
| In-person Interview Completed | 208 | NA | 208 | 
| Business | 33 | 32 | 1 | 
| Final Breakoff | 2 | 2 | 0 | 
| Final Other | 18 | 3 | 15 | 
| Final Refusal | 203 | 190 | 13 | 
| Language Barrier | 6 | 6 | 0 | 
| Maximum Telephone Call Attempts (12) Reached | 80 | 80 | NA | 
| No Answer | 156 | 150 | 6 | 
| No Eligible Respondent Found | 5 | 2 | 3 | 
| Non-working Telephone Number | 186 | 186 | NA | 
| Wrong Address/No Such Address | 242 | 220 | 22 | 
| Total | 1768 | 1500 | 268 | 
| High response rate+ | 69.7% | 66.2% | 86.1% | 
| Low response rate* | 64.0% | 59.2% | 84.9% | 
+ = (Completes + Wrong Address) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes + Wrong Address)
* (Completes) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes)
Table 3. Final Result by Match Between Vendor Files.
| Result Code | Address Match Between Vendors | No Address Match Between Vendors | 
| Telephone Interview Completed | 454 | 175 | 
| In-person Interview Completed | 148 | 60 | 
| Business | 18 | 15 | 
| Final Breakoff | 1 | 1 | 
| Final Other | 12 | 6 | 
| Final Refusal | 145 | 58 | 
| Language Barrier | 3 | 3 | 
| Maximum Telephone Call Attempts (12) Reached | 49 | 31 | 
| No Answer | 90 | 66 | 
| No Eligible Respondent Found | 2 | 3 | 
| Non-working Telephone Number | 103 | 83 | 
| Wrong Address/No Such Address | 140 | 102 | 
| Total | 1165 | 603 | 
| High response rate+ | 71.1% | 66.7% | 
| Low response rate* | 66.6% | 58.3% | 
+ = (Completes + Wrong Address) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes + Wrong Address)
* (Completes) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes)
Table 4a. Final Result by MSG Tenure Status.
| 
				 | Rent | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | Own | 
| Result Code | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 
| Telephone Interview Completed | 
				 8 | 
				 6 | 
				 29 | 
				 76 | 
				 79 | 
				 69 | 
				 62 | 
				 130 | 
				 170 | 
| In-person Interview Completed | 0 | 4 | 9 | 16 | 33 | 40 | 32 | 48 | 26 | 
| Business | 0 | 1 | 1 | 4 | 8 | 7 | 4 | 6 | 2 | 
| Final Breakoff | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 
| Final Other | 0 | 0 | 2 | 1 | 2 | 7 | 2 | 3 | 1 | 
| Final Refusal | 2 | 4 | 11 | 13 | 26 | 36 | 23 | 40 | 48 | 
| Language Barrier | 0 | 2 | 0 | 0 | 0 | 2 | 1 | 0 | 1 | 
| Maximum Telephone Call Attempts (12) Reached | 
				 0 | 
				 1 | 
				 5 | 
				 7 | 
				 13 | 
				 15 | 
				 6 | 
				 16 | 
				 17 | 
| No Answer | 1 | 2 | 13 | 12 | 28 | 21 | 22 | 31 | 26 | 
| No Eligible Respondent Found | 1 | 0 | 1 | 0 | 0 | 0 | 2 | 0 | 1 | 
| Non-working Telephone Number | 3 | 1 | 15 | 31 | 37 | 27 | 20 | 34 | 18 | 
| Wrong Address/ No Such Address | 2 | 9 | 26 | 32 | 38 | 44 | 32 | 30 | 29 | 
| Total | 17 | 30 | 112 | 192 | 264 | 269 | 206 | 339 | 339 | 
| High response rate+ | 71.4% | 67.9% | 66.7% | 79.0% | 68.5% | 65.1% | 69.2% | 69.6% | 70.5% | 
| Low response rate* | 66.7% | 52.6% | 54.3% | 73.6% | 61.9% | 57.1% | 62.7% | 66.2% | 67.6% | 
+ = (Completes + Wrong Address) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes + Wrong Address)
* (Completes) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes)
Table 4b. Final Result by Collapsed MSG Tenure Status.
| Result Code | Own | Rent | 
| Telephone Interview Completed | 586 | 43 | 
| In-person Interview Completed | 195 | 13 | 
| Business | 31 | 2 | 
| Final Breakoff | 2 | 0 | 
| Final Other | 16 | 2 | 
| Final Refusal | 186 | 17 | 
| Language Barrier | 4 | 2 | 
| Maximum Telephone Call Attempts (12) Reached | 74 | 6 | 
| No Answer | 140 | 16 | 
| No Eligible Respondent Found | 3 | 2 | 
| Non-working Telephone Number | 167 | 19 | 
| Wrong Address/ No Such Address | 205 | 37 | 
| Total | 1609 | 159 | 
| High response rate+ | 69.9% | 67.4% | 
| Low response rate* | 64.8% | 55.5% | 
+ = (Completes + Wrong Address) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes + Wrong Address)
* (Completes) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes)
Table 5. Final Result by Dunhill Tenure Status.
| Result Code | Own | Rent | 
| Telephone Interview Completed | 337 | 117 | 
| In-person Interview Completed | 97 | 51 | 
| Business | 9 | 9 | 
| Final Breakoff | 1 | 0 | 
| Final Other | 8 | 4 | 
| Final Refusal | 121 | 24 | 
| Language Barrier | 2 | 1 | 
| Maximum Telephone Call Attempts (12) Reached | 34 | 15 | 
| No Answer | 64 | 26 | 
| No Eligible Respondent Found | 2 | 0 | 
| Non-working Telephone Number | 65 | 38 | 
| Wrong Address/No Such Address | 86 | 54 | 
| Total | 826 | 339 | 
| High response rate+ | 69.2% | 76.0% | 
| Low response rate* | 65.2% | 70.6% | 
+ = (Completes + Wrong Address) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes + Wrong Address)
* (Completes) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes)
Table 6. Final Result by Percent Renters on Block.
| Result Code | 0-10% | 11-20% | 21-30% | 31-40% | 
| Telephone Interview Completed | 291 | 180 | 107 | 51 | 
| In-person Interview Completed | 114 | 59 | 19 | 16 | 
| Business | 12 | 15 | 5 | 1 | 
| Final Breakoff | 1 | 1 | 0 | 0 | 
| Final Other | 9 | 8 | 1 | 0 | 
| Final Refusal | 101 | 62 | 23 | 17 | 
| Language Barrier | 2 | 2 | 0 | 2 | 
| Maximum Telephone Call Attempts (12) Reached | 35 | 23 | 14 | 8 | 
| No Answer | 69 | 51 | 17 | 19 | 
| No Eligible Respondent Found | 3 | 1 | 1 | 0 | 
| Non-working Telephone Number | 70 | 54 | 33 | 29 | 
| Wrong Address/No Such Address | 98 | 61 | 50 | 33 | 
| Total | 805 | 517 | 270 | 176 | 
| High response rate+ | 69.6% | 67.0% | 75.9% | 68.5% | 
| Low response rate* | 64.8% | 61.8% | 69.2% | 59.3% | 
+ = (Completes + Wrong Address) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes + Wrong Address)
* (Completes) / (Final Breakoff + Final Other + Final Refusal + Language Barrier + Maximum Calls + No Answers + No Eligible Respondent + Completes)
Table 7. MSG Tenure Status by Survey Report of Tenure Status.
| 
				 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| Own | 20.1% | 34.9% | 37.4% | 73.1% | 68.7% | 71.85 | 82.0% | 68.9% | 83.3% | 71.7% | 
| Standard error | - | - | (9.7) | (6.4) | (6.3) | (6.2) | (4.4) | (4.5) | (3.4) | (2.1) | 
| Unweighted n | 2 | 2 | 12 | 61 | 74 | 77 | 75 | 133 | 163 | 599 | 
| Weighted N | 85 | 380 | 1506 | 6013 | 7296 | 6714 | 7784 | 10460 | 11889 | 52126 | 
| Rent | 79.9% | 65.1% | 62.6% | 26.9% | 31.3% | 28.2% | 18.0% | 31.1% | 16.8% | 28.3% | 
| Standard error | - | - | (9.7) | (6.4) | (6.3) | (6.2) | (4.4) | (4.5) | (3.4) | (2.1) | 
| Unweighted n | 6 | 8 | 25 | 29 | 33 | 27 | 19 | 43 | 27 | 217 | 
| Weighted N | 337 | 709 | 2519 | 2209 | 3323 | 2641 | 1711 | 4724 | 2392 | 20564 | 
| Total* | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 
| Unweighted n | 8 | 10 | 37 | 90 | 107 | 104 | 94 | 176 | 190 | 816 | 
| Weighted N | 442 | 1088 | 4025 | 8223 | 10618 | 9355 | 9495 | 15184 | 14281 | 72690 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 8.	MSG Tenure Status by Survey Report of Tenure Status for
the Two
Sample Groups (Column Percents).
| 
			 | Rent | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | Own | 
			 | 
| 
			 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| Address Match Between Vendors | ||||||||||
| Own Standard error Unweighted n Weighted N | 24.4% - 2 85 | 9.4% - 1 73 | 32.4% (10.7) 6 1150 | 67.7% (6.5) 36 3745 | 86.2% (4.0) 50 5400 | 76.1% (8.6) 54 4987 | 86.3% (4.3) 54 5866 | 69.3% (5.0) 99 7957 | 86.5% (3.3) 134 9780 | 74.2% (2.3) 436 39043 | 
| Rent Standard error Unweighted n Weighted N | 75.7% - 5 264 | 90.6% - 8 709 | 67.6% (10.7) 23 2398 | 32.3% (6.5) 20 1789 | 13.8% (4.0) 13 864 | 23.9% (8.6) 17 1566 | 13.7% (4.3) 11 929 | 30.7% (5.0) 34 3526 | 13.5% (3.3) 19 1529 | 25.8% (2.3) 150 13573 | 
| Total* Unweighted n Weighted N | 100.0% 7 349 | 100.0% 9 782 | 100.0% 29 3548 | 100.0% 56 5533 | 100.0% 63 6264 | 100.0% 71 6554 | 100.0% 65 6795 | 100.0% 133 11482 | 100.0% 153 11309 | 100.0% 586 52616 | 
| No Address Match Between Vendors | ||||||||||
| Own Standard error Unweighted n Weighted N | 0% - 0 0 | 100.0% - 1 306 | 74.5% - 6 356 | 84.4% (8.9) 25 2269 | 43.5% (12.0) 24 1895 | 61.6% (11.9) 23 1726 | 71.1% (9.6) 21 1918 | 67.6% (8.3) 34 2503 | 71.0% (10.7) 29 2109 | 65.2% (4.9) 163 13083 | 
| Rent Standard error Unweighted n Weighted N | 100.0% - 1 73 | 0% - 0 0 | 25.5% - 2 122 | 15.6% (8.9) 9 421 | 56.5% (12.0) 20 2459 | 38.4% (11.9) 10 1075 | 29.0% (9.6) 8 782 | 32.4% (8.3) 9 1198 | 29.0% (10.7) 8 863 | 34.8% (4.9) 67 6992 | 
| Total* Unweighted n Weighted N | 100.0% 1 73 | 100.0% 1 306 | 100.0% 8 478 | 100.0% 34 2689 | 100.0% 44 4354 | 100.0% 33 2802 | 100.0% 29 2699 | 100.0% 43 3701 | 100.0% 37 2972 | 100.0% 230 20075 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 9. Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for the Two Sample Groups (Percent of Total).
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total* | 
| Address Match Between Vendors | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 71.7% (2.5) 427 37735 | 19.4% (2.3) 114 10202 | 91.1% (1.9) 541 47937 | 
| Rent Standard error Unweighted n Weighted N | 2.5% (1.0) 9 1308 | 6.4% (1.5) 36 3370 | 8.9% (1.9) 45 4678 | 
| Total* Standard error Unweighted n Weighted N | 74.2% (2.3) 436 39043 | 25.8% (2.3) 150 13573 | 100.0% - 586 52616 | 
| DUNHILL | |||
| Own Standard error Unweighted n Weighted N | 57.0% (2.9) 351 30011 | 11.3% (1.9) 70 5945 | 68.3% (2.7) 421 35956 | 
| Rent Standard error Unweighted n Weighted N | 17.2% (2.7) 85 9032 | 14.5% (1.9) 80 7628 | 31.7% (2.7) 165 16660 | 
| Total* Standard error Unweighted n Weighted N | 74.2% (2.3) 436 39043 | 25.8% (2.3) 150 13573 | 100.0% - 586 52616 | 
GDR SE+ NDR SE+
Address Match
MSG 21.9 2.3 16.9 2.7
Dunhill 28.5 3.2 -5.9 3.4
Dunhill-MSG 6.6 3.2 -22.8 2.6
(% of MSG Renters that are correct) – (% of Dunhill Renters that are correct) = 26.3
Standard error = 7.5
Z = 3.5
* Totals may not add up due to rounding.
+ SE = Standard error. GDR = Gross Difference Rate; NDR = Net Difference Rate
Table 9. (cont) Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for the Two Sample Groups (Percent of Total).
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total* | 
| No Address Match Between Vendors | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 61.9% (5.0) 156 12421 | 33.9% (4.9) 64 6797 | 95.7% (1.7) 220 19217 | 
| Rent Standard error Unweighted n Weighted N | 3.3% (1.7) 7 662 | 1.0% (0.6) 3 195 | 4.3% (1.7) 10 857 | 
| Total* Standard error Unweighted n Weighted N | 65.2% (4.9) 163 13083 | 34.8% (4.9) 67 6992 | 100.0% - 230 20075 | 
GDR SE+ NDR SE+
No Address Match
MSG 37.2 5.0 30.6 5.4
(Match) – (No Match)
MSG 15.3 5.8 13.7 6.4
* Totals may not add up due to rounding.
+ SE = Standard error.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table 10. MSG Tenure Status by Survey Report of Tenure Status for Mode Groups (Column Percents).
| 
			 | Rent | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | Own | 
			 | 
| 
			 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| In Person | ||||||||||
| Own Standard error Unweighted n Weighted N | 0% - 0 0 | 50.1% - 1 306 | 45.8% - 4 1013 | 86.6% - 14 3552 | 65.9% (9.8) 21 4152 | 70.2% (9.3) 27 4091 | 88.2% (6.0) 29 4892 | 63.0% (7.4) 33 5295 | 75.7% (10.5) 20 3325 | 71.2% (3.6) 149 26627 | 
| Rent Standard error Unweighted n Weighted N | 0% - 0 0 | 49.9% - 3 305 | 54.3% - 5 1200 | 13.4% - 2 551 | 34.1% (9.8) 10 2147 | 29.8% (9.3) 10 1737 | 11.8% (6.0) 3 652 | 37.0% (7.4) 15 3107 | 24.3% (10.5) 6 1068 | 28.8% (3.6) 54 10767 | 
| Total* Unweighted n Weighted N | 0% 0 0 | 100.0% 4 612 | 100.0% 9 2213 | 100.0% 16 4103 | 100.0% 31 6299 | 100.0% 37 5828 | 1000% 32 5545 | 100.0% 48 8402 | 100.0% 26 4394 | 100.0% 203 37394 | 
| Telephone | ||||||||||
| Own Standard error Unweighted n Weighted N | 20.1% - 2 85 | 15.4% - 1 73 | 27.2% (9.6) 8 493 | 59.7% (5.4) 47 2461 | 72.8% (6.4) 53 3143 | 74.4% (6.4) 50 2623 | 73.2% (6.5) 46 2892 | 76.2% (4.2) 100 5165 | 86.6% (2.6) 143 8564 | 72.2% (2.0) 450 25499 | 
| Rent Standard error Unweighted n Weighted N | 79.9% - 6 337 | 84.6% - 5 403 | 72.8% (9.6) 120 1319 | 40.3% (5.4) 27 1659 | 27.2% (6.4) 23 1176 | 25.6% (6.4) 17 904 | 26.8% (6.5) 16 1058 | 23.8% (4.2) 28 1617 | 13.4% (2.6) 21 1323 | 27.8% (2.0) 163 9797 | 
| Total* Unweighted n Weighted N | 100.0% 8 422 | 100.0% 6 477 | 100.0% 28 1812 | 100.0% 74 4120 | 100.0% 76 4319 | 100.0% 67 3527 | 100.0% 62 3950 | 100.0% 128 6782 | 100.0% 164 9887 | 100.0% 613 35296 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 11. Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Mode Groups (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| In-Person | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 73.0% (4.2) 110 19181 | 17.8% (4.1) 24 4673 | 90.8% (3.6) 134 23853 | 
| Rent Standard error Unweighted n Weighted N | 3.5% (1.9) 3 912 | 5.7% (2.6) 8 1506 | 9.2% (3.6) 11 2417 | 
| Total** Standard error Unweighted n Weighted N | 76.5% (3.9) 113 20092 | 23.5% (3.9) 32 6178 | 100.0% - 145 26271 | 
| Dunhill | |||
| Own Standard error Unweighted n Weighted N | 54.7% (5.4) 82 14364 | 9.8% (3.4) 14 2567 | 64.5% (5.0) 96 16931 | 
| Rent Standard error Unweighted n Weighted N | 21.8% (5.0) 31 5728 | 13.8% (3.2) 18 3611 | 35.6% (5.0) 49 9340 | 
| Total** Standard error Unweighted n Weighted N | 76.5% (3.9) 113 20092 | 23.5% (3.9) 32 6178 | 100.0% - 145 26271 | 
GDR SE+ NDR SE+
In-Person
MSG 21.3 4.0 14.3 4.9
Dunhill 31.6 6.0 -12.0 6.1
Dunhill-MSG 10.3 5.7 -26.4 4.6
(% of MSG Renters that are correct) – (% of Dunhill Renters that are correct) = 23.6
Standard error = 13.2
Z = 1.8
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.;
+SE = Standard error. GDR =Gross Difference Rate; NDR = Net Difference Rate
Table 11. (cont) Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Mode Groups (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| Telephone | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 70.4% (2.7) 317 18554 | 21.0% (2.2) 90 5530 | 91.4% (1.5) 407 24084 | 
| Rent Standard error Unweighted n Weighted N | 1.5% (0.7) 6 397 | 7.1% (1.4) 28 1864 | 8.6% (1.5) 34 2261 | 
| Total** Standard error Unweighted n Weighted N | 71.9% (2.5) 323 18951 | 28.1% (2.5) 118 7394 | 100.0% - 441 26345 | 
| Dunhill | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 59.4% (2.3) 269 15647 | 12.8% (1.8) 56 3378 | 72.2% (2.2) 325 19025 | 
| Rent Standard error Unweighted n Weighted N | 12.5% (1.8) 54 3304 | 15.3% (2.0) 62 4016 | 27.8% (2.2) 116 7320 | 
| Total** Standard error Unweighted n Weighted N | 71.9% (2.5) 323 18951 | 28.1% (2.5) 118 7394 | 100.0% - 441 26345 | 
GDR SE+ NDR SE+
Telephone
MSG 22.5 2.5 19.5 2.2
Dunhill 25.4 2.2 0.3 2.8
Dunhill-MSG 2.9 2.7 -19.2 2.5
(% of MSG Renters that are correct) – (% of Dunhill Renters that are correct) = 27.6
Standard error = 8.6
Z = 3.2
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error. GDR = Gross Difference Rate; NDR = Net Difference Rate
Table 12. MSG Tenure Status by Survey Report of Tenure Status for Percent Renters on Block (Column Percents).
| 
			 | Rent | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | Own | 
			 | 
| 
			 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| 0-10% | ||||||||||
| Own Standard error Unweighted n Weighted N | 100.0% - 1 12 | 30.8% - 1 73 | 25.3% - 2 365 | 89.6% (7.3) 30 3326 | 73.8% (15.3) 36 3350 | 77.2% (7.4) 34 3276 | 91.0% (5.7) 46 4572 | 76.1% (6.1) 75 6362 | 86.6% (4.0) 104 7893 | 79.7% (3.8) 329 29230 | 
| Rent Standard error Unweighted n Weighted N | 0% - 0 0 | 69.2% - 2 165 | 74.7% - 8 1076 | 10.4% (7.3) 4 385 | 26.2% (15.3) 8 1190 | 22.8% (7.4) 9 968 | 9.1% (5.7) 4 455 | 24.0% (6.1) 17 2003 | 13.4% (4.0) 14 1225 | 20.4% (3.8) 66 7466 | 
| Total* Unweighted n Weighted N | 100.0% 1 12 | 100.0% 3 238 | 100.0% 10 1441 | 100.0% 34 3712 | 100.0% 44 4540 | 100.0% 43 4243 | 100.0% 50 5027 | 100.0% 92 8366 | 100.0% 118 9118 | 100.0% 395 36696 | 
| 11-20% | ||||||||||
| Own Standard error Unweighted n Weighted N | 28.3% - 1 73 | 0% - 0 0 | 54.8% - 5 480 | 79.1% (10.0) 18 1981 | 76.4% (8.0) 23 2462 | 67.8% (15.2) 29 2255 | 84.1% (7.5) 18 2182 | 69.0% (8.9) 38 2569 | 72.3% (7.1) 36 2480 | 72.4% (3.5) 168 14482 | 
| Rent Standard error Unweighted n Weighted N | 71.7% - 3 186 | 100.0% - 1 73 | 45.2% - 5 396 | 20.9% (10.0) 5 523 | 23.6% (8.0) 12 759 | 32.2% (15.2) 9 1070 | 15.9% (7.5) 6 413 | 31.0% (8.9) 15 1152 | 27.8% (7.1) 9 953 | 27.6% (3.5) 65 5525 | 
| Total* Unweighted n Weighted N | 100.0% 4 259 | 100.0% 1 73 | 100.0% 10 876 | 100.0% 23 2504 | 100.0% 35 3220 | 100.0% 38 3325 | 100.0% 24 2595 | 100.0% 53 3721 | 100.0% 45 3433 | 100.0% 233 20007 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 12. (cont.) MSG Tenure Status by Survey Report of Tenure Status for Percent Renters on Block (Column Percents).
| 
			 | Rent | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | Own | 
			 | 
| 
			 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| 21-30% | ||||||||||
| Own Standard error Unweighted n Weighted N | 0 - 0 0 | 0% - 0 0 | 54.8% - 5 660 | 37.8% (12.7) 9 490 | 73.3% (11.7) 11 810 | 88.7% (6.8) 11 975 | 47.3% (20.4) 5 559 | 63.5% (16.9) 14 819 | 87.6% (8.0) 19 1198 | 61.1% (4.9) 74 5511 | 
| Rent Standard error Unweighted n Weighted N | 0 - 0 0 | 100.0% - 5 470 | 45.2% - 4 545 | 62.2% (12.7) 12 807 | 26.7% (11.7) 6 295 | 11.3% (6.8) 5 124 | 52.7% (20.4) 8 623 | 36.5% (16.9) 5 471 | 12.4% (8.0) 3 170 | 38.9% (4.9) 48 3505 | 
| Total* Unweighted n Weighted N | 100.0% 0 0 | 100.0% 5 470 | 100.0% 9 1205 | 100.0% 21 1298 | 100.0% 17 1105 | 100.0% 16 1099 | 100.0% 13 1182 | 100.0% 19 1290 | 100.0% 22 1368 | 100.0% 122 9017 | 
| 31-40% | ||||||||||
| Own Standard error Unweighted n Weighted N | 0% - 0 0 | 100.0% - 1 306 | 0% - 0 0 | 30.4% - 4 216 | 38.5% - 4 674 | 30.2% - 3 208 | 68.3% - 6 471 | 39.3% - 6 710 | 87.8% - 4 318 | 41.7% (7.1) 28 2903 | 
| Rent Standard error Unweighted n Weighted N | 100.0% - 3 152 | 0% - 0 0 | 100.0% - 8 502 | 69.6% - 8 493 | 61.6% - 7 1079 | 69.8% - 4 480 | 31.8% - 1 219 | 60.7% - 6 1098 | 12.2% - 1 44 | 58.4% (7.1) 38 4067 | 
| Total* Unweighted n Weighted N | 100.0% 3 152 | 100.0% 1 306 | 100.0% 8 502 | 100.0% 12 709 | 100.0% 11 1753 | 100.0% 7 688 | 100.0% 7 690 | 100.0% 12 1807 | 100.0% 5 363 | 100.0% 66 6971 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 13. Collapsed MSG and Dunhill Tenure Status by Survey Report for Percent Renters on Block (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| 0% - 10% Renter | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 81.3% (2.7) 250 22115 | 13.2% (2.4) 36 3591 | 94.5% (1.9) 286 25707 | 
| Rent Standard error Unweighted n Weighted N | 1.4% (1.1) 3 377 | 4.1% (1.8) 8 1119 | 5.5% (1.9) 11 1496 | 
| Total** Standard error Unweighted n Weighted N | 82.7% (2.9) 253 22492 | 17.3% (2.9) 44 4711 | 100.0% - 297 27203 | 
| Dunhill | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 65.3% (3.9) 204 17753 | 8.4% (2.3) 25 2290 | 73.7% (3.6) 229 20043 | 
| Rent Standard error Unweighted n Weighted N | 17.4% (3.7) 49 4739 | 8.9% (2.5) 19 2421 | 26.3% (3.6) 68 7160 | 
| Total** Standard error Unweighted n Weighted N | 82.7% (2.9) 253 22492 | 17.3% (2.9) 44 4711 | 100.0% - 297 27203 | 
GDR SE+ NDR SE+
0% - 10% Renter
MSG 14.6 2.3 11.8 2.9
Dunhill 25.8 4.4 -9.0 4.4
Dunhill- MSG 11.2 4.5 -20.8 3.6
(% of MSG Renters that are correct) – (% of Dunhill Renters that are correct) = 41.0
Standard error = 17.9
Z = 2.3
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error. GDR = Gross Difference Rate; NDR = Net Difference Rate.
Table 13. (cont.) Collapsed MSG and Dunhill Tenure Status by Survey Report for Percent Renters on Block (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| 11% - 20% Renter | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 71.5% (4.6) 114 10761 | 21.5% (4.0) 36 3236 | 93.0% (2.5) 150 13997 | 
| Rent Standard error Unweighted n Weighted N | 2.7% (1.8) 3 402 | 4.4% (1.5) 9 655 | 7.0% (2.5) 12 1057 | 
| Total** Standard error Unweighted n Weighted N | 74.2% (4.2) 117 11163 | 25.9% (4.2) 45 3891 | 100.0% - 162 15054 | 
| Dunhill | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 53.6% (5.0) 93 8069 | 11.2% (2.9) 21 1685 | 64.8% (6.1) 114 9755 | 
| Rent Standard error Unweighted n Weighted N | 20.6% (5.6) 24 3093 | 14.7% (3.0) 24 2206 | 35.2% (6.1) 48 5299 | 
| Total** Standard error Unweighted n Weighted N | 74.2% (4.2) 117 11163 | 25.9% (4.2) 45 3891 | 100.0% - 162 15054 | 
GDR SE+ NDR SE+
11% - 20% Renter
MSG 24.2 4.4 18.8 4.4
Dunhill 31.7 4.3 -9.4 7.6
Dunhill- MSG 7.6 6.5 -28.2 6.5
(% of MSG Renters that are correct) – (% of Dunhill Renters that are correct) = 20.4
Standard error = 21.3
Z = 1.0
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error. GDR = Gross Difference Rate; NDR = Net Difference Rate.
Table 13. (cont.) Collapsed MSG and Dunhill Tenure Status by Survey Report for Percent Renters on Block (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| 21% - 30% Renter | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 49.4% (9.6) 44 3048 | 25.6% (6.4) 25 1581 | 75.0% (12.3) 69 4629 | 
| Rent Standard error Unweighted n Weighted N | 8.6% (6.1) 3 530 | 16.4% (8.0) 9 1015 | 25.0% (12.3) 12 1545 | 
| Total** Standard error Unweighted n Weighted N | 58.0% (6.8) 47 3578 | 42.1% (6.8) 34 2596 | 100.0% - 81 6174 | 
| Dunhill | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 41.7% (9.3) 38 2575 | 15.4% (4.7) 15 952 | 57.1% (10.0) 53 3527 | 
| Rent Standard error Unweighted n Weighted N | 16.2% (5.7) 9 1003 | 26.6% (6.5) 19 1645 | 42.9% (10.0) 28 2647 | 
| Total** Standard error Unweighted n Weighted N | 58.0% (6.8) 47 3578 | 42.1% (6.8) 34 2596 | 100.0% - 81 6174 | 
GDR SE+ NDR SE+
21% - 30% Renter
MSG 34.2 6.1 17.0 10.9
Dunhill 31.7 6.0 -0.8 8.6
Dunhill-MSG -2.5 5.3 -17.9 6.0
(% of MSG Renters that are correct) – (% of Dunhill Renters that are correct) = 3.6
Standard error = 14.2
Z = 0.3
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error. GDR = Gross Difference Rate; NDR = Net Difference Rate.
Table 13. (cont.) Collapsed MSG and Dunhill Tenure Status by Survey Report for Percent Renters on Block (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| 31% - 40% Renter | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 43.3% (10.1) 19 1811 | 42.9% (10.4) 17 1794 | 86.1% (4.8) 36 3605 | 
| Rent Standard error Unweighted n Weighted N | 0% - 0 0 | 13.9% (4.8) 10 581 | 13.9% (4.8) 10 581 | 
| Total** Standard error Unweighted n Weighted N | 43.3% (10.1) 19 1811 | 56.7% (10.1) 27 2374 | 100.0% - 46 4185 | 
| Dunhill | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 38.6% (10.1) 16 1614 | 24.3% (11.7) 9 1018 | 62.9% (8.1) 25 2631 | 
| Rent Standard error Unweighted n Weighted N | 4.7% (2.8) 3 197 | 32.4% (7.6) 18 1357 | 37.1% (8.1) 21 1554 | 
| Total** Standard error Unweighted n Weighted N | 43.3% (10.1) 19 1811 | 56.7% (10.1) 27 2374 | 100.0% - 46 4185 | 
GDR SE+ NDR SE+
31% - 40% Renter
MSG 42.9 10.4 42.9 10.4
Dunhill 29.0 11.7 19.6 12.3
Dunhill-MSG -13.8 7.1 -23.3 7.4
(% of MSG Renters that are correct) – (% of Dunhill Renters that are correct) = 12.7
Standard error = 7.5
Z = 1.7
* Excludes units that did not match Dunhill file. ** Totals may not add up due to rounding.
+ SE = Standard error. GDR = Gross Difference Rate; NDR = Net Difference Rate.
Table 14. MSG Tenure Status by Survey Report of Tenure Status for Each County (Column Percent).
| 
			 | Rent | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | Own | 
			 | 
| 
			 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| Baltimore County, MD | ||||||||||
| Own Standard error Unweighted n Weighted N | 25.0% - 1 73 | 63.3% - 2 380 | 47.0% - 5 748 | 75.9% (9.5) 22 4036 | 69.7% (9.1) 29 4779 | 72.7% (11.5) 22 3269 | 72.4% (7.4) 20 3128 | 61.8% (7.1) 40 5107 | 82.5% (5.3) 62 6626 | 70.8% (3.5) 203 28146 | 
| Rent Standard error Unweighted n Weighted N | 75.0% - 3 220 | 36.7% - 3 220 | 53.1% - 7 846 | 24.1% (9.5) 12 1284 | 30.3% (9.1) 11 2076 | 27.3% (11.5) 6 1225 | 27.6% (7.4) 8 1190 | 38.2% (7.1) 20 3157 | 17.5% (5.3) 12 1410 | 29.2% (3.5) 82 11627 | 
| Total* Unweighted n Weighted N | 100.0% 4 293 | 100.0% 5 600 | 100.0% 12 1594 | 100.0% 34 5320 | 100.0% 40 6854 | 100.0% 28 4494 | 100.0% 28 4318 | 100.0% 60 8264 | 100.0% 74 8036 | 100.0% 285 39773 | 
| Howard County, MD | ||||||||||
| Own Standard error Unweighted n Weighted N | 0 - 0 0 | 0% - 0 0 | 30.9% - 3 566 | 50.0% - 6 550 | 64.6% - 3 622 | 63.1% - 7 1403 | 92.2% - 14 2194 | 77.3% - 15 1402 | 84.6% (6.0) 22 1524 | 67.2% (2.5) 70 8261 | 
| Rent Standard error Unweighted n Weighted N | 0 - 0 0 | 100.0% - 2 183 | 69.1% - 9 1268 | 50.0% - 6 550 | 35.4% - 2 340 | 36.9% - 5 822 | 7.9% - 2 187 | 22.7% - 3 412 | 15.4% (6.0) 4 277 | 32.8% (2.5) 33 4040 | 
| Total* Unweighted n Weighted N | 100.0% 0 0 | 100.0% 2 183 | 100.0% 12 1834 | 100.0% 12 1100 | 100.0% 5 962 | 100.0% 12 2226 | 100.0% 16 2381 | 100.0% 18 1814 | 100.0% 26 1801 | 100.0% 103 12300 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 14. (cont) MSG Tenure Status by Survey Report of Tenure Status for Each County (Column Percent).
| 
			 | Rent | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | Own | 
			 | 
| 
			 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| Queen Anne’s County, MD | ||||||||||
| Own Standard error Unweighted n Weighted N | 100.0% - 1 12 | 0 - 0 0 | 100.0% - 1 12 | 80.0% - 4 47 | 50.0% - 6 91 | 68.8% - 11 167 | 58.3% - 7 200 | 82.1% (6.4) 23 486 | 88.9% - 16 338 | 74.2% (5.2) 69 1353 | 
| Rent Standard error Unweighted n Weighted N | 0% - 0 0 | 0 - 0 0 | 0% - 0 0 | 20.0% - 1 12 | 50.0% - 6 91 | 31.3% - 5 76 | 41.7% - 5 143 | 17.9% (6.4) 5 106 | 11.1% - 2 42 | 25.8% (5.2) 24 470 | 
| Total* Unweighted n Weighted N | 100.0% 1 12 | 100.0% 0 0 | 100.0% 1 12 | 100.0% 5 58 | 100.0% 12 182 | 100.0% 16 243 | 100.0% 12 342 | 100.0% 28 592 | 100.0% 18 380 | 100.0% 93 1822 | 
| Hanover County, VA | ||||||||||
| Own Standard error Unweighted n Weighted N | 0 - 0 0 | 0% - 0 0 | 0% - 0 0 | 60.0% - 4 135 | 75.7% - 6 251 | 74.3% - 8 295 | 100.0% - 6 513 | 79.8% (8.3) 18 1419 | 76.0% - 14 793 | 70.7% (6.9) 56 3406 | 
| Rent Standard error Unweighted n Weighted N | 0 - 0 0 | 100.0% - 3 305 | 100.0% - 5 222 | 40.0% - 3 90 | 24.3% - 1 80 | 25.7% - 3 102 | 0% - 0 0 | 20.3% (8.3) 5 360 | 24.0% - 3 251 | 29.3% (6.9) 23 1411 | 
| Total* Unweighted n Weighted N | 100.0% 0 0 | 100.0% 3 305 | 100.0% 5 222 | 100.0% 7 226 | 100.0% 7 331 | 100.0% 11 397 | 100.0% 6 513 | 100.0% 23 1780 | 100.0% 17 1044 | 100.0% 79 4818 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 14. (cont) MSG Tenure Status by Survey Report of Tenure Status for Each County (Column Percent).
| 
			 | Rent | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | 
			 | Own | 
			 | 
| 
			 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | Total* | 
| Henrico County, VA | ||||||||||
| Own Standard error Unweighted n Weighted N | 0% - 0 0 | 0 - 0 0 | 49.4% - 3 179 | 82.0% (7.0) 25 1245 | 67.9% (7.3) 30 1553 | 79.2% (7.6) 29 1579 | 90.2% (5.4) 28 1749 | 74.8% (6.6) 37 2046 | 86.4% (5.9) 49 2608 | 78.4% (2.8) 201 10960 | 
| Rent Standard error Unweighted n Weighted N | 100.0% - 3 117 | 0 - 0 0 | 50.6% - 4 184 | 18.0% (7.0) 7 274 | 32.1% (7.3) 13 735 | 20.8% (7.6) 8 416 | 9.9% (5.4) 4 191 | 25.2% (6.6) 10 689 | 13.6% (5.9) 6 412 | 21.6% (2.8) 55 3017 | 
| Total* Unweighted n Weighted N | 100.0% 3 117 | 100.0% 0 0 | 100.0% 7 363 | 100.0% 32 1519 | 100.0% 43 2288 | 100.0% 37 1995 | 100.0% 32 1940 | 100.0% 47 2734 | 100.0% 55 3020 | 100.0% 256 13977 | 
* Totals may not add up due to rounding.
- Denominator of percent is less than 20 unweighted cases.
Table 15. Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Each County (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| Baltimore County, MD | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 71.5% (3.8) 156 20970 | 21.6% (3.7) 47 6332 | 93.1% (2.0) 203 27302 | 
| Rent Standard error Unweighted n Weighted N | 2.8% (1.3) 6 822 | 4.1% (1.7) 12 1212 | 6.9% (2.0) 18 2034 | 
| Total** Standard error Unweighted n Weighted N | 74.3% (3.7) 162 21792 | 25.7% (3.7) 59 7544 | 100.0% - 221 29336 | 
| Dunhill | |||
| Own Standard error Unweighted n Weighted N | 57.7% (4.7) 129 16913 | 11.3% (3.0) 28 3325 | 69.0% (4.0) 157 20237 | 
| Rent Standard error Unweighted n Weighted N | 16.6% (4.4) 33 4879 | 14.4% (2.9) 31 4219 | 31.0% (4.0) 64 9098 | 
| Total** Standard error Unweighted n Weighted N | 74.3% (3.7) 162 21792 | 25.7% (3.7) 59 7544 | 100.0% - 221 29336 | 
GDR SE+ NDR SE+
Baltimore County, MD
MSG 24.4 3.8 18.8 4.0
Dunhill 28.0 5.4 -5.3 5.2
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table 15. (cont) Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Each County (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| Howard County, MD | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 66.1% (6.1) 54 6305 | 14.7% (4.3) 14 1404 | 80.8% (6.8) 68 7710 | 
| Rent Standard error Unweighted n Weighted N | 5.0% (4.1) 2 475 | 14.3% (3.7) 10 1360 | 19.2% (6.8) 12 1834 | 
| Total** Standard error Unweighted n Weighted N | 71.0% (5.0) 56 6780 | 29.0% (5.0) 24 2764 | 100.0% - 80 9544 | 
| Dunhill | |||
| Own Standard error Unweighted n Weighted N | 48.7% (5.8) 43 4647 | 10.9% (5.3) 9 1039 | 59.6% (8.6) 52 5686 | 
| Rent Standard error Unweighted n Weighted N | 22.4% (6.0) 13 2133 | 18.1% (5.2) 15 1725 | 40.4% (8.6) 28 3858 | 
| Total** Standard error Unweighted n Weighted N | 71.0% (5.0) 56 6780 | 29.0% (5.0) 24 2764 | 100.0% - 80 9544 | 
GDR SE+ NDR SE+
Howard County, MD
MSG 19.7 3.8 9.7 7.5
Dunhill 33.2 5.0 -11.5 10.2
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table 15. (cont) Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Each County (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| Queen Anne’s County, MD | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 78.5% (6.4) 52 1056 | 20.6% (6.5) 14 278 | 99.1% (0.9) 66 1333 | 
| Rent Standard error Unweighted n Weighted N | 0.9% (0.9) 1 12 | 0% 
 0 0 | 0.9% (0.9) 1 12 | 
| Total** Standard error Unweighted n Weighted N | 79.4% (6.5) 53 1067 | 20.6% (6.5) 14 278 | 100.0% - 67 1345 | 
| Dunhill | |||
| Own Standard error Unweighted n Weighted N | 67.4% (6.7) 45 906 | 11.3% (4.0) 8 151 | 78.6% (6.1) 53 1058 | 
| Rent Standard error Unweighted n Weighted N | 12.0% (3.8) 8 161 | 9.4% (4.5) 6 126 | 21.4% (6.1) 14 287 | 
| Total** Standard error Unweighted n Weighted N | 79.4% (6.5) 53 1067 | 20.6% (6.5) 14 278 | 100.0% - 67 1345 | 
GDR SE+ NDR SE+
Queen Anne’s County, MD
MSG 21.5 6.4 19.8 6.6
Dunhill 23.2 4.1 -0.7 6.6
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table 15. (cont) Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Each County (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| Hanover County, VA | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 74.5% (12.6) 36 2116 | 8.1% (4.0) 5 229 | 82.5% (13.9) 41 2345 | 
| Rent Standard error Unweighted n Weighted N | 0% - 0 0 | 17.5% (13.9) 7 497 | 17.5% (13.9) 7 497 | 
| Total** Standard error Unweighted n Weighted N | 74.5% (12.6) 36 2116 | 25.6% (12.6) 12 726 | 100.0% - 48 2842 | 
| Dunhill | |||
| Own Standard error Unweighted n Weighted N | 67.1% (13.7) 32 1906 | 10.4% (3.9) 5 297 | 77.5% (11.6) 37 2203 | 
| Rent Standard error Unweighted n Weighted N | 7.4% (2.6) 4 209 | 15.1% (10.2) 7 429 | 22.5% (11.6) 11 639 | 
| Total** Standard error Unweighted n Weighted N | 74.5% (12.6) 36 2116 | 25.6% (12.6) 12 726 | 100.0% - 48 2842 | 
GDR SE+ NDR SE+
Hanover County, VA
MSG 8.1 4.0 8.1 4.0
Dunhill 17.8 4.6 3.1 4.9
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table 15. (cont) Collapsed MSG and Dunhill Tenure Status by Survey Report of Tenure Status for Each County (Percent of Total)*.
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total** | 
| Henrico County, VA | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 76.3% (3.7) 129 7288 | 20.5% (3.9) 34 1960 | 96.9% (1.1) 163 9248 | 
| Rent Standard error Unweighted n Weighted N | 0% 
 0 0 | 3.2% (1.1) 7 301 | 3.2% (1.1) 7 301 | 
| Total** Standard error Unweighted n Weighted N | 76.3% (3.7) 129 7288 | 23.7% (3.7) 41 2261 | 100.0% - 170 9549 | 
| Dunhill | |||
| Own Standard error Unweighted n Weighted N | 59.1% (3.6) 102 5639 | 11.9% (2.6) 20 1133 | 70.9% (2.9) 122 6772 | 
| Rent Standard error Unweighted n Weighted N | 17.3% (3.2) 27 1650 | 11.8% (2.5) 21 1128 | 29.1% (2.9) 48 2778 | 
| Total** Standard error Unweighted n Weighted N | 76.3% (3.7) 129 7288 | 23.7% (3.7) 41 2261 | 100.0% - 170 9549 | 
GDR SE+ NDR SE
Henrico County, VA
MSG 20.5 3.9 20.5 3.9
Dunhill 29.1 3.7 -5.4 4.5
* Excludes units that did not match Dunhill file.
** Totals may not add up due to rounding.
+ SE = Standard error.
GDR = Gross Difference Rate; NDR = Net Difference Rate
APPENDIX A
SUPPLEMENTARY TABLES
Table A1.	Collapsed MSG Tenure Status by Survey Report of
Tenure
Status for Mode Groups (Percent of Total).
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total* | 
| In-Person | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 67.7% (3.8) 144 25308 | 24.8% (3.7) 46 9262 | 92.5% (2.6) 190 34570 | 
| Rent Standard error Unweighted n Weighted N | 3.5% (1.5) 5 1319 | 4.0% (1.9) 8 1506 | 7.6% (2.6) 13 2825 | 
| Total* Standard error Unweighted n Weighted N | 71.2% (3.6) 149 26627 | 28.8% (3.6) 54 10767 | 100.0% - 203 37394 | 
| Telephone | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 70.4% (2.1) 439 24848 | 21.9% (1.9) 132 7737 | 92.3% (1.1) 571 32585 | 
| Rent Standard error Unweighted n Weighted N | 1.9% (0.6) 11 651 | 5.8% (1.0) 31 2059 | 7.7% (1.1) 42 2711 | 
| Total* Standard error Unweighted n Weighted N | 72.2% (2.0) 450 25499 | 27.8% (2.0) 163 9797 | 100.0% - 613 35296 | 
GDR SE+ NDR SE+
In-Person
MSG 28.3 3.8 21.2 4.2
Telephone
MSG 23.8 2.1 20.1 2.0
* Totals may not add up due to rounding.
+ SE = standard error.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table A2. Collapsed MSG Tenure Status by Survey Report of Tenure Status for Percent Renters on Block (Percent of Total).
| 
			 | Survey Report | 
			 | 
			 | 
| 
			 | Own | Rent | Total* | 
| 0% - 10% Renter | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 78.4% (3.6) 325 28780 | 17.0% (3.5) 56 6225 | 95.4% (1.5) 381 35005 | 
| Rent Standard error Unweighted n Weighted N | 1.2% (0.8) 4 450 | 3.4% (1.3) 10 1241 | 4.6% (1.5) 14 1691 | 
| Total* Standard error Unweighted n Weighted N | 79.7% (3.8) 329 29230 | 20.4% (3.8) 66 7466 | 100.0% - 395 36696 | 
| 11% - 20% Renter | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 69.6% (3.9) 162 13928 | 24.3% (3.4) 56 4870 | 94.0% (1.9) 218 18798 | 
| Rent Standard error Unweighted n Weighted N | 2.8% (1.5) 6 553 | 3.3% (1.1) 9 655 | 6.0% (1.9) 15 1209 | 
| Total* Standard error Unweighted n Weighted N | 72.4% (3.5) 168 14482 | 27.6% (3.5) 65 5525 | 100.0% - 233 20007 | 
GDR SE+ NDR SE+
0% - 10% Renter
MSG 18.2 3.4 15.7 3.8
11% - 20% Renter
MSG 27.1 3.7 21.6 3.6
* Totals may not add up due to rounding.
+ SE = standard error (based on unweighted sample size.)
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table A2. (cont) Collapsed MSG Tenure Status by Survey Report of Tenure Status for Percent Renters on Block (Percent of Total).
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total* | 
| 21% - 30% Renter | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 53.8% (6.7) 69 4851 | 27.6% (5.3) 39 2490 | 81.4% (8.8) 108 7341 | 
| Rent Standard error Unweighted n Weighted N | 7.3% (4.3) 5 660 | 11.3% (5.8) 9 1015 | 18.6% (8.8) 14 1675 | 
| Total* Standard error Unweighted n Weighted N | 61.1% (4.9) 74 5511 | 38.9% (4.9) 48 3505 | 100.0% - 122 9017 | 
| 31% - 40% Renter | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 37.3% (6.5) 27 2597 | 49.0% (7.7) 27 3414 | 86.2% (5.4) 54 6011 | 
| Rent Standard error Unweighted n Weighted N | 4.4% (4.4) 1 306 | 9.4% (2.9) 11 654 | 13.8% (5.4) 12 960 | 
| Total* Standard error Unweighted n Weighted N | 41.7% (7.1) 28 2903 | 58.4% (7.1) 38 4067 | 100.0% - 66 6971 | 
GDR SE+ NDR SE+
21% - 30% Renter
MSG 34.9 4.9 20.3 8.3
31% - 40% Renter
MSG 49.0 7.7 49.0 7.7
* Totals may not add up due to rounding.
+ SE = standard error (based on unweighted sample size.)
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table A3.	Collapsed MSG Tenure Status by Survey Report of
Tenure
Status for Each County (Percent of Total).
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total* | 
| Baltimore County, MD | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 67.8% (3.7) 195 26945 | 26.0% (3.4) 69 10341 | 93.8% (1.6) 264 37286 | 
| Rent Standard error Unweighted n Weighted N | 3.0% (1.2) 8 1201 | 3.2% (1.2) 13 1285 | 6.3% (1.6) 21 2487 | 
| Total* Standard error Unweighted n Weighted N | 70.8% (3.5) 203 28146 | 29.2% (3.5) 82 11627 | 100.0% - 285 39773 | 
| Howard County, MD | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 62.6% (2.9) 67 7694 | 21.0% (4.4) 22 2588 | 83.6% (6.1) 89 10283 | 
| Rent Standard error Unweighted n Weighted N | 4.6% (3.3) 3 566 | 11.8% (3.6) 11 1451 | 16.4% (6.1) 14 2018 | 
| Total* Standard error Unweighted n Weighted N | 67.2% (2.5) 70 8261 | 32.8% (2.5) 33 4040 | 100.0% - 103 12300 | 
GDR SE+ NDR SE+
Baltimore County, MD
MSG 29.0 3.7 23.0 3.5
Howard County, MD
MSG 25.7 2.9 16.4 7.2
* Totals may not add up due to rounding.
+ SE = standard error based on unweighted sample size.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table A3. (cont)	Collapsed MSG Tenure Status by Survey Report
of
Tenure Status for Each County (Percent of Total).
| 
			 | Survey Report | 
			 | |
| 
			 | Own | Rent | Total* | 
| Queen Anne’s County, MD | |||
| MSG | |||
| Own Standard error Unweighted n Weighted N | 73.0% (5.2) 67 1329 | 25.8% (5.2) 24 470 | 98.7% (0.9) 91 1799 | 
| Rent Standard error Unweighted n Weighted N | 1.3% (0.9) 2 23 | 0% - 0 0 | 1.3% (0.9) 2 23 | 
| Total* Standard error Unweighted n Weighted N | 74.2% (5.2) 69 1353 | 25.8% (5.2) 24 470 | 100.0% - 93 1822 | 
| Hanover County, VA | |||
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 70.7% (6.9) 56 3406 | 18.4% (7.6) 15 884 | 89.1% (8.7) 71 4290 | 
| Rent Standard error Unweighted n Weighted N | 0% - 0 0 | 10.9% (8.7) 8 527 | 10.9% (8.7) 8 527 | 
| Total* Standard error Unweighted n Weighted N | 70.7% (6.9) 56 3406 | 29.3% (6.9) 23 1411 | 100.0% - 79 4818 | 
GDR SE+ NDR SE+
Queen Anne’s County, MD
MSG 27.1 5.2 24.5 5.2
Hanover County, VA
MSG 18.4 7.6 18.4 7.6
* Totals may not add up due to rounding.
+ SE = standard error based on unweighted sample size.
GDR = Gross Difference Rate; NDR = Net Difference Rate
Table A3. (cont)	Collapsed MSG Tenure Status by Survey Report
of
Tenure Status for Each County (Percent of Total).
| 
			 | Survey Report | 
			 | 
			 | 
| 
			 | Own | Rent | Total* | 
| Henrico County, VA | 
			 | 
			 | 
			 | 
| MSG | 
			 | 
			 | 
			 | 
| Own Standard error Unweighted n Weighted N | 77.1% (3.1) 198 10781 | 19.4% (3.0) 48 2716 | 96.6% (1.1) 246 13497 | 
| Rent Standard error Unweighted n Weighted N | 1.3% (0.8) 3 179 | 2.2% (0.8) 7 301 | 3.4% (1.1) 10 480 | 
| Total* Standard error Unweighted n Weighted N | 78.4% (2.8) 201 10960 | 21.6% (2.8) 55 3017 | 100.0% - 256 13977 | 
GDR SE+ NDR SE+
Henrico County, VA
MSG 20.7 3.2 18.2 3.0
* Totals may not add up due to rounding.
+ SE = Standard error based on unweighted sample size.
GDR = Gross Difference Rate; NDR = Net Difference Rate
APPENDIX B
TELEPHONE QUESTIONNAIRE
CPI Housing Tenure Survey
TELEPHONE VERSION
Hello, my name is [NAME] and I am calling for the Bureau of Labor Statistics, an agency of the U.S. government. We would like to conduct a 3 minute survey with an adult in this household.
[IF NEEDED: We recently sent you a letter about this study, which is examining ways to improve how the government measures housing costs.]
Before I begin, I need to verify your address.
1. Is this [RECITE ADDRESS FROM ASSIGNMENT LABEL]?
 YES
 NO GO TO END
2. Is this address for a business or a residence?
 RESIDENCE
 BUSINESS GO TO END
3. May I please speak with someone who is at least 18 years old and who lives at this address?
 YES
 NO GO TO END
[OPTIONAL: I’d like to speak with someone who is at least 18 years old and who lives at this address. Would that be you?]
The purpose of this study is to help improve the way the government collects information about housing costs.
I work for a research company called Westat. We are conducting this study for the Bureau of Labor Statistics. Westat and the Bureau of Labor Statistics will use the information you provide for statistical purposes only and will hold the information in confidence to the full extent permitted by law.
I have just a few questions to ask you about your home.
4. First, is this house or apartment owned or being bought by you or someone in your household?
 YES GO TO Q7
 NO
5. Is this house or apartment being rented by you or someone in your household?
 YES
 NO GO TO END
6. How much is your current monthly rent?
$ GO TO END
 DON’T KNOW GO TO END
 REFUSED GO TO END
7. If your home were to be rented out, about how much would it rent for per month?
$
 DON’T KNOW
 REFUSED
8. What is the least you would accept in rent?
$
 DON’T KNOW
 REFUSED
9. Do you currently make a mortgage payment?
 YES
 NO GO TO END
10. What is your mortgage payment each month?
$
 DON’T KNOW
 REFUSED
END: Those are all the questions I have for you today. Thank you very much for your participation.
APPENDIX C
IN-PERSON QUESTIONNAIRE
CPI Housing Tenure Survey
IN-PERSON VERSION
Hello, my name is [NAME] and I’m here for the Bureau of Labor Statistics, an agency of the U.S. government. We would like to conduct a 3 minute survey with an adult in this household.
[IF NEEDED: We recently sent you a letter about this study, which is examining ways to improve how the government measures housing costs.]
1. May I please speak with someone who is at least 18 years old and who lives here?
 YES
 NO GO TO END
[OPTIONAL: I’d like to speak with someone who is at least 18 years old and who lives here. Would that be you?]
The purpose of this study is to help improve the way the government collects information about housing costs.
I work for a research company called Westat. We are conducting this study for the Bureau of Labor Statistics. Westat and the Bureau of Labor Statistics will use the information you provide for statistical purposes only and will hold the information in confidence to the full extent permitted by law.
I have just a few questions to ask you about your home.
2. First, is this house or apartment owned or being bought by you or someone in your household?
 YES GO TO Q5
 NO
3. Is this house or apartment being rented by you or someone in your household?
 YES
 NO GO TO END
4. How much is your current monthly rent?
$ GO TO END
 DON’T KNOW GO TO END
 REFUSED GO TO END
5. If your home were to be rented out, about how much would it rent for per month?
$
 DON’T KNOW
 REFUSED
6. What is the least you would accept in rent?
$
 DON’T KNOW
 REFUSED
7. Do you currently make a mortgage payment?
 YES
 NO GO TO END
8. What is your mortgage payment each month?
$
 DON’T KNOW
 REFUSED
END: Those are all the questions I have for you today. Thank you very much for your participation.
APPENDIX D
TRAINING AGENDA
CPI Housing Tenure Survey
Interviewer Training Agenda
October 14, 2002
9:00 am ID Badges
10:00 am Staff Introductions
10:15 am Project Overview
10:30 am The Case Folder
11:30 am The Questionnaires
12:15 pm LUNCH and ID Badges
1:00 pm Contact Procedures and Frequently Asked Questions
2:00 pm Role Plays (Contact and Questionnaire)
3:00 pm Administrative Procedures
-Interviewer Edit
-Mailing Procedures
-Weekly Report Calls
-Time Sheets
4:30 pm Questions and Wrap-up
APPENDIX E
ADVANCE LETTER
 
  
 
  
1 As stated in the Decision Paper for the Continuous Updating initiative, “By FY 2003, the goal is to produce a new plan for carrying out these activities that will include the level of effort and resources required. These planning activities will, of necessity, also address the implications of a continuous CPI revision for the Consumer Expenditure Survey program.”
	
2 Note that this document describes the original 86 area design. The final initiative with a reduced funding level is based on a 75 area design. The costs and benefits for the IT components do not change as a result of this change in area design and the document has not been updated to reflect the smaller area design.
3 Prior to the 1998 CPI Revision, non-self-representing metropolitan areas were published in two size strata in each region. These B and C strata were combined in 1998 and designated the B/C strata to facilitate comparison with earlier published data.
4 Experian indicated they may update their files at some later date.
5 Some census blocks contain no occupied housing units and those blocks were excluded from the analysis. Including blocks with no occupied housing units there are 16, 549 blocks in the SF1.
6 Clearly, some changes have happened between 2000 and 2002 but we have no means of evaluating these changes.
7 The means are not identical to ratios that could be computed from Table 1 because each block is counted equally in the means.
8 The original MSG file had a value of ‘9’. As noted in the sample design section, these cases were eliminated from the sample frame.
9 See Appendix A for these tables for the comparisons with the MSG data for the entire sample.
	 
		
	
| File Type | application/msword | 
| File Title | Outline for Report on PSU and Housing Continuous Revision | 
| Author | John S. Greenlees | 
| Last Modified By | Mason_C | 
| File Modified | 2005-06-17 | 
| File Created | 2005-06-16 |