NATIONAL CENTER FOR EDUCATION STATISTICS NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS
	
	
National Assessment of Educational Progress (NAEP)
2025 Long-Term Trend (LTT) Clearance Package
	
	
	
	
	
	
Appendix C
NAEP 2025 Long-Term Trend (LTT) Sampling Memo
	
	
	
	
OMB# 1850-0928 v.35
	
	
	
	
 
	
	
	
June 2024
	
	
| Date: | February 27, 2024 | Memo: | 2025-m01v01s | |
| 
			 | ||||
| To: | William Ward, NCES Amy Dresher, ETS David Freund, ETS Yue Jia, ETS Pat Stearns, Pearson Lauren Byrne Rob Dymowski Marcie Hickman Jacquie Hogan | Kavemuii Murangi Lisa Rodriguez William Wall Lee Harding Veronique Lieber Amy Lin Yiting Long 
 | ||
| From: | Leslie Wallace | |||
| 
			 | 
			 | |||
| Reviewers: | Tom Krenzke, Keith Rust, Lloyd Hicks | |||
| 
			 | ||||
| Subject: 
 | Sample Design for 2025 NAEP year – Overview – DRAFT 
 | |||
1. Introduction
This memo describes the sample design as discussed at the NAEP 2025 Design Summit held on February 9, 2024. Subsequent discussions have explored possible changes to this design, however, none of those changes are approved at the time of this writing.
For the 2025 NAEP year, the sample design involves two major programs, one for the Long-Term Trend (LTT) and one for the Field Test (FT). The purpose of the Field Test is to learn more about the implications and consequences of moving from NAEP-provided devices to school-based equipment in the administration of student assessments. These two programs involve the following components:
National long-term trend assessments in mathematics and reading at ages 9, 13, and 17; and
Field test assessments in mathematics and reading at grades 4, 8, and 12.
Below is a summary list of the features of the 2025 sample design.
LTT is a paper-based assessment administered using paper and pencil.
LTT will be administered in public and private schools located in the 50 states and the District of Columbia.
LTT will employ a four-stage design, selecting primary sampling units (PSUs), schools, students, and assigning a subject to students.
LTT age 13 will be conducted in fall 2024 (October – December), age 9 in winter 2025 (January – March), and age 17 in spring 2025 (March – May).
The LTT samples will employ moderate oversampling of public schools with relatively moderate-to-high proportions of Black, Hispanic, and American Indian/Alaska Native (AIAN) students. There will also be limited oversampling of Black, Hispanic, and AIAN students within some public schools.
FT is a digitally based assessment administered on computer laptops or tablets: either NAEP-provided Chromebooks (CB) or school-based equipment (SBE).
FT will employ a four-stage two-phase design. In Phase 1, a large sample of schools will be selected and screened for SBE-eligibility (including touch screen). In Phase 2, eligible schools will be subsampled to yield the desired numbers of assessed students and participating schools by device treatment (CB/SBE) shown in Table 2. Note that some SBE-eligible schools will be assigned to use NAEP-provided Chromebooks.
FT will primarily be administered in public schools located in the 50 states, the District of Columbia, and Puerto Rico (mathematics only at grades 4 and 8).
The FT samples will not employ any oversampling related to Black, Hispanic, or AIAN students.
Administering FT on a small scale in private schools and Department of Defense Domestic Dependent Elementary and Secondary Schools (DDESS) in a handful of locations across the country has been discussed since the Design Summit, but not decided upon.
At the Design Summit the possibility of a Field Trial for the 2025 Field Test was raised. The Alliance recommends not conducting such a Field Trial, but this decision is pending. A Field Trial is not included in the remainder of this memo.
The target sample sizes of assessed students are shown in Tables 1 and 2 (which also show the approximate numbers of participating schools) by program and school type (and device treatment for Field Test).
Table 1. Target sample sizes of assessed students and expected number of participating schools for 2025 NAEP LTT
| 
				 | Program | Public school students | Private school students | Total students | Season fielded | 
| Age 9 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| LTT Math (O) | 
				 | 7,200 | 800 | 8,000 | 
				 | 
| LTT Reading (O) | LTT | 7,200 | 800 | 8,000 | 
				 | 
| Subtotal | 
				 | 14,400 | 1,600 | 16,000 | 
				 | 
| Schools | 
				 | 369 | 80 | 449 | Winter | 
| Age 13 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| LTT Math (O) | 
				 | 7,200 | 800 | 8,000 | 
				 | 
| LTT Reading (O) | LTT | 7,200 | 800 | 8,000 | 
				 | 
| Subtotal | 
				 | 14,400 | 1,600 | 16,000 | 
				 | 
| Schools | 
				 | 400 | 80 | 480 | Fall | 
| Age 17 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| LTT Math (O) | 
				 | 7,400 | 600 | 8,000 | 
				 | 
| LTT Reading (O) | LTT | 7,400 | 600 | 8,000 | 
				 | 
| Subtotal | 
				 | 14,800 | 1,200 | 16,000 | 
				 | 
| Schools | 
				 | 423 | 48 | 471 | Spring | 
| LTT SUBTOTAL | 
				 | 43,600 | 4,400 | 48,000 | 
				 | 
| Schools | LTT | 1,192 | 208 | 1,400 | 
				 | 
(O) = Operational
Table 2. Target sample sizes of assessed students and expected number of participating schools for 2025 NAEP Field Test
| Components | Program | Public school students (CB) | Public school students (SBE) | Private school students | Total | Season fielded | 
| Grade 4 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Math | 
				 | 1,500 | 4,950 | 0 | 6,450 | 
				 | 
| Reading | FT | 1,000 | 3,300 | 0 | 4,300 | 
				 | 
| Subtotal | 
				 | 2,500 | 8,250 | 0 | 10,750 | 
				 | 
| Schools | 
				 | 64 | 212 | 0 | 276 | 
				 | 
| Math – Puerto Rico | 
				 | 250 | 750 | 0 | 1,000 | 
				 | 
| Schools | FT | 13 | 37 | 0 | 50 | 
				 | 
| Grade 4 Subtotal | 
				 | 2,750 | 9,000 | 0 | 11,750 | 
				 | 
| Schools | 
				 | 77 | 249 | 0 | 326 | Winter | 
| Grade 8 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Math | 
				 | 1,500 | 4,950 | 0 | 6,450 | 
				 | 
| Reading | FT | 1,000 | 3,300 | 0 | 4,300 | 
				 | 
| Subtotal | 
				 | 2,500 | 8,250 | 0 | 10,750 | 
				 | 
| Schools | 
				 | 63 | 206 | 0 | 269 | 
				 | 
| Math – Puerto Rico | 
				 | 250 | 750 | 0 | 1,000 | 
				 | 
| Schools | FT | 13 | 37 | 0 | 50 | 
				 | 
| Grade 8 Subtotal | 
				 | 2,750 | 9,000 | 0 | 11,750 | 
				 | 
| Schools | 
				 | 76 | 243 | 0 | 319 | Winter | 
| Grade 12 | 
				 | 
				 | 
				 | 
				 | 
				 | 
				 | 
| Math | 
				 | 1,500 | 4,950 | 0 | 6,450 | 
				 | 
| Reading | FT | 1,000 | 3,300 | 0 | 4,300 | 
				 | 
| Grade 12 Subtotal | 
				 | 2,500 | 8,250 | 0 | 10,750 | 
				 | 
| Schools | 
				 | 71 | 236 | 0 | 307 | Winter | 
| FT SUBTOTAL | 
				 | 8,000 | 26,250 | 0 | 34,250 | 
				 | 
| Schools | FT | 224 | 728 | 0 | 952 | 
				 | 
| 2025 GRAND TOTAL (including LTT) | 
				 | 
				 77,850 | 4,400 | 82,250 | 
				 | |
| Schools | 
				 | 2,144 | 208 | 2,352 | 
				 | |
(CB) = NAEP-provided Chromebook, (SBE) = School-based equipment
2. Assessment Types
The assessment types for 2025 are shown in Table 3. As mentioned in Section 1, the operational assessment for LTT includes mathematics and reading in public and private schools. The Field Test includes mathematics and reading in grades 4, 8, and 12 in mainland public and private (pending approval) schools. In Puerto Rico public schools, mathematics will be administered in grades 4 and 8.
Table 3. NAEP 2025 assessment types
| Type | Subjects | Age or grade | Schools | Comments | 
| Operational | Math, reading | Ages 9, 13, 17 | Public, Private | LTT sample | 
| Field Test | Math, reading | Grades 4, 8, 12 | Public, Private (pending approval) | Mainland FT samples | 
| Field Test | Math | Grades 4, 8 | Public | Puerto Rico FT samples | 
3. Sample Design and Sample Sizes
As mentioned above, for the 2025 NAEP year, there are two major sample types: LTT and Field Test. While the detailed target counts of LTT and Field Test assessed students are provided in Tables 1 and 2, a summary of major points is included in the sections that follow.
3.1 The LTT Sample
The LTT sample consists of the public and private school samples for the operational assessments in reading and mathematics for students aged 9, 13, and 17. The target student sample size is 16,000 assessed students per age, or 48,000 students total. Note that for ages 9 and 13, 10 percent of the assessed students are allocated to private schools. This roughly represents a proportional sample, as about 10 percent of the student population attends private schools. For age 17, slightly more of the sample is allocated to public schools and less to private schools to try to ensure reportable results (for public and private combined) based on anticipated response rates. Consequently, for all three ages the sample size for private schools is only large enough for reliable reporting at the overall private school level.
Primary Sampling Units Selection
The sample for the LTT assessments is based on a clustered design using LTT primary sampling units (PSUs) for reasons of operational efficiency. A sample of 108 PSUs has been selected from a frame of approximately 980 PSUs. All sampled schools will be drawn from within the sampled PSUs.
The PSUs were created from aggregates of counties. Data on counties were obtained from the 2020 Census Demographic and Housing Characteristics File (DHC). Each Metropolitan Statistical Area (MeSA) constitutes a PSU, except that MeSAs that cross state boundaries were split into separate PSUs according to Census region boundaries. Non-metropolitan PSUs were formed by aggregating counties into geographic units of sufficient minimum size to provide enough schools to constitute a workload of about 1% of the total sample. These PSUs were made of contiguous counties where possible, and nearly-contiguous counties (separated by MeSA counties) otherwise.
The PSUs were stratified using an index that distinguished PSUs based on homogeneity. The index was generated by a LASSO regression that identified PSU characteristics related to NAEP achievement in past assessments among a set of variables representative of NAEP reporting subgroups. A sample of 108 PSUs was selected for the 2025 LTT samples. Thirty-two large MeSAs were selected with certainty, and the remaining sample was a stratified probability proportional to size (PPS) sample, where the size measure was a function of the number of children as given in the 2020 Decennial Census. Note that 2025 marks the first NAEP year with PSUs selected from the new 2020-Census-based PSU frame.
Overlap Control at School-Level
A desired feature of the LTT sample is to avoid overlap with other samples including Program for the International Student Assessment (PISA), which will be conducted in the spring of 2025. However, the timing of the NAEP LTT and PISA school sampling activities makes such overlap control coordination difficult for 2025. Currently, the PISA school frame (including school probabilities of selection) will be constructed in March, the NAEP LTT samples will be selected in March or April, and the PISA schools will be selected in May. If overlap control can be implemented, it is likely that not all schools from the PISA sample will be avoided due to small numbers of schools on the frame within some PSUs.
Stratification and Oversampling
The plan for the LTT sample design is to draw separate public and private school samples for each age, which has proven advantages:
it permits the timing of sample selection to vary between public and private schools, should this prove necessary;
it allows us to readily assume different response and eligibility rates for public and private schools;
it makes it easier to use different sort variables used in the selection of public schools and private schools; and
it allows for the possibility of a late change of mind concerning the sample sizes that differ between public and private schools.
For the LTT samples, explicit stratification has taken place at the PSU level. For schools within PSUs, stratification gains are achieved by sorting the school file for each age prior to systematic selection. As in past national samples, the expectation is that within the set of certainty MeSA PSUs within a census region, PSU will not necessarily be the highest level sort variable. Thus, type of location will be used as the primary sort variable. Consider for example the large MeSAs in the Midwest region. The design is aimed primarily at getting the correct balance of city, suburban, town, and rural schools crossed by city size and distance from urbanized areas, as a priority over getting exactly a proportional representation from each MeSA (Chicago, Detroit, Minneapolis), although of course it should be possible to get a high degree of control over both of these characteristics. The sort of the schools will use other variables beyond the type of location variable, such as a race/ethnicity percentage variable. The exact set of variables used in sorting the schools prior to sampling will be specified in the particular sampling specification memos.
In addition, for the LTT samples, we will implement oversampling of certain public schools. To increase the likelihood that the results for American Indian/Alaska Native (AIAN) students can be reported for the operational samples, we will oversample high-AIAN public schools. That is, a public school with 5 percent or more AIAN enrollment will be given four times the chance of selection of a public school of the same size with a lower AIAN percentage. Research into oversampling schemes that could benefit AIAN students indicates that this approach should be effective in increasing the sample sizes of AIAN students, without inducing undesirably large design effects on the sample, either overall or for particular subgroups. In addition, high minority public schools for LTT that are not oversampled for AIAN enrollment will be oversampled for Black and Hispanic enrollment. That is, a public school with 15 percent or more Black and Hispanic combined enrollment will be given twice the chance of selection of a public school of the same size with a lower percentage of these two groups. This approach is effective in increasing the sample sizes of Black and Hispanic students, without inducing undesirably large design effects on the sample, either overall or for particular subgroups. Beyond this, we will also implement the oversampling of AIAN, Black, and Hispanic students at the student level in schools not being oversampled at the school level.
The preliminary 2022-23 CCD and the updated 2021-22 PSS school files serve as the basis for the public and private school frames for the LTT sample.
3.2 The Field Test Samples
The Field Test samples consist of public school samples for grades 4, 8, and 12. They will be used for learning more about the implications and consequences of moving from NAEP-provided devices to school-based equipment in the administration of student assessments. The Field Test samples will comprise the following:
The Field Test mathematics and reading spiraling occurs at grades 4, 8, and 12. The digitally based assessment (DBA) will be device delivered and conducted in public schools in a sample of PSUs. The session has targets of 6,450 assessed students in mathematics and 4,300 assessed students in reading at each grade. A decision as to including private schools is pending.
The Field Test mathematics session in Puerto Rico will occur in grades 4 and 8. The DBA will be device delivered and conducted only in public schools. The session has a target of 1,000 assessed students at each grade.
Primary Sampling Units Selection
Similar to the LTT sample design, the samples for the Field Test assessments will be based on a clustered design using PSUs. A sample of 108 PSUs has been selected from a frame of approximately 980 PSUs (the same frame used to select the LTT PSUs). For the Field Test, the 32 largest MeSAs were again selected with certainty, and the remaining sample was a stratified probability proportional to size (PPS) sample selected in the same manner as that described above for the LTT noncertainty PSUs, with one exception—care was taken to minimize overlap between the noncertainty PSUs selected for Field Test and those selected for LTT. This procedure was largely successful, however, the two PSU samples have 37 PSUs in common—the 32 certainty PSUs plus 5 noncertainty PSUs. All sampled schools will be drawn from within the sampled PSUs.
Overlap Control at School-Level
The selection process of Field Test schools will avoid overlap with the LTT school samples. Because the Field Test does not require a nationally representative sample, overlap will be avoided by removing LTT schools from the school lists within each Field Test PSU. If possible, PISA schools will be removed from the Field Test school lists as well. The timing of the NAEP Field Test and PISA sampling activities may mean that this can be done for Phase 2, but not Phase 1. In addition, schools will only conduct the Field Test in at most one grade.
Stratification and Oversampling
For the Field Test, explicit stratification will take place at the PSU level. Although a probability-based design will be used, deviations from a strict probability-based sample will occur due to avoiding overlap as discussed above. Stratification will be used for the purpose of having a good variety of area characteristics, as well as school and student characteristics to serve the purposes of the Field Test. For schools within PSUs, stratification will occur by sorting the school file prior to systematic selection. As in past national samples, the expectation is that, within the set of certainty MeSA PSUs within a census region, PSU will not necessarily be the highest level sort variable. Thus, type of location will be used as the primary sort variable. The design is aimed primarily at getting the correct balance of city, suburban, town, and rural schools, as a priority over getting exactly a proportional representation from each MeSA. The sort of the schools will use other variables beyond the type of location variable, such as a race/ethnicity percentage variable. The exact set of variables used in sorting the schools prior to sampling will be specified in the particular sampling specification memos.
No oversampling of schools or students will occur in the Field Test. Schools will be selected with probabilities proportionate to size.
For Puerto Rico, the sample will not be clustered. The sampling frame of schools will be stratified using the type of location variable. No oversampling of schools or students will occur in the Field Test in Puerto Rico. Schools will be selected with probabilities proportionate to size.
The preliminary 2022-23 CCD serves as the basis for the public school frames for the Field Test. If private schools are included, the 2021-22 PSS frame will be used.
Two-Phase Sample Design
As mentioned earlier, the Field Test sample will be selected in two phases. In Phase 1, a large sample of schools will be selected and screened for SBE-eligibility (including touch screen). In Phase 2, eligible schools will be subsampled to yield the desired numbers of assessed students and participating schools. Table 4 presents the number of sampled schools, the expected number of participating schools, and the expected number of assessed students by grade, device treatment, and phase for the mainland and Puerto Rico.
Table 4. NAEP 2025 two-phase design
| Location and grade | Device treatment | Phase 1 sampled schools | Phase 2 expected sampled schools | Phase 2 expected participating schools | Phase 2 expected assessed students | 
| Mainland grade 4 | CB | 1,400 | 390 | 64 | 2,500 | 
| Mainland grade 4 | SBE | 212 | 8,250 | ||
| Mainland grade 8 | CB | 1,400 | 390 | 63 | 2,500 | 
| Mainland grade 8 | SBE | 206 | 8,250 | ||
| Mainland grade 12 | CB | 1,600 | 390 | 71 | 2,500 | 
| Mainland grade 12 | SBE | 236 | 8,250 | ||
| Puerto Rico grade 4 | CB | 200 | 70 | 13 | 250 | 
| Puerto Rico grade 4 | SBE | 37 | 750 | ||
| Puerto Rico grade 8 | CB | 200 | 70 | 13 | 250 | 
| Puerto Rico grade 8 | SBE | 37 | 750 | 
Possible Inclusion of Private Schools and DDESS Schools in Field Test
Recent discussions have included the idea of conducting the Field Test in some private schools and Department of Defense Domestic Dependent Elementary and Secondary Schools (DDESS) so that something can be learned about the implications and consequences of moving from NAEP-provided devices to school-based equipment in these types of schools. Regarding private schools, the Alliance proposes including 3 to 4 Catholic diocese in the Field Test, and selecting 20 schools at grade 4, 16 schools at grade 8, and 12 schools at grade 12 for Phase 1. The goal would be to get 5, 4, and 3 participating schools at grades 4, 8, and 12 respectively for Phase 2. All private schools would conduct the assessments using school-based equipment.
DDESS schools are normally in-scope for national assessments like a Field Test because they are public and located in the United States, however, it is possible that none (or very few) will be selected due to the size of their student population relative to the student population as a whole and due to the clustering of the sample within PSUs. Should NCES want to ensure the selection of a certain number of DDESS schools in the Field Test sample, one approach might be to place the DDESS schools in the sampled PSUs into a separate stratum and oversample them at a rate designed to yield the desired number of participating schools.
At the time of this writing, whether to include private schools or DDESS schools in the Field Test has not been decided. Any such schools and students included in the Field Test would be in addition to the numbers of schools and students presented in Table 2.
4. New Schools
For the LTT public school samples, a sample of new schools will be selected to compensate for the fact that files used to create the NAEP LTT school sampling frame will be two years out of date at the time of assessment. The new school samples will be drawn using a three-stage design. The first stage is the selection of the LTT sample PSUs, as discussed above. At the second stage, a national sample of school districts will be selected from the LTT sample PSUs. The sampled districts will be asked to review lists of their respective schools and identify new schools. Frames of new schools will be constructed from these updates, and at the third stage, new schools will be drawn with probability proportional to size using the same sampling rates as their corresponding original school samples.
For the LTT private school sample as well as for the Field Test sample, new school samples will not be selected. This is in keeping with recent similar national-only assessments in years when the state-level assessments are not being conducted.
5. Substitute Samples
A portion of the eligible 2025 LTT sample schools at each age will choose to not participate in the assessment. To reduce nonresponse bias, substitute school samples will be selected for the 2025 LTT sample. The order for selecting substitute schools will be from “oldest” to “youngest”; that is, age 17, 13, and then 9. This ordering of samples by age is necessary since no school can be selected as a substitute more than once and there are fewer schools available to serve as substitutes at the higher ages. This will be done separately for both public and private schools. The general steps for selecting substitutes will be to put the substitute frames in their original sampling sort order, and take the 'nearest neighbor' of each original sampled school, excluding schools that have already been selected for NAEP 2025 or that cross PSU or state boundaries, as potential substitutes.
The nearest neighbor will be the school adjacent (immediately preceding or succeeding) to the original school in the sorted frame with the closer estimated age enrollment value. If estimated age enrollment of both potential substitute schools differs from the original school by the exact same amount, the selection procedure will randomly choose one of the schools. If neither the preceding school nor the succeeding school is eligible to be a substitute, then the sampled school will not be assigned a substitute.
In addition, sampled private schools whose school affiliation is unknown will not get substitutes nor can such private schools not in sample serve as substitute schools. Also, new schools will not get substitute schools nor serve as substitutes.
Substitute schools will not be selected for the Field Test samples given the need for the screening phase. Anticipated response rates will be considered when determining subsampling rates for Phase 2.
6. Student Sampling
For the LTT sample, students within the sampled schools will be selected with equal probability, except in public schools where oversampling of AIAN, Black and Hispanic students will take place. In addition to this, student sample sizes for LTT within each school are determined as the combined result of several factors:
1. We wish to take all students in relatively small schools.
2. We do not wish to have a sample that is too clustered for any assessment subject.
3. We do not wish to have many physical sessions that contain only a very small number of students, as this is inefficient.
4. We do not wish to overburden the schools with unduly large student samples.
The plans for LTT below reflect the design that results from considering each of these factors and balancing them.
LTT Private Schools and Oversampled Public Schools
In all private schools and public schools that are oversampled (as described in Section 3.1), the target sample size is 50 assessed students. We will select all students up to 50. In schools with more than 50 such students, we will select 50.
LTT Non-Oversampled Public Schools
In public schools not oversampled at the school level (i.e., under 5% AIAN and under 15% Black and Hispanic students), we will select 50 students plus an oversample of up to 5 additional AIAN, Black, and Hispanic students. The maximum number of sample students will be 55 in these schools.
Field Test Schools
For the Field Test the target sample size will be 50 assessed students in the grade of interest. We will select all students up to 50. In schools with more than 50 students in the grade, we will select 50. In Puerto Rico, the corresponding threshold value is 25 students.
7. Weighting Requirements
Weighting the 2025 assessments is off-contract and will be the responsibility of the 2024-29 Sampling and Weighting contractor. That said, the LTT operational samples typically require a single set of weights for each subject (LTT Math and LTT Reading at ages 9, 13, and 17), applied to reflect probabilities of selection, school and student nonresponse, any trimming, and the random assignment to the particular subject. LTT preliminary weights are typically developed as required to meet the needs of the Design, Analysis, and Reporting (DAR) contractor.
For the Field Test sample, the usual plan is to not produce ‘operational’ weights, although preliminary weights are typically provided to the DAR contractor.
| File Type | application/vnd.openxmlformats-officedocument.wordprocessingml.document | 
| File Title | Appendix A (Statute Authorizing NAEP) | 
| Author | joconnell | 
| File Modified | 0000-00-00 | 
| File Created | 2024-07-21 |