Difference between revisions of "3.5 Synthetic Population"

From NFTPO Model
Jump to: navigation, search
(Updated to NERPMAB2)
 
(2 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
<h3 id="id-3.5SYNTHETICPOPULATION-Overview"><span style="color: rgb(255,102,0);">Overview</span></h3><p class="BodyParagraph">In trip-based models, trip rates are applied to aggregate households grouped in Traffic Analysis Zones (TAZs) to generate trips. On the other hand, in an activity-based model (ABM), choices involving activities and trips are simulated for each of the individual persons in households. Hence, it is necessary to first develop a “synthetic population” of the regions’ residents. Synthetic population is a list of households and persons that is based on observed or forecasted distributions of socioeconomic attributes and created by sampling detailed Census microdata. This produces individual household agents and individual person agents that are subjects of the simulation.</p><p class="BodyParagraph">Prior to their use in the simulation, synthetic populations are represented in data tables, often in a relational database or some equivalently structured file system. Typically there are separate tables for households and person records. The household records file provides details about various household-level socio-demographic attributes such as household income, size, number of workers, etc. Similarly, the person records file provides information about person-level attributes such as age, gender, employment status, etc. Person records are linked to household records through ID numbers.</p><h4 class="BodyParagraph" id="id-3.5SYNTHETICPOPULATION-TABLE3-2SAMPLEHOUSEHOLDRECORDSFILE">TABLE 3-2 SAMPLE HOUSEHOLD RECORDS FILE</h4><div class="table-wrap">
 
<h3 id="id-3.5SYNTHETICPOPULATION-Overview"><span style="color: rgb(255,102,0);">Overview</span></h3><p class="BodyParagraph">In trip-based models, trip rates are applied to aggregate households grouped in Traffic Analysis Zones (TAZs) to generate trips. On the other hand, in an activity-based model (ABM), choices involving activities and trips are simulated for each of the individual persons in households. Hence, it is necessary to first develop a “synthetic population” of the regions’ residents. Synthetic population is a list of households and persons that is based on observed or forecasted distributions of socioeconomic attributes and created by sampling detailed Census microdata. This produces individual household agents and individual person agents that are subjects of the simulation.</p><p class="BodyParagraph">Prior to their use in the simulation, synthetic populations are represented in data tables, often in a relational database or some equivalently structured file system. Typically there are separate tables for households and person records. The household records file provides details about various household-level socio-demographic attributes such as household income, size, number of workers, etc. Similarly, the person records file provides information about person-level attributes such as age, gender, employment status, etc. Person records are linked to household records through ID numbers.</p><h4 class="BodyParagraph" id="id-3.5SYNTHETICPOPULATION-TABLE3-2SAMPLEHOUSEHOLDRECORDSFILE">TABLE 3-2 SAMPLE HOUSEHOLD RECORDS FILE</h4><div class="table-wrap">
{|  class="confluenceTable"  
+
{|  class="wikitable"  
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong> TAZ</strong></p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhno</strong></p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>HHID</strong></p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhsize</strong></p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Age of Household Head</strong></p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhvehs</strong></p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Number of persons</strong></p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhwkrs</strong></p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Income Group</strong></p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhftw</strong></p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Presence of Children</strong></p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhptw</strong></p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Number Workers</strong></p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhret</strong></p>
|-  
+
| class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhoad</strong></p>
|  class="confluenceTd" | <p>143</p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhuni</strong></p>
|  class="confluenceTd" | <p align="center">16667</p>
+
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhhsc</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hh515</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhcu5</strong></p>
 +
class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhincome</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hownrent</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hrestype</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhparcel</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhtaz</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhexpfac</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>samptype</strong></p>
 +
|-
 +
| class="confluenceTd" | <p>1</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">75095</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">2699</p>
 +
| class="confluenceTd" | <p align="center">227</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">11</p>
 +
|-
 +
| class="confluenceTd" | <p>2</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">58074</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">2699</p>
 +
| class="confluenceTd" | <p align="center">227</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">11</p>
 +
|-
 +
| class="confluenceTd" | <p>3</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">0</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">30288</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">3</p>
 +
| class="confluenceTd" | <p align="center">23479</p>
 +
| class="confluenceTd" | <p align="center">227</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">11</p>
 +
|-
 +
| class="confluenceTd" | <p>4</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">0</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">1802</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">3</p>
 +
| class="confluenceTd" | <p align="center">37105</p>
 +
| class="confluenceTd" | <p align="center">227</p>
 +
| class="confluenceTd" | <p align="center">1</p>
 +
| class="confluenceTd" | <p align="center">11</p>
 +
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-3SAMPLEPERSONRECORDSFILE">TABLE 3-3 SAMPLE PERSON RECORDS FILE</h4><div class="table-wrap">
 +
{|  class="wikitable"
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>hhno</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pno</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pptyp</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pagey</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pgend</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pwtyp</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pwpcl</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pwtaz</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pwautime</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pwaudist</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pstyp</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pspcl</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pstaz</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>psautime</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>psaudist</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>puwmode</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>puwarrp</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>puwdepp</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>ptpass</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>ppaidprk</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pdiary</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>pproxy</strong></p>
 +
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>psexpfac</strong></p>
 +
|-
 +
|  class="confluenceTd" | <p>1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">2</p>
 
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 +
|  class="confluenceTd" | <p align="center">41</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|-
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p>193</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">17392</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">1</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">4</p>
 
 
|  class="confluenceTd" | <p align="center">0</p>
 
|  class="confluenceTd" | <p align="center">0</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|-  
+
|-
|  class="confluenceTd" | <p>77</p>
+
|  class="confluenceTd" | <p>2</p>
|  class="confluenceTd" | <p align="center">232</p>
 
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">3</p>
 
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 +
|  class="confluenceTd" | <p align="center">34</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">2</p>
 
|-
 
|  class="confluenceTd" | <p>18</p>
 
|  class="confluenceTd" | <p align="center">5042</p>
 
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">4</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">3</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">1</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">3</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-3SAMPLEPERSONRECORDSFILE">TABLE 3-3 SAMPLE PERSON RECORDS FILE</h4><div class="table-wrap">
+
| class="confluenceTd" | <p align="center">0</p>
{| class="confluenceTable"  
+
class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong> TAZ</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>HHID</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Person ID</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Age</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Works from Home</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Employment Status</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Gender</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Hours Worked per Week</strong></p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|-
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p>77</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">232</p>
 
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">22</p>
+
|-
 +
|  class="confluenceTd" | <p>3</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">1</p>
+
|  class="confluenceTd" | <p align="center">5</p>
|  class="confluenceTd" | <p align="center">2</p>
+
|  class="confluenceTd" | <p align="center">26</p>
|  class="confluenceTd" | <p align="center">9</p>
 
|-
 
|  class="confluenceTd" | <p>77</p>
 
|  class="confluenceTd" | <p align="center">232</p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">24</p>
 
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">0</p>
 
|  class="confluenceTd" | <p align="center">0</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">45</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|-  
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p>77</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">232</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
|  class="confluenceTd" | <p align="center">3</p>
+
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
| class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
|  class="confluenceTd" | <p align="center">0</p>
+
|-
 +
|  class="confluenceTd" | <p>4</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 +
|  class="confluenceTd" | <p align="center">4</p>
 +
|  class="confluenceTd" | <p align="center">34</p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">0</p>
 
|  class="confluenceTd" | <p align="center">0</p>
|}</div><p class="BodyParagraph">Population synthesized by any synthetic population generator may be used with DaySim as long as the required household and person socio-demographic attributes are provided to it in the appropriate format. PopGen, the synthetic population generator developed at Arizona State University (ASU) was chosen for this effort primarily for two reasons. First, it has the ability to control for both household and person level demographic attributes simultaneously. Second, it has an easy-to-use and simple graphical user interface (GUI).</p><h3 id="id-3.5SYNTHETICPOPULATION-PreparingSyntheticPopulationsforDaySim"><span style="color: rgb(255,102,0);">Preparing Synthetic Populations for DaySim</span></h3><p class="BodyParagraph">The design of the synthetic population should support the design of the activity-based model (DaySim in this case) and provide the variables it needs. In addition, the activity-based model should only rely on information that can be realistically provided in the synthetic population.</p><p class="BodyParagraph">Population synthesis generally consists of the synthesis of two sub-populations – those living in regular households and those living in non-institutionalized group quarters such as college dormitories. For this effort, an additional segment of population was synthesized which comprised of seasonal households. These segments were established to reflect the differences in travel patterns associated with these sub-populations as well as to provide the ability to support seasonal analyses. For example, the seasonal population is generally older than the permanent population, has lower levels of workforce participation, and clusters in certain geographic areas. All of these attributes influence travel patterns and the demand for travel.</p><p class="BodyParagraph">There are three major steps in creating a synthetic population:</p><ol><li>Specifying the inputs to the process—the control variables and sample households as well as the level of geographic resolution. Specifying the control variables is essential. In addition, there is often an additional step of specifying additional, uncontrolled variables to be added to the synthetic population.</li><li>Actually running a program that produces the synthetic households.</li><li>The third major step would be transforming the model-generated outputs into characteristics of the population that will be used throughout the rest of the model system. This could involve creating categorical variables out of continuous variables, reformulating income, or allocating households from the zonal level to a finer level of geographic resolution, such as a parcel.</li></ol><h4 id="id-3.5SYNTHETICPOPULATION-DaySimPersonTypes"><em>DaySim Person Types</em></h4><p class="BodyParagraph">Although person are being modeled in disaggregate form in an ABM, it is often useful to create person type categories. DaySim uses 8 such person types. Person type categories may be used for various purposes:</p><ol><li>As a basic segmentation for certain models, such as daily activity pattern models</li><li>To summarize and compare observed versus estimated data and calibrate models</li><li>As explanatory variables in models</li><li>As constraints on alternatives that are available; for example, work and school activities are only available to workers and student; and driving is restricted by age</li></ol><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-4DAYSIMPERSONTYPES">TABLE 3-4 DAYSIM PERSON TYPES</h4><div class="table-wrap">
+
|  class="confluenceTd" | <p align="center">-1</p>
{|  class="confluenceTable"  
+
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">0</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">-1</p>
 +
|  class="confluenceTd" | <p align="center">1</p>
 +
|}</div><p class="BodyParagraph">Population synthesized by any synthetic population generator may be used with DaySim as long as the required household and person socio-demographic attributes are provided to it in the appropriate format. PopulationSim is an open platform for population synthesis and survey weighting.  It emerged from Oregon DOT’s desire to build a shared, open, platform that could be easily adapted for statewide, regional, and urban transportation planning needs. It has the ability to control for both household and person level demographic attributes simultaneously.</p><h3 id="id-3.5SYNTHETICPOPULATION-PreparingSyntheticPopulationsforDaySim"><span style="color: rgb(255,102,0);">Preparing Synthetic Populations for DaySim</span></h3><p class="BodyParagraph">The design of the synthetic population should support the design of the activity-based model (DaySim in this case) and provide the variables it needs. In addition, the activity-based model should only rely on information that can be realistically provided in the synthetic population.</p><p class="BodyParagraph">Population synthesis generally consists of the synthesis of two sub-populations – those living in regular households and those living in non-institutionalized group quarters such as college dormitories. For this effort, an additional segment of population was synthesized which comprised of seasonal households. These segments were established to reflect the differences in travel patterns associated with these sub-populations as well as to provide the ability to support seasonal analyses. For example, the seasonal population is generally older than the permanent population, has lower levels of workforce participation, and clusters in certain geographic areas. All of these attributes influence travel patterns and the demand for travel.</p><p class="BodyParagraph">There are three major steps in creating a synthetic population:</p><ol><li>Specifying the inputs to the process—the control variables and sample households as well as the level of geographic resolution. Specifying the control variables is essential. In addition, there is often an additional step of specifying additional, uncontrolled variables to be added to the synthetic population.</li><li>Actually running a program that produces the synthetic households.</li><li>The third major step would be transforming the model-generated outputs into characteristics of the population that will be used throughout the rest of the model system. This could involve creating categorical variables out of continuous variables, reformulating income, or allocating households from the zonal level to a finer level of geographic resolution, such as a parcel.</li></ol><h4 id="id-3.5SYNTHETICPOPULATION-DaySimPersonTypes"><em>DaySim Person Types</em></h4><p class="BodyParagraph">Although person are being modeled in disaggregate form in an ABM, it is often useful to create person type categories. DaySim uses 8 such person types. Person type categories may be used for various purposes:</p><ol><li>As a basic segmentation for certain models, such as daily activity pattern models</li><li>To summarize and compare observed versus estimated data and calibrate models</li><li>As explanatory variables in models</li><li>As constraints on alternatives that are available; for example, work and school activities are only available to workers and student; and driving is restricted by age</li></ol><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-4DAYSIMPERSONTYPES">TABLE 3-4 DAYSIM PERSON TYPES</h4><div class="table-wrap">
 +
{|  class="wikitable"  
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong> No.</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong> No.</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Person Type</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Person Type</strong></p>
Line 134: Line 278:
 
|  class="confluenceTd" | <p align="center">Unemployed</p>
 
|  class="confluenceTd" | <p align="center">Unemployed</p>
 
|  class="confluenceTd" | <p align="center">None</p>
 
|  class="confluenceTd" | <p align="center">None</p>
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-ControlAttributesandTargetDistributions"><em>Control Attributes and Target Distributions</em></h4><p class="BodyParagraph">There are three major inputs required for population synthesis of which the first step is to identify a set of control attributes and their levels. Next, target distributions of the control attributes and their levels are derived at appropriate geographic units. These target distributions are also known as marginal control totals since they represent the margins of a joint distribution of multiple attributes. Typically, the smallest level of spatial resolution that can be feasibly and reliably used to control attributes is used. If control attribute totals are not accurate at a particular spatial unit, they could be specified at a lower resolution.</p><p class="BodyParagraph">The following considerations are usually important in choosing control variables:</p><ul><li>The number of control variables is important. If there are too few, the synthetic population may not accurately reflect the true population. On the other hand, too many control attributes may cause sample issues. There may not be any sample households with joint attributes of the control variables and this could distort the synthetic population.</li><li>Control attributes may be single or multi-dimensional. Multi-dimensional attributes can be treated as single dimensional attributes with number of categories equal to the product of the numbers of categories in individual attributes. The primary advantage of multi-dimensional attributes is more precise regional control over the correlation between attributes. The disadvantage again is with sparse sample.</li><li>The best choices of variables, will be meaningful attributes that are somewhat “orthogonal” to each other, which means that their variance in the population is largely independent. Conversely, if there are two attributes that are highly correlated, then controlling for both may not achieve much more than controlling for just one.</li><li>Finally, different sets of control attributes may be used for base and forecast years, if limited by forecasting accuracy.  This is not necessarily desirable, though. The ability to forecast marginal control totals should be a consideration when specifying control attributes for this base year.</li></ul><p class="BodyParagraph">Target distributions of control variables for the base year could be obtained from a variety of data sources including the following:</p><ul><li>Decennial Census: ~100% sample</li><li>American Community Survey (ACS) summary files: 3% sample, rolling 5-year sample, yields an estimate of ~15% of population</li><li>Census Transportation Planning Products (CTPP)</li><li>Other zonal data developed locally (TAZs)</li></ul><p class="BodyParagraph">For the forecast year, regional socio-economic forecasts or outputs from a land-use model are often used.</p><p class="BodyParagraph">The following tables provide the list of control attributes and their levels along with the specific data sources used to obtain corresponding target distributions. All the distributions were obtained at the TAZ level.</p><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-5HOUSEHOLDCONTROLDATAFORPERMANENTHOUSEHOLDS">TABLE 3-5 HOUSEHOLD CONTROL DATA FOR PERMANENT HOUSEHOLDS</h4><div class="table-wrap">
+
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-ControlAttributesandTargetDistributions"><em>Control Attributes and Target Distributions</em></h4><p class="BodyParagraph">There are three major inputs required for population synthesis of which the first step is to identify a set of control attributes and their levels. Next, target distributions of the control attributes and their levels are derived at appropriate geographic units. These target distributions are also known as marginal control totals since they represent the margins of a joint distribution of multiple attributes. Typically, the smallest level of spatial resolution that can be feasibly and reliably used to control attributes is used. If control attribute totals are not accurate at a particular spatial unit, they could be specified at a lower resolution.</p><p class="BodyParagraph">The following considerations are usually important in choosing control variables:</p><ul><li>The number of control variables is important. If there are too few, the synthetic population may not accurately reflect the true population. On the other hand, too many control attributes may cause sample issues. There may not be any sample households with joint attributes of the control variables and this could distort the synthetic population.</li><li>Control attributes may be single or multi-dimensional. Multi-dimensional attributes can be treated as single dimensional attributes with number of categories equal to the product of the numbers of categories in individual attributes. The primary advantage of multi-dimensional attributes is more precise regional control over the correlation between attributes. The disadvantage again is with sparse sample.</li><li>The best choices of variables, will be meaningful attributes that are somewhat “orthogonal” to each other, which means that their variance in the population is largely independent. Conversely, if there are two attributes that are highly correlated, then controlling for both may not achieve much more than controlling for just one.</li><li>Finally, different sets of control attributes may be used for base and forecast years, if limited by forecasting accuracy.  This is not necessarily desirable, though. The ability to forecast marginal control totals should be a consideration when specifying control attributes for this base year.</li></ul><p class="BodyParagraph">Target distributions of control variables for the base year could be obtained from a variety of data sources including the following:</p><ul><li>Decennial Census: ~100% sample</li><li>American Community Survey (ACS) summary files: 3% sample, rolling 5-year sample, yields an estimate of ~15% of population</li><li>Bureau of Economic and Business Research (BEBR)</li><li>Census Transportation Planning Products (CTPP)</li><li>Other zonal data developed locally (TAZs)</li></ul><p class="BodyParagraph">For the forecast year, regional socio-economic forecasts or outputs from a land-use model are often used.</p><p class="BodyParagraph">The following table provides the list of control attributes, their geographic and demographic levels along with the relative importance of each control.</p><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-PopulationSim Controls">TABLE 3-5 PopulationSim Controls</h4><div class="table-wrap">
{| class="confluenceTable"  
+
{| class="wikitable"  
| class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Household Attribute</strong></p>
+
|-
| class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Category Number</strong></p>
+
! target
| class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Categories</strong></p>
+
! geography
| class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Data Source</strong></p>
+
! seed_table
|-
+
! importance
| class="confluenceTd" | <p>Householder unit type</p>
+
! control_field
| class="confluenceTd" | <p align="center">1</p>
+
! expression
| class="confluenceTd" | <p align="center">Single family dwelling</p>
+
|-
| class="confluenceTd" | <p align="center">NERPM TAZ Data</p>
+
| num_hh
|-  
+
| MAZ
| class="confluenceTd" | <p></p>
+
| households
| class="confluenceTd" | <p align="center">2</p>
+
| 1000000000
| class="confluenceTd" | <p align="center">Multi family dwelling</p>
+
| HHS
| class="confluenceTd" | <p align="center"></p>
+
| (households.WGTP > 0) & (households.WGTP < np.inf)
|-
+
|-
| class="confluenceTd" | <p>Presence of children</p>
+
| hh_size_1
| class="confluenceTd" | <p align="center">1</p>
+
| TAZ
| class="confluenceTd" | <p align="center">Yes</p>
+
| households
| class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
+
| 5000
|-  
+
| HHSIZE1_S3
| class="confluenceTd" | <p></p>
+
| households.NP == 1
| class="confluenceTd" | <p align="center">2</p>
+
|-
| class="confluenceTd" | <p align="center">No</p>
+
| hh_size_2
| class="confluenceTd" | <p align="center"></p>
+
| TAZ
|-
+
| households
| class="confluenceTd" | <p>Householder age</p>
+
| 5000
| class="confluenceTd" | <p align="center">1</p>
+
| HHSIZE2_S3
| class="confluenceTd" | <p align="center">15 to 24 years</p>
+
| households.NP == 2
| class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
+
|-
|-  
+
| hh_size_3
| class="confluenceTd" | <p></p>
+
| TAZ
| class="confluenceTd" | <p align="center">2</p>
+
| households
| class="confluenceTd" | <p align="center">25 to 54 years</p>
+
| 5000
| class="confluenceTd" | <p align="center"></p>
+
| HHSIZE3_S3
|-
+
| households.NP == 3
| class="confluenceTd" | <p></p>
+
|-
| class="confluenceTd" | <p align="center">3</p>
+
| hh_size_4
| class="confluenceTd" | <p align="center">55 to 64 years</p>
+
| TAZ
| class="confluenceTd" | <p align="center"></p>
+
| households
|-  
+
| 5000
| class="confluenceTd" | <p></p>
+
| HHSIZE4M_S3
| class="confluenceTd" | <p align="center">4</p>
+
| households.NP >= 4
| class="confluenceTd" | <p align="center">65 to 74 years</p>
+
|-
| class="confluenceTd" | <p align="center"></p>
+
| hh_age_15_to_44
|-  
+
| TAZ
| class="confluenceTd" | <p></p>
+
| households
| class="confluenceTd" | <p align="center">5</p>
+
| 5000
| class="confluenceTd" | <p align="center">75 years and over</p>
+
| HHAGE1_S3
| class="confluenceTd" | <p align="center"></p>
+
| (households.AGEHOH > 14) & (households.AGEHOH <= 44)
|-
+
|-
| class="confluenceTd" | <p>Household income (annual)</p>
+
| hh_age_45_to_64
| class="confluenceTd" | <p align="center">1</p>
+
| TAZ
| class="confluenceTd" | <p align="center">Less than $20,000</p>
+
| households
| class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
+
| 5000
|-  
+
| HHAGE2_S3
| class="confluenceTd" | <p></p>
+
| (households.AGEHOH > 44) & (households.AGEHOH <= 64)
| class="confluenceTd" | <p align="center">2</p>
+
|-
| class="confluenceTd" | <p align="center">$20,000 to $39,999</p>
+
| hh_age_65_abv
|  class="confluenceTd" | <p align="center"></p>
+
| TAZ
|-  
+
| households
| class="confluenceTd" | <p></p>
+
| 5000
| class="confluenceTd" | <p align="center">3</p>
+
| HHAGE3_S3
| class="confluenceTd" | <p align="center">$40,000 to $59,999</p>
+
| (households.AGEHOH > 64) & (households.AGEHOH <= np.inf)
| class="confluenceTd" | <p align="center"></p>
+
|-
|-
+
| hh_wrks_0
| class="confluenceTd" | <p></p>
+
| TAZ
| class="confluenceTd" | <p align="center">4</p>
+
| households
| class="confluenceTd" | <p align="center">$60,000 to $99,999</p>
+
| 5000
| class="confluenceTd" | <p align="center"></p>
+
| HHWRK1_S3
|-
+
| households.NWESR == 0
| class="confluenceTd" | <p></p>
+
|-
|  class="confluenceTd" | <p align="center">5</p>
+
| hh_wrks_1
|  class="confluenceTd" | <p align="center">$100,000 or more</p>
+
| TAZ
|  class="confluenceTd" | <p align="center"></p>
+
| households
|-
+
| 5000
|  class="confluenceTd" | <p>Household size</p>
+
| HHWRK2_S3
|  class="confluenceTd" | <p align="center">1</p>
+
| households.NWESR == 1
|  class="confluenceTd" | <p align="center">1 person</p>
+
|-
class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
+
| hh_wrks_2
|-
+
| TAZ
|  class="confluenceTd" | <p></p>
+
| households
|  class="confluenceTd" | <p align="center">2</p>
+
| 5000
|  class="confluenceTd" | <p align="center">2 persons</p>
+
| HHWRK3_S3
|  class="confluenceTd" | <p align="center"></p>
+
| households.NWESR == 2
|-
+
|-
|  class="confluenceTd" | <p></p>
+
| hh_wrks_3m
|  class="confluenceTd" | <p align="center">3</p>
+
| TAZ
|  class="confluenceTd" | <p align="center">3 persons</p>
+
| households
class="confluenceTd" | <p align="center"></p>
+
| 5000
|-
+
| HHWRK4_S3
|  class="confluenceTd" | <p></p>
+
| households.NWESR >= 3
|  class="confluenceTd" | <p align="center">4</p>
+
|-
|  class="confluenceTd" | <p align="center">4 persons</p>
+
| hh_inc_0_25
|  class="confluenceTd" | <p align="center"></p>
+
| TAZ
|-
+
| households
class="confluenceTd" | <p></p>
+
| 5000
class="confluenceTd" | <p align="center">5</p>
+
| HHINC1_S3
|  class="confluenceTd" | <p align="center">5 persons</p>
+
| (households.HHINCADJ > -999999999) & (households.HHINCADJ <= 24999)
|  class="confluenceTd" | <p align="center"></p>
+
|-
|-
+
| hh_inc_25_60
|  class="confluenceTd" | <p></p>
+
| TAZ
|  class="confluenceTd" | <p align="center">6</p>
+
| households
|  class="confluenceTd" | <p align="center">6 persons</p>
+
| 5000
|  class="confluenceTd" | <p align="center"></p>
+
| HHINC2_S3
|-
+
| (households.HHINCADJ > 24999) & (households.HHINCADJ <= 59999)
|  class="confluenceTd" | <p></p>
+
|-
|  class="confluenceTd" | <p align="center">7</p>
+
| hh_inc_60_100
|  class="confluenceTd" | <p align="center">7 or more persons</p>
+
| TAZ
|  class="confluenceTd" | <p align="center"></p>
+
| households
|-
+
| 5000
|  class="confluenceTd" | <p>Household size and workers joint</p>
+
| HHINC3_S3
|  class="confluenceTd" | <p align="center">1</p>
+
| (households.HHINCADJ > 59999) & (households.HHINCADJ <= 99999)
|  class="confluenceTd" | <p align="center">1 person, no worker</p>
+
|-
|  class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
+
| hh_inc_100_plus
|-
+
| TAZ
|  class="confluenceTd" | <p></p>
+
| households
|  class="confluenceTd" | <p align="center">2</p>
+
| 5000
|  class="confluenceTd" | <p align="center">1 person, 1 worker</p>
+
| HHINC4_S3
|  class="confluenceTd" | <p align="center"></p>
+
| (households.HHINCADJ > 99999) & (households.HHINCADJ <= 999999999)
|-
+
|-
|  class="confluenceTd" | <p></p>
+
| person_male
|  class="confluenceTd" | <p align="center">3</p>
+
| SCOUNTY
|  class="confluenceTd" | <p align="center">2 persons, no worker</p>
+
| persons
|  class="confluenceTd" | <p align="center"></p>
+
| 1000
|-
+
| MALE_S
|  class="confluenceTd" | <p></p>
+
| persons.SEX == 1
|  class="confluenceTd" | <p align="center">4</p>
+
|-
|  class="confluenceTd" | <p align="center">2 persons, 1 worker</p>
+
| person_female
|  class="confluenceTd" | <p align="center"></p>
+
| SCOUNTY
|-
+
| persons
|  class="confluenceTd" | <p></p>
+
| 1000
|  class="confluenceTd" | <p align="center">5</p>
+
| FEMALE_S
|  class="confluenceTd" | <p align="center">2 persons, 2 workers</p>
+
| persons.SEX == 2
|  class="confluenceTd" | <p align="center"></p>
+
|-
|-
+
| person_age0to4
|  class="confluenceTd" | <p></p>
+
| SCOUNTY
|  class="confluenceTd" | <p align="center">6</p>
+
| persons
|  class="confluenceTd" | <p align="center">3 persons, no worker</p>
+
| 1000
|  class="confluenceTd" | <p align="center"></p>
+
| AGE0to4_S
|-
+
| (persons.AGEP > 0) & (persons.AGEP <= 4)
|  class="confluenceTd" | <p></p>
+
|-
|  class="confluenceTd" | <p align="center">7</p>
+
| person_age5to17
|  class="confluenceTd" | <p align="center">3 persons, 1 worker</p>
+
| SCOUNTY
|  class="confluenceTd" | <p align="center"></p>
+
| persons
|-
+
| 1000
|  class="confluenceTd" | <p></p>
+
| AGE5to17_S
|  class="confluenceTd" | <p align="center">8</p>
+
| (persons.AGEP >= 5) & (persons.AGEP <= 17)
|  class="confluenceTd" | <p align="center">3 persons, 2 workers</p>
+
|-
|  class="confluenceTd" | <p align="center"></p>
+
| person_age18to24
|-
+
| SCOUNTY
|  class="confluenceTd" | <p></p>
+
| persons
|  class="confluenceTd" | <p align="center">9</p>
+
| 1000
|  class="confluenceTd" | <p align="center">3 persons, 3 workers</p>
+
| AGE18to24_S
|  class="confluenceTd" | <p align="center"></p>
+
| (persons.AGEP >= 18) & (persons.AGEP <= 24)
|-
+
|-
|  class="confluenceTd" | <p></p>
+
| person_age25to54
|  class="confluenceTd" | <p align="center">10</p>
+
| SCOUNTY
|  class="confluenceTd" | <p align="center">4 or more persons, no worker</p>
+
| persons
|  class="confluenceTd" | <p align="center"></p>
+
| 1000
|-
+
| AGE25to54_S
|  class="confluenceTd" | <p></p>
+
| (persons.AGEP >= 25) & (persons.AGEP <= 54)
|  class="confluenceTd" | <p align="center">11</p>
+
|-
|  class="confluenceTd" | <p align="center">4 or more persons, 1 worker</p>
+
| person_age55m
|  class="confluenceTd" | <p align="center"></p>
+
| SCOUNTY
|-
+
| persons
|  class="confluenceTd" | <p></p>
+
| 1000
|  class="confluenceTd" | <p align="center">12</p>
+
| AGE55M_S
|  class="confluenceTd" | <p align="center">4 or more persons, 2 workers</p>
+
| persons.AGEP >= 55
|  class="confluenceTd" | <p align="center"></p>
+
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-SampleData"><em>Sample Data</em></h4><p class="BodyParagraph">During population synthesis, individual household and person records are drawn from a disaggregate sample of households to match target distributions of controlled attributes. It may not be possible to control all the desired attributes and so “uncontrolled” attributes are added to the synthetic population from disaggregate sample data. It is essential that the disaggregate sample is representative of the population of the entire region.</p><p class="BodyParagraph">In most cases, the primary source of disaggregate sample data will is Public Use Microdata Sample (PUMS) data, which is now part of the ACS, and follows the same sampling framework, but provides disaggregate records for households and persons across numerous different attributes. PUMS is sampled and grouped according to geographic units, better known as PUMAs. PUMAs cover contiguous areas of roughly 100,000 population, including persons living in group quarters. For example, a metro area of 850,000 might be covered by 8 or more likely 9 PUMAs. In general, ACS-PUMS provides good representative coverage of most regions and is rigorously tested and monitored, so it is was used for creating sample data for this effort.</p><h4 id="id-3.5SYNTHETICPOPULATION-PopulationSim Run"><em>PopulationSim Run</em></h4><p class="BodyParagraph">The control totals and disaggregate sample data are input into a population synthesizer to generate a synthetic population. According to the PopulationSim wiki, 'the objective of a population synthesizer is to generate household weights which satisfies the marginal control distributions. This is achieved by use of a data fitting technique. The most common fitting technique used by various population synthesizers is the Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used to obtain joint distributions of demographic variables. Then, random sampling from PUMS generates the baseline synthetic population.</p><p class="BodyParagraph">One of the limitations of the simple IPF method is that it does not incorporate both household and person level attributes simulatenously. Some population synthesizers use a heuristic algorithm called the Iterative Proportional Updating Algorithm (IPU) to incorporate both person and household-level variables in the fitting procedure.</p><p class="BodyParagraph">Besides IPF, entropy maximization algorithms have been used as a fitting technique. In most of the entropy based methods, the relative entropy is used as the objective function. The relative entropy based optimization ensures that the least amount of new information is introduced in finding a feasible solution. The base entropy is defined by the initial weights in the seed sample. The weights generated by the entropy maximization algorithm preserves the distribution of initial weights while matching the marginal controls. This is an advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization based list balancing to match controls specified at various geographic levels.'</p>
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">13</p>
 
|  class="confluenceTd" | <p align="center">4 or more persons, 3 workers</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-6PERSONCONTROLDATAFORPERMANENTHOUSEHOLDS">TABLE 3-6 PERSON CONTROL DATA FOR PERMANENT HOUSEHOLDS</h4><div class="table-wrap">
 
{|  class="confluenceTable"
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Person Attribute</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Category Number</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Categories</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Data Source</strong></p>
 
|-
 
|  class="confluenceTd" | <p>Gender</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Male</p>
 
|  class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">Female</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Age</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Under 5 years</p>
 
|  class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">5 to 14 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">3</p>
 
|  class="confluenceTd" | <p align="center">15 to 17 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">4</p>
 
|  class="confluenceTd" | <p align="center">18 to 24 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">5</p>
 
|  class="confluenceTd" | <p align="center">25 to 39 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">6</p>
 
|  class="confluenceTd" | <p align="center">40 to 54 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">7</p>
 
|  class="confluenceTd" | <p align="center">55 to 64 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">8</p>
 
|  class="confluenceTd" | <p align="center">65 to 74 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">9</p>
 
|  class="confluenceTd" | <p align="center">75 years and over</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-7HOUSEHOLDCONTROLDATAFORSEASONALHOUSEHOLDS">TABLE 3-7 HOUSEHOLD CONTROL DATA FOR SEASONAL HOUSEHOLDS</h4><div class="table-wrap">
 
{|  class="confluenceTable"
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Household Attribute</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Category Number</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Categories</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Data Source</strong></p>
 
|-
 
|  class="confluenceTd" | <p>Householder unit type</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Single family dwelling</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">Multi family dwelling</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Presence of children</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Yes</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">No</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Householder age</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">15 to 24 years</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">25 to 54 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">3</p>
 
|  class="confluenceTd" | <p align="center">55 to 64 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">4</p>
 
|  class="confluenceTd" | <p align="center">65 to 74 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">5</p>
 
|  class="confluenceTd" | <p align="center">75 years and over</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Household income (annual)</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Less than $20,000</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">$20,000 to $39,999</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">3</p>
 
|  class="confluenceTd" | <p align="center">$40,000 to $59,999</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">4</p>
 
|  class="confluenceTd" | <p align="center">$60,000 to $99,999</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">5</p>
 
|  class="confluenceTd" | <p align="center">$100,000 or more</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Household size</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1 person</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">2 persons</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">3</p>
 
|  class="confluenceTd" | <p align="center">3 persons</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">4</p>
 
|  class="confluenceTd" | <p align="center">4 or more persons</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Household size and workers joint</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">1 person, no worker</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">1 person, 1 worker</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">3</p>
 
|  class="confluenceTd" | <p align="center">2 persons, no worker</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">4</p>
 
|  class="confluenceTd" | <p align="center">2 persons, 1 worker</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">5</p>
 
|  class="confluenceTd" | <p align="center">2 persons, 2 workers</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">6</p>
 
|  class="confluenceTd" | <p align="center">3 persons, no worker</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">7</p>
 
|  class="confluenceTd" | <p align="center">3 persons, 1 worker</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">8</p>
 
|  class="confluenceTd" | <p align="center">3 persons, 2 workers</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">9</p>
 
|  class="confluenceTd" | <p align="center">4 or more persons, no worker</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">10</p>
 
|  class="confluenceTd" | <p align="center">4 or more persons, 1 worker</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">11</p>
 
|  class="confluenceTd" | <p align="center">4 or more persons, 2 workers</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">12</p>
 
|  class="confluenceTd" | <p align="center">4 or more persons, 3 workers</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-8PERSONCONTROLDATAFORSEASONALHOUSEHOLDS">TABLE 3-8 PERSON CONTROL DATA FOR SEASONAL HOUSEHOLDS</h4><div class="table-wrap">
 
{|  class="confluenceTable"
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Person Attribute</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Category Number</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Categories</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Data Source</strong></p>
 
|-
 
|  class="confluenceTd" | <p>Gender</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Male</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">Female</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Age</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">0 to 17 years</p>
 
|  class="confluenceTd" | <p align="center">NHTS 2009 add-on survey for Florida</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">18 to 24 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">3</p>
 
|  class="confluenceTd" | <p align="center">25 to 39 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">4</p>
 
|  class="confluenceTd" | <p align="center">40 to 54 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">5</p>
 
|  class="confluenceTd" | <p align="center">55 to 64 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">6</p>
 
|  class="confluenceTd" | <p align="center">65 to 74 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">7</p>
 
|  class="confluenceTd" | <p align="center">75 years and over</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-TABLE3-9CONTROLDATAFORGROUPQUARTERSRESIDENTS">TABLE 3-9 CONTROL DATA FOR GROUPQUARTERS RESIDENTS</h4><div class="table-wrap">
 
{|  class="confluenceTable"
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Person Attribute</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Category Number</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Categories</strong></p>
 
|  class="highlight-red confluenceTd" data-highlight-colour="red" | <p class="TableHeadingGray"><strong>Data Source</strong></p>
 
|-
 
|  class="confluenceTd" | <p>Gender</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Male</p>
 
|  class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">Female</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p>Age</p>
 
|  class="confluenceTd" | <p align="center">1</p>
 
|  class="confluenceTd" | <p align="center">Under 18 years</p>
 
|  class="confluenceTd" | <p align="center">Census 2010 and ACS 2006-10</p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">2</p>
 
|  class="confluenceTd" | <p align="center">18 to 64 years</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|-
 
|  class="confluenceTd" | <p></p>
 
|  class="confluenceTd" | <p align="center">3</p>
 
|  class="confluenceTd" | <p align="center">65 years and over</p>
 
|  class="confluenceTd" | <p align="center"></p>
 
|}</div><h4 id="id-3.5SYNTHETICPOPULATION-SampleData"><em>Sample Data</em></h4><p class="BodyParagraph">During population synthesis, individual household and person records are drawn from a disaggregate sample of households to match target distributions of controlled attributes. It may not be possible to control all the desired attributes and so “uncontrolled” attributes are added to the synthetic population from disaggregate sample data. It is essential that the disaggregate sample is representative of the population of the entire region.</p><p class="BodyParagraph">In most cases, the primary source of disaggregate sample data will is Public Use Microdata Sample (PUMS) data, which is now part of the ACS, and follows the same sampling framework, but provides disaggregate records for households and persons across numerous different attributes. PUMS is sampled and grouped according to geographic units, better known as PUMAs. PUMAs cover contiguous areas of roughly 100,000 population, including persons living in group quarters. For example, a metro area of 850,000 might be covered by 8 or more likely 9 PUMAs. In general, ACS-PUMS provides good representative coverage of most regions and is rigorously tested and monitored, so it is was used for creating sample data for this effort.</p><h4 id="id-3.5SYNTHETICPOPULATION-PopGenRun"><em>PopGen Run</em></h4><p class="BodyParagraph">The control totals and disaggregate sample data are input into a population synthesizer to generate a synthetic population. The joint distribution of the control attributes from the disaggregate sample is fitted to the control totals. This fitting or balancing is general done using the Iterative Proportional Fitting (IPF) algorithm or some variant of it which is at the core of most populations synthesizers. PopGen goes one step further in applying what is called an Iterative Proportional Updating (IPU) algorithm that not only matches household-level attribute totals but also person-level attribute controls simultaneously. The final step is then to draw individual household and person synthetic records from the disaggregate sample to match the fitted distribution.</p><h3 id="id-3.5SYNTHETICPOPULATION-Microzone/ParcelAllocation"><span style="color: rgb(255,102,0);">Microzone / Parcel Allocation</span></h3><p class="BodyParagraph">As stated previously, DaySim operates at the parcel level. Since population synthesizers typically synthesize households at the TAZ level, synthetics households are then required to be assigned individual parcels or microzones. The process that does this need not be very complex. Currently, synthesized households of a particular TAZ are randomly assigned to all parcels within the TAZ based on parcel capacities. The parcel capacity or the number of housing units on a parcel is a required field in the base parcel file. The following steps are involved in the allocation process:</p><ol><li>For all TAZs, parcel capacities are adjusted proportionately so that the total capacity obtained as a sum of capacities of all parcels within a TAZ is equal to the total number of households synthesized.</li><li>Each of the synthetic households within a TAZ is randomly located in one of the parcels within the TAZ.</li></ol><p>The process is the same for group quarter population and this requires group quarter capacities for each parcel. In case there are other sub-populations like permanent and seasonal, they are all combined before the parcel allocation process. Currently, there exists an R-script the takes the synthetic population from PopGen as input and allocates households to individual parcels. The script also processes and recodes other necessary demographic attributes required by DaySim from sample data (generally Census/ACS PUMS) used in population synthesis. It then outputs a households and a persons file which can directly be read as inputs by DaySim.</p>
 

Latest revision as of 14:29, 9 November 2020


Overview

In trip-based models, trip rates are applied to aggregate households grouped in Traffic Analysis Zones (TAZs) to generate trips. On the other hand, in an activity-based model (ABM), choices involving activities and trips are simulated for each of the individual persons in households. Hence, it is necessary to first develop a “synthetic population” of the regions’ residents. Synthetic population is a list of households and persons that is based on observed or forecasted distributions of socioeconomic attributes and created by sampling detailed Census microdata. This produces individual household agents and individual person agents that are subjects of the simulation.

Prior to their use in the simulation, synthetic populations are represented in data tables, often in a relational database or some equivalently structured file system. Typically there are separate tables for households and person records. The household records file provides details about various household-level socio-demographic attributes such as household income, size, number of workers, etc. Similarly, the person records file provides information about person-level attributes such as age, gender, employment status, etc. Person records are linked to household records through ID numbers.

TABLE 3-2 SAMPLE HOUSEHOLD RECORDS FILE

hhno

hhsize

hhvehs

hhwkrs

hhftw

hhptw

hhret

hhoad

hhuni

hhhsc

hh515

hhcu5

hhincome

hownrent

hrestype

hhparcel

hhtaz

hhexpfac

samptype

1

1

-1

1

-1

-1

-1

-1

-1

-1

-1

-1

75095

-1

1

2699

227

1

11

2

1

-1

1

-1

-1

-1

-1

-1

-1

-1

-1

58074

-1

1

2699

227

1

11

3

1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

30288

-1

3

23479

227

1

11

4

1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

1802

-1

3

37105

227

1

11

TABLE 3-3 SAMPLE PERSON RECORDS FILE

hhno

pno

pptyp

pagey

pgend

pwtyp

pwpcl

pwtaz

pwautime

pwaudist

pstyp

pspcl

pstaz

psautime

psaudist

puwmode

puwarrp

puwdepp

ptpass

ppaidprk

pdiary

pproxy

psexpfac

1

1

1

41

1

1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

2

1

1

34

1

1

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

3

1

5

26

1

0

-1

-1

-1

-1

1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

4

1

4

34

2

0

-1

-1

-1

-1

0

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

-1

1

Population synthesized by any synthetic population generator may be used with DaySim as long as the required household and person socio-demographic attributes are provided to it in the appropriate format. PopulationSim is an open platform for population synthesis and survey weighting. It emerged from Oregon DOT’s desire to build a shared, open, platform that could be easily adapted for statewide, regional, and urban transportation planning needs. It has the ability to control for both household and person level demographic attributes simultaneously.

Preparing Synthetic Populations for DaySim

The design of the synthetic population should support the design of the activity-based model (DaySim in this case) and provide the variables it needs. In addition, the activity-based model should only rely on information that can be realistically provided in the synthetic population.

Population synthesis generally consists of the synthesis of two sub-populations – those living in regular households and those living in non-institutionalized group quarters such as college dormitories. For this effort, an additional segment of population was synthesized which comprised of seasonal households. These segments were established to reflect the differences in travel patterns associated with these sub-populations as well as to provide the ability to support seasonal analyses. For example, the seasonal population is generally older than the permanent population, has lower levels of workforce participation, and clusters in certain geographic areas. All of these attributes influence travel patterns and the demand for travel.

There are three major steps in creating a synthetic population:

  1. Specifying the inputs to the process—the control variables and sample households as well as the level of geographic resolution. Specifying the control variables is essential. In addition, there is often an additional step of specifying additional, uncontrolled variables to be added to the synthetic population.
  2. Actually running a program that produces the synthetic households.
  3. The third major step would be transforming the model-generated outputs into characteristics of the population that will be used throughout the rest of the model system. This could involve creating categorical variables out of continuous variables, reformulating income, or allocating households from the zonal level to a finer level of geographic resolution, such as a parcel.

DaySim Person Types

Although person are being modeled in disaggregate form in an ABM, it is often useful to create person type categories. DaySim uses 8 such person types. Person type categories may be used for various purposes:

  1. As a basic segmentation for certain models, such as daily activity pattern models
  2. To summarize and compare observed versus estimated data and calibrate models
  3. As explanatory variables in models
  4. As constraints on alternatives that are available; for example, work and school activities are only available to workers and student; and driving is restricted by age

TABLE 3-4 DAYSIM PERSON TYPES

No.

Person Type

Age

Work Status

School Status

1

Full-time worker

18 or more

Full-time

None/Part-time

2

Part-time worker

18 or more

Part-time

None/Part-time

3

Retired person

65 or more

Unemployed

4

Non-working adult

Less than 65

Unemployed

None/Part-time

5

University student

18 or more

Unemployed/Part-time

Full-time

6

High school student

16 or more

Unemployed/Part-time

Full-time

7

Primary school child

5-15

Unemployed

Full-time

8

Preschool child

0-4

Unemployed

None

Control Attributes and Target Distributions

There are three major inputs required for population synthesis of which the first step is to identify a set of control attributes and their levels. Next, target distributions of the control attributes and their levels are derived at appropriate geographic units. These target distributions are also known as marginal control totals since they represent the margins of a joint distribution of multiple attributes. Typically, the smallest level of spatial resolution that can be feasibly and reliably used to control attributes is used. If control attribute totals are not accurate at a particular spatial unit, they could be specified at a lower resolution.

The following considerations are usually important in choosing control variables:

  • The number of control variables is important. If there are too few, the synthetic population may not accurately reflect the true population. On the other hand, too many control attributes may cause sample issues. There may not be any sample households with joint attributes of the control variables and this could distort the synthetic population.
  • Control attributes may be single or multi-dimensional. Multi-dimensional attributes can be treated as single dimensional attributes with number of categories equal to the product of the numbers of categories in individual attributes. The primary advantage of multi-dimensional attributes is more precise regional control over the correlation between attributes. The disadvantage again is with sparse sample.
  • The best choices of variables, will be meaningful attributes that are somewhat “orthogonal” to each other, which means that their variance in the population is largely independent. Conversely, if there are two attributes that are highly correlated, then controlling for both may not achieve much more than controlling for just one.
  • Finally, different sets of control attributes may be used for base and forecast years, if limited by forecasting accuracy. This is not necessarily desirable, though. The ability to forecast marginal control totals should be a consideration when specifying control attributes for this base year.

Target distributions of control variables for the base year could be obtained from a variety of data sources including the following:

  • Decennial Census: ~100% sample
  • American Community Survey (ACS) summary files: 3% sample, rolling 5-year sample, yields an estimate of ~15% of population
  • Bureau of Economic and Business Research (BEBR)
  • Census Transportation Planning Products (CTPP)
  • Other zonal data developed locally (TAZs)

For the forecast year, regional socio-economic forecasts or outputs from a land-use model are often used.

The following table provides the list of control attributes, their geographic and demographic levels along with the relative importance of each control.

TABLE 3-5 PopulationSim Controls

target geography seed_table importance control_field expression
num_hh MAZ households 1000000000 HHS (households.WGTP > 0) & (households.WGTP < np.inf)
hh_size_1 TAZ households 5000 HHSIZE1_S3 households.NP == 1
hh_size_2 TAZ households 5000 HHSIZE2_S3 households.NP == 2
hh_size_3 TAZ households 5000 HHSIZE3_S3 households.NP == 3
hh_size_4 TAZ households 5000 HHSIZE4M_S3 households.NP >= 4
hh_age_15_to_44 TAZ households 5000 HHAGE1_S3 (households.AGEHOH > 14) & (households.AGEHOH <= 44)
hh_age_45_to_64 TAZ households 5000 HHAGE2_S3 (households.AGEHOH > 44) & (households.AGEHOH <= 64)
hh_age_65_abv TAZ households 5000 HHAGE3_S3 (households.AGEHOH > 64) & (households.AGEHOH <= np.inf)
hh_wrks_0 TAZ households 5000 HHWRK1_S3 households.NWESR == 0
hh_wrks_1 TAZ households 5000 HHWRK2_S3 households.NWESR == 1
hh_wrks_2 TAZ households 5000 HHWRK3_S3 households.NWESR == 2
hh_wrks_3m TAZ households 5000 HHWRK4_S3 households.NWESR >= 3
hh_inc_0_25 TAZ households 5000 HHINC1_S3 (households.HHINCADJ > -999999999) & (households.HHINCADJ <= 24999)
hh_inc_25_60 TAZ households 5000 HHINC2_S3 (households.HHINCADJ > 24999) & (households.HHINCADJ <= 59999)
hh_inc_60_100 TAZ households 5000 HHINC3_S3 (households.HHINCADJ > 59999) & (households.HHINCADJ <= 99999)
hh_inc_100_plus TAZ households 5000 HHINC4_S3 (households.HHINCADJ > 99999) & (households.HHINCADJ <= 999999999)
person_male SCOUNTY persons 1000 MALE_S persons.SEX == 1
person_female SCOUNTY persons 1000 FEMALE_S persons.SEX == 2
person_age0to4 SCOUNTY persons 1000 AGE0to4_S (persons.AGEP > 0) & (persons.AGEP <= 4)
person_age5to17 SCOUNTY persons 1000 AGE5to17_S (persons.AGEP >= 5) & (persons.AGEP <= 17)
person_age18to24 SCOUNTY persons 1000 AGE18to24_S (persons.AGEP >= 18) & (persons.AGEP <= 24)
person_age25to54 SCOUNTY persons 1000 AGE25to54_S (persons.AGEP >= 25) & (persons.AGEP <= 54)
person_age55m SCOUNTY persons 1000 AGE55M_S persons.AGEP >= 55

Sample Data

During population synthesis, individual household and person records are drawn from a disaggregate sample of households to match target distributions of controlled attributes. It may not be possible to control all the desired attributes and so “uncontrolled” attributes are added to the synthetic population from disaggregate sample data. It is essential that the disaggregate sample is representative of the population of the entire region.

In most cases, the primary source of disaggregate sample data will is Public Use Microdata Sample (PUMS) data, which is now part of the ACS, and follows the same sampling framework, but provides disaggregate records for households and persons across numerous different attributes. PUMS is sampled and grouped according to geographic units, better known as PUMAs. PUMAs cover contiguous areas of roughly 100,000 population, including persons living in group quarters. For example, a metro area of 850,000 might be covered by 8 or more likely 9 PUMAs. In general, ACS-PUMS provides good representative coverage of most regions and is rigorously tested and monitored, so it is was used for creating sample data for this effort.

PopulationSim Run

The control totals and disaggregate sample data are input into a population synthesizer to generate a synthetic population. According to the PopulationSim wiki, 'the objective of a population synthesizer is to generate household weights which satisfies the marginal control distributions. This is achieved by use of a data fitting technique. The most common fitting technique used by various population synthesizers is the Iterative Proportional Fitting (IPF) procedure. Generally, the IPF procedure is used to obtain joint distributions of demographic variables. Then, random sampling from PUMS generates the baseline synthetic population.

One of the limitations of the simple IPF method is that it does not incorporate both household and person level attributes simulatenously. Some population synthesizers use a heuristic algorithm called the Iterative Proportional Updating Algorithm (IPU) to incorporate both person and household-level variables in the fitting procedure.

Besides IPF, entropy maximization algorithms have been used as a fitting technique. In most of the entropy based methods, the relative entropy is used as the objective function. The relative entropy based optimization ensures that the least amount of new information is introduced in finding a feasible solution. The base entropy is defined by the initial weights in the seed sample. The weights generated by the entropy maximization algorithm preserves the distribution of initial weights while matching the marginal controls. This is an advantage of the entropy maximization based procedures over the IPF based procedures. PopulationSim uses the entropy maximization based list balancing to match controls specified at various geographic levels.'