States and the International System
See also the resources on my companion page for the ISA Compendium's "Review of Available Data Sets" article, which may have been updated more recently than some of the resources on this page.
The other international data pages on this site may also contain relevant data.
COW Interstate System: The latest official list of all members of the Correlates of War Project's interstate system, including all major powers, the composition of the system, and all dyads in the system.
- Spatial-Temporal Domain: Entire world, 1816-2004
- Variables Included: States.csv: COW state abbreviation, number, name, and entry and exit years of statehood; Majors.csv: entry and exit dates of major power status; System.csv: annual composition of the interstate system (one entry per nation-year); Dyads.csv: annual dyadic composition of the interstate system (one entry per nondirectional dyad-year).
Gleditsch and Ward Interstate System: An alternative list of states in the system described in a 1999 International Interactions article, based on somewhat different coding rules from the COW system and with somewhat different dates for many states. The authors provide a list of qualifying states, a list of microstates, and documentation on each case that is included.
- Spatial-Temporal Domain: Entire world, 1816-2002
- Variables Included: State number, name, and abbreviation; dates of system entry and exit.
EUGene Software: Expected Utility Generation and data management program, written by D. Scott Bennett and Allan Stam; this is a Windows-only program that generates data sets for the study of international conflict, with a variety of commonly used variables. More data sets and variables are added frequently, so this list may not be complete -- please check the official EUGene web site for the latest list and any updates.
- Spatial-Temporal Domain: EUGene can generate data sets at the directed-dyad year, non directed-dyad year, country-year, and directed-dispute dyad unit of analysis, 1816-1993 (or any subset of this time period, including contiguous, major power, or "politically relevant" dyads)
- Variables Included: EUGene can generate data sets with expected utility data and risk scores, tau-b scores, Polity democracy and democratization data, COW capabilities and alliance data, contiguity and distance between countries, and COW militarized dispute and/or war data (perhaps among others -- the EUGene web site indicates any additions to this list).
Russett/Oneal Triangulating Peace data: Replication data for Bruce Russett and John Oneal's 2001 book. These data sets have been widely used by other scholars as a starting point for their own analyses. The above link is to the STATA version of the data; they also provide an ASCII version
- Spatial-Temporal Domain: Varies by data set; generally includes the entire international system for the 1950-1989 period.
- Variables Included: Varies by data set; generally includes data on international system membership, trade, alliances, international organization membership, civilizational identity, and various control variables.
ICOW Historical State Names data: From the Issue Correlates of War, or ICOW, Project. This file contains a PDF document with alternative names for nation-states in the COW interstate system, including traditional names, alternate spellings of common names, foreign spellings, and some colonial-era names. This data set is important for scholars attempting to code historical data, as many older source materials use country names that are no longer used or understood, meaning that coders might ignore or mis-code data.
- Spatial-Temporal Domain: entire world, 1816-2001 (approximately; this is intended to be a supplement to the latest version of the COW interstate system, which is available above)
- Variables Included: COW country code and name, alternative names for state
ICOW Colonial History data: From the ICOW Project. This file lists each nation-state's colonial rulers as well as dates and processes of independence. This data file is an Excel spreadsheet and a CSV comma-delimited file, and the enclosed documentation is an RTF word processing document; the folder containing these two files is compressed in ZIP format.
- Spatial-Temporal Domain: entire world, 1816-2004 (approximately; this is intended to be a supplement to the latest version of the COW interstate system, which is available above)
- Variables Included: COW country code and name, dates of interstate system membership, name of colonial ruler, date and process of independence (including comparable data from the COW Territorial Change and Polity 2 data sets as well as ICOW).
List of Stamp-Issuing Entities: From Linn's Stamp Magazine; technically not an academic data resource, but nonetheless an informative analysis of the numerous entities -- including states, dependencies, quasi-states, and others -- that have issued stamps at various times in the last two centuries. Most entities' entries have useful information about histories, colonial rule, dates of independence, and similar topics. Linn's also provides a country name cross-index to help trace names that have changed over time.
- Spatial-Temporal Domain: unclear (apparently covering the time since stamps were first issued)
- Variables Included: none (textual description of each entity's history)
Independent States in the World and Dependencies and Areas of Special Sovereignty: Lists from the U.S. State Department.
- Spatial-Temporal Domain: current
- Variables Included: none (list of cases)
UNEP Islands Home Page and Island Directory Home Page: Lists of islands arranged by country and by island name.
- Spatial-Temporal Domain: current
- Variables Included: Island area, elevation, geological characteristics, environmental characteristics (such as native species), population density, human impact.
State Capabilities
This heading is meant to cover general compilations of data on states' capabilities. Data sets on specific dimensions of capabilities are listed elsewhere on this data site, such as economic data, environmental data, and social and demographic data.
COW National Material Capabilities data: From the Correlates of War project.
- Spatial-Temporal Domain: COW interstate system, 1816-2001
- Variables Included: Military capabilities: military personnel and military expenditures; Industrial capabilities: iron/steel production and energy consumption; demographic capabilities: total population and urban population.
EUGene Software: Expected Utility Generation and data management program, written by D. Scott Bennett and Allan Stam; this is a Windows-only program that generates data sets for the study of international conflict, with a variety of commonly used variables. More data sets and variables are added frequently, so this list may not be complete -- please check the official EUGene web site for the latest list and any updates.
- Spatial-Temporal Domain: EUGene can generate data sets at the directed-dyad year, non directed-dyad year, country-year, and directed-dispute dyad unit of analysis, 1816-1993 (or any subset of this time period, including contiguous, major power, or "politically relevant" dyads)
- Variables Included: EUGene can generate data sets with expected utility data and risk scores, tau-b scores, Polity democracy and democratization data, COW capabilities and alliance data, contiguity and distance between countries, and COW militarized dispute and/or war data (perhaps among others -- the EUGene web site indicates any additions to this list).
Alliances, Treaties, and Organizations
My International Law and International Organizations pages and the economic regionalism section of my international economic data page may also contain relevant data.
Alliance Treaty Obligations and Provisions (ATOP) alliance data: A data set on international alliances that improves on past alliance data by coding the specific obligations and provisions involved in each alliance treaty. The data set was collected by Brett Ashley Leeds and a number of her colleagues, including Andrew G. Long, Michaela Mattes, Sara McLaughlin Mitchell, Jeffrey M. Ritter, and Burcu Savun. The web site now offers downloadable data, the codebook, the individual case codesheets, a search facility, and a list of publications using the data.
- Spatial-Temporal Domain: All formal alliance treaties, 1815-2003
- Variables Included: Varies by data set; the data includes six different units of analysis: the alliance, the alliance phase, the alliance member, the state-year, the dyad-year, and the directed dyad-year.
COW Formal Alliance data: From the Correlates of War project.
- Spatial-Temporal Domain: All international alliances, 1816-2000
- Variables Included: Alliance dates, members, and type; additional variables indicate whether the alliance began before 1816 or continued after 2000.
Doug Gibler's 1648-1815 Alliance Data: An extension of the basic COW alliance data set to the centuries before the Congress of Vienna, introduced in Gibler's 1999 International Interactions article "An Extension of the Correlates of War Formal Alliance Data Set." Because the standard COW interstate system list only covers the period since 1816, Gibler has also created a 1648-1815 system list, which will be needed for users of this data set. Note that the download for this data set has disappeared with Gibler's move from Kentucky to Alabama; interested users should contact him.
- Spatial-Temporal Domain: All international alliances, 1648-1815
- Variables Included: Alliance dates, members, type, reason for termination
Doug Gibler's Territorial Settlement Alliance Data: From Gibler's 1996 Conflict Management and Peace Science article "Alliances that Never Balance: The Territorial Settlement Treaty." This is an Excel spreadsheet with a list of all alliances from the larger COW alliance data set that contained territorial settlement provisions, with additional details about each one. Note that the download for this data set has disappeared with Gibler's move from Kentucky to Alabama; interested users should contact him.
- Spatial-Temporal Domain: All international alliances with territorial settlement provisions that were begun 1816-1977
- Variables Included: Alliance dates, members, Singer/Small alliance type, Gibler alliance type (including specific territory settled)
ICOW Multilateral Treaties of Pacific Settlement (MTOPS) data: The ICOW Project's data on signature/ratification of multilateral treaties and regional or global organizations that call for the pacific settlement of disputes among their members, and/or for respect of member states' territorial integrity; there are also separate files with the associated codebook, documentation of included cases, and notes about excluded cases in PDF format.
- Spatial-Temporal Domain: All members of COW interstate system, 1816-2005
- Variables Included: Country code and name, dates of system membership, dates of acceptance of each global or regional organization, total number of treaty obligations for pacific dispute settlement and for territorial integrity in each year; available as raw treaty-level data as well as compiled into state-year and dyad-year-level data sets for ease of merging with other data.
Russett et al. Data Sets: A variety of data sets employed in publications by Bruce Russett and colleagues -- including John Oneal, Michaelene Cox, David Davis, and Harry Bliss -- that examine international conflict in the post-World War II period. These data sets have been widely used by other scholars as a starting point for their own analyses, and include data on both alliance and international organization membership since 1950 that is not currently available elsewhere.
- Spatial-Temporal Domain: Varies by data set; generally includes the entire international system for the 1950-1989 period.
- Variables Included: Varies by data set; generally includes data on international system membership, trade, alliances, international organization membership, civilizational identity, and various control variables.
International Geographic Data
My Maps and Interactive Geography Tools, Borders and Territory, Demographic Topics, Environmental Topics, and ICOW Project pages may also contain relevant information.
COW Direct Contiguity and Colonial/Dependency Contiguity data sets: From the Correlates of War project; hosted by Paul Hensel at the University of North Texas. This data set includes contiguity both by a direct land border and by up to 400 miles of sea.
- Spatial-Temporal Domain: all contiguous dyads, 1816-2000
- Variables Included: country codes, year, type of contiguity
Minimum Distance Data and Distance between Capital Cities Data: From Kristian Gleditsch and Mike Ward, as introduced in their 2001 Journal of Peace Research article. Two different measures of distance between each country in the international system, available in several different formats and for both the Gleditsch-Ward and COW international systems.
- Spatial-Temporal Domain: international system, 1875-1998
- Variables Included: country codes, country names, distance
Distance between Capital Cities: From economist Jon Haveman.
- Spatial-Temporal Domain: (unclear)
- Variables Included: country codes, country names, distance (in miles)
CEPII Geodesic Distance data: From the Centre d'Etudes Prospectives et d'Informations Internationales.
- Spatial-Temporal Domain: 225 countries, current/recent
- Variables Included: Country-level: geographical coordinates of capital city, languages spoken, geographic area, landlocked status, continent. Dyadic-level: several measures of distance between countries, contiguity, common language, common colonizer, prior colonial link.
Shatterbelt data: State-level data on the shatterbelt status of geographic regions, as used in Hensel and Diehl's 1994 Political Geography article -- now extended through 1992 by David Reilly. This data set is compressed in .ZIP format and is provided in .CSV comma-delimited format, along with a brief codebook in PDF format.
- Spatial-Temporal Domain: entire world, 1945-1992
- Variables Included: country code and name, region, shatterbelt status of region (includes beginning and ending of shatterbelt status)
ICOW Territorial Claims data: From the Issue Correlates of War, or ICOW, Project.
- Spatial-Temporal Domain: currently Western Hemisphere and Europe, 1816-2001
- Variables Included: list of territorial claims with name of claimed territory, names of participants, and beginning/ending dates of claim
COW Territorial Change Data Set: Collected by Paul Diehl, Gary Goertz, Jaroslav Tir, and Phil Schafer.
- Spatial-Temporal Domain: 1816-2000
- Variables Included: Territorial Change Data: (list forthcoming); Initial System Membership Data: (list forthcoming)
Gallup/Mellinger/Sachs geography data: Available through Harvard's Center for International Development geography data web page in STATA and ASCII formats.
- Spatial-Temporal Domain: recent snapshot (not available in any time series format)
- Variables Included: infectious disease (affected areas, population in affected areas, area and population affected by malaria); physical geography (latitude and longitude of each country's centroid, mean elevation, mean distance to nearest ice-free coastline, mean distance to nearest ice-free coastline or sea-navigable river, distance from a country's centroid to nearest coastline, distance from a country's centroid to nearest coastline or sea-navigable river, total population, percent of population within 100km of the coastline, percent of the population within 100km of the nearest coastline or sea-navigable river, percent of land area within 100km of the coastline, percent of land area within 100km of the nearest coastline or sea-navigable river, percent of population in the geographic tropics, percent land area in the geographic tropics, and the typical population density an average person experiences); climate zones; agriculture (FAO soil suitability (two types), percent of Matthews'* cultivated land in each Köppen-Geiger** climate zone, percent of Matthews' cultivated land -- using a revised classification scheme -- in each Köppen-Geiger climate zone, percent land area in each Köppen-Geiger climate zone weighted by Matthews' cultivated land, percent land area in each Köppen-Geiger climate zone weighted by a revised Matthews' cultivated land classification).
Getty Thesaurus of Geographic Names: A searchable interface allowing you to look up any geographic feature; the database will return details such as the latitude and longitude of the feature.
- Spatial-Temporal Domain: current
- Variables Included: place name, type, latitude and longitude, hierarchical location
Social Science Data Collections
These resources offer access to a wide variety of social science data sets, some of which are included in the other data pages on this site but many of which are not (usually because they fall outside the general categories used here).
- Center for International Development data archive (at Harvard University; access to a variety of data sets related to international development, with a particular emphasis on geographic factors in many of their data sets)
- The Center for International Earth Science Information Network (CIESIN) at Columbia University offers a variety of data sets; see especially their data subject list, downloadable data (much of it in GIS format), and world data center for human interactions in the environment)
- The Correlates of War (COW) project has generated many of the most widely used data sets for the quantitative study of world politics; see especially their online data archive.
- Council of European Social Science Data Archives (an online interface to numerous European social science data sets provided by at least twenty different data archives.)
- Internet Crossroads in Social Science Data and Online Data Archive Service (from the Data and Program Library Service at the University of Wisconsin)
- The Interuniversity Consortium for Political and Social Research (ICPSR) Web Site, Data Archive, and Publication-Related Archive offer access to a wide variety of social science data sets
- ISA Data Archive
- Political, Social, and Economic Data Archives (from Richard Kimber at the University of Keele (UK))
- SIPRI's Facts on International Relations and Security Trends (FIRST): a single interface offering access to information a number of databases for each country in the world
- Social Science Data on the Net (from the UC-San Diego library)
- WebEc Links to Economics Data
http://www.paulhensel.org/dataintl.html
Last updated: 30 July 2018
This site © Copyright 1996-present,
Paul R. Hensel. All rights reserved.
Site Privacy Policy