Sociology Sequence Analysis
Brendan Halpin
  • LAST MODIFIED: 29 October 2013
  • DOI: 10.1093/obo/9780199756384-0077


Sequence analysis in sociology refers to a group of approaches to linear (predominantly longitudinal) data that focuses on sequences (such as work-life histories or conversations) as wholes. Sequence analysis is often exploratory and descriptive in intention, typically oriented to generating data-driven typologies, and can be contrasted with conventional approaches to longitudinal data such as hazard-rate modeling (event history analysis), models of transition patterns, or latent growth curve models, which focus on modeling the processes generating the sequences. The development of sequence analysis in sociology has been informed by a perception that complex sequences—such as life-course trajectories or the rhetorical structure of a story—are likely to have structures that are not easily captured by models that focus on their evolution through time, but these structures may be apparent when they are considered as wholes. Sequences are linear objects; that is, they have a unidimensional, ordered structure. In sociological research the dimension is almost always time, so sequences are longitudinal. Two main types of sequences exist: those representing longitudinal process such as life-histories, coded as states in successive time-periods, and those representing structures that unfold through time, such as conversations (coded into types of utterances) or dances (coded into sequences of steps). Sequences are very detailed; for instance, a sequence ten units long in a four-element state space has more than a million possible forms. Hence, sequence data can be difficult to classify either a priori or by inspection. Sequence analysis often has as a goal a data-driven classification, typically achieved by defining a metric of similarity between sequences and using cluster analysis to group sequences on the basis of pairwise similarity. Sequence analysis draws on computer-science techniques for pattern finding in strings of tokens (e.g., text) or other longitudinal data (e.g., recorded speech), some of which have proven very powerful, particularly in molecular biology. A recurrent theme in the sequence analysis literature is whether algorithms developed for, or appropriate to, nonsociological fields such as molecular genetics can map onto sociological data in a meaningful way. Sequence analysis in sociology is currently almost coterminous with the optimal matching algorithm; although alternatives are available, the bulk of research employs this measure. The idea is relatively simple: let the difference between two sequences identical except for one element be related to the difference in the elements, and the difference between two sequences identical except that one has one extra element be the cost of deleting the superfluous element. The difference between any two sequences, then, is the “cost” of the “cheapest” concatenation of substitutions (where the sequences differ in elements) and insertions or deletions that change one sequence into the other. For instance, AABC can be changed into ABBD by deleting the initial A, matching on the next A and the B (two zero-cost “substitutions”), inserting a second B, and substituting the C with a D. Depending on the relative costing of the operations, this may or may not be the cheapest set of operations; longer sequences will have many possible sets to be considered. The optimal matching algorithm (OMA) is “optimal” in identifying the cost of this cheapest set of operations efficiently. The net result is that OMA can identify similarity (identity or partial similarity, thanks to the substitution operation) at the same or different locations (thanks to insertions and deletions, which permit “alignment,” sliding one sequence along another).

Andrew Abbott’s Contribution

The American sociologist Andrew Abbott has done more than anyone else to introduce sequence analysis to the sociological repertoire, with a long series of publications with colleagues and students, advocating and demonstrating its utility. He has used an eclectic range of applications and has made strong arguments about the role of an event-focused “narrative” approach to sociology. At times, his advocacy of sequence analysis developed into a trenchant critique of the state of empirical sociology. Although the concern with sequence predates his work, his advocacy of the optimal matching algorithm has been very influential. A “first wave” of applications can be attributed directly to his influence, which are summarized in a debate in the journal Sociological Methods and Research in 2000 (see the sections on Arguing with Levine and Wu for the debate and “First Wave” Applications for the applications). In the years since, the use of sequence analysis has broadened and deepened, as summarized in “Second Wave” Applications and Future Directions.

General Evangelism and Applications

Abbott’s contribution to the popularization of sequence analysis in sociology has been most effective in presenting real applications in a variety of domains, usually in collaboration with colleagues and students. This process began in the mid-1980s and continued into the early 21st century with some vigor and an eclectic choice of applications. Although Abbott and Forrest 1986 represents the first appearance of optimal matching (OM) in sociology, the first paper with real impact was Abbott and Hrycak 1990 on the careers of German Baroque musicians. Abbott 1991 addresses trajectories of professionalization. Abbott and DeViney 1992 takes the welfare state as the unit of observation. Abbott 1995 surveys the empirical and conceptual use of sequence analysis in a broad sense in the social sciences up to that date. Abbott and Barman 1997 treats the rhetorical organization of journal articles as a sequential structure.

  • Abbott, A. 1991. The order of professionalization: An empirical analysis. Work and Occupations 18.4: 355–384.

    DOI: 10.1177/0730888491018004001Save Citation »Export Citation »E-mail Citation »

    An application rooted in Abbott’s original research interests, professionalization of occupations are viewed as sequences. Theoretically and historically rich. Available online for purchase or by subscription.

    Find this resource:

    • Abbott, A. 1995. Sequence analysis: New methods for old ideas. Annual Review of Sociology 21:93–113.

      DOI: 10.1146/ Citation »Export Citation »E-mail Citation »

      A survey of sequence analysis in a broad sense, including other disciplines such as psychology and anthropology, and using sequence as more of a theoretical than a technical concept. Puts OM in sociology in context and serves as a good bridge to the papers included in Overturning Sociology. Available online for purchase or by subscription.

      Find this resource:

      • Abbott, A., and E. Barman. 1997. Sequence comparison via alignment and Gibbs sampling. Sociological Methodology 27.1: 47–87.

        DOI: 10.1111/1467-9531.271019Save Citation »Export Citation »E-mail Citation »

        Yet another domain, the rhetorical structure of journal articles considered as sequences. The paper also introduces Gibbs sampling as a solution to some technical problems. Available online for purchase or by subscription.

        Find this resource:

        • Abbott, A., and S. DeViney. 1992. The welfare state as transnational event: Evidence from sequences of policy adoption. Social Science History 16.2: 245–274.

          DOI: 10.2307/1171289Save Citation »Export Citation »E-mail Citation »

          An application in yet another domain, the developmental path of the welfare state, demonstrating Abbott’s broad notion of what constitutes sequence. Available online for purchase or by subscription.

          Find this resource:

          • Abbott, A., and J. Forrest. 1986. Optimal matching methods for historical sequences. Journal of Interdisciplinary History 16.3: 471–494.

            DOI: 10.2307/204500Save Citation »Export Citation »E-mail Citation »

            This is the earliest example of optimal matching in sociology. The sequences are notations of traditional English dances, and the research focuses on patterns of cultural diffusion and change across the different villages where the dances had been recorded. Available online for purchase or by subscription.

            Find this resource:

            • Abbott, A., and A. Hrycak. 1990. Measuring resemblance in sequence data: An optimal matching analysis of musicians’ careers. American Journal of Sociology 96.1: 144–185.

              DOI: 10.1086/229495Save Citation »Export Citation »E-mail Citation »

              The first optimal matching paper with real impact. Data on the careers of three hundred German baroque musicians, looking for evidence of a vacancy-chain process. Coleman clearly erred in telling Abbott: “Nobody’s gonna pay any attention . . . as long as you write about dead German musicians” (Abbott 2001, p. 13, cited under Overturning Sociology). Available online for purchase or by subscription.

              Find this resource:

              Overturning Sociology

              Abbott’s contribution has two main facets: the promotion of innovative technical methods for dealing with sequence data (outlined in this section under General Evangelism and Applications) and a much broader argument about the role of time and sequence in sociological theory. The main elements of Abbott’s theoretical argument are that the role of time is not handled well in contemporary sociology and that sequence (in the developmental as well as the temporal sense) needs to be taken more seriously. In this Abbott draws on the work of Peter Abell and David Heise (Abell 1993; Heise 1989, inter alia) that uses sequence as a tool for theory construction. Abbott’s theoretical argument evolves from a general view of the importance of sequence in historical sociology (Abbott 1983 and Abbott 1984), takes in a trenchant critique of contemporary sociology as a prisoner of its powerful statistical techniques (Abbott 1988, as “general linear reality”), and makes strong claims that sociology needs to move “from units to context, from attributes to connections, from causes to events” (Abbott 1995, p. 93, cited under General Evangelism and Applications). This is consonant with the then-current critique of “variable-centered sociology,” which features in Abell’s work (Abell 1993), inter alia. At a technical/methodological level, this argument has resonances with the notion in Raftery 2001 of the emergence in the 1980s of a “third generation” of statistical methods in sociology, dealing with complex phenomena such as social networks and longitudinal and spatial data. A constant tension is found in Abbott’s work between the ambition of the theoretical argument (Abbott 1990, Abbott 2001, and Abbott 1992) and the practical success of the technical innovations that he introduced, the scope of which is much narrower.

              Arguing with Levine and Wu

              In 2000, the journal Sociological Methods and Research published a set of articles on sequence analysis, starting with Abbott and Tsay 2000 in which the authors reviewed the progress of optimal matching analysis in sociology. In response, Levine 2000 and Wu 2000 raised a series of objections; in particular, the authors found that Abbott’s larger claims about a method that transcends the limitations of sociology were overblown, that OMA could not prove its claims, that no mold-breaking application had been seen, and, critically, that the sociological meaning of OM distances was fundamentally unclear. Although some of the criticisms suggest an incomplete understanding of the algorithm, the critics’ repeated point—that findings cannot be relied on if the workings of the method used are opaque—is clear and points to a widely felt difficulty: how to parameterize OM and how to interpret the operation of the algorithm, in sociological terms. In biology, similarity of DNA may be interpreted as a measure of distance to a common ancestor, in which the steps by which divergence arises are roughly comparable to OM’s “elementary operations” of insertion, deletion, and substitution; however, it is not clear a priori that such operations make sense for social science phenomena. At any rate, Levine and Wu made it clear that sequence analysts needed to do more work to relate distance measures to sociological theories. Abbott 2000 responds to these critiques.

              Algorithmic Origins

              The origin of the OMA method is in the Soviet Union, in the so-called Levenshtein distance (Levenshtein 1966). The intended application of Levenshtein’s measure was correction of error-prone transmission of information, and the distance itself is a simple count of the number of edits needed to change one sequence into another. “Information transmission” is a very broad concept, and the applications are many, including inexact text search as well as communications. As molecular biology became more computational over the years, techniques derived from the Levenshtein distance, such as OMA, came to be utilized for searching for patterns in macromolecules such as proteins and DNA. In the 1970s Saul B. Needleman and Christian D. Wunsch generalized the Levenshtein distance to include the idea of weighted substitutions and published the algorithm now bearing their name, but also known as the optimal matching algorithm, which can calculate it efficiently (Needleman and Wunsch 1970). Abbott happened across the OM algorithm thanks to meeting Joseph Kruskal, the famous statistician (see the prologue to Abbott 2001 for an account, cited under Overturning Sociology), at a time when Kruskal and David Sankoff were editing a book building on the Needleman and Wunsch algorithm and some other techniques on methods for analysis of sequence data. This book (Sankoff and Kruskal 1983) contains much that has been influential, notably the chapters in Kruskal 1983 (introducing sequence comparison), Bradley and Bradley 1983 (comparing birdsong as sequences), and Kruskal and Liberman 1983 (exploring time-warping as a means of sequence comparison).

              • Bradley, D. W., and R. A. Bradley. 1983. Application of sequence comparison to the study of bird songs. In Time warps, string edits, and macromolecules. Edited by D. Sankoff and J. B. Kruskal, 189–209. Reading, MA: Addison-Wesley.

                Save Citation »Export Citation »E-mail Citation »

                Chapter 6 shows quite clearly that sequences are not just macromolecules, with birdsong as the application.

                Find this resource:

                • Kruskal, J. B. 1983. An overview of sequence comparison. In Time warps, string edits, and macromolecules. Edited by D. Sankoff and J. B. Kruskal, 1–54. Reading, MA: Addison-Wesley.

                  Save Citation »Export Citation »E-mail Citation »

                  Chapter 1 explains the concept of alignment, compares it with other methods for sequence comparison, and gives a broad range of applications.

                  Find this resource:

                  • Kruskal, J. B., and M. Liberman. 1983. The symmetric time-warping problem. In Time warps, string edits, and macromolecules. Edited by D. Sankoff and J. B. Kruskal, 125–162. Reading, MA: Addison-Wesley.

                    Save Citation »Export Citation »E-mail Citation »

                    Abbott uses the term “time-warping” in his papers in a loose sense, but here Kruskal and Liberman use it in the more formal sense of comparing sequences by locally compressing and expanding the time axis.

                    Find this resource:

                    • Levenshtein, V. I. 1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics-Doklady 10.8: 707–710.

                      Save Citation »Export Citation »E-mail Citation »

                      Levenshtein’s definition of the distance between two token strings in terms of the number of edits (insertions, deletions, substitutions, and transpositions) required to change one string into the other is the starting point for a large body of string comparison algorithms, including optimal matching.

                      Find this resource:

                      • Needleman, S. B., and C. D. Wunsch. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48.3: 443–453.

                        DOI: 10.1016/0022-2836(70)90057-4Save Citation »Export Citation »E-mail Citation »

                        Adapts the Levenshtein distance to use weighted substitution costs. Describes an efficient algorithm to arrive at minimum-cost alignment between sequence pairs: the optimal matching algorithm. Their application was molecular biology, and this marks an early milestone in what is now an extremely large field of bioinformatic sequence analysis. Available online for purchase or by subscription.

                        Find this resource:

                        • Sankoff, D., and J. B. Kruskal, eds. 1983. Time warps, string edits, and macromolecules. Reading, MA: Addison-Wesley.

                          Save Citation »Export Citation »E-mail Citation »

                          This book (reissued in 2001 [Stanford, CA: CSLI]) stands as a major computer science reference for sequence analysis, describing a range of algorithms and applications and inspiring a substantial amount of further work, primarily in computer science and bioinformatics.

                          Find this resource:

                          Didactic Pieces

                          A number of authors, from Abbott 1990 on, have written papers and books with explicitly instructional intention. Chan 1995 drew on his analysis of Hong Kong data. Abbott returned to the topic in MacIndoe and Abbott 2004. Aisenbrey 2000, Brüderl and Scherer 2004, and Scherer and Brüderl 2010 brought it to a German audience, who also benefited from the software of Rohwer and Pötter 1999 and that of Brzinsky-Fay, et al. 2006 (cited under Alternative Approaches and Software: Software). Martin and Wiggins 2011 provides the most recent introduction.

                          “First Wave” Applications

                          More or less directly inspired by Abbott’s advocacy and example, there was a small but significant uptake of optimal matching in sociology during the 1990s. This was predominantly, but not exclusively, US-based and is well summarized in Abbott and Tsay 2000 (cited under Andrew Abbott’s Contribution: Arguing with Levine and Wu). The substance covered a range of topics, although with a strong bias to work-life career data. These included, for instance, work careers (Chan 1995, cited under Didactic Pieces; Scherer 2001), careers of clients of mental health services (Wuerker 1996), transition in careers in Lloyds Bank in the 19th century (Stovel, et al. 1996), careers of women in finance (Blair-Loy 1999), temporal patterns of retirement (Han and Moen 1999a), similarity of careers across couples (Han and Moen 1999b), class careers of British and Irish men (Halpin and Chan 1998), and the temporal pattern of lynching in the southern United States (Stovel 2001).

                          • Blair-Loy, M. 1999. Career patterns of executive women in finance: An optimal matching analysis. American Journal of Sociology 104.5: 1346–1397.

                            DOI: 10.1086/210177Save Citation »Export Citation »E-mail Citation »

                            Studies women’s career in the finance industry and identifies change across cohort in opportunity and perspective. Available online for purchase or by subscription.

                            Find this resource:

                            • Halpin, B. and T. W. Chan. 1998. Class careers as sequences: An optimal matching analysis of work-life histories. European Sociological Review 14.2: 111–130.

                              DOI: 10.1093/oxfordjournals.esr.a018230Save Citation »Export Citation »E-mail Citation »

                              Analyzes class careers of British and Irish men to age 35 using retrospective data. Available online for purchase or by subscription.

                              Find this resource:

                              • Han, S.-K., and P. Moen. 1999a. Clocking out: Temporal patterning of retirement. American Journal of Sociology 105.1: 191–236.

                                DOI: 10.1086/210271Save Citation »Export Citation »E-mail Citation »

                                Looks at trajectories into retirement, noting the “multiplex nature of temporality associated with life course institutionalization and deinstitutionalization” (p 196). Available online for purchase or by subscription.

                                Find this resource:

                                • Han, S.-K., and P. Moen. 1999b. Work and family over time: A life course approach. Annals of the American Academy of Political and Social Science 562:98–110.

                                  DOI: 10.1177/0002716299562001007Save Citation »Export Citation »E-mail Citation »

                                  Examines the extent to which life and work trajectories of couples are coordinated. A relatively rare alternative to cluster analysis of all pairwise distances, in that it uses OM to generate a measure of intracouple similarity. Available online for purchase or by subscription.

                                  Find this resource:

                                  • Scherer, S. 2001. Early career patterns: A comparison of Great Britain and West Germany. In Special issue: Central and Eastern Europe–Higher education and labour markets in transition. Edited by Irena Kogan. European Sociological Review 17.2: 119–144.

                                    DOI: 10.1093/esr/17.2.119Save Citation »Export Citation »E-mail Citation »

                                    The school-to-work transition motivates a high proportion of the literature. Scherer uses West German and British panel data to conduct one of the first OM analyses in a domain in which more conventional methods, such as event history analysis or analyses of outcome at a fixed age, have dominated. Available online for purchase or by subscription.

                                    Find this resource:

                                    • Stovel, K. 2001. Local sequential patterns: The structure of lynching in the Deep South, 1882–1930. Social Forces 79.3: 843–880.

                                      DOI: 10.1353/sof.2001.0026Save Citation »Export Citation »E-mail Citation »

                                      Another alternative to the life-course dominance of sociological sequence analysis, this paper looks at county-level histories of lynching in the southern United States, drawing strongly on arguments from Abbott and others about the necessity of taking a sequence perspective on historical explanations. Available online for purchase or by subscription.

                                      Find this resource:

                                      • Stovel, K., M. Savage, and P. Bearman. 1996. Ascription into achievement: Models of career systems at Lloyds Bank, 1890–1970. American Journal of Sociology 102.2: 358–399.

                                        DOI: 10.1086/230950Save Citation »Export Citation »E-mail Citation »

                                        A sequence-oriented analysis of career data from a British bank, showing a transition between a status-based and an achievement-based system from 1890 to 1970. Available online for purchase or by subscription.

                                        Find this resource:

                                        • Wuerker, A K. 1996. The changing careers of patients with chronic mental illness: A study of sequential patterns in mental health service utilization. Journal of Mental Health Administration 23.4: 458–470.

                                          DOI: 10.1007/BF02521029Save Citation »Export Citation »E-mail Citation »

                                          Treats sequences of services interactions of mental health patients in Los Angeles. A small data set, but of interest because it uses a relatively uncommon form of trajectory. Available online for purchase or by subscription.

                                          Find this resource:

                                          “Second Wave” Applications and Future Directions

                                          Since 2000 vigorous development has occurred in the area of sequence analysis. Aisenbrey and Fasang 2010 gives a very good account of developments in the following decade, arguing that a “second wave” of sequence analysis has begun and that sequence analysis has moved on despite the problems pointed out in Levine 2000 and Wu 2000 (both cited under Andrew Abbott’s Contribution: Arguing with Levine and Wu). Much of this growth has been in the sociology of the life course, with emphasis on the transition to adulthood (see Life Course: School to Work and the Transition to Adulthood and Life Course: Other Labor Market Trajectories in this section). However, other sorts of sequence are also featured, such as daily time-use patterns (see Time Use in this section) and residential, activist, and organizational careers (see Less Common Trajectories in this section). Other methods have been proposed (see the section on Alternative Approaches and Software), as modifications of the optimal matching algorithm or as radical alternatives to it, such as Laurent Lesnard’s “dynamic Hamming” measure (with application to time-use data) and C. H. Elzinga’s combinatorial methods (which define similarity in terms of sequences going through the same states in the same order, if not consecutively). Simulations and other investigations allow researchers to know much more about the characteristics of optimal matching and how to parameterize it. In parallel, much better software and statistical and graphical tools have emerged, making sequence analysis accessible within widely used statistical packages. Although sequence analysis remains predominantly exploratory and descriptive, the hitherto impermeable boundaries between OM and conventional stochastic statistical techniques are being broken, and a large toolbox for dealing with longitudinal sociological data is being assembled. Researchers may not observe the revolution against “general linear reality” that Abbott advocated, but they have a much better capacity for dealing with the longitudinal in sociology.

                                          • Aisenbrey, S., and A. E. Fasang. 2010. New life for old ideas: The “second wave” of sequence analysis bringing the “course” back into the life course. Sociological Methods and Research 38.3: 420–462.

                                            DOI: 10.1177/0049124109357532Save Citation »Export Citation »E-mail Citation »

                                            An important statement of developments in, and applications of, sequence analysis in sociology in the first decade of the 21st century. Available online for purchase or by subscription.

                                            Find this resource:

                                            Life Course: School to Work and the Transition to Adulthood

                                            Sequence analysis in life-course studies has substantially broadened, covering fertility and partnership formation, residence, and the labor market, as well as a particularly strong representation in research on the school-to-work transition. Bynner, et al. 2001 examines the school-to-work transition for two cohorts in the United Kingdom. McVicar and Anyadike-Danes 2002 does similar work with a cohort in Northern Ireland. Brzinsky-Fay 2007 examines labor market entry for cohorts from the European Community Household Panel. Elzinga and Liefbroer 2007 addresses destandardization of labor market insertion in nineteen countries. Martin, et al. 2008 develops ideal types of labor market entry trajectories. Bras, et al. 2010 uses historical Dutch data on the transition to adulthood. Bühlmann 2010 examines routes into professional and higher managerial work in the United Kingdom. Biemann, et al. 2011 examines destabilization of careers in Germany. Anyadike-Danes and McVicar 2005 identifies early correlates of male trajectories characterized by persistent unemployment, and Liefbroer and Elzinga 2012 compares parents’ and children’s life-course patterns.

                                            • Anyadike-Danes, M., and D. McVicar. 2005. You’ll never walk alone: Childhood influences and male career path clusters. Labour Economics 12.4: 511–530.

                                              DOI: 10.1016/j.labeco.2005.05.008Save Citation »Export Citation »E-mail Citation »

                                              Using data from the 1970 British Cohort Study, the paper identifies early correlates of male trajectories characterized by persistent unemployment. Available online for purchase or by subscription.

                                              Find this resource:

                                              • Biemann, T., A. E. Fasang, and D. Grunow. 2011. Do economic globalization and industry growth destabilize careers? An analysis of career complexity and career patterns over time. Organization Studies 32.12: 1639–1663.

                                                DOI: 10.1177/0170840611421246Save Citation »Export Citation »E-mail Citation »

                                                This paper focuses on the effect of globalization and economic change on early careers in Germany. Although change is evident, it does not seem to be driven by globalization, and career patterns have been relatively stable over time. Available online for purchase or by subscription.

                                                Find this resource:

                                                • Bras, H., A. C. Liefbroer, and C. H. Elzinga. 2010. Standardization of pathways to adulthood? An analysis of Dutch cohorts born between 1850 and 1900. Demography 47.4: 1013–1034.

                                                  DOI: 10.1007/BF03213737Save Citation »Export Citation »E-mail Citation »

                                                  Using combinatorial measures rather than OM (as in Elzinga and Liefbroer 2007), this paper tests hypotheses concerning standardization of pathways to adulthood, with Dutch cohort data. Available online for purchase or by subscription.

                                                  Find this resource:

                                                  • Brzinsky-Fay, C. 2007. Lost in transition? Labour market entry sequences of school leavers in Europe. European Sociological Review 23.4: 409–422.

                                                    DOI: 10.1093/esr/jcm011Save Citation »Export Citation »E-mail Citation »

                                                    Drawing on data from the European Community Household Panel, this paper addresses the school-to-work transition in ten EU countries, creating a classification that is compared with theoretically driven classification. Available online for purchase or by subscription.

                                                    Find this resource:

                                                    • Bühlmann, F. 2010. Routes into the British service class: Feeder logics according to gender and occupational groups. Sociology 44.2: 195–212.

                                                      DOI: 10.1177/0038038509357193Save Citation »Export Citation »E-mail Citation »

                                                      With data from the UK National Child Development Study birth cohort, this paper examines routes into the salariat, Goldthorpe’s “service class.” It finds two routes: one direct, and one long and “tortuous.” Available online for purchase or by subscription.

                                                      Find this resource:

                                                      • Bynner, J., I. Schoon, Joshi, R. D. Wiggins. 2001. Transitions from school to work in a changing social context. Young 9.1: 4–22.

                                                        DOI: 10.1177/110330880100900102Save Citation »Export Citation »E-mail Citation »

                                                        Working with data from the UK National Child Development Study and the British Cohort Study (cohorts born in 1958 and 1970, respectively), the authors apply OMA to labor market trajectories of two cohorts, finding a great deal of commonality but a greater difficulty of integration for the younger cohort. Available online for purchase or by subscription.

                                                        Find this resource:

                                                        • Elzinga, C. H., and A. C. Liefbroer. 2007. “De-standardization of family-life trajectories of young adults: A cross-national comparison using sequence analysis.” European Journal of Population/Revue européenne de Démographie 23.3–4: 225–250.

                                                          DOI: 10.1007/s10680-007-9133-7Save Citation »Export Citation »E-mail Citation »

                                                          Using Elzinga’s combinatorial approach to sequence comparison (Elzinga 2005, cited under Alternative Approaches and Software: Elzinga’s Combinatorial Approaches) rather than OMA, this paper examines the issue of destandardization of young adults’ careers in nineteen countries.

                                                          Find this resource:

                                                          • Liefbroer, A. C., and C. H. Elzinga. 2012. Intergenerational transmission of behavioural patterns: How similar are parents’ and children’s demographic trajectories? Advances in Life Course Research 17.1: 1–10.

                                                            DOI: 10.1016/j.alcr.2012.01.002Save Citation »Export Citation »E-mail Citation »

                                                            Compares parents’ and children’s life-course patterns and shows that substantial transmission of patterns exists, despite a considerable change in conditions across generations. Available online for purchase or by subscription.

                                                            Find this resource:

                                                            • Martin, P., I. Schoon, and A. Ross. 2008. Beyond transitions: Applying optimal matching analysis to life course research. International Journal of Social Research Methodology 11.3: 179–199.

                                                              DOI: 10.1080/13645570701622025Save Citation »Export Citation »E-mail Citation »

                                                              Part of the same team as Wiggins, et al. 2007 (cited under Life Course: Multiple Domains) that use theoretically derived ideal types, this paper relates sequences to data-derived ideal types. Distance from each sequence to each ideal type is used to assess the coherence of the classification exercise. Available online for purchase or by subscription.

                                                              Find this resource:

                                                              • McVicar, D., and M. Anyadike-Danes. 2002. Predicting successful and unsuccessful transitions from school to work using sequence methods. Journal of the Royal Statistical Society: Series A 165.2: 317–334.

                                                                Save Citation »Export Citation »E-mail Citation »

                                                                Using data from Northern Ireland, the authors use sequence analysis to cluster school-to-work transition trajectories, with a view to identifying factors that predict problematic paths. Available online for purchase or by subscription.

                                                                Find this resource:

                                                                Life Course: Other Labor Market Trajectories

                                                                Although the transition from school to work, or the transition to adulthood in general, is a very common focus in the sequence analysis literature, additional work examines other parts of the labor market career. Malo and Muñoz-Bullón 2003 and Levy, et al. 2006 examine labor market careers over a longer perspective, and Fasang 2012 focuses on the transition to retirement.

                                                                Life Course: Multiple Domains

                                                                In many ways the exploratory potential of sequence analysis is highest when the sequences are complex, and the interrelation of multiple domains (such as work, education, family, and fertility) is a special case of complexity. Quite a bit of work has been done using multiple domains, either by combining domains into a single complex state space or by using multichannel sequence analysis (MCSA) software (see Gauthier, et al. 2010). A focus on multiple domains is present early on; Dijkstra and Taris 1995 (cited under Alternative Approaches and Software: Early Alternatives) used multiple domains in its example, and Han and Moen 1999a (cited under “First Wave” Applications) also used multiple work-life domains. Aassve, et al. 2007 combines labor market, partnership formation, and fertility of British women. Pollock 2007 also uses British data on housing, marital status, and children. Wiggins, et al. 2007 looks at the effect of work, housing, and partnership histories on the elderly. Bühlmann 2008 combines occupation and industry classifications. Müller, et al. 2008 combines residence, partnership, and fertility for Swiss data. Gauthier, et al. 2010 makes a formal argument for multichannel sequence analysis and presents examples using specialized software.

                                                                • Aassve, A., F. Billari, and R. Piccarreta. 2007. Strings of adulthood: A sequence analysis of young British women’s work-family trajectories. European Journal of Population 23.3–4: 369–388.

                                                                  DOI: 10.1007/s10680-007-9134-6Save Citation »Export Citation »E-mail Citation »

                                                                  One of a small group of papers that pushes sequence analysis in the interesting direction of analyzing multiple life-course domains simultaneously: multichannel sequence analysis. Using females’ life histories from the British Household Panel Study, it unites the labor market, partnership formation, and childbearing to create a data-driven typology. Available online for purchase or by subscription.

                                                                  Find this resource:

                                                                  • Bühlmann, F. 2008. The corrosion of career? Occupational trajectories of business economists and engineers in Switzerland. In Special issue: Central and Eastern Europe—Higher education and labour markets in transition. Edited by Irena Kogan. European Sociological Review 24.5: 601–616.

                                                                    DOI: 10.1093/esr/jcn019Save Citation »Export Citation »E-mail Citation »

                                                                    An explicitly multichannel sequence analysis paper, taking account simultaneously of different dimensions of the work-career of engineers and economists in Switzerland. Available online for purchase or by subscription.

                                                                    Find this resource:

                                                                    • Gauthier, J.-A., E. D. Widmer, P. Bucher, and C. Notredame. 2010. Multichannel sequence analysis applied to social science data. Sociological Methodology 40.1: 1–38.

                                                                      DOI: 10.1111/j.1467-9531.2010.01227.xSave Citation »Export Citation »E-mail Citation »

                                                                      Formally argues for multichannel sequence analysis rather than parallel single-channel analysis because calculating distances based on all dimensions together produces more robust and informative results than does calculating distances on each dimension separately. Uses custom software. Available online for purchase or by subscription.

                                                                      Find this resource:

                                                                      • Müller, N. S., S. Lespinats, G. Ritschard, M. Studer, and A. Gabadinho. 2008. Visualisation et classification des parcours de vie. Revue des Nouvelles Technologies de l’Information 2:499–510.

                                                                        Save Citation »Export Citation »E-mail Citation »

                                                                        Uses retrospective data from the Swiss Household Panel to examine multidimensional life-course trajectories (residence, partnership, and fertility), arguing for the utility of OM for exploratory analysis of longitudinal data. Uses multidimensional scaling to reduce the complexity of the distance data and proposes useful visualization techniques for trajectory data.

                                                                        Find this resource:

                                                                        • Pollock, G. 2007. Holistic trajectories: A study of combined employment, housing and family careers by using multiple-sequence analysis. Journal of the Royal Statistical Society: Series A 170.1: 167–183.

                                                                          Save Citation »Export Citation »E-mail Citation »

                                                                          A nice example of a multichannel sequence analysis bringing together employment, housing tenure, marital status, and responsibility for children, using ten years of British Household Panel Study data. Available online for purchase or by subscription.

                                                                          Find this resource:

                                                                          • Wiggins, R. D., C. Erzberger, M. Hyde, P. Higgs, and D. Blane. 2007. Optimal matching analysis using ideal types to describe the lifecourse: An illustration of how histories of work, partnerships and housing relate to quality of life in early old age. International Journal of Social Research Methodology 10.4: 259–278.

                                                                            DOI: 10.1080/13645570701542025Save Citation »Export Citation »E-mail Citation »

                                                                            Predicting quality of life in old age from housing, partnership, and work life-histories, this paper uses optimal matching to assign trajectories to ideal types developed a priori, an effective but relatively unusual approach in the literature. See also Martin, et al. 2008 (cited in this section under Life Course: School to Work and the Transition to Adulthood). Available online for purchase or by subscription.

                                                                            Find this resource:

                                                                            Less Common Trajectories

                                                                            Although a great deal of sequence research focuses on labor market and family issues, a certain amount of research on other life-course issues is evident. Clark, et al. 2003 and Stovel and Bolan 2004 focus on different aspects of housing careers, Blanchard 2010 examines the careers of political activists, and Stark and Vedres 2006 uses OM to model the ownership trajectories of firms.

                                                                            Time Use

                                                                            Time-use research is a very distinct area within sequence analysis, with Laurent Lesnard’s “dynamic Hamming” distance measure as the dominant alternative to OMA. The dynamic Hamming measure compares sequences time-point by time-point (no “alignment”) but using interstate distances dynamically calculated from the time-dependent pattern of rates of transition between the states. Thus at times when many transitions are occurring, states are judged to be more similar to each other than at times when the transition rate is low. Lesnard makes a strong argument that this is more appropriate for time-use data when the “clock” or calendar is important and the dislocation of time caused by alignment is inappropriate. Lesnard 2010 makes the argument for the appropriateness of dynamic Hamming as an alternative to OM. De Saint Pol 2006 focuses on the distinctive timing of meals in France. Lesnard 2008 uses dynamic Hamming distance on time data for dual-earner couples. Lesnard and de Saint Pol 2009 uses dynamic Hamming to examine the scheduling of paid work.

                                                                            Alternative Approaches and Software

                                                                            The optimal matching algorithm is by some margin the dominant approach in sequence analysis in sociology. In this section some alternatives are presented. The two most prominent alternatives are dynamic Hamming (see the section on Time Use, under “Second Wave” Applications and Future Directions) and Elzinga’s combinatorial approaches (see Elzinga’s Combinatorial Approaches in this section). Also discussed in this section are sequence analysis prior to the availability of OM (under Early Alternatives), systematic attempts to assess OM (under Evaluating OM and Its Parameterization), and two attempts to improve on OM (under Directly Modifying the OM Algorithm). Other papers discussing alternatives to OM are discussed under Competing and Complementary Approaches and model-based approaches under Competing Approaches: Model-Based. Software issues are briefly discussed under Software.

                                                                            Early Alternatives

                                                                            Researchers were applying other methods to the problem of the holistic examination of complete sequences before OM became available. The simplest approach is to define similarity on the basis of element-by-element similarity (similarity at the same time), the so-called “Hamming distance.” Although this method cannot pick up similarity displaced in time in the way that OMA can, it has the virtue of clarity. Buchmann and Sacchi 1995 used Hamming distance in a relatively sophisticated manner, using collateral data on occupations to define distances in a very large state space (i.e., hundreds of occupational groups) and then classifying work-life careers of a Swiss birth-cohort. It became clear that OMA was not a solution in search of a problem but served a need for simplifying complex longitudinal data that was already felt by researchers. Another approach to determining similarities between sequences was to divide them into periods (e.g., combine monthly data into six-month sections) and conduct a factor analysis on summaries (such as cumulated durations) of the periods (Degenne, et al. 1996). This allowed a certain amount of time-dislocation (within the blocks), while retaining a computationally simple period-by-period Hamming-style comparison between the blocks, using distances based on the factors. This is sometimes known as qualitative harmonic analysis (QHA). For a recent comparison of this method and OMA, see also Robette and Thibauld 2008 (cited under Evaluating OM and Its Parameterization). Another competing approach was that of Dijkstra and Taris 1995, in which the authors proposed a method to define similarity or distance that involves dropping elements not shared by both sequences and counting the number of common subsequences (a focus on the same states in the same order). In a comment on that paper, Abbott could demonstrate that OMA was a much more general measure (Abbott 1995). Dijkstra and Taris’s method (implemented in Macintosh software, see Dijkstra 1994, cited under Software) is no longer used, but some of its motivation has been taken up by Elzinga (see Elzinga’s Combinatorial Approaches) in his subsequence-oriented approaches.

                                                                            • Abbott, A. 1995. A comment on “Measuring the agreement between sequences.” Sociological Methods and Research, 24(2): 232–243.

                                                                              DOI: 10.1177/0049124195024002005Save Citation »Export Citation »E-mail Citation »

                                                                              Abbott’s rebuttal to Dijkstra and Taris 1995, in which he shows that the OM algorithm is more general. Available online for purchase or by subscription.

                                                                              Find this resource:

                                                                              • Buchmann, M., and S. Sacchi. 1995. Mehrdimensionale klassifikation beruflicher verlaufsdaten: Eine anwendung auf berufslaufbahnen zweier Schweizer geburtskohorten. Kölner Zeitschrift für Soziologie und Sozialpsychologie 47.3: 413–442.

                                                                                Save Citation »Export Citation »E-mail Citation »

                                                                                This paper precedes optimal matching but works with a definition of sequence similarity (of work-life careers) in terms of distance (at the same time) between occupations using a sophisticated multidimensional approach to defining distance between occupations.

                                                                                Find this resource:

                                                                                • Degenne, A., M.-O. Lebeaux, and L. Mounier. 1996. Typologies d’itinéraires comme instrument d’analyse du marché du travail. In Typologie des marchés du travail, suivi et parcours: L’analyse longitudinale du marché du travail, Rennes, 23–24 May 1996. Edited by A. Degenne, M. Mansuy, G. Podevin, and P. Werquin. Documents, série séminaires 115. Marseille, France: Céreq.

                                                                                  Save Citation »Export Citation »E-mail Citation »

                                                                                  A rare example of qualitative harmonic analysis (see also Robette and Thibauld 2008, cited under Evaluating OM and Its Parameterization), applied to French work-life data.

                                                                                  Find this resource:

                                                                                  • Dijkstra, W., and T. Taris. 1995. Measuring the agreement between sequences. Sociological Methods and Research, 24.2: 214–231.

                                                                                    DOI: 10.1177/0049124195024002004Save Citation »Export Citation »E-mail Citation »

                                                                                    The paper in which Dijkstra and Taris propose their definition of sequence similarity. Available online for purchase or by subscription.

                                                                                    Find this resource:

                                                                                    Evaluating OM and Its Parameterization

                                                                                    A good deal of the critique of OM focuses on how to set the various parameters and what the consequences of the details of the algorithm are for intersequence distances (e.g., Wu 2000, cited under Andrew Abbott’s Contribution: Arguing with Levine and Wu). Some work that addresses these problems directly includes Wilson 2006, which uses simulation to explore how well sequence analysis can recover underlying structures; Robette and Thibauld 2008, which compares qualitative harmonic analysis (see also Degenne, et al. 1996, cited in this section under Early Alternatives) with OM; and Gauthier, et al. 2009, which addresses the parameterization of OM directly.

                                                                                    Directly Modifying the OM Algorithm

                                                                                    OM focuses on editing token strings with elementary operations that focus on tokens out of context, such that, for instance, substitution takes account only of one token in each string, ignoring the tokens’ neighbors. This can be seen as sociologically unattractive, as for instance by Wu 2000 (cited under Andrew Abbott’s Contribution: Arguing with Levine and Wu). Hollister 2009 and Halpin 2010 independently propose distinct but similar adaptations of the OM algorithm to take account of context. Unfortunately, neither algorithm preserves the metric property of the OM distance.

                                                                                    • Halpin, B. 2010. Optimal matching analysis and life-course data: The importance of duration. Sociological Methods and Research 38.3: 365–388.

                                                                                      DOI: 10.1177/0049124110363590Save Citation »Export Citation »E-mail Citation »

                                                                                      Halpin’s argument is that operations on tokens in long spells should cost less than those on tokens in short spells. His duration-adjusted OM adapts the original algorithm to weight operations inversely with the square root of the spell length. Available online for purchase or by subscription.

                                                                                      Find this resource:

                                                                                      • Hollister, M. 2009. Is optimal matching suboptimal? Sociological Methods and Research 38.2: 235–264.

                                                                                        DOI: 10.1177/0049124109346164Save Citation »Export Citation »E-mail Citation »

                                                                                        Hollister proposes “localized OM,” in which the cost of insertion and deletion operations are modified according to the substitution cost between the inserted (deleted) element and its neighbors. Thus, changes that make less substantive difference are cheaper. Available online for purchase or by subscription.

                                                                                        Find this resource:

                                                                                        Competing and Complementary Approaches

                                                                                        A number of papers offer competing or complementary approaches to the use of OM in sequence analysis. Billari 2001 presents a “monothetic divisive” algorithm for nonrepeating event data, which has a structure that OM will not exploit. Billari, et al. 2006 presents techniques drawing from the machine-learning literature. Piccarreta and Lior 2010 and Piccarreta 2012 focus on multidimensional scaling as an alternative to cluster analysis of intersequence distances. Studer, et al. 2011 presents the notion of “discrepancy,” which permits ANOVA-like decomposition of pairwise intersequence distance matrices.

                                                                                        • Billari, F. C. 2001. Sequence analysis in demographic research. Canadian Studies in Population 28.2: 439–458.

                                                                                          Save Citation »Export Citation »E-mail Citation »

                                                                                          A primer for sequence analysis in general, this paper also presents a “monothetic divisive” algorithm, which classifies “unique-event” data (first job, first partnership, first birth, etc) in an efficient and interpretable manner.

                                                                                          Find this resource:

                                                                                          • Billari, F. C., J. Fürnkranz, and A. Prskawetz. 2006. Timing, sequencing, and quantum of life course events: A machine learning approach. European Journal of Population 22.1: 37–65.

                                                                                            DOI: 10.1007/s10680-005-5549-0Save Citation »Export Citation »E-mail Citation »

                                                                                            Another innovative approach from Billari and colleagues, looking at machine-learning techniques that exploit the timing, order, and amount of events. The application is to Italian and Austrian transitions to adulthood. Available online for purchase or by subscription.

                                                                                            Find this resource:

                                                                                            • Billari, F. C., and R. Piccarreta. 2005. Analyzing demographic life courses through sequence analysis. Mathematical Population Studies 12.2: 81–106.

                                                                                              DOI: 10.1080/08898480590932287Save Citation »Export Citation »E-mail Citation »

                                                                                              Further development of the ideas introduced in Billari 2001. Available online for purchase or by subscription.

                                                                                              Find this resource:

                                                                                              • Piccarreta, R. 2012. Graphical and smoothing techniques for sequence analysis. Sociological Methods and Research 41.2.

                                                                                                Save Citation »Export Citation »E-mail Citation »

                                                                                                This paper extends the multidimensional scaling (MDS) approach of Piccarreta and Lior 2010 to dealing with very large data sets and proposes a means of smoothing the graphical representation, greatly facilitating exploration and description. A downside is that the first MDS dimension will often exclude a lot of interesting variation. Available online for purchase or by subscription.

                                                                                                Find this resource:

                                                                                                • Piccarreta, R., and O. Lior. 2010. Exploring sequences: A graphical tool based on multi-dimensional scaling. Journal of the Royal Statistical Society: Series A 173.1: 165–184.

                                                                                                  Save Citation »Export Citation »E-mail Citation »

                                                                                                  Sequence analysts typically run cluster analysis on the pairwise distances, but alternatives are possible. In this paper Piccarreta and Lior argue for the utility of extracting dimensions from the distance matrix using multidimensional scaling. This imposes a simple element of order on the sequences, enabling exploratory research. Available online for purchase or by subscription.

                                                                                                  Find this resource:

                                                                                                  • Studer, M., G. Ritschard, A. Gabadinho, and N. S. Müller. 2011. Discrepancy analysis of state sequences. Sociological Methods and Research 40.3: 471–510.

                                                                                                    DOI: 10.1177/0049124111415372Save Citation »Export Citation »E-mail Citation »

                                                                                                    An alternative to cluster analysis for analysis of pairwise intersequence distances: the “discrepancy” measure of average distance to group center. Permits ANOVA-like calculations using observed categorical variables, or cluster solutions. Offers pseudo-F tests and a pseudo-R2 measure, using bootstrapping. A very useful complement to cluster analysis or MDS of distances. Available online for purchase or by subscription.

                                                                                                    Find this resource:

                                                                                                    Competing Approaches: Model-Based

                                                                                                    Optimal matching provides an algorithmic distance measure (which may be used to generate a typology), but model-based distances and model-based classifications are also possible. Massoni, et al. 2009 use enhancements of Markov models (characterizing sequences by a relatively parsimonious structure of time-weighted transition parameters), whereas the other papers use latent class models to group sequences in a stochastic manner. Vaughn, et al. 2009 focuses on criminal careers, and Barban and Billari 2012 directly compares sequence analysis with latent class analysis. As sequence analysis matures, it is important that links are being made between the exploratory and algorithmic techniques and more conventional statistical techniques.

                                                                                                    • Barban, N., and F. C. Billari. 2012. Classifying life course trajectories: A comparison of latent class and sequence analysis. Journal of the Royal Statistical Society: Series C 61.5: 756–784.

                                                                                                      Save Citation »Export Citation »E-mail Citation »

                                                                                                      A comparison of latent class analysis and OM, using simulation. Although the techniques give largely consistent results, they credit OM with more sensitivity with respect to timing and ordering of events. Available online for purchase or by subscription.

                                                                                                      Find this resource:

                                                                                                      • Massoni, S., M. Olteanu, and P. Rousset. 2009. Career-path analysis using optimal matching and self-organizing maps. In Advances in self-organizing maps: Proceedings of the 7th International Workshop, WSOM, St. Augustine, FL, 8–10 June. Vol. 5629, Lecture Notes in Computer Science. Edited by J. C. Principe and Risto Miikkulainen, 154–162. Berlin and Heidelberg, Germany: Springer.

                                                                                                        Save Citation »Export Citation »E-mail Citation »

                                                                                                        Defines distance between sequences according to the parameters of a “drifting Markov model.” Also uses Kohonen self-organizing maps as an alternative to conventional cluster analysis.

                                                                                                        Find this resource:

                                                                                                        • Vaughn, M. G., M. DeLisi, K. M. Beaver, and M. O. Howard. 2009. Multiple murder and criminal careers: A latent class analysis of multiple homicide offenders. Forensic Science International 183.1–3: 67–73.

                                                                                                          DOI: 10.1016/j.forsciint.2008.10.014Save Citation »Export Citation »E-mail Citation »

                                                                                                          Uses latent class analysis to categorize criminal careers. Available online for purchase or by subscription.

                                                                                                          Find this resource:

                                                                                                          Elzinga’s Combinatorial Approaches

                                                                                                          In an approach that can be related conceptually to Dijkstra and Taris 1995 (cited under Early Alternatives) in that it focuses on similarity defined as the extent to which sequences experience the same states in the same order, Elzinga 2003, Elzinga 2005, and Elzinga 2010 propose a set of methods that count how often sequences share the same subsequences (i.e., order-preserving subsets of the sequences; AC is a subsequence of ABC). This produces a measure of longitudinal similarity that has a radically different justification compared with OM; its focus is on shared order rather than similarity with allowance for time-displacement. Elzinga, et al. 2008 links Elzinga’s developments in social sequence analysis with parallel developments in computer science and biology. Bras, et al. 2010 applies Elzinga’s method to Dutch data on the transition to adulthood.


                                                                                                          Sankoff and Kruskal 1983 (cited under Algorithmic Origins) explained the OM algorithm with sufficient clarity that Abbott was able to implement it in BASIC. Later it was implemented in Goëtz Rohwer’s TDA computer program (Rohwer and Pötter 1999) and is now available in the statistical programming language R in the TraMineR package (Gabadinho, et al. 2009; Gabadinho, et al. 2011) and Stata (see Brzinsky-Fay, et al. 2006), and also in Halpin’s SADI package for Stata (Halpin 2011). Elzinga’s subsequence methods are implemented in his own software, CHESA, and some are also available in TraMineR and SADI. Lesnard’s dynamic Hamming is available both in TraMineR and SADI. Of historical interest is Dijkstra 1994, which implements the method of Dijkstra and Taris 1995 (cited in this section under Early Alternatives).

                                                                                                          back to top