Conservative pattern accumulation in foreign language learning Text of a talk at EUROSLA 6, Nijmegen, May 31-June 2, 1996 Unrevised, as read Robert Bley-Vroman University of Hawaii vroman@hawaii.edu available: http://www.sls.hawaii.edu/bley-vroman/eurosla.txt This talk, about the acquisition of sentence patterns, represents one part of a much more extensive paper, of three parts. One other part, which time prevents dealing with, involves how conservative pattern learning might account for certain constraints on extraction. There, the basic idea is that the learner thinks that extraction from, say, relative clauses is impossible because one has not encountered relative clauses with extraction holes in them. The other main part attempts to situate pattern theory in linguistics and applied linguistics. Although I will not talk about this area, I have included a lot of food for thought in the final pages of the handout. The basic idea there is that patterns are part of language broadly conceived, even "normal" native languages, though not part of the central triggers-and-deductions UG system. The logical problem of foreign language learning is to account for a range of variability, a lack of guaranteed "success", but the possibility of coming close to native-like language behavior. The theoretical research program I outline ought to yield a system in which wide individual variability is expected, native grammars are never achieved, but in which the languages generated by the grammars of learners can become arbitrarily close to weak equivalence with the target. In this restricted sense, foreign languages are learnable. My presentation here is more discursive than technical. I want to give a flavor of the rationale and of the goals of the program. The outline of this talk is page 2 of your handout. I am concentrating on only one particular aspect of acquisition, the learning of syntactic patterns. I do not deny that there is much more to the induction of a rule system than this process. I also accept that L2 linguistic knowledge is more than a list of patterns. I also accept that there is more that shapes second language behavior than the acquired syntactic pattern system. I also admit that the mechanisms proposed are much too simplistic. My point is to show in a general way how this kind of approach can shed light at a macroscropic level of general properties of the phenomenon of foreign language learning. At the highest level, I wish to suggest that much can be gained by considering foreign language learning as a primarily inductive process. This contrasts strongly with the triggers-and-deductions view favored by current UG theories. * As a first approximation, a pattern is a sequence of categories/subcategories. Det N is a pattern, in particular an NP pattern. Equivalently, patterns are labelled trees or subtrees. I use the term "pattern" even though "rule" would do equally well. And a "grammar" resembles a traditional phrase-structure grammar such as one might have seen in the early days of generative grammar, in which language was conceived of as a system of rules and language-learning the acquisition of those rules (rather than deducing consequences in an axiomatic triggering system). Phrases are categorized, with category understood as a complex object. In the theory of patterns proposed here, we will permit phrases to be subcategorized, in part based on internal features, including internal configurational structure. So, for example, we can say that in German an adverbial modifier can be combine with a following S, forming a larger S. Furthermore, we can specify that the S to which the adverbial is adjoined must by of a particular subtype, namely Inverted. Internal structure or morphology also produce classifications like: indicative clause versus subjunctive clause, singular NP versus plural NP, nominative NP versus accusative NP. Patterns may be subcategorized both on the basis of internal structure and external function/distribution. A relative clause is a special sort of clause in that it combines externally with a nominal expression and functions to modify it. A relative clause also has special internal characteristics, consisting of a relative pronoun and a sentence with a missing element. Subcategorization of phrases based on external function also include, for instance, the distinction between, say, subject NP versus object NP; relative clauses versus complement clauses. * What range of concepts or features is available for building the categories and subcategories of the pattern system? I assume that the features available are those given by the native language structure or evidently present in the input (or derived from the learner's own conceptualization of the nature of language). The learner cannot, as far as one can tell, go beyond these sources. For example, the category relative clause may be available from the native language, and the internal structure may be derived from the L2 input. The restriction to these sources, rather than UG, is the major constraint on the system and what prevents the languages of learners from being completely "wild". * A given element is simultaneouly both an instance of a category and one or more subcategories. A sentence like "Gestern besuchte ich meinen Freund" is an instance of that pattern Adverb Sinv, and the subsequence "besuchte ich meinen Freund" is an instance of the category Sinv. But besuchte ich meinen Freund is also an instance of S, of which S inverted is a subcategory. Therefore, "Gestern besuchte ich meinen Freund" is not simply an instance of the pattern Adverb + Sinv, it is also an instance of Adverb + S. "Besuchte ich meinen Freund" belongs to other subcategories as well--to the category finite clause, for example; and to the category preterite clause. The sentence A man milks the cow is an instance of the pattern NP + VP: it is also an instance of the pattern NP singular + VP singular. This is just like non-linguistic categorization. The dog which I saw running down our street yesterday was simultaneously a member of the categories dog, labrador retriever, four-legged animal, etc. * Since two patterns can be true of the same input, two patterns can coexist in the learner's grammar, perhaps with varying strengths. This possibility, which follows directly from the notion of pattern, obviously automatically predicts the phenomena of "optionality" and "stage seepage" as well as apparent "going beyond the input." Since an input of Gestern besuchte ich meinen Freund demonstrates the pattern Adv S inverted and Adv S, the learner can generate both Gestern besuchte ich meinen Freund and Gestern ich besuchte meinen Freund consistent with the single pattern grammar. It is not inconsistent to say both "I saw a dog yesterday" and "I saw a labrador retriever yesterday." * The category to which I can assign an object depends on what I notice about that object. If it hadn't noticed the breed of dog, I could not have said that a labrador had run down the street, but only that a dog had run down the street. If I don't attend to the fact that the sentence to which the adverbial is adjoined is inverted, then it counts at intake as an instance of the category S, not of Sinv; and the pattern is Adverbial + S. It should be clear that the basic task for the language learner is that of noticing the right things and determining the correct type and level of categorization for a given pattern. The learner may know, for example, that both inverted and non-inverted clause patterns exist, but may not realize that the invertedness of besuchte der Mann seinen Freund in Gestern besuchte der Mann seinen Freund is relevant to the pattern with fronted adverbials. I believe this is the best way to make sense of Richard Schmidt's conjecture that what he calls "understanding" as well as noticing is essential for second language acquisition. * Since noticing is important, there will be frequency and salience effects. Different learners will notice different things, and input will have a much greater effect than in child language development. Because of such effects, it is possible that the interlanguage may appear to make distinctions among subtypes which are not distinguished grammatically in the target language grammar. For example, in German, if the subtype of the inverted sentence containing pronoun subjects is more common or more salient than that with full NPs, then the learner's production may appear to make a distinction between pronominal and non-pronominal subjects in inversion, even though no such distinction is present in the native-speaker grammar. If you are students of the acquisition of German, you will see the relevance of this example. Again, such frequency-based distinctions in the knowledge system are to be expected under the pattern-learning conception though less expected under a UG-driven triggers-and-deductions systems. (In fact, this particular example, of word-order dependent on the pronoun vs. full NP difference is attested in the learning of German as a foreign language. Schwartz and Sprouse, attempting to accomodate this assymetry to UG theory, suggest that learners are treating abstract German agreement as if it were French, despite the plentiful triggering input, driven by their version of conservatism (the subset principle.) * A large part of the acquisition of syntax revolves around categorization and noticing. Therefore, it interfaces with the cognitive operation of human categorization in general and can be expected to share characteristics, such as prototype and basic-level effects, with other categorization and induction processes. Thus, one expects learners to fix on a particular subtype of a pattern as prototypical and "unmarked". For example, learners of German often seem to behave as if the SVX order is the basic order. This order can thus be expected to be extended to cases where the "correct" pattern has yet to be mastered or where the pattern in question does not specify a particular subtype. When foreign language learners of German first note the pattern with an initial adverb, but if the necessity of having the inverted subtype of clause with it is not yet noted or incompletely mastered, so that the operative pattern is Adv S, then one can expect the prototype word-order to be used, namely SVX. We also see this clearly in the extension of SVX to subordinate clauses by some learners of German. Note the clear contrast with UG-driven child learners. There, the deductive structure of UG "sets them up" to create verb-final subordinate clauses as soon as they begin producing them, despite the overall frequency of and salience of SVX structures in the input. Foreign language learners, on the other hand, base their initial attempts on prototypicality rather than deduction. * The theory of pattern acquisition assumes that there is no deductive structure within the pattern set. There is, for example, no theory-given deductive link between the placement of non-finite elements at the end of the sentence when they are combined with a auxiliary and the fact that only finites occur in post-subject V2 position or in the subject-inverted structure. UG-expected deductive connections often fail to show up in foreign language learning, and this is what pattern theory expects. For example, the occurrence of clause-final non-finite elements in German ought theoretically to motivate a head-final VP and thus prevent non-finites from occurring in V2. But we know, that learners can consistently put non-finite verbs in clause-final position with a auxiliary and still put apparent infinitives in second position after a subject. Meisel, and later Eubank, have pointed out the difficulty this situation poses for UG-based deductive acquisition theory. Moreover, the well-studied learner Jose has apparetnly noted by this stage that non-finites do not occur in the inverted V2 pattern. If the subject-inversion V2 pattern, the SVX pattern, and the split-VP -- S Aux X V -- pattern are separately learned, then there is no "predicament" posed for the theory by the separate learning of the verbal restrictions in each case. Likewise, as mentioned already, pattern acquisition cannot provide any deductive link between the major main-clause patterns in German and the verb-final pattern of German subordinate clauses. (Or between the acquisition of particular verbal morphology and the acquisition of a word-order pattern, beyond the obvious fact that a pattern which depends on a morphological feature cannot correctly be acquired unless VPs can be categorized using that feature, but this is not deductive in any interesting theoretical sense). * The basic-level and prototype effects which are expected by pattern theory also suggest that there may be assymetrical structure within the pattern inventory. Certain patterns can be viewed as being derived from other patterns. One could do this formally with something like GPSG metarules or "metapatterns". In fact, I suspect that the "best" learner of German develops a grammar with two basic clause patterns, SVX and and the "split VP", viz. S Aux X V. A single "inversion" metarule derives the V2 inverted-subject pattern: it can in principle apply both to the SVX pattern and the split VP pattern. A separate metarule creates the Verb-final structure. Thus two basic patterns and two metarules derive the six common individual patterns. An additional metarule would front adverbials and non-subject constituents, so that the full inverted V2 sentence pattern represents the application of two metarules to a basic prototype pattern--combining inversion and fronting. An interesting possibility arises under this conception. If the learner treats SVX and S Aux X V as separate patterns and treats inversion as a metarule, a grammar can easily arise in which the inversion metarule applies to one basic pattern but not the other. In such a case, for instance, auxiliaries might show up inverted before the learner inverts with main verbs. Parallel differences could show up in the placement of negation if the formation of negatives is also by metarule. Indeed, the lack of parallelism in the elaboration of these two basic patterns (aux and non-aux) is relatively well-established. The UG theorist will describe this as difference in raising to functional categories and try to deduce it from UG-given basic properties of thematic versus non-thematic verbs, even gloatingly suggesting that the interlanguage resembles English, evincing a split in raising behavior present in neither the native nor the target language. However, such a forced account is not required, since precisely this split is already present in the German pattern inventory. Auxiliary sentences are a different patterns from non-auxiliary sentences in the systems of pattern-accumulating learners (though not, of course, in the UG-derived systems of native speakers of German). As an aside, It should be obvious that this conception of patterns and metarules connects quite well with developmental sequence work of the Pienemann type. While pattern theory itself is not a theory of processing complexity (and it not clear to me that processing complexity is the right way to explain much acquisition order), certainly pattern acquisition theory provides entities to which processing complexity views might apply. It is also related to the ideas of Clahsen and Muysken, although it does not attempt to describe learner German a "rogue" version of some other theory. Rather, pattern theory attempts to describe learner language as a consistent, even elegant system in its own right. * Pattern acquisition is in general conservative: unless one has actually encountered a pattern, it is not acquired. All patterns are either derived from experience with the target language or from concepts related to the native language. However, in many respects, as I have shown, the learner may seem to go beyond the input as viewed by the analyst or from the point of view of the L1 grammar--there may be "errors of commission", as Schwartz and Sprouse put it in their analysis of Cedvet. Or distinctions made which are not present in the native or target grammars. But, we need not be driven to saying that the learner has gone beyond the learner's own experience with the data (gone beyond the "intake", if you will.) * (In the spirit of conservatism), if two patterns are both compatible with the input, the more specific pattern receives the stronger reinforcement. In this way, progress can be made. In the German inversion example, the learner encounters instances of patterns with fronted elements, but since the conservative learner takes the more specific case as the more highly weighted one, progressive exposure (assuming noticing, as always) will eventually lead to the more specific pattern pushing out the less specific one. The system gradually tends towards "correctness" as experience increases. CONCLUSION Learner grammars are not messy, undisciplined deficient versions of "real" grammars, afflicted with deductive mistakes, clustering failures, periods of unexpected optionality, massive stage seepage, and the like. They are elegant disciplined systems in their own right, but governed by rather different principles from the core principles of UG.