Changes in Verbal and Nonverbal Conversational Behavior in Long-Term Interaction

Daniel Schulman, Timothy Bickmore
College of Computer and Information Science
Northeastern University

Good morning, and thank you for your time and attention. Today I'll be presenting "Changes in Verbal and Nonverbal Conversational Behavior in Long-Term Interaction".

Overview

We model changes in verbal and nonverbal behavior within conversation openings in a corpus of multiple-conversation discourse…

Week 1	Week 2	Week 3	Week 4	Week 5	Week 6

…with the goal of generating realistic behavior in conversational agents for long-term interaction.

I'll just start off with the broad-picture overview of what this paper is about. We're interested in modeling verbal and nonverbal behavior in long-term multi-conversation dyadic interaction, which we take to mean multiple conversations, spanning some significant period of time, between the same dyad and addressing the addressing one overall conversational task. Our goal is to produce conversational agents which have behaviors that is engaging and realistic over this kind over long-term interaction with users, and I'll approach things mainly from that angle, but I hope some of this work will also be relevant to those in the audience interested in areas like behavior recognition and interpretation.

This paper is primarily an empirical and observational study. We collected a corpus of this kind of multi-conversation interaction, and have done some analysis of verbal and nonverbal behavior in that corpus. For this study, we focus specifically on conversation openings: the beginning of each conversation. I'll justify that approach in a little while.

Long-Term Interaction

Karen agent from Bickmore et al — (Bickmore et al. '10)

Conversational agents are often intended to increase both immediate rapport and long-term engagement…

… and are increasingly used in applications that require long-term interaction and multiple conversations.

Counseling
Education
Social Companionship

Companion agent from Smith et al — (Smith et al. '10)

Conversational agents — and I'll focus on embodied conversational agents, which attempt to simulate face-to-face conversation — have become a widely-applied technology (at least in research projects) for a variety of applications; there are a few examples listed here. One reason for that is that we hope that the affordances of face-to-face conversation can be leveraged by an agent to promote engagement and rapport. To do that, we believe, or at least assume, that we need to produce sufficiently realistic behavior, that matches how a human analogue of the agent would promote engagement and rapport in conversation.

Many of these applications put the agents in roles where multiple interactions between the user and agent are required. It's common for a counseling intervention, conducted by a human counselor, to last months, years, or longer — indefinitely. Given this, we are motivated to produce conversational agents that have realistic behavior in the context of long-term interaction.

Research Question

Are there systematic changes in verbal and nonverbal behavior in conversation openings that occur in across multiple face-to-face conversations?

Are differences in verbal and nonverbal behavior within conversation openings predicted by interaction history and the strength of interpersonal relationship?

Put simply, an agent's behavior in the tenth conversation with a user should be appropriate for a tenth conversation, in the context of all the previous interactions. If a tenth conversation looks different from a first conversation, an agent's behavior should reflect that, or risk looking increasingly unrealistic and loosing engagement over time.

We also argue that behavior in all tenth conversations may not look the same: there are a lot of different aspects of long-term relationship that may be changing over time that could be associated with changes on behavior. I'll focusing on two: Interaction history includes the number of previous conversations. Interpersonal relationship includes constructs such as trust and intimacy. These two are related but separate: we may see different behavior in a dyad with a long history and a close relationship compared to a dyad with a similarly long history and a distant or weak relationship.

As I mentioned before, we also look specifically at the beginnings of conversation. Some of our previous work in this area hints that where there are differences between conversations, those differences are particularly pronounced at the beginning of conversations. To the extent that differences in behavior are associated with participants' beliefs — or mutual beliefs — about their interpersonal relationship, the beginning of a conversation is where we expect those beliefs to be negotiated or communicated. So if you're looking for differences between conversations, and are going to pick a particular part of those conversations to examine, both of those suggest that openings are a good choice.

Background and Related Work

Behavior in Long-Term Interaction

There is evidence of differences in behavior predicted by interaction history and interpersonal relationship:

Observers can discriminate (audiotaped) conversations between friends and between strangers. (Planalp and Benson 1992)
Strangers use more explicit acknowledgements, with more head nods and mutual gaze. (Cassell et al. 2007)

Small changes in behavior (e.g. lexical variability) can have a significant effect on long-term engagement. (Bickmore et al. 2010)

There is some previous work that indicates differences in behavior predicted by prior history and by relationship. Much of the prior work is cross-sectional, comparing pairs of friends to pairs of strangers, for example. Cross-sectional studies are valuable, but have some limits in this area. Often, you can show that some change occurs, but it is more difficult to show a pattern of change over time. A trickier issue is separating change over time from differences between dyads: if, for example, we see a difference between friends and strangers, we could explain this as a change that occurs over time, but also as a difference that predicts whether people are more likely to become friends.

In the two items cited here, we see some differences between friends and strangers, both look cross-sectionally. Cassell et al. found a decrease or minimization in acknowledgment behavior in friends compared to strangers.

Finally, I note that we've done some previous work showing that fairly small changes in behavior — things like adding or removing some lexical variability, or switching between first-person or third-person language — can have a significant effect on user engagement when someone is interacting with a conversational agent for months or years.

Related Work: Rapport over Time

The nonverbal behaviors associated with strong rapport may vary over time.

Tickle-Degnen model of nonverbal rapport

(Tickle-Degnen and Rosenthal 1990)

Tickle-Degnen and Rosenthal give a model of rapport and nonverbal behavior that I want to pull out as background work, because it specifically addresses the idea of change over time. They suggest that which nonverbal behaviors indicate strong rapport vary over time, with those indicating positivity being more important in early conversations, while coordination is more important later on. Mutual attention is always associated with strong rapport.

Previous Work: Posture

Posture shifts occur less often later in conversations and the rate decreases faster in later conversations.

(Schulman and Bickmore, 2011)

We also have published two previous studies looking at changes in verbal and nonverbal behavior over time, both actually using the same corpus as in this study. We found that posture shifts occur more frequently at the beginning of conversations than the end, and that this decrease in the rate of posture shifts is significantly greater in later conversations.

Previous Work: Articulation Rate

Articulation rate is faster over time, both within and across conversations, but only for single words (with silence before and after).

(Schulman and Bickmore, 2010)

We also looked at changes in articulation rate, defined here in terms of a normalized duration of a word, not counting any pauses or silence between words. We found that the duration of some words decreased — words that appeared as a pause group by themselves, with silence before and after — and that these words were mostly acknowledgments and discourse markers, so broadly speaking the words that mainly had a conversation management role got minimized over time.

The Exercise Counseling Corpus

1st conversation	3rd conversation	6th conversation

We collected a corpus we call the Exercise Counseling Corpus: a longitudinal videotaped corpus of conversations. These are videotapes of weekly conversations between an exercise trainer and clients, with the trainer acting as a counselor to try and change the clients' attitudes about physical activity. These are in a laboratory setting, with participants recruited specifically for a study, but besides that we tried to make it fairly naturalistic. There was a real, not role-played intervention occurring, and a meaningful task.

The Exercise Counseling Corpus

Summary: Contents of the Corpus

6 clients (one counselor).
Up to 6 weekly sessions each.
32 conversations.
500.25 minutes of video.
101493 words of spoken dialogue.

We recruited 6 different clients, and a single counselor conducted all sessions. All clients, when recruited, were asked to come in once a week for six weeks, and all but one did come in every week. We have a total of 32 conversations, and a fairly large amount of video.

Interpersonal Relationship

We assess therapeutic alliance using the short revised Working Alliance Inventory (WAI-SR) after every conversation for both counselor and client.

Since we're interested partly in looking at behavior relative to the perceived strength of interpersonal relationship, we used a self-report measure of therapeutic alliance, which is a conceptualization of interpersonal relationship developed specifically in the context of counseling interactions. Advantages of this construct are that it has good, validated measures, and that we know it's meaningful: a strong alliance has been shown to significantly predict positive outcomes in counseling.

Both the counselor and the client completed a therapeutic alliance survey after every conversation. We see a fairly strong alliance, as illustrated in this plot — this is on a 1 to 5 scale — and we see a clear pattern of increasing alliance over time. Generally, this corpus contains examples of successful counseling and fairly good development of interpersonal relationship. It's weaker on bad examples.

All coding for nonverbal behavior was done in ANVIL, and looked only at the first minute of each conversation. The paper has more detail about the coding and related matters like inter-rater reliability. I'll skip those here for time.

Behavior Coding

Within the first minute of each conversation, we coded:

The proportion of time speaking
The number of gaze-aways during speech
The proportion of time nodding when not speaking
The proportion of time smiling or frowning
The proportion of time performing self-adaptors when not speaking
The proportion of time performing gestures when speaking
The proportion of time with eyebrows raised or lowered during speech

We looked at a fairly large set of different behaviors. Based on preliminary analysis, and on our previous studies, we saw no evidence of major differences that were observable within a single minute of conversation (although a larger corpus may certainly have turned up some), so all of these variables are aggregates over a video clip.

Our choice of outcome variables was based on a survey of the prior literature, trying to pick out those that might change across multiple conversations. For more detailed references, see the paper, but briefly: The proportion of time speaking is reported as a difference between friends and strangers, as is the use of head nods, specifically for acknowledgment. Smiling, the frequency and expressivity of gestures, and the use of eyebrows are all cited in the literature on immediacy. Gaze-aways are associated with immediacy as well, and at least one study reported an association between gaze-away and topic intimacy. Self-adaptors are associated with anxiety in conversation, so we might expect more in early conversations.

Predictors

Interaction History: # of previous sessions; final session
Interpersonal Relationship: therapeutic alliance (WAI-SR, lagged)
Interaction Role: counselor/client

And to look at how those behaviors change over time, we considered a number of possible predictors. First, looking at interaction history, we have the number of previous sessions. But we added a second predictor, which is whether this was the last session. This one didn't initially have strong justification from previous work or theory: primarily, we noticed while doing coding that the final sessions looked qualitatively different from earlier ones, and were easy to tell apart.

To look at interpersonal relationship, we have a participant's self-reported therapeutic alliance as a predictor of their behavior. Since we assessed this at the end of a conversation, this is lagged: it's the alliance reported in the previous conversation.

Finally, since the counselor and client have very different roles in this conversation, we looked at that as a predictor.

Models

For each behavior, we considered 4 sets of predictors:

A: History

# of previous sessions
final session

B: History and Relationship

as A, and also:

therapeutic alliance
alliance × sessions

C: History and Role

as A, but with each effect estimated separately for counselor and client

D: History, Relationship, and Role

as B, but with each effect estimated separately for counselor and client

Since we didn't want to start out assuming that any particular behavior would be associated with interaction history, or with the quality of the interpersonal relationship, we looked at four different models for each behavior, each picking a different subset of those predictors.

The first (showing A) just includes predictors related to interaction history: the number of previous sessions, and whether it's the last session. The next (showing B) adds the quality of interpersonal relationship: self-reported therapeutic alliance, and an interaction effect with the number of sessions. Finally, we look at variants (showing C and D) which add the participants' role in the interaction, allowing effects to vary separately for the counselor and clients.

Model Details

Mixed-effect regression, with per-dyad means normally distributed about the overall mean.
Counselor and client outcomes are correlated at the dyad and session level.
# of gaze-aways is modeled as a Poisson-distributed outcome with additive Gaussian overdispersion.
For other behaviors, the proportion of time the behavior is observed is modeled as Gaussian under an inverse logit transform.
Models are fit using Bayesian estimation with weak priors.
Deviance Information Criterion (Spiegelhalter et al. '02) used to select the best-fitting model.

This is all the gory details of the statistics, or at least some of the details: again in the interests of time I won't say too much about this, and refer to details in the paper, but please feel free to ask afterwards if there's any questions you have.

Mouth Shape

Both counselor and clients smile and frown less over time, but increase in the last sessions.

For mouth positions, our best-fitting model looked only at interaction history, not therapeutic alliance. We see the same trend for both the counselor and clients. Participants spent significantly more time smiling and frowning in the early sessions and this decreased over time. However, the final sessions look different. We see significantly more smiling and frowning in final sessions relative to the trend: it looks a lot more like an initial session.

Gaze-Away while Speaking

Both counselor and clients gaze away while speaking more frequently in later sessions, but decrease in the last sessions.

For gaze-away, we came up with a similar model. Participants gazed away from their conversation partner while speaking significantly more in later conversations. But again, the final sessions between each dyad look different: we see significantly fewer gaze-aways in those sessions, and again it looks a lot more like an initial session.

Nodding when not Speaking

Both counselor and clients nod more in early sessions when reporting low therapeutic alliance.

However, looking at nodding when not speaking — which is primarily nodding while the conversation partner is speaking — we have a more complex model that includes both interaction history and therapeutic alliance: participants nodded more in sessions where they had previously reported low perceived alliance, but only in early sessions. The difference attenuates over time. There's a trend toward less nodding in the last session, but it's not significant.

Other Results

Speech: The clients spoke significantly more in later sessions, and the counselor spoke less, except for the last session.
Self-Adaptors: The counselor used more self-adaptors in later sessions.
Gestures: The counselor used fewer hand gestures in later sessions, except for the last session.
Eyebrow movement: No significant trends observed.

The previous slides covered three of the seven behavioral variables we looked at, and those were the cleaner of the results. I'll briefly cover here the remaining ones, which are a bit messier.

For amount of time speaking, we see the counselor and client having trends in opposite directions. I should mention they're not becoming more even: the counselor starts off talking less than the clients, and the difference gets larger, not smaller. For adaptors and other hand gestures, we see significant changes only for the counselor. For eyebrow movements, we don't see much of anything significant at all.

Summary and Discussion

There are systematic changes over time, with most trends reversing in the last sessions.

Mouth	Gaze	Nod

This partially conflicts with earlier work: friends are reported to nod less than strangers. (Cassell et al. 2007)

To summarize the main results, we see systematic changes across sessions in three behaviors for both the counselor and client. In all of these cases, the trends reverse in the last sessions, which look much more like an initial interaction.

I want to note that our results are only partially in agreement with prior work. Tickle-Degnen and Rosenthal give a model of nonverbal behaviors and rapport which predicts that high rapport is associated with the communication of "positivity" in early conversations, but not in late ones. We see something similar, where both smiles and frowns are more common early on. We did a quick qualitative look at our video, and it was apparent that most of the smiles in our corpus are not Duchenne smiles, which are generally thought to represent felt emotion: so we conjecture that much of that is changes in how much people intentionally communicate positive and negative affect. This is only a slight modification, I think, from the notion of "positivity" to being very explicit about communicating appropriate affective responses in early conversations.

Where there's closer to a conflict with previous work is the results here on headnodding. It's been reported that friends tend to use less nodding than strangers, for acknowledgment. We see something different here: strong alliance dyads have less nodding than weak alliance dyads, but only in early sessions. One possible conjecture is that we're picking up early differences that are predictive of whether people become friends, rather than differences that would appear over time: the earlier work is cross-sectional, and can't make that distinction.

Behavior Generation

We compute "longitudinally adjusted" generation probabilities where:

\(p\): The base generation probability
\(s\): The # of previous sessions
\(f\): 1 if this is a final session, 0 otherwise
\(a\): The therapeutic alliance

\(\newcommand{\logit}{\mathop{\mathrm{logit}}\nolimits}\)

Smile or Frown	\(p' = \logit^{-1}(\logit(p) - 0.16s + 0.97f)\)
Gaze-away	\(p' = 1 - (1-p)^{\exp(0.2s - 0.8f)}\)
Headnod	\(p' = \logit^{-1}(\logit(p) + 0.06s - 0.44f - 0.28a + 0.06sa)\)

The last topic I'll discuss is implementing these findings in a nonverbal behavior generation system for a conversational agent. Our basic approach is to implement this kind of long-term changes as adjustments to the probability of generating some nonverbal behavior event: a smile or frown, or a gaze-away, for example.

We've implemented this on top of a rule-based nonverbal behavior generation system, but it should be workable on top of something that is based more directly on a machine-learning model, as long as it can output generation probabilities for behaviors. The disadvantage is that we have to make some untested assumptions: the main one is that our results apply additively with other predictors of behaviors.

Here are the adjustments that we make for the probability of generating a smile or frown, a gaze-away, and a headnod, based on the number of previous sessions, whether its the final session, and an estimate of the strength of therapeutic alliance. These come directly from the results of the observational study. There's also some adjustments, not shown on this slide, based on our earlier results on posture shift and on articulation rates.

Behavior Generation: Sample

"Okay, great. And how is your exercise going?"

	1st conversation, low alliance
	5th conversation, high alliance

This is showing an example of different behavior generated for the same utterance, both at a couple minutes into a conversation, but otherwise in very different contexts. You can see that in an early conversation, with low therapeutic alliance, we've generated a nod and some smiling. In a later conversation, these drop out, but we see an added gaze-away during the utterance.

Current Work

This model has been implemented in "Rhythm", a rule-based nonverbal behavior generator, and we are running a 6-week longitudinal randomized control trial comparing:

Changing	Static	Exaggerated
Behaviors are generated according to the model given here.	Behavior generation does not change over time.	Behaviors change 3 times as much as in the models given here.

Main outcomes: self-reported engagement and perceived realism.

That wraps up the work reported in this paper. I'll just briefly give some conclusions, and mention our current work and some possible future work. Currently, we're running a longitudinal web-based evaluation study to test the effect of implementing this model in a conversational agent. Participants interact with the agent once a week for six weeks, seeing either the behavior model I discussed here, a model where nothing changes across sessions, or an exaggerated model which basically just multiplies all parameters in the behavior generation slide by 3.

Our main outcomes are self-reported measures of engagement and perceived behavioral realism. I can't yet give you results of that, since it's still ongoing, but it'll be finishing up shortly.

Conclusions

We conclude that:

There are systematic changes in behavior in long-term interaction.
Behaviors are a complex result of multiple aspects of interpersonal relationship.
The context of long-term interaction should be considered both for behavior generation and interpretation.

In conclusion, we find that there are indeed systematic changes in verbal and nonverbal behavior that occur from the first conversation to later conversations, and that these changes are a complex product of multiple aspects of interpersonal relationship, including the interaction history and the strength of the relationship. Given these changes, applications that deal with verbal and nonverbal behavior in long-term interaction — whether dealing with behavior generation or interpretation — should consider carefully the context of the behavior, and avoid assumptions that behavior is unchanging across multiple sessions.

Future Research Questions

Can these results be generalized and extended beyond this corpus and scenario?
Are there systematic changes associated with other aspects of interpersonal relationship?
Do these systematic changes improve long-term realism and engagement in a conversational agent?
Do the differences observed in final sessions also occur in the presence of other changes to relationships, tasks, or roles?

I'll wrap up by giving some future work, and future research questions raised here. First, most obviously, we're interested in validating these results in a larger population, and in generalizing them beyond the specific scenario — the specific conversational task, kind of interpersonal relationship, and fixed schedule of interaction — present in the Exercise Counseling Corpus.

Finally, we're interested in following up on the finding that final sessions do not follow the trend of the previous interactions, and tend to look a lot more like a first session. This wasn't something we were looking for — maybe we should've been — and we conjecture that it happened specifically because all participants knew that it was the last session. Like the first session, this represents a change in their interpersonal relationship, in their roles toward each other. As the use of conversational agents moves more toward long-term interaction, it's likely that such changes will become more common, so we're interested in whether there are characteristic patterns of conversational behavior, and whether that is what we are seeing here.

Changes in Verbal and Nonverbal Conversational Behavior in Long-Term Interaction

Overview

Long-Term Interaction

Research Question

Background and Related Work

Behavior in Long-Term Interaction

Related Work: Rapport over Time

Previous Work: Posture

Previous Work: Articulation Rate

The Exercise Counseling Corpus

The Exercise Counseling Corpus

Summary: Contents of the Corpus

Interpersonal Relationship

Behavior Coding

Behavior Coding

Predictors

Models

Model Details

Mouth Shape

Gaze-Away while Speaking

Nodding when not Speaking

Other Results

Summary and Discussion

Behavior Generation

Behavior Generation: Sample

Current Work

Current Work

Conclusions

Future Research Questions

Thank You

Questions?