Computational Mafia

skitter30 · Post Post #75 (ISO) » Wed Dec 18, 2019 4:49 am

Ya there's a bunch of things i'd love to look at if i had a workable dataset of votes, lynches, and flips; i have some ideas regarding scum voting patterns that i'd like to test

But i dont entirely know how to ~put together~ a dataset like that

You seem to have a p solid game plan.

When you say 'game archives', what is that referring to?
Like the op of each game that (ideally?) contains that info?

Or is that all compiled in one giant spreadsheet somewhere?

Psyche · Post Post #76 (ISO) » Fri Dec 20, 2019 10:03 am

my fking votecounter requires 32-bit python
how did i let this happen

Psyche · Post Post #77 (ISO) » Fri Dec 20, 2019 10:05 am

oh and here's an example archive: viewtopic.php?f=53&t=29549
had to do quite a bit of cleaning to make this something that doesn't create trouble but it's better than collecting it all myself

popsofctown · Post Post #78 (ISO) » Fri Dec 20, 2019 12:19 pm

Wouldn't it be cool if there was a thingy that would produce links to a player's last five completed games?

Seems possible if like a script perused post history and looked for user's name in vcs to rule out post game commentary, etc

Psyche · Post Post #79 (ISO) » Fri Dec 20, 2019 12:33 pm

would be easy to code w a cleaned up (universal) archive
maybe one day

Psyche · Post Post #80 (ISO) » Sat Dec 21, 2019 8:20 am

In post 76, Psyche wrote:my fking votecounter requires 32-bit python
how did i let this happen

this is the absolute worst

Psyche · Post Post #81 (ISO) » Tue Jan 07, 2020 7:10 am

ok i've figured out a fix that isn't "start all over"
it'll still require a bit of work but hopefully it'll result in a better product anyway
and an opportunity to try out some new coding practices

yessiree · Post Post #82 (ISO) » Tue Jan 07, 2020 7:02 pm

In post 78, popsofctown wrote:Wouldn't it be cool if there was a thingy that would produce links to a player's last five completed games?

Seems possible if like a script perused post history and looked for user's name in vcs to rule out post game commentary, etc

it wouldn't be difficult at all to put together something like that, scrape the posts from player X of their last Y games
even better tho it can be combined with Bob's deception classifier to analyze the tendency to which a player engages in deceptive speech as either alignment, and basically spits out a likelihood that X is scum when given Z number of new posts

Psyche · Post Post #83 (ISO) » Tue Jan 14, 2020 5:15 pm

found a replacement spellcheckerrr

chamber · Post Post #84 (ISO) » Tue Jan 14, 2020 5:30 pm

In post 78, popsofctown wrote:Wouldn't it be cool if there was a thingy that would produce links to a player's last five completed games?

Seems possible if like a script perused post history and looked for user's name in vcs to rule out post game commentary, etc

search.php?keywords=&terms=all&author=p ... mit=Search

Doesn't take much effort to already do this (approximately) with the search features that exist.

Psyche · Post Post #85 (ISO) » Mon Jan 20, 2020 1:25 pm

it's been almost a week have an update

i partly cleaned up my two main data sets - one the transition spreadsheet identifying where phase transitions happened throughout each game, and the other basically a computer readable and cleaned-up subsection of the mini normal archives already maintained on this site identifying every relevant player in the game (plus mods), their roles, and their fates
the cleaned up stuff covers only about ~300 mini normals and mostly only handled the prospect of missing information - not inaccurate
furthermore, i still need to get my votecounter working again

As mentioned before, the strategy for identifying errors is to try to combine the three data sources to complete a task that's hard to complete unless they're all accurate:

With the votecounter and the archive indicating which players are still alive on a given Day, i'll try to infer (for every day phase in my data set) where a hammer has happened.
If the next mod post after the hammer isn't the transition point my transition spreadsheet indicates for that Day, then there's a good chance of an issue in my data or code. Similarly, if my player archive and my votecounter disagree on who the hammered player is, that's also a red flag.

If I can successfully infer the transition post and hammered player from voting data and player information for every phase of every game in my data set, I'll proceed to also try to infer which team won the game and compare that against what my archive says (though nothing in my codebase can infer who got killed in a night phase, or handle vote-manipulating PRs, so the code will get some help from the fate-identifying part of the archive).

There could still be issues in the data set or code after achieving all this, but I imagine that if I expand the data set far enough the initial 300 games while trying to use the data set to do more ambitious things, I'll catch them eventually. I'll hire the filipino lady again and help her provide for her children.

And then a side effect of all this testing will be a cleaned data set tracking every vote that happened across all 300 games. That's when we can start the Great Vote Count Analysis. And then we'll find absolutely no reliable pattern in any of these votes even after appling fancy NLP tools to take "context" into account and I'll finally leave mafiascum.net forever. Oh but at least we'll also have a functioning votecounter that works across even old games w/ loose mods. And no one will use it. Well at least I'll be able to move on.

Psyche · Post Post #86 (ISO) » Sun Jan 26, 2020 11:28 am

fixed the votecounter - it can determine the person lynched on D1 across all 300 mini normals in my sample
it's kinda slow though?
i guess im ok with slow as long as it always works

still need to...
extend test to determine transition posts from post# of hammer
extend test to Days beyond D1
extend test to more games
convert test results into cleaned data set
plan out and start analyses

maybe i'll try to solicit ideas a little further down the line

gobbledygook · Post Post #87 (ISO) » Sun Jan 26, 2020 1:47 pm

Ty Psyche!

Psyche · Post Post #88 (ISO) » Fri Feb 07, 2020 8:03 pm

god i rlly want to work on this but i should wait i should totally wait

Psyche · Post Post #89 (ISO) » Sat Feb 08, 2020 1:46 am

i did a little work
solved the speed bottleneck
seems the votecounter is no longer perfect now that im using the new spellchecker
i might just switch back to 32bit and return to pyenchant again

EDIT: no the new spellchecker is just as accurate but much faster. my problems are even scarier.

Psyche · Post Post #90 (ISO) » Tue Feb 11, 2020 10:36 pm

Down the Line: Read Extraction

So the Great VCA is of course on the horizon, but we all know that people's votes are only a small portion of the information people produce in a typical game, and of the basis for most people's reads. Understanding Mafia requires engaging with the information in people's *posts*, but it's hard (just engaging fully w/ the posts in one game is a big effort!). People have historically managed this challenge by either focusing their analyses on specific cues/situations where manual coding/interpretation of each case is feasible, or by emphasizing global textual features like comparative post or wordcount while avoiding deep consideration of the content in players posts. Projects that train machine learning classifiers over large corpuses of gameplay text to discriminate alignment seems to be the most sophisticated-imaginable examples of this latter category of work.

To take content seriously without constraining research scale, I want to try leveraging current state-of-the-art tools for extracting structured, machine-readable representations of the information in text. NLP folk call this Open information extraction; tools scan over arbitrary sentences (a simple example might be "Obama was born in Hawaii") and using grammar rules extracts simple subject-object-relation propositions (like ['Obama', 'was born in', 'Hawaii']) much more amenable to automated analysis.

A natural extension of my VCA project that could leverage this kind of tool might be Read Extraction. Votes are themselves a good window into each player's beliefs and status at a given moment in a given Day, and in some contexts they might even be the best window. But they're a limited window - at their best they only represent a player's

biggest

scumread but more often they occur as part of a negotiation between perceived "viable" Day outcomes. Pairing our vote dataset with a reads dataset could enable analysis of how people [pretend to] form and act on their beliefs throughout mafia games that's far more extensive and robust than what might be achieved from studying votes alone.

Skeleton

How do we do that, though? Here's a broad skeleton for a potential Read Extraction tool:

Player Identification
. We need to reliably discern when someone's talking about another player. My VoteCounter already does this to infer the targets of people's votes when the target's exact username isn't mentioned. It already successfully negotiates the ambiguity in abbreviations, acronyms, misspellings and other issues. We'll need more than this - for example, we'll need to tell when players are referenced w/ quotes or replies, and will also need to apply coreference resolution tools to infer who people are talking about when they use pronouns like "He".
Claim extraction
. We'll leverage the best available OpenIE pipeline for extracting people's statements abut identified players and converting them into simple, machine-readable representations.
Read inference
. Finally, we'll have to infer from a person's claims about a player what their professed read about that player is. There are a lot of ways to go about doing this that each have their own precision/recall tradeoffs. Sentiment analysis won't be sufficient - people can have negative attitudes about townreads, and positive attitudes about scumreads. We do have the option to focus on explicit read announcements ("Psyche is scum!"), but we'd miss a lot of reads. It's the toughest part of all this.

No part of this pipeline is likely to achieve perfect performance: players can be identified w/ terms that have nothing to do with their usernames nor that can be inferred from surrounding text. Even the best OpenIE tool I'm familiar with has a substantial error rate. Even with perfect OpenIE, sentences expressing claims aren't even the only way people can convey reads: for example, people very often just post ordered lists. And there are countless ways to call someone scummy, and many of these ways are indistinguishable from other kinds of negative opinions - or even jokes/compliments.

I think I would prospectively focus first on extracting and building a dataset of machine-readable representations of as-close-as-possible-to-*all* mentions of and/or claims about players by players throughout every game I can get good voting data for. That's a substantive problem and in and of itself, and the dataset would be interesting in its own right too.

From there, we explore the challenge of classifying extracted claims, or maybe we choose a unique classification scheme to suit each of whichever research questions we decide are interesting.

Validation

The thing that's made this votecount dataset effort viable was finding a way to get quick feedback on design/implementation decisions. From there I could just incrementally improve my pipeline's success rate, making the project far more manageable. Is something like that possible in this domain?

This is where focusing on a close extension of my VCA project might be relevant. We expect votes to be a partial extension of a person's reads, so we can try to validate a read extraction pipeline against a player's votes. Using our existing vote dataset, we'll track when a vote seems to conflict or occur in accordance w/ the voter's detected attitude about the target. When the time comes to evaluate performance, we can produce discordance and concordance rates, or inspect particular examples of discordance to inform further development. We can potentially exclude votes that occur near the end of a game's Day to avoid the intrusion of social/viability concerns to sharpen the analysis.

There's basically zero possibility of a 100% concordance rate (people don't only vote their professed top scumreads!), and one wouldn't even 100% confirm the quality of the pipeline. But it's at least a strategy for highlighting potential issues in my pipeline and driving improvements.

Final Note

The most substantive challenge I've seen to efforts like the Great VCA is that it's fruitless if one doesn't take the

context

of votes into account. Supplementing our votes dataset with a reads dataset like the one I'm proposing seems a serious way to start doing exactly that. There's definitely other information in people's posts beyond indications of their attitudes about everyone else, but we're starting somewhere.

Psyche · Post Post #91 (ISO) » Wed Feb 12, 2020 8:38 pm

ive never been so proud of a post guys :(

popsofctown · Post Post #92 (ISO) » Thu Feb 13, 2020 5:17 am

It's a good thing you're planning out.

I'm skeptical that the inaccuracies from the limitation of the IE will be systematic enough for the data to be any besides just good good stuff.

Krazy · Post Post #93 (ISO) » Thu Feb 13, 2020 5:20 am

Might need to separate jester/bastard games?

Krazy · Post Post #94 (ISO) » Thu Feb 13, 2020 5:21 am

although maybe how often jesters claim jester would be a funny addition albeit much more limited pool

Psyche · Post Post #95 (ISO) » Sun Feb 16, 2020 12:05 pm

votecounter now at peak performance on D1
there are about 5 games over my initial 300 game sample that it just gets wrong (ugh!), but that's still a >98% success rate.
i have to see if that kind of performance extends to other days and other games, too, though

what i'll ultimately do is filter out games i can't get good results on according to my validation scheme
still, fixing as many of the errors i catch as possible is important for minimizing the errors i don't catch

big remaining to-dos:
- take power role interactions into account when processing votes and predicting lynches (especially doublevoters and dayvigs and N0s)
- extend votecountertest to predict phase transitions and test the dataset beyond D1
- add more games beyond my original 300, probably focusing on newer games and everything bob collected. god i wonder where tamuz's newbie dataset is. i know it's somewhere...
- fix all the problems in my dataset or votecounter uncovered from the above expansions
- save and share the whole dataset
- first gamut of analyses

and i gotta do it all before month's end