The Great Vote Count Analysis (Pre-Discussion)

Psyche · Post Post #0 (isolation #0) » Mon Jan 27, 2020 7:47 am

Let's say hypothetically that I have my hands on a dataset tracking every vote anyone's ever made across hundreds or potentially even thousands of completed non-theme games. I've got each vote's post number, target, and maker, whatever. I even know which Day they happened. And for each game in the data set, I also hypothetically have all the stuff you usually see in a mod's OP - stuff like who won, each slot's role/alliance/fate, replacements, etcetera. I also of course have every one of each game's posts.

What are some useful or interesting questions that I can ask of this data? I know that people examine vote counts a lot to figure out a game - what are some ways we can ground these sorts of analyses in evidence?

An organized list of exploratory questions posed as of the linked post

Psyche · Post Post #6 (isolation #1) » Mon Jan 27, 2020 6:00 pm

can you elaborate on what that means

Psyche · Post Post #11 (isolation #2) » Tue Jan 28, 2020 8:24 am

well that's what the posts are for

Psyche · Post Post #16 (isolation #3) » Tue Jan 28, 2020 11:56 am

we really don't just have the votes, though
i've been learning a lot of text analysis techniques from my research; we can design analyses that are sensitive to context if we come up with the right questions

(also 50% is pretty good)

Psyche · Post Post #18 (isolation #4) » Tue Jan 28, 2020 10:17 pm

Relative to a base rate of like .28 for a 2/7 setup, for example, .5 would be great from just looking at votes.

But i don't really think collecting a bunch of statistics comparing town and scum rates of some behavior in a given situation is going to help anyone find scum in a real game. Even if one of these comparisons found some substantial factional difference on average, I dunno how'd that information would be applied in particular situations.

Think I will do these exploratory analyses, but am hoping I can try building and testing models of voting behavior that apply across situations. Maybe I'll start with a naive model that assumes every player randomly votes for whomever they don't know is in their faction, see where the data diverges from that, and go from there.

Psyche · Post Post #21 (isolation #5) » Wed Jan 29, 2020 12:44 am

wanna do an avatar bet about it? a one month avatar if in one month's time i dont public a dataset of several hundred games, half the proposed analyses posted so far, and some basic modeling?

Psyche · Post Post #26 (isolation #6) » Thu Jan 30, 2020 6:06 am

ive been obsessed w the modeling question you have no idea how many shitty wallposts ive drafted about it
gotta get the data first though right

Psyche · Post Post #27 (isolation #7) » Thu Jan 30, 2020 7:18 am

Is there a possibility of some EV model at the resolution of individual votes? We compute for each post the current EV of voting each player given the gamestate and the posting player's knowledge, win condition? It seems to me that if you don't include some model of town scumhunting and scum trying to defeat their scumhunting (which either vastly complicates things or comes out to a wash or both), then it'd just end up w/ town voting mostly randomly and scum voting for the biggest existing wagon on town.

It'd be really convenient for town if this had any ground to it, as it asserts observable differences between factional voting patterns even before roles get flipped - scum almost always jumping to the largest wagon around while town don't go out of their way to sheep at all. If we find that this model doesn't track scum voting patterns well at all, then it becomes way more important to consider their strategies for appearing town, or alternatively to suppose that maybe they have some other voting strategy unconcerned w/ deceiving town. The Charisma model discussed below, for example, asserts that scum will prefer to support the most "charismatic" townie's mislynch instead of whichever mislynch seems closest.

As another approach, consider RC/MathDino's Charisma Model of Mafia.

My assumptions:
Gamestates result from predetermined charisma levels, where charisma is defined as "ability to avoid being lynched".
Town will always attempt to lynch the least charismatic (scummiest) player. Town will never no-lynch.
Scum will always attempt to kill the most charismatic (towniest) town player.
PRs will claim at L-1. If uncc'd, they become most charismatic.
Scum will fakeclaim confirmable roles at L-1 in order to out TPRs, and will be lynched if counterclaimed.
Scum are aware of the charisma list, and will counterclaim PRs if they are more charismatic and if doing so would not lose them the game.

It makes predictions about how people vote over the long run, so our hypothetical dataset can conceivably evaluate this model's gains over more simplistic EV calculations (ones that assume random voting/lynching) when it comes to predicting how games really turn out. We have to come up with some way to quantify/infer ability (or perceived ability) to avoid being lynched - perhaps we look at each player's careerlong success, perhaps we try to discover what kind of posting is associated w/ success at lynch avoidance, or something else. Each metric is its own model, and we can evaluate each one.

I'm sure both these models are too simple to not be pretty shit, but they'll be a nice start maybe.

ok now it's out of my system for a while i hope pls lord

Psyche · Post Post #29 (isolation #8) » Thu Jan 30, 2020 11:01 pm

imagine ill just convert a link to #20 into an image or something

Psyche · Post Post #32 (isolation #9) » Sat Feb 01, 2020 2:36 pm

I'd definitely like to work out some performance metrics from all this.

(Should note it does require data for most completed games, though, rather than just an well-sized sample.)

Psyche · Post Post #33 (isolation #10) » Sat Feb 01, 2020 3:20 pm

https://wiki.mafiascum.net/index.php?title=Scumputer

we'll test this it'll be great

Psyche · Post Post #34 (isolation #11) » Sat Feb 08, 2020 4:06 pm

Found a way to port markdown over to bbcode, so a lot easier to produce simply formatted posts. Tried to organize and reflect on the proposed exploratory analyses so far.

Categories of Proposed Exploratory Analyses

Here, exploratory means there's not a particular model in the foreground driving the question. Instead, we're just producing statistics over the data set.

Doable With Initial Dataset

% of scum self-votes to town self-votes
% of scum first vote busses vs scum first vote town votes
a version of the above that excludes self-hammers
% of scum L-1s that are not hammered
% of scum hammers on town
% of scum hammers on scum
% of town hammers on town
% of town hammers on scum
how often scum bus (2x)
how often there's more than one scum on a wagon (2x)
how often scum vote right next to each other (2x)
average frequency of town posts
average frequency of scum posts
average frequency of town votes
average frequency of scum votes
how often do scum vote someone, vote somewhere else, then return to the original vote, and how often town do it in contrast
what % of chronological time during a day phase someone was voting no one?

Potentially Sketchy Without Full Dataset

For each player X belonging to all of the set of games find out...

What % of lynches with their vote on them as town flip scum
What % of lynches with their vote on them as town flip town
What % of lynches with their vote on them as scum flip scum
What % of lynches with their vote on them as scum flip town
"have i ever voted correctly"

Reflections

I'd had a category for analyses that might be technically challenging, but no one really proposed any like that just yet that wasn't just modeling. We may want to rework some of these questions to appreciate complexities in the data - for example, for some questions we might want to control for the influence of PRs or other sources of information besides player intuitions that might drive voting patterns.

The analyses above should be a great foundation for work on two kinds of substantive questions.

Most questions will help explore whether VCA in the systematic sense of the word is possible - whether, without even considering context much, interpretable rules can be applied to public voting patterns to help discriminate town from scum. Finding any alignment-discriminative results in these would be pretty surprising and a big deal, but even then just a first step toward the holy grail.
The output of RC's(/Schadd's) line of requests is useful for exploring whether mafia is more a game of skill or of chance. We discussed that prospect a bit in site chat; this NYTimes article by Daniel Kahneman is super relevant.

I think though that my small dataset of a few (or even several) hundred games won't be sufficient to satisfactorily address the latter question. So I'll probably focus on the other one for this month's deadline.

Psyche · Post Post #37 (isolation #12) » Sun Feb 09, 2020 8:03 pm

In post 35, DrDolittle wrote:the issue with studying these questions is that the system is fragile. I.e. if you find that with p = 0.01 scum votes right next to each other, the very nature of discovering this result will change how people play, and make the result irrelevant (or will quickly equilibriate to having no predictive power). That's also why (my hypothesis) that Elli's computer codes are kept hidden and he rarely if ever discusses findings. Either that or he throws everything into a huge Neural Network and doesn't ask why something happens but just gives an outcome as probability a player is mafia.

otoh i dont really mind if discovering something changes how people play per se and in fact will be interested in studying if gameplay substantially changes when new stats are posted to this subforum.

and on the other im interested in bigger game than particular statistics

Psyche · Post Post #38 (isolation #13) » Sat Feb 22, 2020 5:54 am

have to admit
it's gonna be a tough final week

Psyche · Post Post #41 (isolation #14) » Sun Feb 23, 2020 9:38 am

professor just delayed a assignment deadline to after spring break
get fked n

In post 40, Ankamius wrote:I'm most interested in how scum lynch timings effect winrate statistically but that seems outside the scope of this(?)

As in how often town wins with the first scum lynch at full plist, 80%, etc.

if it can be done w the dataset, it's game

Psyche · Post Post #43 (isolation #15) » Tue Feb 25, 2020 7:58 am

In post 35, DrDolittle wrote:the issue with studying these questions is that the system is fragile. I.e. if you find that with p = 0.01 scum votes right next to each other, the very nature of discovering this result will change how people play, and make the result irrelevant (or will quickly equilibriate to having no predictive power). That's also why (my hypothesis) that Elli's computer codes are kept hidden and he rarely if ever discusses findings. Either that or he throws everything into a huge Neural Network and doesn't ask why something happens but just gives an outcome as probability a player is mafia.

https://en.wikipedia.org/wiki/Goodhart%27s_law

let's test it; let's test goodhart's law

Psyche · Post Post #44 (isolation #16) » Tue Feb 25, 2020 11:41 am

To win my avatar bet, the analyses I'm gonna do before month's end are:

% of scum self-votes to town self-votes
% of scum hammers on town
% of scum hammers on scum
% of town hammers on town
% of town hammers on scum
average frequency of town posts
average frequency of scum posts
average frequency of town votes
average frequency of scum votes
how often scum bus

i pick these because they're especially easy to implement given my codebase. more complete/interesting analyses will come later.

Psyche · Post Post #45 (isolation #17) » Fri Feb 28, 2020 6:39 pm

https://status.shadow.tech/

i cant get to my computer for the final leg :(

Psyche · Post Post #48 (isolation #18) » Sat Feb 29, 2020 10:45 pm

1 more day double or nothing?

Psyche · Post Post #51 (isolation #19) » Sun Mar 01, 2020 12:25 am

qq i was gonna use that pagetop

Psyche · Post Post #53 (isolation #20) » Sun Mar 01, 2020 10:17 am

just a few last bugs i want to fix in the votecounter before i release things
am hopeful that it's tonight

Psyche · Post Post #54 (isolation #21) » Sun Mar 01, 2020 4:34 pm

i found more to do while working on the things to do and love my life

Psyche · Post Post #56 (isolation #22) » Sun Mar 01, 2020 8:29 pm

anyway looks like it's still at least one or two weekends away sorry guys

Psyche · Post Post #58 (isolation #23) » Thu Mar 05, 2020 8:20 pm

yeah ok

Psyche · Post Post #60 (isolation #24) » Sun May 03, 2020 4:19 pm

there is no release except death

Psyche · Post Post #64 (isolation #25) » Mon May 04, 2020 12:59 am

the coronavirus stuff put a lot of new work on my plate and i resolved to get ahead on that again before returning to this

but there was a broader problem too and it was that i was starting to do a lot of manual coding/review for a project focused on automating an otherwise overwhelmingly tedious/time-consuming workflow. i think i'll dig in again when i find a way over that hump.

Psyche · Post Post #85 (isolation #26) » Fri May 22, 2020 4:45 am

hmm
the semester is over today
so maybe it's good timing

Psyche · Post Post #87 (isolation #27) » Fri May 22, 2020 6:11 am

yknow i asked that question before and i only pretended to understand the answer

Psyche · Post Post #94 (isolation #28) » Fri May 22, 2020 2:47 pm

I can't reboot this weekend but I'll be putting this back on my radar

Psyche · Post Post #100 (isolation #29) » Thu Jun 18, 2020 5:57 pm

oh off and on

Psyche · Post Post #102 (isolation #30) » Mon Jun 22, 2020 2:48 pm

i can't tell yall how much i wish i had time for this

Psyche · Post Post #110 (isolation #31) » Sat Aug 22, 2020 6:07 pm

hope i get the opportunity to replicate this stuff

Psyche · Post Post #112 (isolation #32) » Thu Aug 27, 2020 6:23 am

stalled when the pandemic put tons of new work on my shoulders but
i might be free again in a couple of days

Psyche · Post Post #113 (isolation #33) » Fri Aug 28, 2020 12:20 pm

Starting this back again tomorrow. Gonna go for a timeboxing approach instead of picking specific dates to hit milestones. First step is just reminding myself what I have and what needs to be done, maybe revising some approaches based on strategies/techniques learned over the past few months.

Psyche · Post Post #115 (isolation #34) » Sun Aug 30, 2020 3:15 am

So I've run through the code and reviewed my notes.

My codebase is clearly enough written that I can remember what each part of it was

supposed

to do, but it doesn't look like I was careful when documenting new problems/gaps in the actual implementation. They're listed in my issue tracker, but descriptions are pretty terse: a given issue might get short summary sentence and a couple examples outputs proving something's wrong, and that's it. These notes are better than nothing, but the result is that I don't really know where in the 13 currently listed issues I should start, or remember what my approach would've been for addressing these issues. I'll probably have to retrace my steps through the project all over again before I can make changes confidently again. I'll come up with some better practices to avoid this work in the future.

I'll work on this again on Tuesday and then again on Friday or Saturday. Will maybe avoid updates until I, like, have actual new stuff to report.

Psyche · Post Post #117 (isolation #35) » Sat Oct 17, 2020 6:37 am

Okay, here's the plan for getting this project back into gear.

The idea is to follow yessiree's advice: scale down and focus my ambitions.

I've spent a lot of time trying to get my automatic votecounter to perform perfectly on every game in my development dataset. I'll accept that its performance now is probably close to the ceiling possible for my particular approach to the problem, and stop substantial efforts to improve the automatic votecounter.

Even though the votecounter doesn't perform perfectly, I do have a solid way to tell if the votecounter has done a good job: I can check if extracted votes accurately predict 1) who if anyone has been lynched in a given game Day and 2) the post number at which a given game Day has ended. Predicting both of these accurately doesn't guarantee that the votecounter has done its job perfectly, but it comes pretty close (I'll try to back this claim up with numbers at some point). So I can

validate

voting data collected for a particular game, even if I can't develop a votecounter that codes every game perfectly.

We never needed a perfect or near-perfect votecounter for this project. We just needed quality voting data collected over a large sample of games. So far I've been only using ~300 relatively old games to develop my votecounter. I'll collect/preprocess and run my votecounter over data associated with most or all completed games on MS instead. The votecounter will generate valid data for most of those. And if the pool of processed games is big enough, we'll have an adequately sized, validated dataset to support further analyses. And we'll always have the option to add data for further games if new games finish, or if we work out the issues with the votecounter's performance on the games where it has trouble.

I've always thought that the votecounting bit has been the hard part of this project. But this way we can make it a lot easier and get to the fun part a lot faster. And all I have to do is get over myself.