Browser Extension Vote-Counter (early demo)

Psyche · Post Post #1 (isolation #0) » Sun Sep 03, 2023 9:56 am

Cool. Do you think a page topper would be hard to add as an additional feature?

Psyche · Post Post #20 (isolation #1) » Sun Sep 17, 2023 10:14 am

How much dev or validation do you think the vote tabulator itself might need? Can you remind of the basic strategy it implements?

Psyche · Post Post #26 (isolation #2) » Mon Sep 18, 2023 6:09 am

ok great!
i think i want to contribute along those lines if you're open, as i've done a lot of the legwork already in my various abandoned projects.
i'm willing to make sure it's in the right language and style too.
i think all i'd need to set things up is the initial implementation for the part of the code that parses votes
id implement a measure of parsing performance based on ability to predict lim outcomes across past games, setting up an efficient way to consider alternative parsers

Psyche · Post Post #29 (isolation #3) » Mon Sep 18, 2023 6:48 am

thanks. will be somewhat slow-going bc of what i'll have to learn but i'm legit happy for the excuse

Psyche · Post Post #46 (isolation #4) » Sun Sep 24, 2023 4:43 pm

The broad idea behind the validation framework is to bootstrap evaluation of votecounters using info recorded in the site's various game archives.
The game archives tracks for a lot of games who was eliminated each day, and eliminations (usually) depend on votes.
Therefore a good votecounter, given each thread mentioned in the archive and a post number initiating each day, should be able to reliably predict who was eliminated those days.
Not necessarily all of them -- mods can make weird calls or flat-out mistakes, PRs and other shenanigans can mess with hammer conditions, and aliases (like alt or irl names) might have nothing to do with a slot's username.
But still, most of them.

So what I did was take archive data, scrape applicable games, and set my votecounter to predict D1 outcomes across all of them.
Wherever there were errors, I checked them out. I ignored errors that could not be addressed with a fix to my votecounter, and tabulated -- and eventually fixed -- errors that potentially could.
Once errors were addressed enough for D1 outcomes, I similarly generated predictions for successive days.
Frequently, I needed to manually annotate D2+ post number start positions when long twilights messed with votecounting, but this also helped identify cases where the votecounter accurately predicted a lim, but thought hammer happened earlier or later than it really did.

So this method wasn't a perfect way of testing and identifying gaps in the votecounter, but it saved a lot of time compared to the hypothetical alternative of manually coding the target of every vote across games on the site and using those as test cases. And remains usable for new votecounter implementations that might be introduced in the future.

The votecounter was a grab-bag of different techniques. The validation technique confirms it works well, but because of the feedback-driven process behind how I made it, it doesn't have some basic principle behind it.

Psyche · Post Post #59 (isolation #5) » Tue Oct 10, 2023 12:24 pm

I think I'll be able to work on validation stuff this weekend. It'll probably start by rebooting my "Great VCA" effort. Imagine that I should re-scrape some things to check how much the site update broke my codebase.

Psyche · Post Post #61 (isolation #6) » Sat Oct 14, 2023 1:11 pm

aw what happened

Psyche · Post Post #63 (isolation #7) » Tue Nov 07, 2023 4:11 am

today's the day I actually get started lol

Psyche · Post Post #90 (isolation #8) » Sun Jan 07, 2024 6:25 pm

i can imagine the potential issues but feel like it’d be ideal would be something close to what mods already include in their OPs - or, say, something suitable for said inclusion. maybe can use the archive threads as a reference if it’s worh the trouble; they include most of these details in a structured but plain languagy format except alias.

Psyche · Post Post #93 (isolation #9) » Mon Jan 15, 2024 5:39 pm

ive gotten pretty handy at javascript/typescript now, so that's one barrier to entry gone for me to actually do things
but yknow, i'm probably just blowing hot air

Psyche · Post Post #102 (isolation #10) » Mon Feb 12, 2024 9:15 pm

am sure to drift in and out but would like to fit in some time to start contributing and helping keep momentum. would also help me build my comfort w/ typescript and get ideas for my other projects.
first step is probably checking if i can set up the dev environment by following your instructions
in the meantime do you think there are any "good first issues" you could add to the repo that could get my feet wet w/o disrupting your other plans?

Psyche · Post Post #116 (isolation #11) » Wed Feb 21, 2024 11:48 pm

i am vry confused
maybe i stick to pythonland after all

Psyche · Post Post #125 (isolation #12) » Wed Apr 03, 2024 8:56 am

ooh ahh

Psyche · Post Post #130 (isolation #13) » Thu Apr 11, 2024 8:16 pm

i have a suggestion!
a vca tool

Psyche · Post Post #132 (isolation #14) » Thu Apr 11, 2024 9:50 pm

well i actually think it's a natural extension of the votecounter. mvp is presumably just generating more than one votecount and assigning font colors to names

Psyche · Post Post #135 (isolation #15) » Thu Apr 11, 2024 10:14 pm

yeah its a very interesting question

Psyche · Post Post #140 (isolation #16) » Sat Apr 13, 2024 10:47 am

feel like a tool that collects votes across a thread and puts them in one place is still the "showing players information about the current game state" space but yeah the frontier approaches

Psyche · Post Post #142 (isolation #17) » Sat Apr 13, 2024 5:06 pm

Another option is to try making the votecounter more powerful!

Have been gradually building a tight pool of test cases for votecounters based on examples from games I scraped years ago.

Wonder if your matcher can correctly match the vote in this post without manual aliasing:

Spoiler:

Here's code with the playerlist and HTML of the relevant fragment of the post:

Code: Select all

def test_match_vote_by_abbreviation():
    """
    Tests that a vote for a player by abbreviation is counted.

    Relevant phases:
    1098, D1: https://forum.mafiascum.net/viewtopic.php?p=2695031#p2695031
    """
    players = ['EmpTyger', 'Nul', 'Substrike22', 'Llamarble', 'Pinewolf', 'Amor', 'Scott Brosius', 'Internet Stranger', 'themanhimself', 'Guderian', 'RobCapone', 'Shattered Viewpoint', 'chkflip', 'WeirdRa', 'brokenscraps', 'Kingcheese']
    post_content = '<span class="bbvote" title="This is an official unvote.">UNVOTE: Amor</span><br><span class="bbvote" title="This is an official vote.">VOTE: TMHS</span>'
    votes = list(VoteParser(players).from_post(post_content))
    assert votes[-1] == 'themanhimself'

A simple distance metric seems to prefer to match TMHS to a different slot, such as Amor.

Psyche · Post Post #145 (isolation #18) » Sat Apr 13, 2024 5:51 pm

Right!

A key problem with this example is that TMHS's name doesn't have punctuation one could use a shortcut to match the acronym. You need something that knows enough english to predict how people might carve a username like "themanhimself" into units for an acronym. I've found that there's just enough knowledge in a basic dictionary to pull this off most of the time.

I used a spell-checking library called enchant.
It initializes with a dictionary and returns a boolean based on whether your string appears in the dictionary.
A function I made called "english_divides" considered every possible substring of each player's username and checked whether the substring was an english word or not, and stored every possible segmentation of the player's username based on english-word boundaries.
This also allowed for nonsense strings to occur in a segmentation.
For example, a username like "pprtown" might have among its splits '["ppr", "town"]'.

I could use these segmentations to propose candidate abbreviations. For example, the above segmentation could suggest an abbreviation of "pt" or "ptown".

From there, looking for exact or near matches among abbreviations generated from the most conservative segmentations (fewest separate chunks) while also minding the possibility of ambiguous abbreviations could consistently assign a broad swath of votes like TMHS -- and even reasonable typos I wasn't even intending to cover.

All this seems just as doable in a typescript project -- if it's worth doing. Things do get messy pretty quickly when you try to account for edge cases.

Psyche · Post Post #171 (isolation #19) » Sun Apr 14, 2024 3:32 am

In post 148, yessiree wrote: For votes with ambiguous targets, I lean more towards flagging them to be checked manually rather than trying to parse the intended target using NLP. Reason one is that configuration (usernames, aliases and replacements) are needed for each game anyway, reason two is that if the string parsing algo is not 100% accurate, you dataset ends up with some incorrect data, and reason three - NLP is hard!

I imagine the process would be like
- configure basic settings (usernames, replacements, common aliases, etc) for games from which to collect data
- collect votes -> get a list of votes + a list of flagged votes
- go through the flagged votes one by one and add aliases as necessary
- repeat step 2 until you have no more flagged votes

a few counterpoints:

- I think this underestimates how much of a slog getting to a complete configuration and handling flagged votes really is -- especially over many games -- when common abbreviations like "NMHS" so easily leak out. IMO if the votecounter doesn't "just work", you're going to see / are seeing relatively slow adoption, particularly among more experienced moderators already happy with how they do things.

- The string parsing algorithm in JV's repo is already not 100% accurate. This is already the default mode we're in. You're probably not in favor of just removing it though, right?

- It's possible to have both flagging

and

a smart-but-inaccurate matching algorithm. For example, you can flag all votes that don't perfectly match defined aliases, but suggest a matching during user review that can be given a quick up/down rating and trigger stuff like an update to the alias pool. You can propose aliases during configuration to save users the trouble of brainstorming them themselves. And so on.

- NLP is hard, but also fun. And also it's not that hard. For example, checking strings against a dictionary is not hard. Applying a string distance function is not hard. (And frankly I think most of the hard work is already done anyway.)

- Solving this kind of problem opens a lot of downstream possibilities that from our current vantage point look infeasible. For example, most robust analyses of a game can't stop at looking at votes, and need to pay attention to all interactions between players. Right now, tackling that looks like a pipe dream! But mapping these interactions gets much more plausible with a robust solution to the username matching problem.

Once I'm done refactoring my current implementation and its tests I'll get more specific about how it could fit into a context like this where it's very important that no false matchings get into a generated votecount without an alert.

Psyche · Post Post #173 (isolation #20) » Wed Apr 17, 2024 7:13 am

Am thinking you make a good point. I think I'll still push for something powerful enough to pick up particularly common abbreviations like in the example I shared (mostly because I have something close and have a stronger need for something close), but otoh will avoid getting bogged down by rarer issues most easily handled by player-specific aliasing.

Even for research purposes that do require processing a very large pool of games, I agree we only really need a tool powerful enough to make it feasible to collect good dataset just once, and to confirm to us that it is good data (i.e., flagging uncertainties).

Psyche · Post Post #174 (isolation #21) » Fri Apr 19, 2024 1:19 am

have found some edge cases that might defy even a flagging-focused method if flagging only highlights ambiguous names

maybe most salient is broken quote tags that allow a quoted vote to look like the poster's vote

broken vote tags are also a factor, but easier to refuse by policy

might suggest as a safety measure having the votecounter by default flag any post with any signs of broken tags

Psyche · Post Post #176 (isolation #22) » Fri Apr 19, 2024 1:30 am

Some other cases that might be tough to flag:
- Bold tags with mix of natural language and potential votes (rule to have votes in their own line has been used to avoid this)
- Mispellings or stylizations of "vote" (or unvote) within bold tags
- particularly salient example of the ambiguity here:

Hammer: Psyche

Could require vote tags, but the more rules like this you add, the greater the possibility that someone misinterprets a moderator-ignored vote as a real vote and does something that violates game integrity. But there are ways to reduce that risk too -- like constant reminders of vote counting policies.

True solution is of course a complete separation of vote and post UI.

Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)

Re: Browser Extension Vote-Counter (early demo)