Examining the NAF meta

News and announcements from the worldwide Blood Bowl players' association

Moderator: TFF Mods

User avatar
Jip
Super Star
Super Star
Posts: 963
Joined: Mon Jun 06, 2016 1:26 pm
Location: Costa del Swindon

Re: Examining the NAF meta

Post by Jip »

This thread very quickly spiralled into a good example of other elements of the NAF meta, ironically! :o

Image

Reason: ''
Aspiring to improve on mid-table mediocrity, over in the SAWBBL.

Fancy an actual one-dayer? Check out The Coffee Cup.

Looking at attending your first tournament? Have a read of this.
User avatar
Dionysian
Emerging Star
Emerging Star
Posts: 360
Joined: Thu Jan 31, 2013 4:01 pm

Re: Examining the NAF meta

Post by Dionysian »

Like finding pearls in the muck, VM's posts almost always contribute positively to my existence.

Reason: ''
User avatar
sann0638
Kommissar Enthusiasmoff
Posts: 6609
Joined: Mon May 08, 2006 10:24 am
Location: Swindon, England

Re: Examining the NAF meta

Post by sann0638 »

Ideally Jip's picture would have been larger. Bit of fun analysis - are you making the raw data available anywhere?

(and yes, I appreciate the irony of me asking. We're working on a way of making the raw data quickly available directly from the NAF site)

Reason: ''
NAF Ex-President
Founder of SAWBBL, Swindon and Wiltshire's BB League - find us on Facebook and Discord
NAF Data wrangler
User avatar
Joemanji
Power Gamer
Posts: 9508
Joined: Sat Jul 05, 2003 3:08 pm
Location: ECBBL, London, England

Re: Examining the NAF meta

Post by Joemanji »

Image

Reason: ''
*This post may have been made without the use of a hat.
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: Examining the NAF meta

Post by dode74 »

Ok, a little (hopefully) constructive criticism. I'll take each of your steps one by one.

1. First off, you *do* know how to provide the ranges for 95CI for each of the win%s by calculating a margin of error of a proportion. The simple calculation gives the margin at 50%, and most of the values are pretty close to that mark so the margins will be slightly large but on the safe side assuming the sample is random. Thing is, we know it's not: the low number of games played potentially lends itself to a degree of bias due to coaching skill.
The second talks about correlation between win% and play rates. If your hypothesis is "observed win rates can be correlated positively with play rates" then it's a simple correlation which I think you *can* use just the mean for. That's because we're talking about observed, rather than underlying real, win rates having the correlation: if "people play teams which win more" is the hypothesis, then the result they see, the mean, is the relevant statistic. You could do that fairly simply in Excel: there's a Data Analysis toolpack you can activate by following these instructions. All you need to do now is be able to interpret the output to be able to tell if the results are significant or not. Because if they're not then the whole thing is tilting at windmills.
If you do that then you will be able to tell if Orcs are more of an outlier than other teams or not, but you'd then need to test your other hypothesis regarding Orcs and new players before declaring that you know a source of error. If you can identify that as a source of error then you may be able to account for it in Orc numbers and run through the rest of step 1 again.

2. What does "I set up the excel sheet to simply reflect the total number of opposing teams (and the average win-percentage against each race) - rather than the games that have actually been played" actually mean? Is it normalisation for composition (because that's what it sounds like you are trying to say)? How did you do it, exactly?

3. I think your argument about mirror matches is flawed. While it's probably easier to do today, "back then" wasn't exactly the stone ages, and removal of mirror matches absolutely could be done by choice. I also think your choice to normalise (step 2, if that's what you're doing) and *then* remove the mirrors will create a multiplicative error: WE play WE a lot (they are their own 2nd most played opponent) so you start by scaling down the factor then removing it, rather than removing it then normalising the other factors.

4. What "feature"? I know excel pretty well and what you describe doesn't look like a simple button press to me.

5. This looks like a rerun of step 2 to me. You normalised for composition there, and you seem to be doing the same again here.

Basically, it looks like you've used what you know to be flawed premises to come to a series of conclusions based on undetermined methods. Your conclusion may or may not be correct (a stopped clock is right twice a day) but if it is correct it's not due to sound methodology. Perhaps you'd care to share the spreadsheet itself if it is even vaguely transparent?

Reason: ''
User avatar
mubo
Star Player
Star Player
Posts: 749
Joined: Mon Dec 22, 2008 7:12 pm
Location: Oxford, UK

Re: Examining the NAF meta

Post by mubo »

Thanks for being constructive. As an aside, I think the tone of at least one other posts is pretty uncool, and would like to see a bit more hands on moderation.
dode74 wrote: 3. I think your argument about mirror matches is flawed. While it's probably easier to do today, "back then" wasn't exactly the stone ages, and removal of mirror matches absolutely could be done by choice. I also think your choice to normalise (step 2, if that's what you're doing) and *then* remove the mirrors will create a multiplicative error: WE play WE a lot (they are their own 2nd most played opponent) so you start by scaling down the factor then removing it, rather than removing it then normalising the other factors.
I disagree with you here. But I think it may be a miscommunication... I think what Plasmoid has done is this:

1. Compute win ratios (column A)
2. Remove swiss pairings, and do step 1 again (column B, expressed as a difference relative to A)
3. Remove mirror matches, and do step 1 again (column C, expressed as a difference relative to A (or maybe B, not clear actually))
Which seems pretty reasonable to me.

I think you absolutely have to remove mirror matches to get meaningful WRs, they will all necessarily be 50%, and will thus drag all results towards 50%.

Reason: ''
Glicko guy.
Team England committee member
User avatar
VoodooMike
Emerging Star
Emerging Star
Posts: 434
Joined: Thu Oct 07, 2010 8:03 am

Re: Examining the NAF meta

Post by VoodooMike »

nightwing wrote:You need to take a deep breath. You're coming off as someone who has an axe to grind here. You didn't agree with his post? Fine. Drop the pathetic insults and make a reasoned counter argument about WHY the methodology is flawed, rather than spiteful bitching.
I always have an axe to grind with people who try to pass off their teenage bra-fumbling as statistics. As for needing a counter argument about the methodology... he flat out states that he knows inferential statistics needs distributions, not descriptives... but then uses descriptives exclusively. Nobody needs to argue beyond that point - its done wrong from step 1, so steps 2 through infinity are inherently wrong. There's really no methodology beyond that - no work is shown, just vaguely referenced, and done knowingly wrong.

What YOU are doing is demonstrating that you, like Wile E Coyote, can't tell the difference between substance and empty air.
Rolex wrote:Facts never convinced anyone.
Facts convince the people who are worth convincing, for everyone else you just need to distract them with something shiny.
Rolex wrote:While advocating for science he ignores the sciences of the mind.... how ironic.
What you're really talking about is peripheral route persuasion, which is the domain of propaganda and marketing. Intelligent people don't need to be wooed into caring what the truth is, and everyone else didn't really care to begin, so they're easy to reroute.
sann0638 wrote:(and yes, I appreciate the irony of me asking. We're working on a way of making the raw data quickly available directly from the NAF site)
Hopefully more data than just roster and score, too. I'm not sure what useful information can really be gleaned from those fields alone.
mubo wrote:Which seems pretty reasonable to me.
Of course it does - because you don't know what you're doing either. Same way the vaccination/autism link seems totally reasonable to people who don't understand (or bother to read) the science. If, on any given topic, you don't really know what you're talking about, then you're not in a position to judge whether or not someone else does.
mubo wrote:I think you absolutely have to remove mirror matches to get meaningful WRs, they will all necessarily be 50%, and will thus drag all results towards 50%.
Given that the tier definitions set out by the BBRC - which is what the numbers were (improperly) being compared to - are based on the inclusion of mirror matches, the exclusion of them is either disingenuous or dishonest. Yes, mirror matches will push the value toward 50%, but that was true when the tiers were created, too. There are 24 rosters, and unless you're allowing compositional imbalances heavily influence your numbers, that shouldn't have so much of an effect as to warrant diverging from the original choice by the BBRC to include them.

Unless, of course, the NAF is looking to break from the BBRC's definitions now?

Reason: ''
Image
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: Examining the NAF meta

Post by dode74 »

mubo wrote:I disagree with you here. But I think it may be a miscommunication... I think what Plasmoid has done is this:

1. Compute win ratios (column A)
2. Remove swiss pairings, and do step 1 again (column B, expressed as a difference relative to A)
3. Remove mirror matches, and do step 1 again (column C, expressed as a difference relative to A (or maybe B, not clear actually))
Which seems pretty reasonable to me.
Given he doesn't have match-level data (to the best of my knowledge it's not at match level on his stated source) I don't think that's the case. Again, I'd like to see the spreadsheet.
I think you absolutely have to remove mirror matches to get meaningful WRs, they will all necessarily be 50%, and will thus drag all results towards 50%.
As mentioned above, if this is to have any meaning when comparing with BBRC tiering then mirror matches need to be included. Fag-packet maths suggests the difference is ~0.2% for a team which is running at 55% without mirror-matches.

Reason: ''
User avatar
rakkzul
Experienced
Experienced
Posts: 82
Joined: Mon Sep 12, 2016 12:19 pm

Re: Examining the NAF meta

Post by rakkzul »

Please guys, keep it civil. I, as a complete outsider in this topic, would like to read some constructive discussion and try to learn something from it
:wink:

Reason: ''
Image
User avatar
Vanguard
Super Star
Super Star
Posts: 922
Joined: Sun Jun 08, 2008 8:27 am
Location: Glasgow
Contact:

Re: Examining the NAF meta

Post by Vanguard »

VoodooMike wrote:Unless, of course, the NAF is looking to break from the BBRC's definitions now?
dode74 wrote:As mentioned above, if this is to have any meaning when comparing with BBRC tiering then mirror matches need to be included. Fag-packet maths suggests the difference is ~0.2% for a team which is running at 55% without mirror-matches.
I don't think Plasmoid mentioned the BBRC figures in his original post. He's looking at how much races/tiers need to be modified to push them all towards a 50% win ratio to get closer to a level playing field in tournaments. The BBRC tiering criteria were for bringing races into suitable performance margins under league play, it's not really relevant here.

Reason: ''
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: Examining the NAF meta

Post by plasmoid »

Hi Mike (and Dode, by all means :)),
I think I ought to clarify this:

Mike said:
Unless, of course, the NAF is looking to break from the BBRC's definitions now?
I'm not suggesting that the NAF should.
In fact I'm not saying that the NAF should do anything with this.
In fact, as stated I did this because a) I was curious and b) I might arrange a tournament with some tiered bonuses, based on this. Just like tons of other tiered tournaments. Like the EuroBowl.
Given that the tier definitions set out by the BBRC - which is what the numbers were (improperly) being compared to - are based on the inclusion of mirror matches, the exclusion of them is either disingenuous or dishonest. Yes, mirror matches will push the value toward 50%, but that was true when the tiers were created, too. There are 24 rosters, and unless you're allowing compositional imbalances heavily influence your numbers, that shouldn't have so much of an effect as to warrant diverging from the original choice by the BBRC to include them.
I'll be taking this in little bits. So:
Given that the tier definitions set out by the BBRC - which is what the numbers were (improperly) being compared to
No. I'm not doing anything with the stats that has to do with the BBRC tiers. This is not NTBB. I'm not talking about which ranges each tier covers or ought to cover. I'm not doing anything in relation to the magic number 45 or 55. I'm trying to work out what would move all teams closer to 50% - not as a number generated by the BBRC, but as a number indicating that you win about as much as you lose. And I'm doing it with the perspective of a local tournament. Yes, I'm using the word Tier. And I'm using it in exactly the same way as a lot of other tiered-bonus Tournaments. Like the EuroBowl, referenced by Rolo on page one, which Refers to Tier 2 teams as: Chaos, Chaos Pact, Human, Khemri, Slann, Necromantic, High Elf, Elf, Nurgle - i.e. not a claim that these teams are significantly below 45% wins.
are based on the inclusion of mirror matches, the exclusion of them is either disingenuous or dishonest.
I'd have thought that explicitly stating that I'm excluding mirror matches, and explicitly stating why would mean that no-one would accuse me of being secretly manipulative. And I'm not comparing my numbers to BBRC numbers. I'm arguing that this would be a better (non-BBRC) measure of a balanced performance.
Yes, mirror matches will push the value toward 50%, but that was true when the tiers were created, too.
In the past you have said yourself that including mirror matches in performance/balance would be crazy, and that if the BBRC did so knowingly then they were stupid (or some other derogatory term, I don't remember). Do you not still think so? Again, I'm not saying that the NAF or anyone should break with the BBRC. I'm saying that if I want to host a tournament with teams on more equal footing, as many tournamnets try to, using gut feeling, then results from mirror matches will not tell me anything.

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: Examining the NAF meta

Post by dode74 »

Ok, understood. But it seems to me you are doing different things here.

The first is the "more players play teams which they see as more powerful" hypothesis, which can be done but you have not done beyond eyeballing a couple of numbers. As I said before, you can carry out an analysis with excel. It's worth noting that there are plenty of other potential sources of influence on a coach's race choice other than just "is it a powerful tourney team?" Teams aren't cheap, either in cost of minis or time to paint them, so a coach is likely to be limited in his choices anyway, and, if he is a regular league player, he may be more influenced by league performance than tournament performance for example.

The second is an attempt to tier the races, which is something which can be done with cluster analysis - something Wulfyn did some time ago via PM with me. While explaining the concept he carried out a k-means analysis of that same data (up to July 2015) and came up with the following:
I ran a random forest initialisation point, based on 5 clusters, and the best fit was the following (with centroids in brackets):

T1 (55.26) : Wood Elves, Undead, Dark Elves, Lizardmen
T2 (51.87) : Amazons, Elves, Norse, Dwarves, Chaos Dwarves, Necromantic, Skaven
T3 (48.93) : High Elves, Slann, Orc, Humans
T4 (45.54) : Khemri, Chaos Pact, Nurgle, Vampires, Underworld, Chaos
T5 (33.91) : Halflings, Goblins, Ogres

Notes:

All teams have the centroid within their CI95.

The T1 4 teams came out in every initialisation point I did using 5 groups, so I'm very condfident they are in the top tier.

High Elves have the T2 centroid within their CI95, but the T3 centroid is closer so are a T3 team. The T3 tier is also quite small at only 4 teams.

Chaos Pact and Khemri often moved between T3 and T4, but generally always ended in T4.
He tried 4 tiers (as per the norm) but 5 seemed to work better. (I don't know if he carried out any removal of mirror matches or normalisation for composition, but I don't think he did.)

When you compare this to the commonly-used tournament tiers:
Tier 1: Chaos Dwarf, Dwarf, Wood Elves, Skaven, Norse, Lizardmen, Orc, Undead, Amazon, Dark Elves
Tier 2: Chaos, Human, Khemri, Pact, Slann, High Elves, Nurgle, Necromantic, Pro Elves, Vampires, Underworld
Tier 3: Halflings, Goblins, Ogres (collectively known as the “stunty” teams, who often get awarded a special “stunty cup” for the highest placed in a tournament)
You can see TourneyT3 equates to ClusterT5. TourneyT1 largely equates to ClusterT1&2, and TourneyT2 largely equates to ClusterT3&4. With the exceptions of Orcs, Pro Elves and Necro: Orcs are ClusterT3 and TourneyT1, while Pro Elves and Necro are ClusterT2 and TourneyT2.

So it seems we can use that tournament data to give tiers (with a nod to all the errors you've already mentioned), and it seems the tourney tiers being used are largely there or thereabouts, with the potential for refinement. What that means in terms of how to equalize tournaments for those individual tiers is a different matter entirely, and whether it would actually have an effect on race representation in tournaments is yet another.

Reason: ''
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: Examining the NAF meta

Post by plasmoid »

Hi Dode,
thanks for your constructive reply. If any of my replies come across as snippy or snotty, then it isn't intended - just remember that English isn't my first language.

Easy ones first:
Perhaps you'd care to share the spreadsheet itself if it is even vaguely transparent?
Sure thing. And as this was my first venture into Excel, ever, I think it is set up fairly transparantly, so as not to confuse myself any further. Let me know if you want to see it (or if my replies below make that irrelevant). Just PM me if you want it.
4. What "feature"? I know excel pretty well and what you describe doesn't look like a simple button press to me.
No no - it was a feature I built into my spreadsheet, allowing the performance numbers for each race individually to be modified by a specific number.

The rest is trickier. I hope I understand what you're asking.
1. First off, you *do* know how to provide the ranges for 95CI for each of the win%s by calculating a margin of error of a proportion.
I do. I didn't because I don't really use that first column for very much.
if "people play teams which win more" is the hypothesis, then the result they see, the mean, is the relevant statistic. You could do that fairly simply in Excel.
That would be my hypothesis, yes. But not one that I set out to investigate. I appriciate that you tell me how to do it, but it would take a lot longer for me to figure out and then do, than I'm willing to Invest. I'm sure someone could do the math very quickly if they wanted to, and the result would no doubt be very interesting.

As for Orcs being outliers: If we remove tier 3 teams from our hypothesis, and I think we should because anyone taking them is not influenced in his decision by a Desire to win the tournament, the the remaining teams 21 teams share 92,1% of the games played, so the should on average each be playing 4,4% of the games. I'm not sure how to do get the proper range for a random distribution here, in order to see if orcs fall outside that. I'm sure you could and very quickly. But I expect that orcs do fall significantly outside a normal distribution do at 8,8% (with almost 20.000 games played).

Whether this massive difference is due to newbies playing them more or something else as not something that can be answered, is it? I'm only pointing out that there has long been a theory that "newbies drag down the numbers for orcs" and there seems to be an unusually high number of orc players.
2. What does "I set up the excel sheet to simply reflect the total number of opposing teams (and the average win-percentage against each race) - rather than the games that have actually been played" actually mean? Is it normalisation for composition (because that's what it sounds like you are trying to say)? How did you do it, exactly?
What I'm trying to say is that I have the numbers for the games that were actually played. And that this is not random due to Swiss.
So I took the win percentages we have from those. Just the averages - like I said - I wouldn't know how to do the math with the ranges.
And then I multiplied by the number of opposing teams of that race.
So - (sloppy numbers here, not in the spreadsheet) - Wood Elfs playing 4000 imaginary gobbo opponents at 82% wins, 19800 Orc opponents at 62% wins, etc - to get what the Wood elf win percentage would have been like if they had faced "everyone".
I wouldn't know how to do the ranges on that calculation and I don't even know if I should, because these aren't actual games but just imaginary games.
3. I think your argument about mirror matches is flawed. While it's probably easier to do today, "back then" wasn't exactly the stone ages, and removal of mirror matches absolutely could be done by choice.
I didn't mean to make this about the BBRC anyway. But I've seen the data collected back when the BBRC decided on the tiers, and it didn't include information regarding opponnets. It was just a string of WDL for a ton of teams. So we didn't have the data. And we couldn't realistically have gotten it.
Be that as it may.
What I did was simply remove the (for example) Wood elf vs Wood elf stats from the above described calculation. As I just ran the Whole thing in the spreadsheet Again, I don't think I got a "multiplicative error" - but by all Means check the spreadsheet.
5. This looks like a rerun of step 2 to me. You normalised for composition there, and you seem to be doing the same again here.
Not quite.
The first time around, I used the actual number of teams (well, games, actually) played by each race out of the 223.000 games played.
The second time around I figured - if people tend to play the strongest teams - then a more level playing field would mean that they would no longer gravitate towards any particular teams (at least not for any reason that we know of). So, once again disreagarding tier 3 (as they are played for other reasons than taking home the trophy), I normalized(?) meta, meaning that I did the calculation under ste 2+3 again, but this time with all non-tier 3 teams being represented equally in the meta.

I hope that answers your questions.
If not, by all means ask again - here or on PM.
Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: Examining the NAF meta

Post by plasmoid »

Very interesting. Thanks Dode. Wulfyn is the man. And constructive on top of that.

Perhaps I should ask him if he'd be interested to do the same work, but with the removal of Mirror matches and Swiss pairing. Might be interesting, because both of those are confounds(?) to a teams actual performance/strength "in the wild". Not that they're the only confounds, but they're the only ones that can be easily isolated.
(I say "performance" here with reference to win percentage, not BBRC tiers).

Also, I know I haven't proven the hypothesis that power teams get played more. I think I fairly explicitly say that I wouldn't know how. So I'm not hiding that I'm just eyeballing. I do think it is fairly obvious that the bottom 12 teams get played a heck of a lot less than the top 12. And that Orcs is the whopping big exception.

Cheers
Martin

PS - as an aside: You say:
Teams aren't cheap, either in cost of minis or time to paint them, so a coach is likely to be limited in his choices anyway, and, if he is a regular league player, he may be more influenced by league performance than tournament performance for example.
While this can not be quantified, so surely resides in hell with all other non-math arguments, this is indeed one of the points in the theory that Orcs numbers get dragged Down. Because Orcs are in all the boxed sets, so "you" or "someone you know" most likely has an orc team, and most people have tried them out as part of learning the game, so most people feel like them know them a Little. Another argument (while anecdotal) is just how many times tournament organizers how to help the new guy build an orc roster for the tournament. It happens. A lot. IMO,

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: Examining the NAF meta

Post by plasmoid »

Oh, Dode, the starting point for my analysis (or whatever you want to call it) was, as stated, attending/examining the EuroBowl meta, where the rules have been the same for 2016 and 2017, and will remain almost unchanged for 2018.

Out of 272 teams, there have been roughly 30 of each woodies, Darkies, undead and Lizards. And then 26 Necro, who get a tier bonus. That got me thinking about what kind of tier bonuses might make for a more diverse meta, at a competitive environment like the Eurobowl.

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
Post Reply