NTBB: Stats

Got some ideas for rules? Maybe a skill change or something completely different!!! Tell us here.

Moderator: TFF Mods

Post Reply
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

NTBB: Stats

Post by plasmoid »

Hi all,
Darkson has asked the discussion about stats to be moved. And rightly so.
I still intend to reply to both Stheve0 and VoodooMike, but progress is likely to be slow.

First: Hitonagashi
Thanks for understanding where I'm coming from :)

You said:
I mean, it bugs me the way Plasmoid tests the rosters (as, without equal strength players, the 'feedback' will be heavily skewed), but I can't see a better way.
Living in the real world neither can I.
I did try to get enough 'equal strength' coaches to create an 'elite' closed playtest tournament, next to the open-to-all league that I have going.
But how do you measure 'equal skill' anyway.
I couldn't find quite enough elite coaches, so I had to settle for less.
I did set up the tournament with all the 13 tweaked teams playing against all the 13 untweaked teams once each - to see how the tweaked teams would fare against pure tier 1 opponents.
But doing that, I obviously won't be able to get the kind of data that would be statistically significant.

I don't know how many thousands of games that would require anyway. But it's obviously not an option. (And if it was, it wouldn't be 'equal skill' anyway).

So what I'm doing is very explicitly not just looking at the numbers. That would be nonsense.
At the end of 2013 I'll be gathering feedback from participating coaches. Heresy, I know, but that's my only option :)

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
User avatar
spubbbba
Legend
Legend
Posts: 2267
Joined: Fri Feb 01, 2008 12:42 pm
Location: York

Re: NTBB: Stats

Post by spubbbba »

Well with such a small sample size you could largely ignore the stats and concentrate more on opinion.

If a team has 48% win rate from hundreds of games but goes up to 65% from 20 games then I don’t think that would be enough to say they were now too powerful. But if there was a lot of consensus from experienced coaches playing with and against the new roster that they were too good then that would be more useful. However if they have something ridiculous like 85% win rate then that would require more analysis. Of course the coaches and races involved will matter moreso in smaller sample sizes.

For instance I like the NTBBL Khemri much more than the crp version. Break tackle on the guardians gives them a more interesting playstyle. I don’t believe they are overpowered but have lost to them the 2 times I faced them. Both games were close however and my teams did have some pretty bad armour.
Oh and if Setekh is getting thick skull then I think all skeleton stars should have it.

Reason: ''
My past and current modelling projects showcased on Facebook, Instagram and Twitter.
User avatar
VoodooMike
Emerging Star
Emerging Star
Posts: 434
Joined: Thu Oct 07, 2010 8:03 am

Re: NTBB: Stats

Post by VoodooMike »

plasmoid wrote:Living in the real world neither can I.
Without this meaning to be as insulting as it will sound, the problem isn't that you live "in the real world" its that you don't understand statistics well enough to use it for your stated design goals. While changes that are based solely on "how you like it" are going to be hard to predict a specific outcome on given the existing analyses that has been done, the direction that various rosters need to be pushed in, in order to "narrow the tiers", is not something that needs to be done on feel. We know where the inequity lies, at various TV levels.

Likewise, a good chunk of the changes you're making have nothing at all to do with narrowing the tiers - they're just things you like better pushed through with the claim that its for a righteous cause. I'm very much in favour of eliminating the tiers, or at least dealing with the inequity among teams at various levels, because it makes BB a game more likely to have wider appeal... but the title of your changes reminds me of religious groups that push their doctrine as "family values".
plasmoid wrote:I did try to get enough 'equal strength' coaches to create an 'elite' closed playtest tournament, next to the open-to-all league that I have going.
But how do you measure 'equal skill' anyway.
This is a red herring - the problem is not that you can't find enough people to test your changes, it's that you can't justify the changes themselves in terms of actually working toward narrowing the tiers, because you admittedly did not use any of the widely collected data on win% among the rosters when you made those changes. It's monkeys on typewriters - you're saying you can't find enough people to check if they've written Shakespeare while the folks you mentioned above are saying "stop getting monkeys to write your plays".
plasmoid wrote:But doing that, I obviously won't be able to get the kind of data that would be statistically significant.
Of course it won't... which begs the question: why do it? There is, however, plenty of statistically significant data on the topic of roster win% available, and what baffles some of us is how you can say you're bringing the win%s closer to one another when you don't use the actual win% data.
plasmoid wrote:So what I'm doing is very explicitly not just looking at the numbers. That would be nonsense.
At the end of 2013 I'll be gathering feedback from participating coaches. Heresy, I know, but that's my only option
No, what you're very explicitly doing IS nonsense as far as your stated goal is concerned. If you stopped calling it NTBB and renamed it "plasmoid's preferences" then that'd be accurate, though we both know it'd be more routinely rejected by people who only listen because they think there's data and legitimate logic involved in the design of the changes.

It's also not your only option - there's no actual deadline, just one you've decided on yourself. It's a bit like when dode74 says tells people he's just enforcing the rules on the cyanide forums, despite the fact that he wrote those rules - the separation between your supposed obligation, and your personal whim, are an illusion meant to lend legitimacy to how you want to do things.

Reason: ''
Image
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi Shteve0:
Oh dear... I came on here to be conciliatory and balanced, and it probably doesn't read that way.
Never mind. I hope that buys me a little leeway with the tone also.
Holding on to that assertion and then dismissing actual performance data, even selectively picking what feedback 'feels right' and what doesn't, is a nonsense that assumes that one person's gut feeling is better than the combined gut feelings of the design team. If the argument here was that the NTBB was designed because they didn't like one or other central tenet of the LRB design, that's one thing.
Right, so one of the accusations here is blasphemy? :wink:

As you ought to have known by now, the argument is that I disagree with one or other basic tenet of the LRB design. And that these are suggested house rules for those that do the same. I think the website makes a rather clear distinction between the CRP+10 list, and the remaining NTBB roster changes in this regard.

To wit:
*I disagree that the win-percentage that the tiers are based on should be based on a team's lifetime average. If a team starts very weak and finished very strong, it may well appear balanced over a 30 game stretch, but if you play 60 games, you have 40 games of "very strong" rather than 10. In the same vein - but more importantly - teams that start very strong but have a weak finish will also be perfectly balanced in the CRP-design perception, but to my mind is quite unbalanced, because lots of leagues only play those 1st 10 games and then start over. (and because tournaments are closely related to those first 10 games).

*I also disagree with the tenet that it is awesome that both tier 2 and tier 3 are so wide. I know this is in contrast to both JJ, (most) of the design team, and a lot of BB players. NTBB is a set of house rules for those that agree with me on this
but yes, for what it's worth I'm concerned at your assertion that you're somehow "improv[ing] on CRP" through your roster changes
I think it quite obvious that by "improving CRP" I specifically mean changing CRP based on the 2 disagreements above. I.e. moving CRP towards the goal of NTBB. That's an improvement to me and anyone who agrees with the premise of the NTBB. And naturally not an improvement if disagree with what NTBB is about.
but it seems that certain statistical evidence was dismissed in favour of the stats not matching what the author(s) felt was correct, and other teams were nerfed or buffed based on personal preference, which we don't agree with.
Just to be clear - later changes to NTBB have often been in opposition to my personal preference.
But yes, I have listened to other people.

For example, you campaigned heavily that the G-mummies made undead too good. So I took that to the Daventry league, and heard something similar. I even made the grave mistake of not waiting for statistically significant data 8)

But back to your mean point, which you make multiple times, all with a hint of accusation, such as here:
Meanwhile, the stats he chose as a supporting argument for the tweaks indicate that the Necro and Pro Elves enjoy excellent win% in the range he's looking to normalise, but skipped the nerfs because he felt they weren't as good as the data suggests (!)
Actually, I though I explained that rather thoroughly a few pages back, and it seems to me that you didn't try very hard to understand what I wrote.

So let's look at it again:
The stats I showed you were the playtest stats for LRB6 (with an infusion of LRB5 stats for teams that had not been changed between editions). It is very important to understand, that a few things playtested for LRB6 got turned down, and hence the stats for those teams are not valid for CRP.

Specifically, Necro had a spin with Golems at 100K, which didn't make it into CRP. So, removing those LRB6 stats, necro are at 44.84%. Not 60%. As I explained.
(Other teams that had such tainted data were halflings (where 50K Master Chef was tested and discarded) and Underworld (where 50K bribes were tested and discarded)).

That puts Necro roughly 0.2% short of "BBRC tier 1" and Elfs just 0.2% above it. So, I didn't (as you claim) "skip the nerfs because he felt they weren't as good as the data suggests" at all. As I also explained. I skipped acting on that because I think it would have been foolish to make a change to the rules on something as flimsy as 0.2%. Or to put it differently, back in the day I figured my "margin of error" would simply be standard math rules for rounding up and rounding down.

But let's look at the stats then (with halfling and necro changes revoked, as mentioned above. For Underworld, my data have no untainted stats, but if I do a straight average from FOL and Box, I get a ballpark number):
Wood 56.52
Unde 56.47
Dwar 56.16
----------
Elf 55.20
Liza 53.16
Nors 52.55
Chao 52.01
Skav 51.93
CDws 51.72
High 51.31
Dark 51.07
Amaz 49.35*
CPac 48.81
Orc 47.03*
Slan 47.01*
Khem 46.88*
Nurg 46.07
Humn 46.05*
Necr 44.84
----------
Vamp 42.75
UndW 42
Gobs 36.81
Ogre 32.42
Half 19.21

So, the controversial ones are the 5 teams marked with an asterix.
Khemri and Humans I won't defend from a NTBB mindset. They are part of the CRP+10 list that have backing for further playtest by Ian, Galak and Babs. They are narrowing within tier 1, but not because of any urgency. That leaves 3 teams.

With Orcs I'll have to depart from the stats. Not due to any urge to cheat or lie - but because the stats logically don't apply, and ignoring that would not make sense. The reason being that CRP+10 contains a nerf to Claw and CPOMB not reflected in these stats - and orcs are arguably hit very hard by cpomb in CRP. I can't produce hard stats for that, naturally, but in trying to consider the bigger picture I can look at the data that Koadah had collected from the Box-LRB4 play, i.e. pre-CRPClaw:
http://www.cmanu.pwp.blueyonder.co.uk/b ... stats.html

While it does not have Orcs above the 55% mark, they are the number 1 team of all.
I admit that perhaps I should have left them alone, but in trying to pre-empt a ripple effect from the cpomb nerf, I did err on the side of caution.

So, potentially a problem for NTBB. If I ever had the chance (and I most likely won't) to gather tons of data for orcs in an NTBB setting, and they did horribly, I'd be happy to give up this change. I might even consider giving that up anyway. I'm honestly not sure. One thing making me hesitate is that I don't want to unleash an overpowered team on any NTBB leagues - and that I haven't met a single coach who found the 90K pricetag on orc blitzers to be unfair. And it's not like I haven't had complaints about most things in NTBB. (Except for your complaint, which, as I understand it is a lot more abstract, and has nothing to do with 90K orc blitzers specifically).

With Slann, this is perhaps something I should have left well alone. And I originally did. You got me there. The stats don't support it. Perhaps I ought to roll that back. Or put a disclaimer up on the website. But I resent the implication that this is something cooked up by an evil old-boys club. Or by me personally.
This was suggested to me by the Daventry league. I took it to my own group, I took it to the Daventry group, and I took it here. Just like with the orc change, not anyone thought it was a bad idea. Nobody. Even Garion liked it, and he has no love lost for NTBB for sure.

Finally, the Amazon.
Their lifetime stats certainly look fine.
But as mentioned (way) above, the NTBB rules are based on the premise that lifetime stats are not the best indication of balance.
And while this is not very tangible, AFAIK most experienced amazon coaches would describe them as strong starters, weak finishers. And that's what put them up as candidates. That, and - as mentioned - a lot of feedback that I had missed them.
This made them a candidate for further investigation - as in the lifetime stats aren't telling the whole truth here.
So here are some stats:
FOL has them at 57.10% (4484 games)
Box has them at 59.42% (8618 games)
No other team stands out as clearly as that in both sources. So I think the stats indicate that the lifetime stats did not give the full story and do support a fix.
and the fix is not a fix, it's a rewrite. Plasmoid has admitted to intentionally changing their playstyle, and for my money has got it badly wrong (but that's a different argument based on how it impacts their progression).
We've had this discussion before - but I don't think that quote very accurately portrays my thoughts on the amazon, so I'll elaborate just a little bit:
You can find 30 suggested amazon rewrites online. All with the common denominator that few people but their respective authors like them.
I most certainly did not want to open that can of worms.

So I tried to stick what is amazons out of the box: They are a cookie-cutter team and unique in that regard. And they are the hard-to-knock-down must-POW-to-hurt team. I wanted to stay with that.
I find 'what they are' to be more tangible than 'what they can develop into'. Unlike you I don't think that being able to develop POMBers is in any way particularly amazonian. But lots of blodging and 6337 is still very much in their future.
But I suppose this part of discussion doesn't really lead anywhere we haven't gone before. I'll be talking to playtesting coaches at the end of 2013 to see how this has worked out.
Let's address the elephant in the room: a lot of people come across this project and believe it to be semi official in remit and how it's seen in the broader community, and you allow that impression. Fine.
I don't know what to say. I allow it? WTF?
Maybe you're projecting your own experience onto everyone else here. Your previous commish (who wrote a stirring review after season 1) may have oversold it to the league - what do I know?

But the site uses the term House Rules in the first few lines. The site also very clearly explains the level of BBRC involvement in the CRP+10 rules, and draws a very clear distinction between CRP+10 and the rosters. The BBRCers, when asked explicitly, did not find the wording misleading. Furthermore I consistently describe what I do - several times in explicit opposition to the BBRC.
And to top that off I have in forum topics repeatedly called this house rules - and I have even replied to some threads where someone has chosen a wording that might be understood to indicate that NTBB is something more.

I trust that people are able to read.
But all this discussion has definately made me aware that I should explain the stats I've worked with openly. Then people can make up their own minds, and decide for themselves whether my decisions were ultimately warranted by the stats or not. Give me some time and I'll add it to the site. At least now you've seen my explanations - and that I disagree with a lot of the things that you say in your final barrage. And you're more than welcome to pass my explanations on to your league mates.
You've said yourself [snip] that you don't wish to dedicate any time to researching actual win percentages
Geeez. How long do you think it took the first time around?
Since then I landed a full time steady job and another kid.
If someone else - yourself(?) - could be bothered to do a similar collection of actual league play data from lots of different leagues (to drown out individual super teams and super coaches) like I did the first time around, I'd be happy to include them.

For now I'm content that even if, say, dwarfs are actually not over 55% but a bit below it, the nerfs I've introduced will not blast them out of tier 1. Heck, it won't drop them below 50%. So worst case scenario I'm still narrowing tier 1.

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi all,
@VoodooMike: I'll reply to you next. Could be a week though.

@Spubbbba:
Well with such a small sample size you could largely ignore the stats and concentrate more on opinion.
That's the plan.
As VoodooMike has already stated, it's not like the stats will have any statistical significance anyway.
Oh and if Setekh is getting thick skull then I think all skeleton stars should have it.
I tried to make the changes to the stars fairly mechanical.
Setekh is based off a BlitzRa, and NTBB gives those Thick Skull.
Any star based off a skeleton lineman is not given anything, as NTBB does not create a new situation for those players.

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: NTBB: Stats

Post by dode74 »

Hey Martin

My own issue is that the changes purport to be based at least partially on statistics (as explained in the NTBB2013 pdf) when, in fact, they are not supported by them. Even the lines you've drawn across the stats you've listed above are misleading (I'm not suggesting intentionally), because the sample sizes don't support the suggestion that those top 3 teams, for example, are definitely above 55%. They don't even support that WE are definitely better than Lizards.
Now I believe that you are trying to narrow the tiers overall, which is fine, but you've not defined what your goals are: what will the new Tier 1 be? If you do that then changing those top 3 teams will at least be internally consistent (although the stats-based justification of buffs to Humans and Khemri does beg the question as to why there is a lack of buff for Necros). Whether any of these changes will have the desired effect, and how we measure the effect, is a different matter entirely.

Reason: ''
Hitonagashi
Star Player
Star Player
Posts: 664
Joined: Mon Mar 07, 2011 5:11 pm

Re: NTBB: Stats

Post by Hitonagashi »

Plasmoid wrote:
I don't know what to say. I allow it? WTF?
Maybe you're projecting your own experience onto everyone else here. Your previous commish (who wrote a stirring review after season 1) may have oversold it to the league - what do I know?
Minor point here, on at least 4 occasions, I've been talking to new players in the FUMBBL chat who asked "when is the new CRP rules coming to FUMBBL?", or "When are we getting the CRP+?".

All of them were under the impression that your house-rules were the next edition of the CRP that was in beta testing.

Reason: ''
koadah
Emerging Star
Emerging Star
Posts: 335
Joined: Fri Mar 25, 2005 5:26 pm
Location: London, UK

Re: NTBB: Stats

Post by koadah »

Hitonagashi wrote:
Plasmoid wrote:
I don't know what to say. I allow it? WTF?
Maybe you're projecting your own experience onto everyone else here. Your previous commish (who wrote a stirring review after season 1) may have oversold it to the league - what do I know?
Minor point here, on at least 4 occasions, I've been talking to new players in the FUMBBL chat who asked "when is the new CRP rules coming to FUMBBL?", or "When are we getting the CRP+?".

All of them were under the impression that your house-rules were the next edition of the CRP that was in beta testing.
CRP+ isn't NTBB. Even Christer has mentioned CRP+. If you are going to test it anywhere, Fumbbl would be a good choice.

Wanting to try the rules is not the same thing as thinking that they are official.
Some, if not most of the CRP+ changes are already coded into the client even if we cannot use them yet.

Reason: ''
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi all,
@Dode - I'll try to get back to you today.
In the meantime, didn't you recently post a graph of individual team performances including their margin of error.
Was that for my (old) stats?
If it was, would you mind posting them again here?

@Koadah/Hito: Indeed, CRP+10 and full NTBB are not the same thing.
As house rules I don't expect NTBB to somehow take over the BB world, but I'm happy to see them adopted by some, and I'm also quite happy to serve as a means of increasing awareness of CRP+10.

I don't think wanting to try CRP+10 neccessarily implies that they're in any way official or destined to become official.
But I suppose I could use even stronger language on the website. Thank goodness there's a vacation coming up - maybe I can work on the site then. My wife would just love that :oops:
That said, I won't preface the site with a "These Rules Suck" jpg, and I still think it fairly rude to imply that I'm trying to conceal the truth about the rules.

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: NTBB: Stats

Post by dode74 »

Hey Martin

That chart was for OCC.

Reason: ''
User avatar
spubbbba
Legend
Legend
Posts: 2267
Joined: Fri Feb 01, 2008 12:42 pm
Location: York

Re: NTBB: Stats

Post by spubbbba »

plasmoid wrote:Specifically, Necro had a spin with Golems at 100K, which didn't make it into CRP. So, removing those LRB6 stats, necro are at 44.84%. Not 60%. As I explained.
To me that is a prime example of the dangers of using an inadequate number of games to determine changes. Decreasing the cost of Golems is really minor, they are just glorified zombies and certainly not a vital player on the Necro team, unlike wolves. For a price change to turn the team from borderline tier 2 to above tier 1 is just ridiculous.

The changes to the NTBBL Orc team are greater since you normally start with 4 Blitzers and they are the key players for Orcs. And as you’ve said I don’t think that changed them much at all. Going by the stats instead of increasing the cost of blitzers we should be reducing the cost of some players since Orcs are sooooo weak.

Reason: ''
My past and current modelling projects showcased on Facebook, Instagram and Twitter.
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi Spubbbba,
I know and agree. And testing 2 different line-ups for necro back in the day meant that the numbers for each were further reduced, making matters worse.
The truth - I suspect - is somewhere inbetween those 2 numbers.

That said, with the right roster I do think a price change could make a big difference, Though by no means as big as 15%. On a roster such as necro, starting with few stability skills, the loss of a reroll on the starting roster could hurt a lot. (And I think Golems are a lot better than their reputation).

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi all - primarily Dode (though I suspect Shteve0 and VoodooMike may be reading along also):
My own issue is that the changes purport to be based at least partially on statistics (as explained in the NTBB2013 pdf)
You know, I read the pdf and the website, and I feel a little better now. I didn't find any claims anywhere that this was based on solid or bulletproof statistics. Didn't find any mention of statistics in fact - except that the BBRC seem to have managed to fit the tier1 teams into their description of tier 1. The language is pretty neutral for the most part - which I guess could be construed to imply absolute truth - but I do mention myself, my thoughts and actions quite a bit, implying subjectivity. So while I will revisit the website and add a few 'I think's and 'in my opinion's, I think Shteve0 pushed the description of my deceptiveness very far.

All that said, I do know that the stats are shaky. I just didn't know how shaky.
I'll set up a link and a description of the rationale behind my work, so visitors can judge the stats and decisions for themselves.
And VoodooMike is right, my grasp of statistics is fairly rudimentary.

Coincidentally Dode, do you really think the stats say absolutely nothing?
I wonder if you could/would work out a graph with margin of error for my old stats?
I also wonder what would happen, if the accepted margin of error was lowered from the scientific 95%?
Last question, is the margin of error to be understood as of completely linear probability all the way across, or is it more of a bell curve like thing?
(Either way, you know that the thing I posted here was a crude txt version of the stats, presenting the actual stat found and not the margins of error).

Ah, be that as it may perhaps.
In my defense I will say that when I gathered these stats and decided to start on NTBB, Cyanide was still a glimpse in the milkman's eye and FUMBBL hadn't switched to CRP, so these literally was the only collection of CRP-BB statistics.
In retrospect perhaps the nihilist approach would have been better, but I decided to act on the stats, flawed as they were.
In the end I guess I trusted the stats more because they lined up pretty well with what I expected/predicted prior to collecting them. And if nothing else, the NTBB will appeal to those who based on their own experience feel that the stats (and the ensuing discussion) has rightly identified the überteams. I assume we all agree that the tier 2 and tier 3 teams are not disputed(?). Right?
you've not defined what your goals are: what will the new Tier 1 be? If you do that then changing those top 3 teams will at least be internally consistent
I actually have. The website states:

With such a simple rule in place, there's no reason not to make all rosters reasonably competitive.
The team lists presented here are my shot at doing just that, first by narrowing tier 1 by essentially weakening the strongest starters. Secondly by buffing the tier 2 teams to the cusp of tier 1, and buffing the tier 3 teams to move them into the current tier 2.

And a bit further down, under the header Tier 0, I say:
But a handful of teams start out stronger than this, then fall down into the tier 1 zone in prolonged league play. In tournament play and short league play these teams are at a notable advantage - so I've introduced some minor changes to lessen their short term power without weakening their long term performance.

Notice - BTW - that I say this is my shot at this.
Anyway, what that the aim is to fit the current tier 0, 1 and 2 teams into the 45-55% win bracket, tier 2 teams near the bottom.
And tier 3 teams into their own tier (2), grouped as tightly as possible around the 40% mark, leaving a significant gap up to the bottom of tier 1, and a sizeable distance to the top of tier 1.


I've been reluctant to spell out the numbers, because in the end I will never be able to generate the kind of significant data to show if this was a total success or not. I'll be happy if tier 0 teams drop into 55-45 in both short term and prolonged play, if tier 2 teams make it close to the 45% mark, and the tier 3 teams get better without crossing into tier 1 (45%).
(although the stats-based justification of buffs to Humans and Khemri does beg the question as to why there is a lack of buff for Necros).
Yeah, I can see that in the pdf I've put Khemri and Humans in tier 2.
The website is rather more accurate, with Khemri and Humans described as "somewhat underpowered for tier 1" - and not included in tier 2(!)
I'll ammend the pdf.
Bottom line is this both the Human change and Khemri change are not stat driven as such. Galak, Ian and Babs agreed that humans are low tier 1, but that they ought to be higher based on both the fluff and on being a box team alongside orcs. And Galak has spoken openly about how the BBRC failed when they created the current Khemri roster. So those 2 changes are not purely NTBB like the other roster changes, but are part of the CRP+. As such - as I explained to Shteve0 above - the changes did not come about like the NTBB roster changes. Which is why necro were not buffed. The stats I had did not show them to be outside of tier 1.

Now, finally, you talked elsewhere about shoring up the stats.
And that would certainly be interesting.
But perhaps ultimately futile? VoodooMike, who understands statistics a lot better than I do, seems to think so(?): I mean, with factors such as TV-difference/inducements, race-vs-race differences, variation in coaching skill, and lady luck. Perhaps in theory factors that can be taken into account, but in reality we'll be hard pressed to find 2 coaches of equal skill willing to 100.000 games aggainst eachother :wink:

I would be interested to see stats for low-TV performance. As pure as possible (as described in the other thread).
But if we used FOL for this then we'd only be able to use teams actually starting their career in FOL, rather than teams that have joined from elsewhere (as someone said that you could in FOL).
I don't think Box stats would be useful...
It's not that individual games are TV-matched. That would be fine, and would eliminate the impact of TV-gap/inducements.
It's that the meta-game is TV-matching.

I know that I've explained this elsewhere, but I can't recall where. Here's the problem.
Arguably the 2 biggest problems with CRP is CPOMB and MinMaxing.
In a real league, MinMaxing is much less of a problem, because TV-efficiency isn't the only measure of power. In TV-matched play you can go to where you are most TV-efficient - your "sweet spot" if you will - and never fear meeting the other kind of power, namely 'total power' in the form of a huge overdog. The overdog may be less TV-efficient, but his total power could still be higher than yours. In league play, the overdog advantage encourages you to grow. In TV-matched play you are only encouraged to MinMax.

However... Hmmm... If we were only looking at Box teams for their first 10 games (against other teams in their first 10), then neither team would have had time to develop into MinMaxers (sitting at 1300TV after 70 games :roll:)

I guess stats for teams both in their first 10 games, with no more than 4?, games apart could be rather interesting. If VoodooMike is reading along, he can probably contribute some excellent ideas on what to do.

Now, finally, here is how I see the teams:
[The stats presented here is the total of my stats + FOL + Box, all caution thrown to the wind. In the paranthesis you'll find the percentages for My stats, FOL, Box).
Khemri got changed for design reasons.
Humans, are 48.27% (46.05, 47.94, 48.87) - I.e. bottom third of tier 1, and got boosted on fluff as described above.

I think it completly uncontroversial that the Tier 3 teams would need a boost to get close to 40%:
Ogre 27.67% (32.42, 28.14, 26.6) - 6815 games
Gobs 29.74% (36.81, 30.57, 28.0) - 7236 games
½lin 30.13% (19.21, 28.56, 30.71) - 4021 games

With tier 2, (leaving out Slann, who come in at 45.85 and who based on the stats ought to have been left alone), they also seem consistently short of the 45%'ish mark:
Vamp 40.55% (42.75, 40.58, 40.36) - 7966 games
UndW 41.56% (--.--, 42.02, 41.48) - 5419 games

And then the controversial category - Tier 0:
Orcs 47.93% (47.03, 50.23, 46.05) - 37595 games
Dwar 52.15% (56.16, 54.25, 50.71) - 28467 games
Wood 52.89% (56.52, 51.47, 53.67) - 17927 games
Unde 56.11% (56.47, 58.29, 54.21) - 21521 games
Amaz 58.01% (49.35, 57.10, 59.41) - 13424 games

Undead and Amazon are getting numbers outside tier 1 obviously.

Wood Elfs, Dwarfs and Orcs are getting numbers well within tier 1.
Maybe that means they're OK.
But for all 3 I'd be very interested to learn more about their stats in their first 8-10 games, as to my eye they are classic strong starters (heck, Undead and Amazon are too - shudder).
On top of that Dwarfs and Orcs will, I think, be getting a boost from the NTBB nerf to cpomb, meaning that they'll recover a lot of the long term power that they lost in CRP (and in the cpomb heavy Box, which does dominate these numbers).
So, based on the NTBB premise (e.g. that short term power can make a team overpowered) they could still be ripe for a nerfing).

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
koadah
Emerging Star
Emerging Star
Posts: 335
Joined: Fri Mar 25, 2005 5:26 pm
Location: London, UK

Re: NTBB: Stats

Post by koadah »

There are loads of reasons why this link might be no use to you.

I'm just chucking it in because the season ended today. And I had to drop a long way down the list to find any undead.

Reason: ''
User avatar
VoodooMike
Emerging Star
Emerging Star
Posts: 434
Joined: Thu Oct 07, 2010 8:03 am

Re: NTBB: Stats

Post by VoodooMike »

plasmoid wrote:I also wonder what would happen, if the accepted margin of error was lowered from the scientific 95%?
You can't adjust the alpha until you get what you want... it's simply about the level of confidence you want to have that you're seeing an actual effect. So what we're saying is that we're 95% certain the real value is between the high and the low value (the average plus and minus the so-called margin). If you adjust it, you're simply going to be saying something like "ok, now we're 90% certain..." or "we're 75% certain..." etc etc. For social sciences 95% is standard... there's no reason to use anything else, really.
plasmoid wrote:Last question, is the margin of error to be understood as of completely linear probability all the way across, or is it more of a bell curve like thing?
See above.
plasmoid wrote:But perhaps ultimately futile? VoodooMike, who understands statistics a lot better than I do, seems to think so(?): I mean, with factors such as TV-difference/inducements, race-vs-race differences, variation in coaching skill, and lady luck. Perhaps in theory factors that can be taken into account, but in reality we'll be hard pressed to find 2 coaches of equal skill willing to 100.000 games aggainst eachother
You don't need two players doing it all... in fact, that would lack external validity. It's not futile to look at past statistic - especially those related to Box and to a lesser extent FOL. You don't need to control for player skill, either, when you have a large number of different players involved in the dataset. Think of it this way... lets say you didn't know the exact math to calculate how likely a given dice-involved test was to succeed. You could try it a few times and try to figure it out, or you could try it thousands of times and the various dice values would average out to (in the case of 1d6 to 3.5) and the variation is actually automatically controlled for. This is the case with most factors... in the case of coach skill, you have good and bad players, and they're like the dice rolls. Across a large enough dataset, they control for themselves.... they average themselves out, leaving only the truly constant effects as being consistently visible.

When I said stat usage was futile, it was in reference to your testing of your changes - you won't have enough of a dataset to be meaningful. Playtesting your changes will be unlikely to provide any usable information... not only because it'd be statistically meaningless, but the opinions would be subject to the expectancy effect, and thus would be of little discerning value too. You're certain to find what you WANT to find... not because its true, but because the error will be so high it'll cover any numbers you might want, and because people will report "its good!" because they know what the test is about.
plasmoid wrote:I would be interested to see stats for low-TV performance. As pure as possible (as described in the other thread).
But if we used FOL for this then we'd only be able to use teams actually starting their career in FOL, rather than teams that have joined from elsewhere (as someone said that you could in FOL).
I don't think Box stats would be useful...
It's not that individual games are TV-matched. That would be fine, and would eliminate the impact of TV-gap/inducements.
It's that the meta-game is TV-matching.
None of this matters. The data is absolutely fine for this - making reference the to data you want as "pure" is ignorant and prejudicial... what you want is actually much muddier as far as statistical power goes, and you're much less likely to be able to justify any changes because you'll end up with wider margins. All that matters is how teams fare at certain TVs... box and FOL data is especially GOOD for this because it enforces TV similarity, which is exactly what you want when dealing with tier placement, or adjustment of win%s.
plasmoid wrote:In a real league, MinMaxing is much less of a problem, because TV-efficiency isn't the only measure of power. In TV-matched play you can go to where you are most TV-efficient - your "sweet spot" if you will - and never fear meeting the other kind of power, namely 'total power' in the form of a huge overdog. The overdog may be less TV-efficient, but his total power could still be higher than yours. In league play, the overdog advantage encourages you to grow. In TV-matched play you are only encouraged to MinMax.
There's no evidence that MinMaxing affects win%s in any noticeable way... in fact, while it may feel like a problem when people face a minmax team, it happens very infrequently... its like when people complain about double or triple skulls. It sticks in their head so they imagine it happens more often than it does. The "sweet spot" idea is more myth than reality.. Chaos, for example, just gets better as its TV goes up... so its sweet spot would be infinity as far as we can see. No teams win% progresses toward infinity as TV approaches zero, which means any team that doesn't get better with TV without end, will appear to have a peak on their graph. Its not very meaningful.
plasmoid wrote:I guess stats for teams both in their first 10 games, with no more than 4?, games apart could be rather interesting. If VoodooMike is reading along, he can probably contribute some excellent ideas on what to do.
You and dode74 are on your own for that. I'm not going to jump through hoops with meaningless subdivision of the data just because you can't understand why it's unnecessary. My excellent idea is to stop dicking around and use the massive amount of totally valid data that exists and is relevant to what you're supposedly working on.

Reason: ''
Image
Post Reply