NTBB: Stats

Got some ideas for rules? Maybe a skill change or something completely different!!! Tell us here.

Moderator: TFF Mods

Post Reply
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi all,
right - finally found the time to reply. Might as well jump right in:
The changes made just can't be justified on a statistical basis, is all, so claiming that they are gives them a false provenance.
Both you and VodooMike seem to be focused on my claim to be making changes justified on statistical basis.
I've never wanted to make such a claim.
And I do wish that the statistically minded among you would focus a bit on what is actually said on the site, as opposed to what you feel is implied.

Please note that I never make reference to any data on the NTBB site, ever.
The data I brought up in this and the other TFF thread was only because specifically asked what data I had based NTBB on, and I explained that this data was my starting point and that:
The data isn't perfect by any means, so as always it has been combined with thought and discussion.
VoodooMike then scolded me for doing statistics all wrong.
But I know I'm not doing statistics. And not claiming to.

Dode also pointed out that:
This means that the statement "a handful of teams start out stronger than this" from the quote above isn't justified at all by the stats that were used.
Indeed. That data was the starting point for a lot of discussion.
I'm not crazy enough to claim that those data are statistical justification.
The stats are for lifetime performance. How could they ever reveal extreme short term performance?

I know that eyeballing data and then discussing BB does not yield statistically significant results.
all I've done is to prefer discussion + (vague) 'trends in data' over discussion only.
As mentioned in this thread, I've buffed slann solely based on feedback. And my decision to nerf orcs was based partly on the widespread (in discussion) view that orcs attract newbs (being in the boxed game), as well as the assumption that the CRP+ change to the metagame would catapult orcs back up in the power hierarchy to where they were pre-CRPclaw. Nothing secret about it - you're welcome to dig out the many forum discussions I've participated in concerning NTBB.

To wit, I chose he name Narrow Tier to state my intention/aim. Not my method.
As I would have if I had created Improved Inducements rules.
And I strongly disagree that anything else is implied by the name, except perhaps to someone in a statistics-mindset.

Or as Dode said about the BBRC:
They didn't claim anything other than aiming for the tiers they were aiming for, and whether they achieved it or not is a matter of debate.
Same as me then.

Dode said:
All I'm saying is that to call it "Narrow Tier" and to claim that it "pushes the T1 races into the 45-55% bracket" heavily implies a statistical basis for the changes (i.e. that some of the race are outside that bracket) which currently doesn't exist - it's misleading.
I don't know what sloppy wording I've typed out in this discussion :oops: . But did I "claim that it pushes the T1 teams into the 45-55% bracket"?
On the NTBB site, in the (only) section you like to quote, I say that 'the BBRC seem to have managed to get all the tier 1 teams into the 45-55% win zone'. I don't think that's claiming that I have statistical justification.

I also say that (same section):
I've introduced some minor changes to lessen their short term power without weakening their long term performance
I didn't even claim that it actually gets them into the 45-55% range, even if that's what I'm hoping for.

Now, allow me at the very least to repudiate that 'statistical basis' is 'heavily implied' - or that most readers will expect that.
1. First off, anyone who has been following the development of Blood Bowl through the LRB era will know that about 99% of the changes made to Blood Bowl have been made purely on the basis of feedback and discussion. No stats. Dirty Player wasn't changed based on stats. Diving Tackle wasn't. The kick-off table wasn't. etc..
That last 1 percent represent the late team changes made by the BBRC based on the data I collected at the time - which is hardly bullet proof data anyway, (the data may be good concerning margins of error, but is hardly extensive enough to be reliably representative of the whole 'population').
BB has been forged in discussion.
That's the tradition I'm continuing.
And that's what I think most readers will expect.

Coincidentally, unlike Mike, I don't think that the LRB process has left BB in a terrible state. I think CRP is the best that BB has ever been.
You can call it "suck it and see".
I think it's more like evolution (except these "mutations" aren't completely random, they have intent behind them): Some things don't work, replacements are tried out and are either discarded or accepted.

2. Secondly, NTBB has been a work in progress since the first forum discussions in late 2009 (AFAIK).
Back then, nobody expected there to be significant stats for anything, because there wasn't any!
My data sample was the first shot at data collection, and still it was small and rather insignificant.
That situation has only changed fairly recently.
The data from the Cyanide game has for a long time been very unreliable, with the 8-race start, the mass cheating and the oracle. And FUMBBL didn't cross over until first half of 2011(?)

So, the ammount of data is a fairly new situation. And still I doubt anyone can easily produce data for tabletop league play in a quantity that will be significant for statistical studies. As for the other meta-games (Box-style & Res. Tournament), there are good sample sizes to narrow the margins of error, but as for the power of the sample to represent the full population, I doubt that we have enough data yet. Though admittedly that is beyond my expertise.

Now, all that said, as the number of history-less newbies joining BB increases, while the ammount of data slowly builds, I suppose that the assumption of 'statistical justification' will gradually become more frequent.
As a result, I've already stated that I'll rework the language on the site to make it abundantly clear that these house rules are based on discussion, on my perception forged in discussion, and that they are for anyone who share that perception.
I've already reworked the pdf, and I'll update it when I also have the rewritten site finished.


So, what does that mean for the future of NTBB?
In itself, not a lot, I suspect.
But this whole discussion has made me more aware of the data that exists.
And it has also improved my ability to work with the data.
So based on this, I think there will be a few unexpected changes to NTBB2014.
And when that happens, you can be sure that I'll be explaining the statistical background, and what it does and does not mean.
Big fat sticker on the front. An actual claim. :orc:

More on that in a future forum post :D
Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: NTBB: Stats

Post by dode74 »

I've never wanted to make such a claim.
And I do wish that the statistically minded among you would focus a bit on what is actually said on the site, as opposed to what you feel is implied.
The use of tiers, narrowing, your use of data earlier in the thread (page 1), and your talk of tier 0 heavily implies a statistical basis. Don't get me wrong: this isn't a criticism of NTBB as a whole; it's a criticism of the invalid use of statistics to "shore up" what is basically a set of house rules. That's where the phrase "lies, damned lies and statistics" comes from.
Please note that I never make reference to any data on the NTBB site, ever.
No, but you did quote tier ranges and state that some teams started outside them. That's a statistical claim.
Or as Dode said about the BBRC:
They didn't claim anything other than aiming for the tiers they were aiming for, and whether they achieved it or not is a matter of debate.
Same as me then.
Not really. As I said above, you used "some teams start outside tier 1" as your start point, whereas I believe they started from "what do we feel needs changing" probably based on feedback. The difference is qualitative: your claim is that "X can be seen to be wrong, therefore we can change X", whereas theirs is "we think X is wrong, we have the remit to change it, so we're going to". I'm not saying either method is better, but that they did what they said they did (rightly or wrongly), whereas your starting premise can be seen to have no factual basis. I've got no issues with you trying to continue a tradition of "suck it and see" evolution, but if there is to be a claim that the "sucking" is statistically based in any way then there needs to be a solid argument behind it.

We (you and I) have been over most of this ground before. The main aim of my last few posts was to respond to garion and koadah. I know you are actually doing some learning and making some changes here, and I look forward to seeing where it goes.

Reason: ''
User avatar
VoodooMike
Emerging Star
Emerging Star
Posts: 434
Joined: Thu Oct 07, 2010 8:03 am

Re: NTBB: Stats

Post by VoodooMike »

plasmoid wrote:Both you and VodooMike seem to be focused on my claim to be making changes justified on statistical basis.
I've never wanted to make such a claim.
A: Excuse me, are you the owner of "Zeke's Fix Ur Car or Ur Money Back Emporium?"
Z: Yep, that's me.
A: My car isn't fixed, and I'd like my money back.
Z: What's this now?
A: You didn't fix my car, so I want my money back.
Z: I never said you'd get your money back if it didn't get fixed.
A: It's right there on the sign!
Z: That's just the name of the shop. I'm sorry you feel it was implied that you'd get your money back, but that's more your problem than mine.

The tiers are statistical constructs. Your project is called Narrow Tier BB, with the title "Narrowing the Tiers" right at the top, and two of the three design goals are about the tiers:
plasmoid wrote:2) The NTBB aims for a Narrowing of tier 1. The CRP+ already does this by buffing the underwhelming Human and Khemri team, and NTBB follows up by reining in the 5 most popular Tier 1 teams.
3) The final part of NTBB is slightly more controversial. - Narrowing the gap between the tiers. NTBB introduces buffs to the tier 2 and tier 3 teams, in order to push the tier 2 teams (Vampire, Underworld, Slann) to the cusp of tier 1, and to improve the tier 3 teams (Goblin, Halfling, Ogre) enough to make them more viable - without making them equal to proper tier 1 teams.
...
plasmoid wrote:And I do wish that the statistically minded among you would focus a bit on what is actually said on the site, as opposed to what you feel is implied.
The name of your project, and two of your three design goals are centered on performance statistics, yet you think its silly assumption on our part that you might possibly have looked at some statistics to decide the changes were needed (since, for all you know, everything is squarely at 50% already) or that you might use numbers to guide your actions. Seems to me we can read a whole lot better than you can think, thus far.
plasmoid wrote:Please note that I never make reference to any data on the NTBB site, ever.
The tiers are lifetime performance standards for teams - they're a stated range of win% based on the win% formula, that general team performance is supposed to fall into. To even start talking about the tiers, or any alteration to them, you have to first be implying that the teams don't already fall into the ranges you're seeking to create. Such an assertion is utterly nonsensical if you have no data to say they do not. This isn't rocket science, even for someone like you. At the bare minimum, you're using your own feelings as "data" since you have to, for some reason, believe that the tiers aren't ALREADY exactly the way you want them to be. It's a crappy, unreliable source of data, mind you, and one that is prone to being torn apart by ACTUAL data, but there's some sort of deterministic process involved somewhere.
plasmoid wrote:VoodooMike then scolded me for doing statistics all wrong.
But I know I'm not doing statistics. And not claiming to.
Because you do statistics wrong... and you keep trying to refer back to a set of numbers that have no meaning, because you seem incapable of wrapping your head around why they have no meaning. You ARE trying to use statistics, you just don't know how, and when you get frustrated with that fact you claim that the demand for numbers is baseless.
plasmoid wrote:On the NTBB site, in the (only) section you like to quote, I say that 'the BBRC seem to have managed to get all the tier 1 teams into the 45-55% win zone'. I don't think that's claiming that I have statistical justification.
Lets take a look at what NTBB says there:
NTBB wrote:The BBRC seem to have managed to get all the tier 1 teams into the 55-45% win zone that they wanted. But a handful of teams start out stronger than this, then fall down into the tier 1 zone in prolonged league play. In tournament play and short league play these teams are at a notable advantage - so I've introduced some minor changes to lessen their short term power without weakening their long term performance.
Saying they seem to have gotten them into that range implies you have data. Saying that some start out stronger than that implies you have data. Saying they drop down in longer term play to that range implies you have data. To say roster A needs a buff and roster B needs a nerf, to achieve your vision, implies you have data to that effect.
plasmoid wrote:Coincidentally, unlike Mike, I don't think that the LRB process has left BB in a terrible state. I think CRP is the best that BB has ever been.
But made better by you changing stuff at random, apparently. I'll point out that I've never said anything about CRP being worse than any previous version of the game, or that the game has been left in a terrible state - I did, however, say that their eyeballing process has led the game to be something people like you think needs your expert touch to make work properly, and their method was the same as your method.

Your argument seems to be "Well, none of my friends finished highschool, so nobody ever assumed this sort of thing in the past..." - that's less an issue with people who have reasonable expectations of statistical justification and more an issue with you having been blowing smoke up the asses of people who don't know any better, for several years. Governments are pretty grumpy about the internets for the same reasons too... easy fact checking is a bitch when you're full of crap.

Reason: ''
Image
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi Mike,
I can't help but notice that you manage to fit personal insults into the majority of your posts to me, Garion, Darkson and Koadah.
I can't see any real retorical purpose to it, but it's rather annoying, and I'm assuming you're doing it just to masturbate your ego.
Oh dear, look what you made me do. I believe the adage says you'll now proceed to beat me at it.

Anyway, I'm glad you said this:
At the bare minimum, you're using your own feelings as "data" since you have to, for some reason, believe that the tiers aren't ALREADY exactly the way you want them to be. It's a crappy, unreliable source of data, mind you, and one that is prone to being torn apart by ACTUAL data, but there's some sort of deterministic process involved somewhere.
I'd say I'm using online discussion as my "data". Crappy as it may be.
My objection is not to your and Dodes description of that data.
My objection is to the accusation that I'm implying that I have a different kind of data than the one I have.
Dode says that any statistical statement requires data.
Fine. Discussion is my data. Alongside the bits of hard data of neglible statistical importance.

That's why I don't expect that the data will win anybody over. And that's why I don't use it as an argument on the site.
I expect anyone who is inclined to try NTBB will do so not because I convinced them with data, but because their own experience of the game matches mine. Terribly unfounded as their experience too may be.

Anyway, as I said I'll work on the language to make it clearer. And I'll start looking at hard data, because that's available now - and I'm interested to see how that can improve NTBB.
But made better by you changing stuff at random, apparently.

Weak as discussion may be as a data source, it isn't random.
For instance I'd be surprised to find that any of the tier 3 teams are already lounging close to 40%, where I'd like them to be.
Even if I'm just going on observation, discussion and neglible data.

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: NTBB: Stats

Post by dode74 »

My objection is to the accusation that I'm implying that I have a different kind of data than the one I have.
Tiers are, as Mike rightly says, a statistical construct: they are defined mathematically with reference to numerical data. You're implying that you have numbers which show some teams are performing differently to the tiers (see Tier 0 teams) and are justifying changes on that basis.
"Discussion" is not a data source with which you can make any reasonable assessment of or adjustment to "tiers" with. Discussion is ideas, tiers is numbers. No problem at all with this or any other set of rules which uses ideas as a basis: hell, the idea is to have fun, after all. It's when that same set of rules then claims a statistical basis for changes which isn't there which is the issue.

Are you now saying that there is no statistical basis for NTBB? How about a limited statistical basis? If it is a limited statistical basis then what are those limits and how do you justify them?

Reason: ''
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Right, modified text on the site is up:
http://www.plasmoids.dk/bbowl/NTBB.htm
I hope this will hold up for the 2013 version.
Tiers are, as Mike rightly says, a statistical construct
Statistical construct makes them sound like abstract math. But they're not. They're quantified observations about reality.
And much as it is a weak source of data, observations and discussion can be made concerning this reality.
Weak comparisons can also be made about this reality - for example I feel very confident that gobbos, halflings and ogres are weaker than the "tier 1" teams, even without any hard data for it.

Say I glanced at a bowl with 20 apples. I could say that "more than half of those apples are red", and it would be a meaningful statement, even if I hadn't counted them. I might feel more confident if I had asked other people, who had also glanced at the bowl about it. But I/we could still be wrong.
Discussion is ideas, tiers is numbers.
Discussions aren't just ideas.
There has been lots of discussion about power in BB.
I know they're tainted with observational bias. But observational bias is not the end of all conversation.
For example, I know I can find a lot more threads about dwarfs being broken than about Nurgle being broken - and that's hardly just down to observational bias. That's not proof that dwarfs are broken. But I have no doubts that they're a better starting team than Nurgle.

Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
User avatar
VoodooMike
Emerging Star
Emerging Star
Posts: 434
Joined: Thu Oct 07, 2010 8:03 am

Re: NTBB: Stats

Post by VoodooMike »

Plasmoid wrote:Statistical construct makes them sound like abstract math. But they're not. They're quantified observations about reality.
Wrong as usual. The tiers are based on the intended win% for a given roster meaning they are a statistical estimation of future outcomes... incorrect estimations in many cases, but they're certainly not quantified observations about reality; that would imply that we were being given a sample mean rather than a population estimate based on samples. When you attempt to predict population means from a sample, you give the CIs specifically because you don't have population data, you're using sample data to estimate population data, and variation will put it somewhere in that range.
Plasmoid wrote:And much as it is a weak source of data, observations and discussion can be made concerning this reality.
It isn't a serious form of data - it's something that people who don't know any better use as data. Humans, as an instinctive part of their existence, perform very weak statistical analysis on their environment and experiences. That's swell and all, but it is ridiculously bad compared to formal forms of analysis in the same way you and a buddy can probably push a car, but a tow truck will do a better job on an order of magnitude as to make the two almost not worth comparison. You're arguing from a very, very stupid standpoint.
Plasmoid wrote:Weak comparisons can also be made about this reality - for example I feel very confident that gobbos, halflings and ogres are weaker than the "tier 1" teams, even without any hard data for it.
I'm absolutely confident my broken clock will be right twice a day, and equally confident that that fact does not support my clock being a meaningful tool for telling the time.
Plasmoid wrote:Say I glanced at a bowl with 20 apples. I could say that "more than half of those apples are red", and it would be a meaningful statement, even if I hadn't counted them. I might feel more confident if I had asked other people, who had also glanced at the bowl about it. But I/we could still be wrong.
Meaningful to you, maybe. In reality your brain did do rough quantification based on how much of each colour it remembers seeing. It's really bad compared to actually counting the apples, but it worked ok for nomadic hunters and such. Since we don't live in caves anymore or fend off saber-tooth tigers, we can use data and calculators and such.
Plasmoid wrote:Discussions aren't just ideas.
There has been lots of discussion about power in BB.
I know they're tainted with observational bias. But observational bias is not the end of all conversation.
For example, I know I can find a lot more threads about dwarfs being broken than about Nurgle being broken - and that's hardly just down to observational bias. That's not proof that dwarfs are broken. But I have no doubts that they're a better starting team than Nurgle.
Well that was a whole lot of nothing. I can find you a lot of threads about the RNG being broken, too. Think that means that makes it more true? A person's guesstimation of a fact is not data on anything but people's guesstimations. Opinions are not objective data on anything but opinion. Full stop.

Reason: ''
Image
plasmoid
Legend
Legend
Posts: 5334
Joined: Sun May 05, 2002 8:55 am
Location: Copenhagen
Contact:

Re: NTBB: Stats

Post by plasmoid »

Hi Mike,
you misunderstand me if you think that I'm claiming that discussion (based on observation) is comparable in quality to statistical analysis.
I was merely responding to Dode, who says that statements about a "statistical construct" must be based on hard data to not be devoid of meaning.

My point is that the statistical construct relates to tangible reality via observation, and that as such it is possible to make (ill-founded) comparative observations/statements about reality even without hard data. And that's what I did because:
a) nobody, (not just my college buddies, as you so glibly remarked), had applied proper statistics to BB in the past.
b) when NTBB started, there was only neglible quantities of data available.
(and c) had I wanted to, I wouldn't have known how).

So I'm not saying that statistics isn't a supirior method.
Cheers
Martin

Reason: ''
Narrow Tier BB? http://www.plasmoids.dk/bbowl/NTBB.htm
Or just visit http://www.plasmoids.dk instead
dode74
Ex-Cyanide/Focus toadie
Posts: 2565
Joined: Fri Jul 24, 2009 4:55 pm
Location: Near Reading, UK

Re: NTBB: Stats

Post by dode74 »

Dode, who says that statements about a "statistical construct" must be based on hard data to not be devoid of meaning.
No I didn't. I said that the tiers were a statistical construct. They are defined mathematically. Numerical data is used to assess them. I am saying that the statement "team X is outside tier 1" is unsubstantiated without numerical data to support it, no matter how many people, reputable or otherwise, think it is the case.

Reason: ''
Post Reply