Analysis of NAF statistics

Steam Ball · Post by **Steam Ball** » Wed Mar 08, 2017 9:15 pm

AndeeT wrote:(EDIT for table jpg. I can't format forum posts for toffee!)

Write the table using a monospace font in other editor, then copy paste and wrap with code tags.

1 - 2 - 3
a . b . c
i   ii  iii

AndeeT · Post by **AndeeT** » Wed Mar 08, 2017 11:44 pm

Thanks for the discussion everyone. I have learnt a lot here. And thank you Steam Ball for the lesson on formatting posts- will give that a go later.

The points around how to measure success are extremely interesting. Conceptually, a draw is hard to define for me in this way. It kind of comes down to a glass half full or half empty no? Some will see a draw as a bad thing, others as a good. Of course, this problem is not specific to Blood Bowl but to any game where outcome is trinomial. A few things that I can think of as to why it's useful to consider 'success' to be (wins+(draws/2));

1. It obviously has a long standing in American sports...(never came across it here in the U.K. until now though), so there is a precedent for it at least.
2. Blood Bowl can often end up in a draw
3. In terms of ranking teams, you want a system that discriminates between teams so that you don't end up with too many tied ranks. Taking (draws/2) into account should (I think) increase your discriminating power over straight wins.

Dode - you mentioned on the first page of the thread that low TV bias could confound comparisons. I know that that NAF data can be split out into three different TR categories (100, 110, 120) so I was considering looking into standardising for TR. Has this been done before? Or in the case of FUMBBL, standardising for TV?
If we are considering total games at all TR's, the TR structure of per race could vary a lot so it certainly seems reasonable to look into. Either that or produce a separate analysis for each TV range.

The NAF is already stratified into those 3 categories; what TV bands would you suggest for e.g FUMBBL data?

Oh, and thanks for this from page 4 of the thread;

"I think, therefore, that the "points gained" proportion is (wins + draws/3)/n and the "points not gained" proportion is (losses + draws*2/3)/n, where n is games played."

The formulae check out. And it helps my understanding to dichotomise into 'gained' and 'not gained'; think that's where I was going wrong earlier with the whole 'w/l/d ≠ 1' thing.

Best Wishes,

AndeeT

dode74 · Post by **dode74** » Thu Mar 09, 2017 7:23 am

The NAF data is stratified for bands in which most(?) tournaments take place. There is some variance in this: some 110 tournaments include skills in that 110, while others do not and add a skill pack on top: this will add a confound.

The problem with attempting to stratify by TV for MM etc is that the numbers of games occuring at higher TVs is so low that the 95CI ranges become very large. "Most" people want to know about high TV data (meaning 2k+) but sufficiently useful sample sizes aren't there for lots of the races.

AndeeT · Post by **AndeeT** » Fri Mar 10, 2017 1:23 am

Hi,

Thanks dode. I will have to think a bit harder about how to stratify the FUMBBL data by TV (currently looking at ranked data), as the low numbers will mess with standardisation. I was thinking of the following TV bands for standardisation; 900-1200, 1200-1500, 1500-1800, 1800-2100, 2100-2400. As you say, above 2000 numbers start dropping off rapidly, but with TV bands this wide, as long as I exclude Underworld, Goblin, Halfling from the analysis, small numbers shouldn't be an issue.

I did some work looking into standardising the Win% (wins+(draws/2)) NAF data by TR (TR100, TR110, TR120, as it is presented broken down at; http://naf.talkfantasyfootball.org/). The assumption was that the differing TR values may confound comparison between races, if races had different ratios of games played at TR100:TR110:TR120. By standardising the data by TR, it allows you to compare races, even if the relative numbers of games played at each TR level are different. Please see table below. As you can see, taking TR out of the equation did not appreciably change win%. This means that most teams have very similar ratios of games played at TR100:TR110:TR120 in the NAF, so it shouldn't be too much of a confounding factor when comparing races in NAF data. Of course, we can only standardise by what is in the data; as dode mentioned skill pack upgrades could also be confounding but I can't standardise for this as it is not recorded in the data.

Code: Select all

         RACE,  WIN%,  STANDARDISED, DIFFERENCE
       UNDEAD,  56.3,          56.3,        0.0
     WOOD ELF,  55.5,          55.4,       -0.1
    LIZARDMEN,  53.8,          53.8,        0.0
       AMAZON,  53.6,          53.6,        0.0
   DARK ELVES,  53.1,          53.0,       -0.1
CHAOS DWARVES,  52.5,          52.5,        0.0
      DWARVES,  52.1,          52.1,        0.0
        NORSE,  51.9,          51.8,       -0.1
       SKAVEN,  51.3,          51.4,        0.1
  NECROMANTIC,  51.1,          51.2,        0.1
        ELVES,  50.5,          50.6,        0.1
          ORC,  48.1,          48.1,        0.0
   HIGH ELVES,  48.1,          48.0,       -0.1
       KHEMRI,  48.0,          48.1,        0.1
   CHAOS PACT,  47.6,          47.6,        0.0
        SLANN,  46.4,          46.2,       -0.2
        HUMAN,  46.0,          46.0,        0.0
   UNDERWORLD,  45.1,          45.1,        0.0
       NURGLE,  44.5,          44.4,       -0.1
        CHAOS,  43.7,          43.7,        0.0
     VAMPIRES,  43.3,          43.4,        0.1
    HALFLINGS,  34.5,          34.3,       -0.2
      GOBLINS,  32.4,          32.4,        0.0
        OGRES,  31.7,          31.4,       -0.3

[/size]

With this in mind, I produced a 'dode graph' of the standardised data. Please see below;

TR.png

For details on standardisation please see page 9 of the following PDF http://www.apho.org.uk/resource/view.aspx?RID=48457

I am also working on a similar thing looking at standardising between LRB4/5/6 of the NAF data.

Please let me know your thoughts.

EDIT - forgot to say, the lines on the graph represent the +\- 95% confidence interval for that races win% (calculated using method on page 9 of the above hyperlinked PDF)

Best Wishes,

AndeeT

CyberedElf · Post by **CyberedElf** » Fri Mar 10, 2017 6:01 pm

AndeeT wrote:First half of 95% confidence interval comparison table;

Confidence_Int_1.jpg

There is a different formula for comparing two proportions. You can say with 95% confidence that two proportions are different even if their 95% CIs overlap.
https://www.cscu.cornell.edu/news/statnews/stnews73.pdf
Provides the formula.

AndeeT · Post by **AndeeT** » Sat Mar 11, 2017 2:50 am

Hi CyberedElf,

Thank you for the PDF and interesting that you mention T-tests and confidence intervals for differences between samples. I used to do T-tests all the time but have never looked into the CI for differences in proportions or means, so that was educational for me. I think these are great for when you set up an analysis specifically looking at a difference between two races. However, I don't know if I would condone this approach for comparing all the races we have.

It is very tempting to calculate these for each of the comparisons between races. However, that is a lot of comparisons! Talking alone about the overlapping pairs of confidence intervals from the graph/tables I posted earlier, there are 95 unique pairs that overlap. By making multiple comparisons like this, each time you make one, you increase the chance of finding a difference purely by error (type 1 family-wise error).

Our alpha=0.05 for making those confidence intervals, and we are making 95 comparisons. Family-wise error for this =1-(1-0.05)^(95) = 0.99! i.e. it is almost 100% certain that we will make an error and find significant differences when there are none.

For the moment I am happy with the conservative method of just using 95% CI's and saying when they don't overlap = significance, and when they do = we don't really know. There are tests that control for family wise error, and things like Analysis of Variance (though I've only ever seen that applied to around 6 comparisons, it might still fall down when we get to comparing 276 racial comparisons??) but I need a few hours sleep and a strong coffee before I start looking into those

...

... While I am here though I have a few questions regarding FUMBBL data (http://fumbbldata.azurewebsites.net/stats2.html);
1. How often is data refreshed/updated?
2. What does the 'mirror' button do? what are mirrors? (no silly answers please

)
3. Is there a method to extract data for specific, non-overlapping TV bands. Bands I would like are - 900-1119, 1200-1499, 1500-1799, 1800-2099, 2100-2399. However, I think at the moment I may be double counting some matches, as the filters on the FUMBBL website do not allow you to be explicit. Any tips?

Best Wishes

AndeeT

Darkson · Post by **Darkson** » Sat Mar 11, 2017 7:08 am

AndeeT wrote:2. What does the 'mirror' button do? what are mirrors? (no silly answers please )

Mirror matches are where the same race plays against it self (i.e. Chaos vs Chaos, WE vs WE etc) and has the tendency to pull stats towards 50% (as any given mirror match will either result in a W/l or a d/d).
The button you mentioned removes mirror games so it's always race vs a different race.

Pretty certain you can't remove thse games for the NAF stats, and the BBRC never excluded them anyway.

plasmoid · Post by **plasmoid** » Sat Mar 11, 2017 8:07 am

Hi AndeeT,
I did some preliminary data work a few years ago using the same site as you.
Oh, and this nice NAF site: http://naf.talkfantasyfootball.org/ - which unfortunately does not double-sort, but you can at least sort by "LRB6".

1. How often is data refreshed/updated?

I believe updating is automated/continuous.

2. What does the 'mirror' button do? what are mirrors? (no silly answers please )

It removes matches between 2 teams of the same race, as this is empty data that pulls the data towards 50%.
Removing them is fine if you want to know how much a team wins, but contentious if you want to say anything about "balance" or "tiers" as the BBRC did not exclude mirror matches when they defined the tiers (as Darkson has already hinted).
It is perhaps worth noting that the original data that the BBRC worked with when they defined the tiers did not include the option to remove mirror matches, so perhaps this is why they chose to include data which is not only meaningless, but clouds the meaning. Or perhaps that wanted to include it in their definition, as these games are indeed played in the meta. Nobody has ever bothered to comment, AFAIK.

3. Is there a method to extract data for specific, non-overlapping TV bands. Bands I would like are - 900-1119, 1200-1499, 1500-1799, 1800-2099, 2100-2399.

AFAIK, unfortunately no.

Down near the bottom of this page (http://www.plasmoids.dk/bbowl/NTBB2014x.htm) you can see the work that I did.
This is data from FUMBBLs Black Box just before they changed the scheduler. The FUMBBL site used to have an option to see either "pre-" or "post-" scheduler change data, but I don't see that option anymore

Anyway, the table has lots of Little cells with numbers in them. If there is a single number in a cell, then I didn't bother to calculate CI, because I was looking for Places where teams went above or below the Tier bands, and it that particular place there was no risk of that. So just ignore cells with a single number in them.
However, for longer bands with 2 numbers (xx.x - xx.x) do include CIs.
...the only reason I bring this up is that I found that for quite a few teams something happens around 1500TV, so that fits will with the divisions that you have chosen.

Good luck
Martin

Talk Fantasy Football

Analysis of NAF statistics

Re: Analysis of NAF statistics

Re: Analysis of NAF statistics

Re: Analysis of NAF statistics

Re: Analysis of NAF statistics

Re: Analysis of NAF statistics

Re: Analysis of NAF statistics

Re: Analysis of NAF statistics

Re: Analysis of NAF statistics