Subject: Digest for the period 5/23/2006 - 5/24/2006 Date: Wed, 24 May 2006 01:02:21 -0400 Table of contents ---------------------------------------------------------------------- 1. Flight size, quality judging, et. al. (Bev D. Blackwood II) 2. Scoring, a statistical look (Spencer W. Thomas) 3. The goal of judging (Joel Plutchak) 4. RE: Accounting for points (Houseman, David L) 5. Time for judging. (Jon Tobey) 6. RE: Accounting for all 50 points (loribrown`at`att.net) 7. Judging (Peed, John) ---------------------------------------------------------------------- From: Bev D. Blackwood II Date: Tue, 23 May 2006 01:04:50 -0500 Subject: Flight size, quality judging, et. al. >Jamil is amazed by 15 beer flights... :-) Actually, I aim for 8's myself when flighting, but I've judged 10-12 at a sitting. There's a threshold at which an organizer has to weigh a quality panel for 13 or two lesser panels (and considerably more time) for 7 & 6. I agree that 15 is way too big... Part of our (Dixie Cup's) whole strategy for judging and medals is that once a style reaches a certain density, we break into categories for medals to allow smaller flights AND better focus on the beers. >Jon remarks that we're trying to do two things: Produce a winner and provide feedback... Are these two things mutually exclusive? No. Two round (not one bottle mini-BOS) competitions let you first judge the beer and provide feedback, then select winners from the best of the first round winners using second bottles where you can focus on the beer itself without requiring a scoresheet. I think that beers aren't given their due in events where there have to be mini-BOS evaluations. I've done that enough years in the AHA's now that I know it's a serious flaw. Beers and meads that were spectacular when tasted fresh out of the bottle often fall flat even after being re- capped and kept cool. It is very frustrating to have to fight for a great beer when the opposing panel is tasting something flat and lifeless. "Yeah, I can taste that it probably WAS good..." ARRGH! Regarding "scoring" every comment... I'm just not that style... I score as I go and then add up what I get at the end. I am almost universally "on target" with fellow judges. My philosophy of judging is that 40's and up don't require a lot of feedback... just "tweaks" to aim for perfection. (My personal "high score" is a 47) The 19's and below (and why the HELL won't people go below 19????) need a LOT of problem assessment and advice. The worst thing a homebrew competition could do (in my mind) is give a 19 and have the judges basically offer NO feedback... just remarks like "undrinkable" etc. We work hard to avoid low-comment scoresheets and will re-judge in the second round if we don't feel the first round sheets are appropriate. >Steve recommends sending sheets back for evaluation... A good strategy and no doubt a good way for those Grand Master VIII's to earn a few more points! ;-) (Seriously, who needs that much recognition... it's just beer!) I think internal review at the event (J.C. reviews every sheet to ensure panels are producing good work) and BJCP training at the club level to produce good sheets helps to avoid the necessity of sending samples back to the BJCP for evaluation. However, I think that if it could become a standard, where the BJCP can request copies of entrants scoresheets for a style at random, it could go a long way to improving the overall quality of the returned scoresheets. -BDB2 Bev D. Blackwood II Brewsletter Editor The Foam Rangers http://www.foamrangers.com ---------------------------------------------------------------------- From: Spencer W. Thomas Date: Tue, 23 May 2006 09:15:02 -0400 Subject: Scoring, a statistical look Suppose that beer scores follow a "normal" or bell-curve distribution. (There is a flaw in this assumption: brewers ought not to submit beers for competition that are really bad, and should prefer to submit beers that are really good, so our bell curve will be distorted to the high end. ) Let's further suppose that a "world class" beer is "3 sigmas" out on the bell curve. The normal distribution tells us that 99.73% of the samples will be within 3 sigmas, so "3 sigma" beers appear only once in 400 samples, and at the high end once per 800. That doesn't seem too far out of line with my intuition and experience, once you factor in that brewers tend not to submit really bad beers. I've got over 45 judging points. If each of those points corresponded to judging 20 beers (and it's probably more than that, on average), then I've judged 900 beers in competition. I've given a score over 45 maybe 5 times? Ok, so that's one out of 200 beers, which fits better with the equation "World Class" = 2.5 sigmas. So, for the sake of argument, let's split up the point space linearly as follows: 0pts = -3 sigma 25pts = 0 sigma 50pts = 3 sigma and above If we do this, the score breakpoints identified on the scoresheet correspond to sigma values as shown below: Outstanding (45 - 50): 2.4+ sigma Excellent (38 - 44): 1.6 - 2.4 sigma Very Good (30 - 37): 0.6 - 1.6 sigma Good (21 - 29): -0.6 - 0.6 sigma Fair (14 - 20): -1.3 - -0.6 sigma Problematic (0 - 13): less than -1.3 sigma And the expected distribution of beers, assuming no submission bias (bad assumption, as noted above): Outstanding: 1% Excellent: 5% Very Good: 22% Good: 45% Fair: 17% Problematic: 10% According to this rule of thumb, I should be giving about 1 beer in a hundred a score of 45 or above, and 5 in a hundred a score from 38 - 44. Or, I should be scoring about 1 beer in every 10 flights as "outstanding" and 1 beer in every 2 flights "excellent". I think I do a little worse than that for outstanding, and a little better than that for "excellent." (The rule also claims that 1 beer in each flight would be "problematic". I'm happy to say that this is NOT generally the case, supporting the idea that brewers tend not to submit really bad beers!) I'd be interested to see statistical analysis of the results of some very large competitions (Dixie Cup or NHC, for example) to whether they come close to this spread. =Spencer in Ann Arbor ---------------------------------------------------------------------- From: Joel Plutchak Date: Tue, 23 May 2006 13:30:36 +0000 Subject: The goal of judging Jon Tobey writes: >I ask this question annually, but nobody ever picks it up and answers it, >so >I'll post it again: > >As far as scoring a beer >in competition< I think that our goals are very >unclear. Are we trying to pick the best beer, OR are we trying to pick the >best >beer AND help people make better beer. > >Just once I would like a distinct answer to this question. If we don't know >the >goal in evaluating the beers, no wonder we don't please our customers! Directly from the BJCP web site: ...three primary goals for homebrew competitions: 1. To give the entrants valuable feedback on the quality of their brew as perceived by the judges in order to enhance the quality of homebrewing. ... This seems clear to me, and although that document is fairly new I have seen nothing in my approximately ten years in the BJCP program to suggest it was ever different: the primarily goal of judging in homebrew competitions is to get feedback to the brewer so s/he can brew better beer. -- Joel "Someday I *will* brew better beer" P. ---------------------------------------------------------------------- From: Houseman, David L Date: Tue, 23 May 2006 11:17:51 -0400 Subject: RE: Accounting for points Jon, You say: "I ask this question annually, but nobody ever picks it up and answers it, so I'll post it again: As far as scoring a beer >in competition< I think that our goals are very unclear. Are we trying to pick the best beer, OR are we trying to pick the best beer AND help people make better beer." Quite seriously, the answer is Yes. Both. Dave Houseman ---------------------------------------------------------------------- From: Jon Tobey Date: Tue, 23 May 2006 08:25:22 -0700 (PDT) Subject: Time for judging. I agree that 15 is too many for a flight, but 12 minutes is plenty of time to judge a beer. Jon Tobey Ideastream 425-822-8351 "It's like one of those craziass Australian wooden Frisbees." My Name is Earl ---------------------------------------------------------------------- From: loribrown`at`att.net Date: Tue, 23 May 2006 15:27:45 +0000 Subject: RE: Accounting for all 50 points The other part that factors into this discussion is clipping the 50 point scale. From my experience, many judges have a hard time awarding more than 40 points - for whatever reason. Like Jon Tobey said, 40 points is only 80% (B minus, if you will). 45/50 is 90% (A minus, if we keep on the school grading analogy.) Organizers also clip the other end of the scale recommending no one give out less than say 19 points, or 13 points (again just some arbitrary number, so we don't discourage new or bad brewers with too low of a score). Some day I would like to just cap a bottle of carbonated water and see what my score sheet looks like ("lacks malt flavor and aroma, hops are not detectable, very clear, good carbonation, clean, no off flavors. Check recipe, overall score 19"). I am joking, but you see where I am going... If we arbitrarily clip both ends of the scale, we are really left with about 20 to 30 points to allocate. Is that really the intent of our 50 point scale? I look forward to the ongoing discussion on this topic. Lori ---------------------------------------------------------------------- From: Peed, John Date: Tue, 23 May 2006 10:26:52 -0700 Subject: Judging Bill Pierce, I understand the time constraints, but it doesn't take any significant extra time to embed point deductions in your comments. And yes, you can only say so much in the allotted time, but for most beers, hitting the high points will be helpful. Put another way, anything helpful is helpful - it beats the heck out of terse descriptions and some apparently arbitrary number. Jamil, it appears that Bill is talking about 3 flights of 5 beers each. In effect, though, it's still 15 beers in 3 hours. Jon Tobey, clearly we are trying to pick the best beer AND help people make better beer. I don't speak in any official capacity, but it's clear to me that the goal is to do both. I think judging is too subjective now, and I believe that a lot of the subjectivity can be described objectively. I believe that accounting for your point deductions makes you do that. "Why am I giving this beer a 7 for aroma? Well, because I think it's OK - not great, but not bad. Why am I deducting those 5 points? Well, the malt aroma is a bit weak (-1). The hop aroma is really pretty good, but somewhat less than sensational ..