Saturday, February 19, 2011

The Quick & Dirty Index (QDI) of NYT Difficulty- A Work in Progress

Computing the Quick & Dirty Index (QDI)

Here’s how to get a semi objective estimate of the difficulty of the New York Times Crossword Puzzle based on the first 50 on line solvers (So, this can be done within 10- 90 minutes of the puzzle posting):

On the Applet, Select: “First” (Not Fastest)

At the 5 min mark, note Number of People who have completed the puzzle= N1

At the 10 min mark, note Number of People who have completed the puzzle= N2

At the 15 min mark, note Number of People who have completed the puzzle= N3

Note the Time it takes 50 people to complete the puzzle: T

QDI= T/ N1+ N2+N3

Example: A Tuesday Puzzle

At 5 min: 12 people have done it- N1= 12

At 10 min: 28 people have done it- N2=28*

At 15 min: 40 people have done it- N3=40*

It took 22 minutes for 50 people to complete the puzzle, so T= 22

QDI: 22/12+28+40 = 22/80

QDI= 0.275

*NB: N2 and N3 are the actual numbers of completers as you read them from the list at each point, NOT the incremental number since the last reading!

Predicting Difficulty with the Quick & Dirty Index (QDI)

E= Easy EM= Easy-Medium M=Medium MC= Medium- Challenging C= Challenging

Monday** E: <0.15 EM:0.15-0.2 M:0.2-0.3 MC: 0.3-0.4 C: >0.4

Tuesday: E: <0.10 EM:0.1-0.14 M:0.15-0.2 MC: 0.2-0.3 C: >0.3

Wednesday E: <0.20 EM:0.2-0.25 M:0.25-0.3 MC: 0.3-0.39 C: >0.4

Thursday E: <0.40 EM:0.4-0.6 M:0.6-0.9 MC: 0.9-1.4 C: >1.4

Friday E: <1.0 EM:1.0-2.5 M:2.5-3.5 MC: 3.5-7.0 C: >7.0

Saturday E: <4.0 EM:4.0-5.0 M:5.0 -7.0 MC: 7.0-11.0 C: >11.0

** Since Monday posts at 6pm on Sundays, the scores are not comparable to the rest of the week (it looks like a Wednesday). Fewer people get on line to solve it at dinnertime (or earlier) relative to later in the evening. Also, in truth, the best index for Monday is how many people finish in the first 5 minutes. A scale for this is in progress...

Validating the Quick & Dirty Index (QDI)

QDI is being validating with the information posted by sanfranman based on computing the median time for the entire population of solvers and the 100 fastest solvers on any given day. Agreement rate is 80-90%, depending on how you squint at it.

Big CAVEAT re the Quick & Dirty Index (QDI)

If something major is going on in the world during that first 15 min-1 hour after posting, then the QDI is not valid (Earthquake, World Series, Oscars, etc.)

Similarly, the time of day that the puzzle posts is relevant. Since Mondays post at 6pm Eastern on a Sunday evening, fewer people solve during the first few minutes than on Tuesday at 10pm. So, comparisons to assess difficulty are run within a given day of the week, not between days.

Why have a Quick & Dirty Index (QDI)?

My original trigger was a comment someone made to sanfranman on Rex’s blog, saying that his data are not valid because anyone can cheat and do the puzzle on line after having solved it off line, artificially skewing the data. I responded by saying that while this is true (and some numbers that get posted seem unbelievable) the data appear generally consistent with subjective experience, day of the week, and Rex's ratings, suggesting that the percentage of people who cheat is minimal.

I also said that a good way to gauge difficulty while minimizing the impact of cheating is to look very early on after the puzzle has posted. If someone had completed the puzzle at 2.5 min. after it posted the odds of that person having cheated are essentially non-existent- That time would be higher on Saturdays, but again, if someone posted at 8 min. it would be unlikely that they had cheated.

That led me to my efforts to monitor early postings. While looking at the number of completers at 5 minutes after 10 pm is informative on a Tuesday, it is useless on a Saturday, since hardly anyone ever finishes at 5 minutes. Hence the sampling at 5,10,15 minutes, and then having a constant index of how long it takes the first 50 to complete. The latter number is quite informative but it misses some of the dynamics of whether a lot of that got accomplished rapidly or slowly. The QDI is a poor man’s way of getting a sense of rate.

Mostly, it's a geeky way to keep amused until real data are in :)

r.alphbunker said...

I have been working on a program to analyze the solution of a puzzle after the puzzle is completed. Solving the puzzle is exactly like solving the puzzle using Across Lite, it is completely non-invasive. After you are done you can request a report of what happened. The program has matured to a point where my friend Dave and I meet in a coffee shop, solve the puzzle and look at the analysis of the puzzle. A link to the generated analysis for today's puzzle is below. The left column is me, the right column is Dave.

http://www.mumde.net/CampusTest/xword/Feb1911.htm

Lower case letters indicate that the letter is a cross, i.e., one that was entered for the clue in the opposite direction.

The program also allows you to play back all the completed answers that you entered one at a time and produces a color-coded analysis where the dark areas indicate parts of the puzzle that caused the most difficulty.

It is great fun playing back our solutions to each other and hearing each other comments. The conversations thus generated can go anywhere!

I started out wanting to find one number to describe the puzzle (I called it the Brain Effort Quantification or BEQ for short). But I am now more interested in visualization of the kind that Edward Tufte talks about.

BTW, I greatly enjoy your posts on the Rex Parker blog. I find it useful to temper my posts by thinking what would Foodie say!

foodie said...

@r.alphbunker- I think it would be a lot of fun to have/use such a program, especially to track progress over time. Very cool to try to analyze one's thought process!

Looking back over the last couple of years, I know that I have learned to take more chances, accept the possibility of more mistakes, but also make better guesses. It would have been fun to be able to look back and actually contrast an old and a new solution using your program. I also bet that it would be a good teaching tool, and possibly a research tool (I know, I've got it bad, this research bug :).

I'm looking forward to the time when you're ready to share it (somehow...). Is there intellectual property for program like AcrossLite or the type of program you're creating?

And thank you for the kind words! I greatly enjoy your posts as well.

r.alphbunker said...

@foodie

I got the Across Lite format from http://code.google.com/p/puz/wiki/FileFormat

I am only reading the .puz file, I do not modify it. Instead I create another file that contains the history of all keystrokes and that is the file that I use for the analysis of the puzzle.

I would be delighted if you were to use the program and make suggestions. Please email rbunker [at] lisco [dot] com.

Anonymous said...

Read your QDI writeup with much interest. I will now use it as a standard to compare against my own solving experience. Just a note that there's an apparent typo just below the example; NB for N1.