lichess.org
Donate

Exact Ratings for Everyone on Lichess

The manual says similar to Bayes ELO. I do not know what Bayes ELO is. But even if I knew, would that be enough to figure out if there is an assumption of a population distribution as constraint to code machinery documented at very low level in that manual.

Is there a paper for the statistical model directly not its implementation.. Or has it been born out of an already encoded for computer formulation.. Is there a manual for non-coders so we can analyse and interpret mathematically without being of that tribe.

Nest I would go and search Bayes ELO, and maybe find out the same question about it. But then can I rely on the "similar".
What is similar in the eyes of the manual author. Bayesian quoting might actually be a hint that it would be using updating information to keep building the population density, but the constraint on the population density or distrubtion could still be a low dimension (few parameter) information spreading to individual movel, meant to dismiss all the information that is not of that assumed population level few parameter model.

What do we need to keep hunting for the big picture all the time. from reference to refence. out with it!
@justaz said in #90:
> Ordo is written in C by Miguel A. Ballicora. My new version I am working on is in Python, I sped it up by using Cython, and I'll be rewriting it in Zig. There is someone else writing it in C++ too.
Python is my favourite as well!
@dboing said in #92:
> The manual in GitHub has a mathematical section after the first few pages (I had to select "More Pages" after the first 5 or so with my pdf reader) which was very interesting to read.
@dboing said in #92:

Do you realise that all these people are hobbyist working for free? They do not owe you, or academia in general, anything.

And you keep asking questions that you can answer yourself by using... Google ? For example if you cared to Google BayesElo, you would have seen Remi Coulom's web page on BayesElo, with all the pedantic mathematical reference required.
<Comment deleted by user>
@justaz I've gone over your code on github. It looks correct. I think the error terms you calculate should be equal to the partial derivative of the log-likelihood* w.r.t. to the log-rating (takes some algebra to see), so what you've implemented is effectively gradient descent of the log likelihood (though I'm not sure what total_inv is doing).
Because this log-likelihood is convex there are no local minima, so I don't think you need large jumps.
Also calculating ratings_inv is unnecessary because:
1/(1 + rating_opp / rating_you) = rating_you / (rating_you + rating_opp)
which requires the same number of multiplications and divisions as your current version.

* There are some issues around draws, but you can make everything work. The issue is that we only have an expected score, not the probabilties of a win/draw/loss. The way I solved it is to assign to an expected score s the probabilities
P(W) = s2, P(D) = 2*s*(1-s), P(L) = (1-s)
They sum to 1 and have the correct expected value for the score (s). If you the calculate the log-likelihood of the result w.r.t. to these probabilities and take the gradient you get 2 * the error you calculate.
Example of non code dependent compact way of presenting the math, so that we have all interpretable parameters in the face.

en.wikipedia.org/wiki/Glicko_rating_system#Step_2:_Determine_new_rating

That is for Glicko 1. But, to really understand the logic, as users of such tools, it might not be enough, but necessary.
And if you read the talk of the page, one will notice that those trying to implement might need other type of details.

yet, that there is the more communicable across specialities I think. At least that is a compact thing that could be discussed if we could easily show it here.. and dissect as equals, not having to analyse code by code fluent experts and relying on their expert vetting.. Math. exist for a reason. It is to cross that qualia divide that expertise relying is preventing critical thinking from a larger pool of audience.. and chess users are likely to be well equipped with own logic..... why not respect that or tap on that?

It is likely possible it is only my limitations in visualising sequential procedural math. which has been minimized in this Wikipedia page. I think that there we have all the interpretable quantities in symbolic form, with minimal magic numbers to track procedurally from when they were introduced.. That at least readable equations there all near each other, can let us discuss which notion increases or decreases the rating. They may be hard to read at glance, but they are there to consult not spread over many lines and pages.. requiring a code reading expert, who can keep an equivalent model in memory in spite of having lost visual sight of the information.. My memory needs that unloaded stuff to be visible so the hard thinking can be on emerging interpretation questions, not running after the meaning of each new procedure lines... seeing the whole. and using brain for what it might mean to my world...

Only missing thing might be the initialization assumptions.. Is there something equivalent to this page for Bayes-ELO, and would it also apply to ORDO.

Sorry to be such a odd thing in the soup... I might be looking for something impossible.. too much work for me to figure out, and too much work for the constructors of these things.. We are all volunteers.. I volunteer my thinking, and ideas of what might help other too.. Like perhaps that poster who mentioned being of mathematical physics training, feeling something missing in the presentation. I have tried in the past to spend lots of time hunting for such code independent understanding out of github and other docs, and really, open source code has its own flaws with respect to communicating to user base and allowing critical thinking outside its common training as coder or developpers. I might have given up too soon also.

I could dwell and spend energy on the last figure of the blog, for example. I tried.. left a link to lichess own population distribution. maybe future work by blogger will address the question of population distribution.. Showing them should come with the assumptions behind.
Even the Wikipedia article is missing on the overall information flow.. The fact that the only assumption of distribution is on individual initial conditions in the pools, and that the population distribution emerging does not need to be parametrized model itself, but mathematically exists as whatever is show when computed.. That is for me a basic story that is important to understand when using such tool and showing a population distribution as being less desirable than the smoother one..

It might be key to framing the blog in perspective of what we want from rating systems. Am I the only one interested in that kind of thing. The reasoning behind claims. and making sure we know the givens and the output limits of interpretation as consequence?

The smoothness of that last figure of the blog, is what makes me lazy hunting for the Bayes ELO­. If the bayes there negates what glicko shows.. and that is inherited by ORDO, I think we should all wonder where the smoothness comes from.... As equal rational being in possession of all the givens.. within discussion.
@ant_artic said in #97:
>

Would you be able to answer the question of what are the assumptions about population distribution over which such gradient descent based optimisation is occurring? What is being optimized, the log-likely hood of what?

I understand forum ascii is biased toward coding language mathematics, but we can paste equations as images here.

You seem to be talking shop here. But would it not help the target audience (what is that?) to get the basic givens.
We can be pedagogical here, won't hurt anyone. What probabilities.. what are the events.. etc... a recap..

I certainly would trust the coding to be correct. My question(s) are from the point of view of why would we want to use such a rating system over that of lichess? The coding abilities are not in question. Nor the implementation algorithm to minimize some object of statistics if we don't even know what the statistical objects are about. I might be ignorant of BayesELO premises, but would it be not appropriate to mention those here in light of glicko set up, given the claim that we would prefer ORDO (or BayesELO). Can someone help us. connect Glicko setups to Ordo (or BayesELO?).. Loglikelyhood is just an angle. It might imply a direction of givens to obtained, but why not spell that out for the benefit of all. Why restrict the discussion? It appears in your post that we should know the probabilities of "what", is maybe my point.