Scorings in the Mojeek Custom API Plan
Scores, and some signals, used in ranking for Mojeek are provided for those customers on the Custom Plan of the API. These are explained below in detail, along with the existing ‘score’ provided and the metric ‘nph’. These scores are provided in the feed for api.mojeek.com, when using the parameter 'fscr=1'.
Please note that all scores are subject to change, and without notice. They are closely related to Mojeek algorithms which are in a state of constant investigation, testing and improvement.
Responses
- score : Overall page match score
This score can be used to adjust rankings and/or combine Mojeek results with other data sources.
For example, if you did two Mojeek API calls, pulling two sets of results you can then combine them into one return by sorting the two sets of results using ‘score’.
You might also choose to use this score to remove a low quality results. For instance a cutoff score of around 0.10 might be used; this can cause some borderline-relevant pages to be dismissed, but should remove those that are definitely not. However this ‘score’ is not consistent and is affected by various things including language and region boosts. So we do not recommend that you use this for removing low quality results, and instead use ‘onscr’, as explained next. It’s probable that a combination of ‘score’ and ‘onscr’ will be most useful, as explained later.
- onscr : Page content keyword relevance score
This score represents the keyword match between the search query and content on the webpage.
This score is always between 0 and 1. This score is not subject to any boosts (eg language & region), so will be more consistent. It may change, as and when algorithm adjustments are made.
We would suggest <0.15 as a possible threshold for deciding whether to remove results or not. If this score is used on its own, anything higher may discard relevant pages. It will also get triggered in cases when the query is not well formulated to suit Mojeek, or perhaps in the case of user misspellings.
- sescr : Semantic content query matching score
This score represents the semantic match between the search query and content on the webpage.
This score between is always between -1 and 1. The score represents the semantic match between the search query and the page content; or rather the page content that has been embedded. Not all pages are embedded (currently only pages detected as English language are embedded), in which case no ‘sescr’ is returned. The score represents something close to a cosine similarity score.
We would suggest <0.5 as the default for a ‘sescr’ threshold. Using a value above 0.5 to remove results is very likely to discard relevant pages.
Combining keyword and semantic scoring
We recommend that when semantic scoring is available both semantic thresholds and keyword thresholds should be used. Sometime relevant results may have a low keyword score and a high semantic score, or vice versa, so this way such results are not removed. Something like this:
if (sescr = defined) then if (onscr < 0.15 & sescr < 0.5) remove result else if (sescr = undefined) then if (onscr <0.15) remove result endif
- g : Gravity
This is a query-independent page authority score, which always ranges between 0 and 100. Gravity as it's known at Mojeek, is our page authority score and can be thought of as similar to PageRank. The order of 10^11 links are been used in its calculation. As such it's a very significant and valuable metric that can be used for a wide variety of purposes.
- nph : Number of phrases
This is the count for the number of phrases in the query (head value) and the number of those phrases found on each page (results value).
For example, if there are two separate phrases in the query, there will be a nph head value of 2, and each result will have a nph value of 2, 1, or 0 (zero). 2's will always be above 1’s, which will always be above 0’s. This enables the option to show a "no more exact matches, these pages contained some of the phrases ..." message to the searcher.
Further reading: XML Response Format, JSON Response Format