Scoring Methodology

How the Rubric Works

Each platform is scored on six criteria. Each criterion carries a defined percentage weight. Within each criterion, evaluators assess a set of sub-criteria and assign a raw score from 0 to 100. The criterion score is the average of its sub-criteria scores. The overall platform score is the weighted sum of all six criterion scores, yielding a final result on a 0-100 scale.

No score is permanent. Platforms are re-evaluated on a defined cadence, and material changes - to pricing, privacy policies, clinical claims, or user feedback patterns - may trigger an off-cycle review. The date of the most recent evaluation is always displayed alongside a platform's score.

The rubric is not adjustable for platform type or market segment. A consumer-facing IQ quiz and a professional clinical screening tool are held to the same criteria, because users of both deserve the same baseline protections. A consumer product that lacks peer-reviewed psychometric data earns a low validity score; that is the intended outcome.

Criteria Summary

Detailed descriptions and sub-criteria for each dimension are provided in the sections below.

#	Criterion	Weight	Core Question
1	Test Validity & Scientific Rigor	25%	Is the test grounded in published psychometric science?
2	Scoring Transparency	20%	Does the platform explain how scores are derived and what they mean?
3	Pricing Clarity	15%	Are all costs disclosed before a user commits to pay?
4	Billing & Cancellation Fairness	15%	Can users cancel easily and access fair refund terms?
5	Data Privacy & Security	15%	What data is collected, how is it used, and with whom is it shared?
6	User-Reported Experience	10%	What do aggregated user reviews and complaint records indicate?

1. Test Validity & Scientific Rigor

25%

This is the highest-weighted criterion because no amount of billing transparency or good user reviews compensates for a test that does not measure what it claims to measure. ICAS examines the psychometric foundations on which a platform's assessments are built, the quality and recency of norming data, the reliability of the instrument across administrations, and whether the methodology has been subjected to independent peer review.

Platforms that license or adapt well-established, published instruments begin this evaluation with a stronger foundation. Platforms that use proprietary question banks with no disclosed validation data are scored accordingly. Marketing language that uses scientific-sounding terminology without citing underlying research is treated as an absence of evidence, not as evidence.

Sub-criteria evaluated:

Psychometric foundation: Does the platform identify the theoretical construct being measured (e.g., general cognitive ability, working memory, verbal reasoning)? Is there a published or disclosed measurement framework?
Norming data: Has the test been normed on a representative sample? Is the norming population described - size, age range, geographic distribution, year of collection? Is the norm group current (typically within the past 15-20 years for cognitive instruments)?
Reliability coefficients: Are internal consistency (e.g., Cronbach's alpha), test-retest, or split-half reliability data reported or available? Are the values within an acceptable range for the type of instrument?
Validity evidence: Is there evidence of construct validity, criterion validity, or convergent validity with established instruments? Are these claims supported by citations or disclosed study data?
Peer review: Has the instrument or its underlying methodology been published in or reviewed by a recognized scientific journal, or subjected to structured external review?

Scoring guidance: A score of 80-100 requires documented norming data, reported reliability coefficients above accepted thresholds, and at least one peer-reviewed or externally validated source. A score of 40-60 reflects a platform that references scientific concepts but does not disclose supporting data. A score below 40 reflects proprietary instruments with no disclosed validation, or instruments that make clinical-grade claims without any scientific substantiation.

2. Scoring Transparency

20%

A score means something only if the recipient understands what it represents, how it was derived, and what its limitations are. This criterion evaluates whether a platform explains its scoring in plain terms, what scale or standard score format is used, and whether result reports contextualise scores accurately - without overstating precision or diagnostic certainty.

Sub-criteria evaluated:

Score scale disclosure: Is the scoring scale clearly identified (e.g., IQ standard score with mean 100 and SD 15, percentile, stanine)? Does the platform explain what the scale means in plain language?
Methodology disclosure: Does the platform explain how individual item responses are converted to a score? Is it stated whether adaptive scoring, weighted item scoring, or normative comparison is used?
Confidence and limitations: Does the result report acknowledge margin of error, confidence intervals, or the limitations of the assessment format (e.g., unsupervised online administration)? Are appropriate caveats against clinical interpretation included?
Sub-score transparency: If multiple cognitive dimensions are scored, is it clear what each sub-score measures and how they contribute to any composite score?
Score report legibility: Is the result report provided to users in a format they can understand without specialist knowledge? Does it avoid misleading presentation such as false precision or visual anchoring that exaggerates differences between score bands?

Scoring guidance: A score of 80-100 requires an explicit scale description, a plain-language explanation, and a meaningful caveat about the limits of online assessment. A score of 40-60 reflects platforms that report a number but do not explain what scale it is on or how it was derived. A score below 40 reflects platforms that present scores in ways that imply clinical diagnostic certainty without any stated methodology or limitation.

3. Pricing Clarity

15%

Opaque pricing is a common source of consumer frustration with online assessment platforms. This criterion evaluates whether the full cost of accessing results - including any required subscription, one-time fee, or upsell - is visible before a user begins the assessment or enters payment information.

Sub-criteria evaluated:

Pre-test cost disclosure: Is the price (or the fact that a payment is required) shown to users before they begin the assessment - not only after they have completed it and are committed to seeing their results?
Pricing page completeness: Does a dedicated pricing or payment page exist? Does it list all available plans, one-time fees, and what is included in each tier?
Absence of hidden fees: Are there additional charges for detailed reports, PDF downloads, or supplemental content that are not disclosed on the primary pricing page?
Subscription vs. one-time clarity: Is it clearly stated whether a charge is a one-time payment or the first billing period of a recurring subscription? Is the billing interval explicit?
Currency and tax display: Is pricing shown in the user's local currency where technically feasible? Are taxes or regional fees disclosed prior to checkout?

Scoring guidance: A score of 80-100 requires pricing to be fully disclosed before the test begins, with a complete pricing page and no post-completion payment surprises. A score of 40-60 reflects platforms where pricing is available but requires navigation away from the assessment flow to find. A score below 40 reflects platforms that withhold pricing until after test completion, or that present a low initial price while omitting required add-on charges.

4. Billing & Cancellation Fairness

15%

Subscription billing and cancellation practices are among the most frequently cited complaints across the online assessment sector. This criterion examines whether a platform's billing practices are straightforward and whether users can cancel recurring charges without unreasonable friction.

Sub-criteria evaluated:

Cancellation path: Can a subscription be cancelled through a self-service account interface without requiring a phone call, email to customer support, or multi-step verification? How many steps does cancellation require?
Refund policy: Is a refund policy published and accessible? Does it specify the conditions under which refunds are granted, the timeframe for processing, and any non-refundable components?
Dark patterns: Does the platform use known dark patterns that impede cancellation - such as hiding the cancellation option, requiring users to speak to a retention agent before cancelling, or presenting cancellation as unavailable?
Renewal notifications: Do subscribers receive advance notice of subscription renewals? Is the renewal date and amount communicated clearly before the charge is processed?
Charge dispute responsiveness: Based on publicly available complaint records, how consistently does the platform resolve billing disputes in a timely and fair manner?

Scoring guidance: A score of 80-100 requires self-service cancellation in three steps or fewer, a published refund policy, and no observed dark patterns in the cancellation flow. A score of 40-60 reflects platforms with a functional but friction-heavy cancellation path. A score below 40 reflects platforms with no self-service cancellation, no published refund policy, or a documented pattern of charge disputes reported by users.

5. Data Privacy & Security

15%

Cognitive assessment platforms collect sensitive personal data - including responses to psychological questions, demographic information, and payment details. This criterion evaluates whether platforms are transparent about what they collect, how long they retain it, and under what conditions they share it with third parties.

Sub-criteria evaluated:

Data collection disclosure: Does the privacy policy clearly enumerate the categories of personal data collected? Does it distinguish between data required to deliver the service and data collected for other purposes such as analytics or marketing?
Retention policy: Is a data retention period specified? Does the platform commit to deleting or anonymizing assessment data after a defined period, or is retention indefinite?
Third-party data sharing: Does the privacy policy identify the categories of third parties with whom data is shared? Does it distinguish between service providers (processors) and parties who receive data for their own purposes (controllers)?
GDPR and regional compliance: Is there evidence of compliance with applicable data protection regulations? For platforms accessible to European users, does the platform provide lawful basis disclosures and data subject rights mechanisms (access, erasure, portability)?
Security disclosures: Does the platform disclose the general security measures applied to stored assessment data? Is there a security contact or responsible disclosure policy?

Scoring guidance: A score of 80-100 requires a clearly written privacy policy with specific retention periods, identified third-party categories, and accessible data subject rights mechanisms. A score of 40-60 reflects a privacy policy that exists and addresses most areas but uses vague language on retention or third-party sharing. A score below 40 reflects an absent or clearly inadequate privacy policy, or documented evidence of undisclosed data sharing practices.

6. User-Reported Experience

10%

Structural evaluation captures what platforms disclose and how their systems are designed. User-reported experience captures what actually happens in practice - including whether customer service is responsive, whether technical issues are common and resolved, and whether users feel the product delivered what was promised. This criterion draws on aggregated, publicly available review and complaint data.

This criterion is weighted lowest (10%) because user sentiment data is inherently noisy, reviews can be manipulated in either direction, and dissatisfied users are systematically more likely to leave reviews than satisfied ones. The ICAS methodology applies adjustment factors to account for review volume and platform age when comparing platforms. However, persistent patterns in complaint data - particularly around billing and result interpretation - are treated as meaningful signals even at this reduced weight.

Sub-criteria evaluated:

Aggregate review sentiment: What is the platform's average rating across major, publicly accessible review aggregators? Is the volume of reviews sufficient to draw meaningful inference?
Complaint pattern analysis: Are there recurring themes in negative reviews or formal complaints? Complaint patterns related to billing, data misuse, or misleading results are given greater weight than complaints about test difficulty or result interpretation.
Customer service responsiveness: Based on public response records, does the platform respond to user complaints in a timely and substantive manner? Are responses formulaic or do they address the specific issue raised?
Result satisfaction: Do users report that the test experience and results were consistent with what was described on the platform? Are there systematic reports of results being withheld pending additional payment or upsell?

Scoring guidance: A score of 80-100 reflects consistently positive aggregated sentiment with few recurring complaint themes and evidence of responsive customer service. A score of 40-60 reflects mixed sentiment with some recurring complaints that do not constitute a systematic pattern. A score below 40 reflects consistent negative patterns, particularly around billing disputes or results that do not match platform claims, with limited evidence of customer service resolution.

Overall Score Calculation

The overall platform score is a weighted average of the six criterion scores:

Overall Score = (C1 × 0.25) + (C2 × 0.20) + (C3 × 0.15) + (C4 × 0.15) + (C5 × 0.15) + (C6 × 0.10)

Each criterion score (C1 through C6) is itself the average of its sub-criteria scores, each rated 0-100. The resulting overall score falls on a 0-100 scale and is rounded to the nearest whole number for display purposes.

ICAS uses the following interpretive bands for published scores:

75-100 - Meets Standards: The platform performs well across the rubric. Weaknesses, if any, are minor and non-systematic. Eligible for the ICAS Reviewed badge.
50-74 - Mixed: The platform meets some criteria adequately but has notable gaps in one or more areas. The review identifies specific areas for improvement.
0-49 - Does Not Meet Standards: The platform has significant deficiencies in one or more high-weight criteria. The review documents the specific failures.

Score bands are interpretive aids. The full criterion-level scores are always published alongside the overall score so that readers can assess which dimensions drove a result.

Review Cadence

Standard evaluations are conducted on a rolling schedule. Each reviewed platform is re-evaluated at least once every 18 months. The date of the most recent evaluation is displayed prominently on every platform review page.

Off-cycle re-evaluations may be triggered by any of the following:

A material change to the platform's pricing structure, billing terms, or cancellation policy
A published update to the platform's privacy policy that alters data collection, retention, or sharing practices
A significant shift in user complaint volume or complaint category patterns
A published change to the underlying assessment instrument or scoring methodology
A formal request for re-evaluation submitted by the platform operator and accepted by ICAS

When a re-evaluation is in progress, the existing published score remains visible and is labelled as pending update. Scores are not removed from publication during re-evaluation unless the platform ceases operation.

Platforms may request re-evaluation at any time via the Contact page. Requests are acknowledged but do not guarantee expedited scheduling. Re-evaluation follows the same rubric and process as the initial evaluation.