HomeBlogAplenty Icon PublicationResearchers validate intelligence assessment across diverse demographic groups

Researchers validate intelligence assessment across diverse demographic groups

by mcsvtln@gmail.com

Dec 29, 2025

Aplenty Icon Publication

A new analysis confirms that a widely used non-verbal cognitive assessment measures intelligence consistently across different demographic groups, including Syrian refugees and Turkish students. The researchers found that the test provides fair and comparable scores regardless of a child’s gender, age, grade level, or ethnicity. These findings were published in the journal Intelligence.

Psychological assessments are often assumed to function the same way for everyone. However, this assumption requires statistical proof to ensure fairness. Validity refers to the degree to which evidence supports the interpretation of test scores for their intended use.

A test intended to measure cognitive ability should ideally reflect that ability alone. It should not be influenced by a student’s cultural background, native language, or gender. When a test measures a construct in the same way across different groups, it demonstrates what psychologists call measurement invariance.

Without this invariance, comparing scores between groups becomes problematic. A specific score for one group might indicate a different level of ability than the same score for another group. This potential bias can lead to incorrect decisions about an individual’s educational needs. This is particularly concerning when tests are used to select students for gifted education programs.

The Bildiren Non-Verbal Cognitive Ability Test, known as the BNV, was developed to address the need for fair assessment in Türkiye. The test relies on geometric shapes, pattern completion, and abstract reasoning tasks. It does not require students to read, write, or speak to answer questions. This design aims to minimize the impact of language barriers and educational history. In recent years, the Turkish Ministry of National Education has used the BNV to screen hundreds of thousands of primary school students for gifted programs.

Ahmet Bildiren, a researcher at Aydin Adnan Menderes University in Türkiye, led the investigation into the BNV’s fairness. Along with co-author Derya Akbaş, he sought to verify that the test functioned equivalently for the diverse student population in the region. The researchers were particularly interested in whether the test was valid for Syrian students. Millions of Syrians have sought refuge in Türkiye due to the war in Syria. Ensuring that these students are assessed fairly alongside their Turkish peers is a matter of educational equity.

The researchers analyzed two distinct datasets to conduct their evaluation. The first sample included 7,745 Turkish students aged 4 to 13. This large group allowed the team to examine how the test functioned across gender, grade level, and age. The participants hailed from 11 different cities representing various regions of the country.

The second sample focused specifically on ethnicity. This group consisted of 1,719 students residing in Ankara, the capital city. It included both Turkish students and Syrian refugee students. The researchers kept these datasets separate to ensure accurate statistical comparisons. They avoided combining them because the number of Turkish students vastly outnumbered the Syrian students in the broader population.

To analyze the data, Bildiren and Akbaş employed a statistical method known as multigroup confirmatory factor analysis. This technique allows researchers to test whether a theoretical model fits real-world data across different groups. They began by testing a “one-factor model.” This model assumes that all questions on the test contribute to measuring a single underlying trait, which in this case is fluid intelligence. Fluid intelligence involves the ability to solve novel problems and identify patterns independent of prior knowledge.

The initial analysis confirmed that the BNV is indeed unidimensional. The data supported the idea that the test items measure a single cognitive construct for both sample groups. Once the structure of the test was established, the researchers moved to the core of their study. They tested for measurement invariance in stages.

The first stage involved testing for “configural invariance.” This checks if the basic organization of the test is the same for all groups. It asks if the items cluster together in the same way for boys and girls, or for Turkish and Syrian students. The models fit the data well in this stage. This indicated that the test structure was consistent across all demographic categories.

The researchers then proceeded to a more rigorous test known as “scalar invariance.” This is a strict standard for fairness. Scalar invariance requires that the mathematical relationship between the test items and the underlying ability remains constant across groups. Specifically, it demands that the starting points and the units of measurement are equivalent. If scalar invariance holds, a researcher can be confident that differences in test scores reflect actual differences in ability, not bias in the test itself.

The study found that the BNV achieved scalar invariance across all comparison groups. The analysis showed that the test functioned equivalently for boys and girls. It also worked consistently for students in primary school compared to those in elementary school. When the researchers examined age groups, ranging from 4-year-olds to 13-year-olds, the strict invariance model again fit the data.

Most importantly for the context of diverse classrooms, the test demonstrated scalar invariance between Turkish and Syrian students. The factor loadings, which indicate how well an item correlates with intelligence, were equivalent for both ethnic groups. The thresholds, which indicate how difficult an item is, were also equivalent. These results suggest that the BNV allows for valid comparisons of latent means across these ethnic subgroups.

To supplement the primary analysis, the authors conducted a check for differential item functioning. This is a finer-grained analysis that looks at individual questions rather than the test as a whole. It seeks to identify specific items that might be unexpectedly easier or harder for one group compared to another, even when the students have the same overall ability.

The differential item functioning analysis revealed that the vast majority of the 47 test items showed negligible bias. For gender comparisons, no items showed concerning levels of difference. In the ethnicity comparison, a few items displayed moderate to large differences. However, these isolated instances were not pervasive enough to disrupt the overall scalar invariance of the test. Similarly, some items functioned differently for the youngest children compared to older children. This is relatively common in developmental assessments, as cognitive strategies change rapidly in early childhood.

The authors noted that while the statistical evidence for fairness is strong, context remains a vital consideration. Statistical invariance does not erase the lived reality of the test-takers. Syrian students may face unique challenges such as trauma, stress, or socioeconomic disadvantages that could influence their performance. These external factors exist independently of the test’s psychometric properties.

There are limitations to the study that warrant mention. The sample used to compare ethnic groups was drawn entirely from Ankara. This may restrict how well the findings apply to Syrian students living in other parts of Türkiye. Additionally, the number of Syrian students in the sample was smaller than the number of Turkish students. Future research would benefit from larger, more balanced samples collected from a wider geographic area.

The study was also cross-sectional, meaning it looked at students at a single point in time. It did not track how individual students’ scores might change as they grow older. Longitudinal research would be necessary to understand how the measurement properties of the test might evolve over time for the same individuals. Future studies could also investigate the combined effects of these variables, such as examining specific outcomes for Syrian girls versus Turkish boys.

Despite these caveats, the results provide solid evidence supporting the validity of the BNV. The establishment of measurement invariance suggests that the test is a fair tool for diagnostic and selection purposes in heterogeneous populations. It supports the Ministry’s continued use of the assessment for identifying gifted potential among diverse student bodies. By confirming that the test measures fluid intelligence consistently, the study helps ensure that educational opportunities are distributed based on ability rather than demographic bias.

The study, “Measurement invariance of Bildiren non-verbal cognitive ability (BNV) test across gender, grade level, age, and ethnicity,” was authored by Ahmet Bildiren and Derya Akbaş.