This article on testing was reproduced from the educational archives of Yuri’s Sample Rate Converter website with his kind permissions. All copyright belongs to said website and any reproduction of this information below can only be done with the express permission of Yuri Korzunov.
“Audiophile myths debunked” is one of the popular topics in various forums and articles. As a rule, these “myths” are based on listening impressions rather than testing. They are unexplained technically or have small measurable differences which may be theoretically / probably.
To accurately check these “myths” it is recommended to use an objective method that ensures reliable repeatability. A result that delivers the same values in similar conditions anytime.
An audio blind test is one way of “objectively” measuring the “subjective” perception. We shall discuss then how to turn this type of test into a safe and reliable set of evidence.
A Hi-Fi blind test is a technique when the listener (participant) tries to recognize an unknown sample via sound only.
A sample here is a recording, or equipment, or software, or an apparatus/software mode. This trial is an attempt to measure a non-measurable music feature we know as perceived sound quality.
The objective aspect here is the methodology by which we can measure and repeat the result in equal conditions.
The subjective aspect here is detected by human feelings/hearing and this results in a decision made by the listener based on what they perceive.
As a tester, we cannot directly access to another person’s feelings. We cannot listen to music in the same way as another person. We can ask him/her only but they can’t exactly describe their feeling. Especially in the subtlest details, provided by modern musical equipment, software, or HiFi test records.
But we want to know: it is really audible or not?
This is the main aim of the blind trial. We try to eliminate the subjective part of perception (price, bias, habit, etc., i.e. probable imaginary perception). For even more “subjectivity” reducing, a double-blind test is used.
A Double-blind test is a trial where neither listener nor conductor knows what the sample actually is.
Blind test audio
The ABX test may be used in this scenario. An ABX test is a trial where samples A and B may be compared with sample X (either A or B). The listener should recognize what is X (A or B).
ABX test audio
ABX audio test software
Foobar2000 with ABX test software plugin may be used to ABX test of audio files. However, the author doesn’t know what happens with the audio files when ABX-test-foobar’s plugin prepare files before comparison (progress when the plugin start).
For an ABX test on Mac OS software, you can try to find the ABXer utility. For iOS, there does exist an ABX Tester application. Also available is a cross-platform Lacinato ABX. These programs remain untested with me so I cannot comment further on their features or capability.
If somebody claims that the sound is different, we really cannot measure its audibility. This is because we listen to the audio in the brain via our ears.
Theoretically, measurement tools are more sensitive than human ears. However, without trials, we do not know where is the ‘audibility edge’ of difference.
An Audibility edge is the maximal value of a feature or the difference value between listening samples that cannot be distinguished by a human.
Professional double-blind testing of music equipment is not home entertainment. It is hard long expensive work. It doesn’t necessarily mean that audiophile blind testing should be done at the laboratory. We just need to ensure safe and reliable trial results are delivered in a conducive environment for the testing procedures.
Proper blind Hi-Fi Test – The 7 Keystones
These features are necessary to provide safe trial results:
- A big number of measurements
- A suitable listening place
- Testing equipment issues
- Measurement precision issue
- Careful condition control
Tests should ensure reliable repeatability under the same conditions. i,e, there should be no deviation from the agreed parameters or set precisions.
The trial begins with a methodology design. A methodology defines the trials:
- other things noted below
Methodology should be designed first
The test protocol is on paper, where detailed data is recorded about the conducted trial. It includes:
- measurement tools
- participants (and its listening skill, occupation, education, other)
- trial conditions
- other stipulations as required or specific to the test
Hi-Fi test protocol
The testing and measurement equipment may be registered with a unique identifier of the item (serial number, as an example). The main protocol aim is the ability to check experiment conditions in case of doubt or to better understand reasons behind any results.
To save time, several participants may be accommodated in one room. Several rows of seats may be placed in the room.
In the trial result interpretation, it is necessary to take into account:
- each seat’s positioning and how it impacts the sound,
- the seat with and without the listener will impact differently,
- the speaker sound is different for each seat due to the position.
Speakers have individual radiation patterns. The frequency response depends on the listener sitting position relative to the speakers and the listening room’s walls. Acoustical rays can interfere, rays bounced from a surface other than a wall can interfere too.
Impact seat place to audio test results
An anechoic room (free from echo) is ideal to prevent acoustical wave bouncing. Speakers also have different frequency responses in different directions. Therefore, different seat placement can also cause different frequency responses, even in an anechoic room. Generally, only one fixed sitting place in the listening room is recommended for accurate and reliable testing.
Testing equipment issues
Tested equipment (apparatus that will be tested in the trial) should be checked to ensure they are workable under the conditions outlined in the methodology in case of doubt.
It is desirable if testing equipment has a unique identifier (serial number). It helps when we want to repeat the experiment or we found difference resulting reasons in case of doubt. Because different equipment instances may have various performance figures and thus sound differently.
Very important thing!
LOUDNESS OF COMPARED SAMPLES
Human ‘perception edge’ is close to 1 to 2 dB of loudness. So 0.1 to 0.2 dB level normalization of samples is recommended.
Humans have limited time in terms of remembering an echoic memory (auditory event retaining). So time sample listening should not be too long. Immediate (real-time) easy switch between samples must be ensured.
When we compare headphones, tested samples can’t be isolated from participant technically. For solving this issue, synthesis of headphone features and comparison via single headphone units were suggested.
Measurement precision issue
In statistical calculations, measurement error values may be different. We don’t know if the outcome will be exact, but we must expect in the initial test first approach to have a normal distribution. Therefore;
To provide measurement precision X, it is recommended to measure value Y with precision X/3 or higher.
As an example, we want to provide ear trial measurement accuracy of within 1%.
1/1% = 1/0.01 = 100 attempts
So trying number 3 times higher 300 = 100 * 3 is recommended.
Defined measurement precision should be provided
Big number of measurements
The ear test has many variables due to the human factor above all other elements in the testing. Variation may be compensated for by using a large number of repetitions in the test a large group of participants, or a wide range of equipment items.
Big participant and test cycle number is recommended
In the “Measurement precision issue” part we considered the example: providing 1% precision.
What we recommended were 300 cycles. However, it doesn’t guarantee 100 % accuracy. Participant skills can cause biasing of results, as an example. To compensate it with precision 1%, we should invite 300 = (1/1%)*3 = (1/0.01)*3 participants. Thus the total number of measurements is 90000 = 300 trying * 300 participants
We should take into account each feature that can possibly cause mistakes. In the initial approximation, we should invite participants with different skills and groups of participants with same skill should have the same number of members.
In the trial conclusions, we should group results by participant skills.
Number measurements are enough if adding of several participants and/or cycles and/or equipment items and/or other factors that will impact on the results within an allowable error margin.
Careful condition control
Conditions can define the experiment. Changing conditions can bias results significantly.
Careful control of conditions provide the test exactness
Somebody want to compare DACs. He asks people who have the devices to check it on some public HiFi test CD.
These tests, as rule, are performed at home without an exact condition control. We have experimented with different speakers, listening places, the ambient noise, etc. Even experiments, described in details, may have subtle, at first glance, issues, that can impact upon the result.
As a rule, before experimenting we can’t know exactly what these details are or if they are important in the trial. Only the result figures can show that details are unimportant in our experiment. If they are unimportant the impact on the results will stay within an allowable error margin.
- Blind test, double-blind tests, ABX testing are intended to offer “objective” measurements of “subjective” (immeasurable) perceptions.
- Properly performed and recorded hi-fi blind tests may be able to present serious objective evidence based on subjective perception.
- The trial may be performed at home for personal purposes, but can’t be used as technical evidence.
- The trial should be designed, performed in a manner that maximizes independence from the number of participants and test cycles or both. Result deviations into set error margin are allowable.
- The result repeatability is achieved by a big number of participants and cycles, equipment items, etc.
- Even correctly designed and performed HiFi test doesn’t guarantee absolute truth. Because new knowledge may open new details which can be incorporated into new tests.