πAI Search Engine Multilingual Evaluation Report (v1.2)
In previous evaluation, we assessed multilingual and cross-lingual search capabilities of Perplexity Basic, Perplexity Pro, Metaso, iAsk, and you.com. Among these, Perplexity Pro consistently performed the best. This time we added a new ai search player, Felo AI, including both his Basic and Enhanced edition. Although Felo Enhanced requires a browser plugin installed, it remains a free version.
Evaluation Results
After rigorous evaluation, we have reached the following conclusions:
Felo Enhanced leads Perplexity Pro by a slight margin, achieving an accuracy of 86.67% compared to Perplexity Pro's 84.17%.
Felo Basic also surpasses Perplexity Basic with an accuracy of 81.67%, compared to 72.50%.
Metaso performs well in searches within the mainland China Simplified Chinese community but shows average performance in cross-lingual searches, with an accuracy of 66.67%.
iAsk and you.com demonstrate poor performance in searches outside of English-speaking communities, with accuracies of 52.50% and 33.33%, respectively.
Product Name | Average ACC | complex_search | business_consulting | local_search | products_search | real_time_news | technical_consulting |
---|---|---|---|---|---|---|---|
Felo Enhanced | 86.67% | 90.00% | 65.00% | 100.00% | 90.00% | 85.00% | 90.00% |
Perplexity Pro | 84.17% | 90% | 70.00% | 95.00% | 90.00% | 80.00% | 80.00% |
Felo Basic | 81.67% | 90.00% | 55.00% | 95.00% | 85.00% | 90.00% | 75.00% |
Perplexity Basic | 72.50% | 90% | 50.00% | 80.00% | 85.00% | 60.00% | 70.00% |
Metaso | 66.67% | 50% | 75% | 80% | 75% | 65% | 55% |
iAsk | 52.50% | 40% | 45.00% | 35.00% | 65.00% | 65.00% | 65.00% |
33.33% | 40% | 20.00% | 15.00% | 40.00% | 55.00% | 30.00% |
Evaluation Data
To comprehensively assess the performance of the above AI search engines in a multilingual environment, we conducted tests in six languages: Japanese, English, Traditional Chinese, Simplified Chinese, Russian, and Korean. We have open-sourced all test datasets and results, which can be found on GitHub at the following URL:
DatasetοΌhttps://github.com/sparticleinc/ASEED
Evaluation Methodology
In this round, we migrated our testing platform from Ragas to Promptfoo. Promptfoo offers multiple answer detection rules and achieves higher overall accuracy in evaluations compared to Ragas. Therefore, we will only update test cases in the Promptfoo format going forward. For this evaluation, we utilized the Promptfoo platform and conducted tests using the GPT-4o large language model. All test results underwent manual verification to ensure accuracy.
Last updated