πŸ“ŠAI Search Engine Multilingual Evaluation Report (v1.2)

In previous evaluation, we assessed multilingual and cross-lingual search capabilities of Perplexity Basic, Perplexity Pro, Metaso, iAsk, and you.com. Among these, Perplexity Pro consistently performed the best. This time we added a new ai search player, Felo AI, including both his Basic and Enhanced edition. Although Felo Enhanced requires a browser plugin installed, it remains a free version.

Evaluation Results

After rigorous evaluation, we have reached the following conclusions:

  1. Felo Enhanced leads Perplexity Pro by a slight margin, achieving an accuracy of 86.67% compared to Perplexity Pro's 84.17%.

  2. Felo Basic also surpasses Perplexity Basic with an accuracy of 81.67%, compared to 72.50%.

  3. Metaso performs well in searches within the mainland China Simplified Chinese community but shows average performance in cross-lingual searches, with an accuracy of 66.67%.

  4. iAsk and you.com demonstrate poor performance in searches outside of English-speaking communities, with accuracies of 52.50% and 33.33%, respectively.

Product NameAverage ACCcomplex_searchbusiness_consultinglocal_searchproducts_searchreal_time_newstechnical_consulting

Felo Enhanced

86.67%

90.00%

65.00%

100.00%

90.00%

85.00%

90.00%

Perplexity Pro

84.17%

90%

70.00%

95.00%

90.00%

80.00%

80.00%

Felo Basic

81.67%

90.00%

55.00%

95.00%

85.00%

90.00%

75.00%

Perplexity Basic

72.50%

90%

50.00%

80.00%

85.00%

60.00%

70.00%

Metaso

66.67%

50%

75%

80%

75%

65%

55%

iAsk

52.50%

40%

45.00%

35.00%

65.00%

65.00%

65.00%

33.33%

40%

20.00%

15.00%

40.00%

55.00%

30.00%

Evaluation Data

To comprehensively assess the performance of the above AI search engines in a multilingual environment, we conducted tests in six languages: Japanese, English, Traditional Chinese, Simplified Chinese, Russian, and Korean. We have open-sourced all test datasets and results, which can be found on GitHub at the following URL:

Dataset:https://github.com/sparticleinc/ASEED

Evaluation Methodology

In this round, we migrated our testing platform from Ragas to Promptfoo. Promptfoo offers multiple answer detection rules and achieves higher overall accuracy in evaluations compared to Ragas. Therefore, we will only update test cases in the Promptfoo format going forward. For this evaluation, we utilized the Promptfoo platform and conducted tests using the GPT-4o large language model. All test results underwent manual verification to ensure accuracy.

Last updated