⚗️AI Search Engine Multilingual Evaluation Report - Complex Query (v1.1)

Conclusion

In our last assessment, we observed that existing AI search engines fell short when tackling intricate challenges. Considering the frequency of such complex issues in our everyday tasks and lives, this evaluation is dedicated to examining the AI Search Engine's proficiency in resolving such problems.

However, during our evaluation, we found that the Basic versions of various products were not entirely satisfactory. Therefore, out of necessity, we had to include Perplexity Pro in our testing scope to see how well the best products on the market can perform. After rigorous testing, we have reached the following conclusions:

Perplexity Pro significantly outperformed, achieving an accuracy rate of 80%, while the performance of other products (free version) was not up to par.
When the source retrieved is not enough, LLMs tend to use its knowledge to infer, which lead to lots of hallucination.
The LLMs generating answers for Metaso and Perplexity (basic) performed poorly, often providing incorrect answers even when relevant information was available.

Evaluation Data

In our evaluation, complex problems are defined as those involving multiple points of information, with answers that require consolidation or reasoning. To ensure the validity of our tests, we deliberately avoided questions that could be directly answered by a single webpage. Our test cases were primarily constructed around the following two scenarios:

Comparative Questions: These involve comparing entities from different time periods or groups.
Composite Questions: These contain two distinct sub-questions that require a comprehensive answer.

The multilingual distribution of our test cases remains consistent with the previous edition. Our complex problem test cases have been made open source:

Dataset：https://github.com/sparticleinc/ASEED/blob/main/datasets/complex_search.csv

Case Analysis

Question：Which movie is longer, Hamlet or Gone with the Wind?

Ground truth：The movie that is longer is Hamlet.The key details are:Hamlet (1996) directed by Kenneth Branagh has a runtime of 242 minutes (4 hours 2 minutes)Gone with the Wind (1939) has a runtime of 221 minutes (3 hours 41 minutes)

Comment: Perplexity Pro performed exceptionally well, providing accurate duration information and correct answers. iAsk and You.com also arrived at the correct conclusions. However, Perplexity showed inconsistency in its performance, with contradictory conclusions indicating that there is room for improvement in the quality of its large language model (LLM). Metaso exhibited similar issues to Perplexity; despite having sufficient data, the accuracy of its responses was still lacking.

Search AI Engine

Answer

Accuracy

Perplexity Pro

When comparing the length of the movies "Hamlet" and "Gone with the Wind," the runtime of each film plays a crucial role. "Hamlet," directed by Kenneth Branagh in 1996, has a runtime of 242 minutes, which is approximately 4 hours and 2 minutes. This adaptation of Shakespeare's play is known for its faithfulness to the original script, including every scene.On the other hand, "Gone with the Wind," released in 1939, has a slightly shorter runtime of 221 minutes, or about 3 hours and 41 minutes. Despite having a runtime that is slightly shorter than "Hamlet," "Gone with the Wind" is an epic historical drama that garnered critical acclaim and won 10 Oscars.In summary, "Hamlet" is longer in terms of total runtime compared to "Gone with the Wind." While both films are known for their epic storytelling and memorable characters, the dedication to including every scene in "Hamlet" contributes to its longer runtime.

True

Perplexity

Gone with the Wind is longer than Hamlet. According to the search results:Gone with the Wind has a runtime of 3 hours and 58 minutes (238 minutes).Hamlet, directed by Kenneth Branagh in 1996, has a runtime of 4 hours and 2 minutes (242 minutes).So while both are very long epic films, Hamlet at 242 minutes is slightly longer than Gone with the Wind at 238 minutes.

False

Metaso

电影《乱世佳人》（Gone with the Wind）的片长有多个版本，包括238分钟、223分钟（1969瑞典版）和224分钟（1994英国版）。而电影《哈姆雷特》（Hamlet）的片长为242分钟。因此，可以确定《乱世佳人》比《哈姆雷特》更长。

False

iAsk

Hamlet is longer than Gone with the WindHamlet has a runtime of 4 hours and 2 minutes, while Gone with the Wind has a runtime of 3 hours and 58 minutesThese sources were consulted to gather accurate information on the runtimes of the movies “Hamlet” and “Gone with the Wind” for comparison.

True

You.com

Comparing the Length of Hamlet and Gone with the WindKenneth Branagh's 1996 film adaptation of Hamlet is longer than Gone with the Wind. The 1996 film adaptation of Hamlet, directed by Kenneth Branagh, has a runtime of just over four hours, making it one of the longest adaptations of the play

True

Question：劉德華比周傳雄（小剛）大幾歲？

Ground truth：劉德華出生於1961年9月27日，而周傳雄（小剛）則出生於1969年6月7日。因此，劉德華比周傳雄大約8年。

Comment: Perplexity Pro excelled, not only providing accurate duration information but also delivering correct answers. All other platforms were incorrect, with the primary reason being their failure to recall the correct content. Metaso made inferences based on the available data. However, both attempts at deduction were incorrect.

Search AI Engine

Answer

Accuracy

Perplexity Pro

劉德華出生於1961年9月27日，而周傳雄（小剛）出生於1969年6月7日。因此，劉德華比周傳雄大8歲。

True

Perplexity

根據搜索結果無法確定劉德華和周傳雄的確切年齡差距。搜索結果中提到了周傳雄的一些背景信息,如他19歲時參加了"臺灣校園歌唱比賽"並被選為小虎隊候選成員,但最終放棄了成為小虎隊隊員的機會。另外也提到了劉德華在2023年春節晚會上演唱了《恭喜發財》這首歌曲。但沒有直接給出兩人的出生年份,因此無法準確計算他們的年齡差距。

False

Metaso

周传雄在2019年8月23日被提及为40岁，而在2021年10月31日，他的年龄被更新为52岁。这表明在2021年时，周传雄的出生年份应该是1969年（假设当前年份为2023年）。刘德华的具体出生年份在我搜索到的资料中没有直接提及，但根据他的知名度和职业生涯，可以合理推测他可能出生在1970年代初期。如果我们以周传雄1969年出生的假设为准，那么到2023年，他将是54岁。如果刘德华也出生在1970年代初期，比如1972年或1973年，那么到2023年，他将是51岁或50岁。这意味着刘德华比周传雄小大约4到6岁。因此，基于这些信息，我们可以推断刘德华比周传雄大约4到6岁。

False

iAsk

劉德華比周傳雄（小剛）大6歲。The sources provided detailed information about the careers and backgrounds of both Andy Lau (劉德華) and the individual known as Xiao Gang (周傳雄), including their ages and accomplishments in the entertainment industry.

False

You.com

年龄比较刘德华比周传雄（小刚）大 10 年.

False

PreviousAI Search Engine 多言語評価レポート（v1.0）NextAI Search Engine 多语言评测报告(v1.2)

Last updated 1 year ago

hashtagConclusion

hashtagEvaluation Data

hashtagCase Analysis

Conclusion

Evaluation Data

Case Analysis