[ad_1]
AIMultiple aims to help buyers identify the right writing assistant for their business.
AIMultiple’s first AI writer benchmark will aim to help marketing teams choose the writing assistant that best fits their business’ needs. The benchmark will assess these aspects:
- For the resulting articles:
- Readability
- Truthfulness
- Correct use of English and grammar
- Je ne sais quoi (i.e. how attractive / engaging the article is)
- Customer service
- Total cost of ownership
What will be the guiding principles?
AIMultiple’s benchmark methodology is designed for an objective and transparent assessment. It also explains participation requirements.
What will be benchmarked?
AIMultiple will share prompts to the UI provided by the AI writing assistants and evaluate the resulting articles.
What is the benchmark dataset?
50 prompts will be created by the AIMultiple team. 25 will be B2C and 25 will be B2B focused. They will be a mix of bottom of the funnel, top of the funnel and middle of the funnel articles.
What is required from the AI writing assistant?
The complete article needs to be returned within 5 minutes of receiving the prompt
How will AIMultiple perform the benchmark?
AIMultiple’s AI writing assistant benchmark aims to closely match the preferences of buyers. They want a solution that provides articles that are at a quality that is as close to be published. Therefore, AIMultiple will measure these metrics:
- For the resulting articles, industry analysts from AIMultiple’s team that have extensive online writing experience will evaluate the articles in terms of these metrics on a scale of 10. Each evaluator must have produced online articles that receive thousands of visitors per month on competitive topics. Results will be the average of 5 evaluators’ assessments in these dimensions:
- Je ne sais quoi (i.e. how attractive / engaging the article is)
- Correct use of English and grammar will be measured for each vendor by counting the number of mistakes. AIMultiple will share a grammar mistake/1,000 words ratio for each solution.
- Customer service: Reviews on B2B review platforms will be analyzed to assess customer satisfaction.
- Speed: If there are significant differences in speed between the vendors, this will be highlighted.
- Other features
- Total cost of ownership: Public cost data published by the vendors will be used to calculate the cost of the benchmark. Vendors’ cost model will also be shared to help buyers compare prices of different vendors.
How will the results be published?
They will be published on AIMultiple.com and will feature graphs that users can leverage to find the right vendor for their business. Different metrics (e.g. manual effort) will be separately presented to create transparency for buyers.
Each participant will receive their detailed results as well as the average results.
Challenges
Writers would normally use the AI assistant output as a starting point not as the final product. This benchmarks aims to measure the quality of this initial product. It would also be interesting to know how the AI assistant supports the writing process. However, measuring writers’ preferences during their writing process would introduce more subjectivity to the process and therefore we will not be considering that in this assessment.
Please note that AIMultiple is in the design phase of the benchmark and changes will be made as AIMultiple gets end user feedback and finalizes the benchmark.
Reach out to AIMultiple team via [email protected] if you would like to participate in the AIMultiple AI writer benchmark.
Source link