Donald Hobern<p>Idea for a non-<a href="https://scicomm.xyz/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://scicomm.xyz/tags/research" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>research</span></a> tool (probably exists somewhere)...</p><p>I was asked to comment on some "reports" generated by "AI" tools (specifically perplexity.ai and scispace.com). The results looked superficially good, but were still just party-trick report-a-like documents. In particular, the alleged references didn't anchor the assertions, many sentences misaligned predicates with complex subjects, and overall it was turgid, turgid, turgid. The results would have been better if it had simply extracted statistically interesting and relevant sentences from each source without a claim it was adding further value.</p><p>This made me wonder whether a tool exists that follows the steps below.</p><p>1. Perform a trawl for papers relevant to the topic under consideration based on the user prompt, prioritising those with high impact factors and those that are most recent.</p><p>2. Build a citation graph for this set of papers and all the papers they reference.</p><p>3. Retrieve all these documents.</p><p>4. Create a word map for the set of retrieved documents to assess what words and multi-word terms appear more frequently in this corpus than in the wider research literature - these are the primary terms of interest to this community.</p><p>5. For each of the documents in the corpus, evaluate which words and terms appear more frequently in that paper than in either the wider research literature or the primary-term word map we created - these indicate the specific focus of each paper.</p><p>6. Find the sentences that look likely to be most semantically significant as the key thesis for each paper. This would be based on:</p><p>- prevalence of the identified primary-terms in each sentence</p><p>- focus on sentences with structures that seem to assert inter-relationships between these terms ("This paper explores squinkiness, pinkiness and flinkiness of blorbs" is less likely to be significant than "The flinkiness of a blorb is indirectly correlated with the product of its squinkiness and pinkiness" - a clue is in the fact that the second sentence uses key terms in different clauses)</p><p>- a particular weighting towards abstracts, results and discussions.</p><p>7. Generate a faceted view of all the most highly-ranked sentences explorable by article impact, key terms, publication date, etc.</p><p>It feels like it would be a much better research tool, based entirely on statistical approaches. I would not want this to be used on-demand (still way too much wasted compute), but wouldn't this be better than the <a href="https://scicomm.xyz/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> <a href="https://scicomm.xyz/tags/parrots" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>parrots</span></a>?</p>