13 Sep 2025

AI News Digest

🤖 AI-curated 8 stories

Today's Summary

Elon Musk’s xAI just laid off about 500 data annotators as it shifts focus to hiring domain-specific ‘AI tutors’—a move that highlights the evolving landscape of AI training and human oversight as models mature. On a different note, Replit bagged $250M to push forward with Agent 3, an autonomous tool that’s jazzing up the code-gen scene by helping both coders and non-coders automate software development tasks. Meanwhile, a new study from Penn’s researchers introduces ResearchQA, a benchmark aiming to improve scholarly question-answering across 75 fields, shining a light on how we might better assess and enhance AI’s academic capabilities.

Stories

xAI cuts ~500 data annotators as it pivots to specialist 'AI tutors'

Elon Musk’s xAI has laid off at least 500 members of its data‑annotation team — the company’s largest group — as it reorganizes to prioritize domain‑specific ‘specialist’ AI tutors (plans to expand that specialist team “10x”). The move, reported by Business Insider on Sept. 13, 2025, reflects a shift from broad, generalist human data labeling toward more targeted, higher‑skill human oversight for training and fine‑tuning its Grok chatbot. Industry impact: the change signals how AI firms are refining human‑in‑the‑loop workflows (and cost bases) as models mature, and it underscores continued volatility in staffing models for AI training and safety roles.

Replit raises $250M at $3B valuation and launches Agent 3 autonomous coding agent

Replit said it raised $250 million in new funding (valuing the company at about $3 billion) and used the raise to accelerate product development — including the launch of Agent 3, an autonomous tool that tests and fixes code and can build custom agents and workflows. Reported by Reuters (Sept. 11, 2025), the raise — led by Prysm Capital with strategic participation from Google's AI Futures Fund and others — highlights heavy investor interest in 'code‑gen' platforms. Industry impact: the funding and Agent 3 release underscore growing momentum for tools that let non‑engineers and engineering teams automate software development tasks, intensifying competition among code‑generation startups and incumbent developer tool vendors.

Publisher analysis finds widespread LLM‑generated text in manuscripts and peer reviews — and a tool to spot it

Nature reports that the American Association for Cancer Research (AACR) used a Pangram Labs detection tool to screen tens of thousands of submissions and peer‑review reports and found a sharp rise in likely LLM‑generated text since ChatGPT’s release. AACR’s analysis (presented Sept. 3 at the International Congress on Peer Review) flagged ~23% of abstracts and ~5% of peer‑review comments in 2024 as likely AI‑generated, while fewer than 25% of authors disclosed using LLMs. The story highlights a practical research‑integrity challenge (AI‑polished or AI‑written methods sections can introduce errors) and a growing publisher response: automated screening and new policies. Impact: publishers, journals and conferences may expand automated detection and disclosure rules, which affects how researchers prepare manuscripts and how peer review is conducted and audited.

ResearchQA: a large, survey‑mined benchmark to evaluate scholarly question‑answering across 75 fields

An arXiv preprint from researchers at the University of Pennsylvania introduces ResearchQA — a resource that mines survey articles to produce ~21K research‑level queries with ~160K rubric items across 75 fields. The authors show how survey‑derived rubrics enable scalable, domain‑aware evaluation of long‑form, evidence‑based answers and use the benchmark to evaluate 18 systems, finding no system exceeds ~70–75% coverage of rubric items; citation and limitation‑reporting remain particularly weak. Why it matters: ResearchQA provides a multi‑disciplinary, high‑granularity benchmark for assessing research‑assistant models and retrieval‑augmented systems, offering a way to measure and improve scholarly QA capabilities (and citation/attribution fidelity) across many academic domains.

Read more → arXiv (University of Pennsylvania authors)

Elon Musk’s xAI cuts ~500 data annotators as it pivots to specialist ‘AI tutors’

xAI laid off roughly 500 members of its data-annotation / AI‑tutoring team in a reorganization that shifts the company away from broad generalist annotators toward domain-specific ‘specialist AI tutors’ (STEM, finance, medicine, safety, etc.). The cuts hit xAI’s largest team and come as the company restructures how it trains Grok — a move that could lower short‑term costs but risks disrupting model‑training operations and morale while signaling a tactical shift in how smaller AI players try to scale labeling and fine‑tuning. The story is important for the industry because it underscores ongoing pressure on AI firms to optimize human-in-the-loop workflows, and it highlights how firms balance labor cost, quality, and domain expertise when training large models.

Senior Apple AI/search exec Robby Walker to depart amid Siri delays and AI talent shakeup

Bloomberg (reported via Reuters) says Robby Walker, a senior Apple executive who led Siri until earlier this year and later ran the Answers/Information & Knowledge team, is planning to leave Apple next month. The exit follows a string of departures from Apple’s AI ranks and comes as the company delays major generative‑AI upgrades to Siri and pushes a new AI search effort. The move raises fresh questions about Apple’s pace in the AI race and talent retention as competitors aggressively hire top AI engineers — an industry dynamic with implications for product timelines, recruiting, and Apple’s ability to compete on consumer AI features.

Read more → Reuters (reporting Bloomberg)

Oboe launches an AI-powered learning platform that builds custom courses from a prompt

Oboe, created by the co‑founders of Anchor, debuted an AI learning platform that generates multi-format micro‑courses from simple prompts — producing text lessons, narrated audio, slides, quizzes and mini‑games. Users get a small number of free course creations (five), with subscription tiers for heavier use. Oboe emphasizes active learning (flashcards/quizzes/games) and uses internal agentic checks to reduce hallucinations. Why it matters: it shows the next wave of consumer education apps using generative AI to assemble structured learning experiences quickly — useful for self‑study, lesson prep, and rapid upskilling, while raising familiar accuracy and depth tradeoffs.

Google expands Gemini and NotebookLM with audio uploads, flashcards, quizzes and new report formats

Google updated its Gemini ecosystem: the Gemini mobile app now accepts audio uploads (free users get limited minutes; Pro/Ultra users more), Google Search’s AI Mode added five languages, and NotebookLM gained active‑learning features — flashcards, quizzes, and new report/blog/study‑guide formats (customizable tone/structure). Why it matters: these additions turn research and personal notes into interactive study materials and broaden language/accessibility — a practical boost for students, educators and learners using AI to turn documents and media into study tools and assessments.