📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems now automate the majority of engineering tasks in AI research, reaching near-saturation in key benchmarks. However, the role of AI in research itself remains limited, with human creativity still essential. This shift could reshape AI development timelines and institutional strategies.
Recent data and expert analyses confirm that AI systems are now capable of automating most core engineering tasks in AI research, reaching near-saturation on multiple benchmarks. While AI can replicate engineering processes efficiently, its capacity to automate the research itself—such as hypothesis generation and creative problem-solving—remains uncertain. This development could significantly impact the pace and nature of AI innovation.
Six key benchmarks measuring AI capability in AI R&D skills show rapid progress, with all approaching or reaching saturation within 16 to 21 months. For example, the CORE-Bench, which tests research reproduction, improved from 21.5% in September 2024 to 95.5% in December 2025, with some experts declaring it ‘solved.’ Similarly, the MLE-Bench, assessing Kaggle competition performance, advanced from 16.9% to 64.4% in roughly 16 months, indicating AI can now perform at mid-tier human levels on complex tasks.
Deep research into kernel design and optimization further illustrates this trend, with models now generating GPU kernels and converting code with minimal human input. These advances suggest that the engineering component of AI R&D is largely automated, reducing the marginal cost and time of reproducing experiments and developing infrastructure. However, the capacity of AI to independently conduct novel research—such as formulating hypotheses or creative problem-solving—remains less clear, with experts noting that research may involve inherently human insight.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

AI Tools for Finance and Accounting Professionals: Automate Tasks, Save Hours, Work Smarter
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

GPU-Accelerated Computing with Python 3 and CUDA: From low-level kernels to real-world applications in scientific computing and machine learning
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

AI-Assisted Programming: Better Planning, Coding, Testing, and Deployment
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

Reproducibility-First ML Experiments: A Practical Guide to Versioning, Tracking, and Scaling Your ML Workflows for Consistent Results
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of AI Automating Engineering Tasks in AI R&D
The near-complete automation of engineering tasks in AI research could accelerate development cycles, reduce costs, and shift institutional focus toward higher-level research activities. Organizations might need to reconsider their resource allocation, emphasizing creative and strategic aspects over routine engineering work. However, the residual role of human researchers in generating novel ideas and hypotheses remains a critical factor, and the transition could redefine the landscape of AI innovation.
Recent Progress in AI R&D Capabilities and Benchmark Saturation
Over the past 18 months, multiple independent benchmarks measuring core AI research skills have shown rapid progress. The CORE-Bench, assessing research reproduction, and the MLE-Bench, evaluating Kaggle competition performance, both approached saturation, indicating AI systems can now handle complex, routine engineering tasks with high reliability. This pattern aligns with earlier predictions about the ‘cascade’ of AI capabilities reaching measurement limits, suggesting that the engineering phase of AI development is nearing completion.
Experts like Thorsten Meyer have analyzed these trends, emphasizing that while engineering automation is advancing swiftly, the creative and hypothesis-driven aspects of research are less mature in AI systems. The question remains whether AI can eventually automate the entire research process or if human insight will always be necessary for breakthrough discoveries.
“The pattern of benchmarks approaching saturation indicates that AI can now automate vast swaths of AI engineering, but the residual research—those parts involving creativity and hypothesis generation—remains less certain.”
— Thorsten Meyer
Unresolved Questions About AI’s Role in Autonomous Research
While engineering tasks in AI R&D are approaching full automation, it remains unclear whether AI can independently conduct innovative research, including hypothesis formulation, experimental design, and creative problem-solving. Experts acknowledge that some aspects of research may inherently require human insight, but the extent to which AI can bridge this gap is still under investigation. The timeline and feasibility of fully automated research are uncertain.
Next Steps in Monitoring AI Capabilities and Institutional Responses
Researchers and organizations will continue to track benchmark progress to determine the limits of AI automation in engineering. Simultaneously, discussions are expected around how to adapt institutional structures, funding, and talent development in response to increasingly automated engineering workflows. The key question is whether AI will eventually automate the entire research cycle or if human researchers will remain essential for breakthrough discoveries.
Key Questions
What are the main benchmarks indicating AI automation in engineering?
The CORE-Bench for research reproduction, the MLE-Bench for Kaggle competition performance, and various kernel design benchmarks are the primary indicators showing AI’s rapid progress toward automating engineering tasks.
Does this mean AI can now do all research independently?
No. While engineering tasks are nearing full automation, the creative, hypothesis-driven aspects of research are still uncertain and likely require human insight for the foreseeable future.
What impact could this have on AI research organizations?
Organizations may shift resources toward higher-level research activities, reduce routine engineering efforts, and focus on strategic, innovative problem-solving as AI handles engineering tasks more efficiently.
How soon might AI fully automate the entire research process?
The timeline remains uncertain. Experts suggest that while progress is rapid, the complete automation of research, including creativity and insight, could still be years away, and some believe it may never fully occur.
Source: ThorstenMeyerAI.com