Scholar — large language models

arxiv.org 📅 2022 📰 arXiv 📄 PDF

A global analysis of metrics used for measuring performance in natural language processing

👤 Kathrin Blagec; Georg Dorffner; Milad Moradi; Simon Ott; Matthias Samwald

Measuring the performance of natural language processing models is challenging. Traditionally used metrics, such as BLEU and ROUGE, originally devised for machine translation and summarization, have been shown to suffer from low correlation with human judgment and a lack of transferability to other tasks and languages.…

cs.CL cs.AI

arxiv.org 📅 2023 📰 arXiv 📄 PDF

Can Large Language Models design a Robot?

👤 Francesco Stella; Cosimo Della Santina; Josie Hughes

Large Language Models can lead researchers in the design of robots.…

cs.RO

semanticscholar.org 📅 2025 📰 Nature 🔖 5,401 citations

DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning

👤 DeepSeek-AI; Daya Guo; Dejian Yang; Haowei Zhang; Jun-Mei Song; Ruoyu Zhang; R. Xu; Qihao Zhu; Shirong Ma; Peiyi Wang; Xiaoling Bi; Xiaokang Zhang; Xingkai Yu; Yu Wu; Z. F. Wu; Zhibin Gou; Zhihong Shao; Zhuoshu Li; Ziyi Gao; A. Liu; Bing Xue; Bing-Li Wang; Bochao Wu; B. Feng; Chengda Lu; Chenggang Zhao; C. Deng; Chenyu Zhang; C. Ruan; Damai Dai; Deli Chen; Dong-Li Ji; Erhang Li; Fangyun Lin; Fucong Dai; Fuli Luo; Guangbo Hao; Guanting Chen; Guowei Li; H. Zhang; Han Bao; Hanwei Xu; Haocheng Wang; Honghui Din

General reasoning represents a long-standing and formidable challenge in artificial intelligence (AI). Recent breakthroughs, exemplified by large language models (LLMs)1,2 and chain-of-thought (CoT) prompting3, have achieved considerable success on foundational reasoning tasks. However, this success is heavily continge…

DOI: 10.1038/s41586-025-09422-z

arxiv.org 📅 2025 📰 arXiv 📄 PDF

Liars' Bench: Evaluating Lie Detectors for Language Models

👤 Kieron Kretschmar; Walter Laurito; Sharan Maiya; Samuel Marks

Prior work has introduced techniques for detecting when large language models (LLMs) lie, that is, generate statements they believe are false. However, these techniques are typically validated in narrow settings that do not capture the diverse lies LLMs can generate. We introduce LIARS' BENCH, a testbed consisting of 7…

cs.CL cs.AI

arxiv.org 📅 2026 📰 arXiv 📄 PDF

Spark-LLM-Eval: A Distributed Framework for Statistically Rigorous Large Language Model Evaluation

👤 Subhadip Mitra

Evaluating large language models at scale remains a practical bottleneck for many organizations. While existing evaluation frameworks work well for thousands of examples, they struggle when datasets grow to hundreds of thousands or millions of samples. This scale is common when assessing model behavior across diverse d…

cs.DC cs.CL cs.LG

arxiv.org 📅 2025 📰 arXiv 📄 PDF

Sentra-Guard: A Real-Time Multilingual Defense Against Adversarial LLM Prompts

👤 Md. Mehedi Hasan; Sk Tanzir Mehedi; Ziaur Rahman; Rafid Mostafiz; Md. Abir Hossain

This paper presents a real-time modular defense system named Sentra-Guard. The system detects and mitigates jailbreak and prompt injection attacks targeting large language models (LLMs). The framework uses a hybrid architecture with FAISS-indexed SBERT embedding representations that capture the semantic meaning of prom…

cs.CR cs.AI

arxiv.org 📅 2019 📰 arXiv 📄 PDF

A Benchmark Study of Machine Learning Models for Online Fake News Detection

👤 Junaed Younus Khan; Md. Tawkat Islam Khondaker; Sadia Afroz; Gias Uddin; Anindya Iqbal

The proliferation of fake news and its propagation on social media has become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been suggested to detect fake news. However, most of those focused on a specific type of news (such as political) which leads us to t…

cs.CL cs.IR cs.LG stat.ML

DOI: 10.1016/j.mlwa.2021.100032

arxiv.org 📅 2025 📰 arXiv 📄 PDF

Noise-Driven Persona Formation in Reflexive Neural Language Generation

👤 Toshiyuki Shigemura

This paper introduces the Luca-Noise Reflex Protocol (LN-RP), a computational framework for analyzing noise-driven persona emergence in large language models. By injecting stochastic noise seeds into the initial generation state, we observe nonlinear transitions in linguistic behavior across 152 generation cycles. Our …

cs.CL

arxiv.org 📅 2024 📰 arXiv 📄 PDF

SSFF: Investigating LLM Predictive Capabilities for Startup Success through a Multi-Agent Framework with Enhanced Explainability and Performance

👤 Xisen Wang; Yigit Ihlamur; Fuat Alican

LLM based agents have recently demonstrated strong potential in automating complex tasks, yet accurately predicting startup success remains an open challenge with few benchmarks and tailored frameworks. To address these limitations, we propose the Startup Success Forecasting Framework, an autonomous system that emulate…

cs.AI

arxiv.org 📅 2025 📰 arXiv 📄 PDF

Tell Me: An LLM-powered Mental Well-being Assistant with RAG, Synthetic Dialogue Generation, and Agentic Planning

👤 Trishala Jayesh Ahalpara

We present Tell Me, a mental well-being system that leverages advances in large language models to provide accessible, context-aware support for users and researchers. The system integrates three components: (i) a retrieval-augmented generation (RAG) assistant for personalized, knowledge-grounded dialogue; (ii) a synth…

cs.CL cs.AI cs.HC cs.LG

📚 UNiON Scholar