Falcon

Falcon‑H1R-FP8: Accelerating Inference with Quantized Precision

Falcon CHAT Hugging Face DISCORD Introducing Falcon H1R 7B FP8, a fully quantized version of the Falcon H1R 7B model that packs both weights and activations into FP8 format. Using NVIDIA Model Optimizer and post-training quantization (PTQ) workflow, the FP8 quantized model preserves the original BF16 quality performance while delivering a 1.2×–1.5× throughput boost and halving GPU memory footprint. Evaluations The FP8 variant retains essentially the same accuracy as BF16 across all three major reasoning tasks: AIME25 drops only 0.8 % (from 83.1 % to 82.3 %), LCB‑v6 falls by 1 % (68.6 % → 67.6 %), and GPQA‑D shows a negligible 0.1 % difference (61.3 % → 61.2 %). These results confirm that the FP8 PTQ preserves benchmark performance while delivering substantial memory and throughput gains. ...

Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling

Falcon CHAT Hugging Face DEMO DISCORD Introducing Falcon H1R 7B We’re excited to unveil Falcon H1R 7B, a decoder-only large language model, developed by the Technology Innovation Institute (TII) in Abu Dhabi. Building upon the robust foundation of Falcon-H1 Base model, Falcon H1R 7B takes a major leap forward in reasoning capabilities. Despite its modest 7 billion‑parameter size, Falcon H1R 7B matches or outperforms state‑of‑the‑art reasoning models that are 2–7× larger, proving its exceptional parameter efficiency and does so consistently across a wide range of reasoning‑intensive benchmarks. ...

Introducing Falcon-H1-Arabic: Pushing the Boundaries of Arabic Language AI with Hybrid Architecture

Check out the Arabic version translated by Falcon-H1-Arabic The journey of building world-class Arabic language models has been one of continuous learning and iteration. Today, we’re excited to announce Falcon-H1-Arabic, our most advanced Arabic language model family to date, representing a significant leap forward in both architecture and capabilities. This release embodies months of research, community feedback, and technical innovation, culminating in three powerful models that set new standards for Arabic natural language processing. ...

Falcon-Arabic: A Breakthrough in Arabic Language Models

Check out the Arabic version translated by Falcon-Arabic We are excited to introduce Falcon-Arabic, a 7B parameter Language Model that sets a new benchmark for Arabic NLP. Built on the Falcon 3 architecture, Falcon-Arabic is a multilingual model that supports Arabic, English, and several other languages. It excels in general knowledge, Arabic grammar, mathematical reasoning, complex problem solving, and understanding the rich diversity of Arabic dialects. Falcon-Arabic supports a context length of 32,000 tokens, allowing it to handle long documents and enabling advanced applications like retrieval-augmented generation (RAG), in-depth content creation, and knowledge-intensive tasks. ...

Falcon-H1: A Family of Hybrid-Head Language Models Redefining Efficiency and Performance

Falcon CHAT Hugging Face Paper Github DEMO DISCORD Introduction Today, we are proud to introduce the Falcon-H1 series, a collection of six open-source models ranging from 0.5B to 34B parameters, each available in both base and instruction-tuned variants. At the core of these models lies a hybrid architecture that combines the strengths of the classical Transformer-based attention mechanism with the State Space Model (SSM), known for its superior long-context memory and computational efficiency. This architectural innovation is further enhanced by fundamental advancements in training dynamics and data utilization, enabling Falcon-H1 models to deliver uncompromised performance that rivals the top Transformer-based models across all covered size tiers. ...