Tag: deep learning

Research Papers

LLM Token: A Comprehensive Analysis of Large Language Model Tokenization

Research Team • May 27, 2024

This paper provides a comprehensive analysis of tokenization in large language models, exploring the fundamental mechanisms that enable LLMs to process and understand text. The research examines various tokenization strategies, their impact on model performance, and the implications for natural language processing tasks.

AlphaEvolve: A Novel Approach to Neural Architecture Evolution

Research Team • March 20, 2024

A groundbreaking approach to neural architecture evolution that combines evolutionary algorithms with deep learning principles to optimize neural network architectures. This paper introduces a novel methodology for automatically discovering efficient neural network architectures through an evolutionary process.

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit • December 6, 2017

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova • October 11, 2018

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun • December 10, 2015

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Blog Posts

Tiny but Mighty: How Small Language Models Are Beating the Giants

January 15, 2025

Models with just 135M to 7B parameters started outperforming their heavyweight counterparts on real-world tasks. Learn why smaller is smarter and how SLMs deliver 10x cost reduction with 5x faster inference.