Tag: deep learning

Research Papers

LLM Token: A Comprehensive Analysis of Large Language Model Tokenization

Research TeamMay 27, 2024

This paper provides a comprehensive analysis of tokenization in large language models, exploring the fundamental mechanisms that enable LLMs to process and understand text. The research examines various tokenization strategies, their impact on model performance, and the implications for natural language processing tasks.

AlphaEvolve: A Novel Approach to Neural Architecture Evolution

Research TeamMarch 20, 2024

A groundbreaking approach to neural architecture evolution that combines evolutionary algorithms with deep learning principles to optimize neural network architectures. This paper introduces a novel methodology for automatically discovering efficient neural network architectures through an evolutionary process.

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob UszkoreitDecember 6, 2017

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely.

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina ToutanovaOctober 11, 2018

We introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers.

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian SunDecember 10, 2015

Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions.

Blog Posts