Fine-tune Google Gemma with Unsloth and Distilled DPO on Your Computer

[ad_1] Following Hugging Face’s Zephyr recipe Generated with DALL-EFinding good training hyperparameters for new LLMs is always difficult and time-consuming. With Zephyr Gemma 7B, Hugging Face seems to have found a good recipe for fine-tuning Gemma. They used a combination of distilled supervised fine-tuning and DPO similar to what they did for their original Zephyr…

ByBenjamin MarieMarch 18, 20240Comments

AI

Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts | by Benjamin Marie | Dec, 2023

[ad_1] How to efficiently outperform GPT-3.5 and Llama 2 70B Image by 8385 from PixabayMost of the recent large language models (LLMs) use very similar neural architectures. For instance, the Falcon, Mistral, and Llama 2 models use a similar combination of self-attention and MLP modules. In contrast, Mistral AI, which also created Mistral 7B, just…

ByBenjamin MarieDecember 15, 20230Comments

AI

Fine-tune Better Chat Models with Distilled Identity Preference Optimization (IPO)

[ad_1] Mistral 7B aligned with IPO Photo by Rishabh Dharmani on UnsplashTo become chat models, pre-trained large language models (LLMs) are fine-tuned on large datasets of instructions/questions paired with expected answers. While this simple fine-tuning yields convincing chat models, their answers may still be incoherent, biased, unethical, and unsafe from a human perspective. This is…

ByBenjamin MarieDecember 13, 20230Comments

AI

QA-LoRA: Fine-Tune a Quantized Large Language Model on Your GPU

[ad_1] Quantization-aware fine-tuning Illustration by the author — Made with images from Pixabay (1,2)State-of-the-art large language models (LLMs) are pre-trained with billions of parameters. While pre-trained LLMs can perform many tasks, they can become much better once fine-tuned. Thanks to LoRA, fine-tuning costs can be dramatically reduced. LoRA adds low-rank tensors, i.e., a small number…

ByBenjamin MarieOctober 14, 20230Comments

AI

Falcon 180B: Can It Run on Your Computer?

[ad_1] There is also a chat version. The models are available on the Hugging Face hub: Falcon 180B is completely free and state-of-the-art. But it’s also a huge model. Can it run on your computer? Unless your computer is ready for very intensive computing, it can’t run Falcon 180B out-of-the-box. You will need to upgrade…

ByBenjamin MarieSeptember 12, 20230Comments

AI

vLLM: PagedAttention for 24x Faster LLM Inference | by Benjamin Marie | Jun, 2023

[ad_1] Almost all the large language models (LLM) rely on the Transformer neural architecture. While this architecture is praised for its efficiency, it has some well-known computational bottlenecks. During decoding, one of these bottlenecks is in the computation of the attention with pairs of key-value tensors for each token of the input. All these tensors…

ByBenjamin MarieJune 24, 20230Comments

AI

GPT-3.5 Translates Paragraphs Better | by Benjamin Marie | May, 2023

[ad_1] And outperforms Google Translate for the translation of literary works Image from PixabayAccording to previous studies, GPT models perform as well as standard machine translation systems, e.g., Google Translate. These studies mostly focused on sentence-level translation: The default approach used in machine translation that translates sentences one by one without any context. Translating paragraphs…

ByBenjamin MarieMay 25, 20230Comments

AI

AI Won’t Replace Translators. But it can help them.

[ad_1] Opinion It’s 1960 all over again Image from PixabayIn a recent study, the University of Pennsylvania and OpenAI investigated the potential impact of large language models (LLM), such as GPT models, on various jobs. GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models (Eloundou et al.,…

ByBenjamin MarieMarch 31, 20230Comments

AI

Traditional vs. Neural Metrics for Machine Translation Evaluation

[ad_1] 100+ new metrics since 2010 Image from PixabayAn evaluation with automatic metrics has the advantages to be faster, more reproducible, and cheaper than an evaluation conducted by humans. This is especially true for the evaluation of machine translation. For a human evaluation, we would ideally need expert translators For many language pairs, such…

ByBenjamin MarieMarch 9, 20230Comments

AI

Data Preprocessing for Machine Translation

[ad_1] Clean, normalize, and tokenize Image from Pixabay.Data preprocessing is a critical step for any machine learning tasks. The data must be correct, clean, and in the expected format. In this blog article, I explain all the steps that are required to preprocess the data used to train, validate, and evaluate machine translation systems.…

ByBenjamin MarieFebruary 25, 20230Comments