Skip to content Skip to sidebar Skip to footer

Author page: Benjamin Marie

Mixtral-8x7B: Understanding and Running the Sparse Mixture of Experts | by Benjamin Marie | Dec, 2023

[ad_1] How to efficiently outperform GPT-3.5 and Llama 2 70B Image by 8385 from PixabayMost of the recent large language models (LLMs) use very similar neural architectures. For instance, the Falcon, Mistral, and Llama 2 models use a similar combination of self-attention and MLP modules. In contrast, Mistral AI, which also created Mistral 7B, just…

Read more

Fine-tune Better Chat Models with Distilled Identity Preference Optimization (IPO)

[ad_1] Mistral 7B aligned with IPO Photo by Rishabh Dharmani on UnsplashTo become chat models, pre-trained large language models (LLMs) are fine-tuned on large datasets of instructions/questions paired with expected answers. While this simple fine-tuning yields convincing chat models, their answers may still be incoherent, biased, unethical, and unsafe from a human perspective. This is…

Read more

QA-LoRA: Fine-Tune a Quantized Large Language Model on Your GPU

[ad_1] Quantization-aware fine-tuning Illustration by the author — Made with images from Pixabay (1,2)State-of-the-art large language models (LLMs) are pre-trained with billions of parameters. While pre-trained LLMs can perform many tasks, they can become much better once fine-tuned. Thanks to LoRA, fine-tuning costs can be dramatically reduced. LoRA adds low-rank tensors, i.e., a small number…

Read more

GPT-3.5 Translates Paragraphs Better | by Benjamin Marie | May, 2023

[ad_1] And outperforms Google Translate for the translation of literary works Image from PixabayAccording to previous studies, GPT models perform as well as standard machine translation systems, e.g., Google Translate. These studies mostly focused on sentence-level translation: The default approach used in machine translation that translates sentences one by one without any context. Translating paragraphs…

Read more