{"id":22883,"date":"2023-10-27T18:16:06","date_gmt":"2023-10-27T16:16:06","guid":{"rendered":"https:\/\/golem.ai\/?p=22883"},"modified":"2026-02-17T16:46:00","modified_gmt":"2026-02-17T15:46:00","slug":"optimisation-llm-scaleway","status":"publish","type":"post","link":"https:\/\/miralia.ai\/fr\/blog\/optimisation-llm-scaleway","title":{"rendered":"Optimisation de performance LLM avec GPUs Nvidia de Scaleway : une \u00e9tude technique"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Pourquoi <a href=\"http:\/\/Golem.ai\">Miralia<\/a> a d\u00e9cid\u00e9 d\u2019exp\u00e9rimenter les LLMs ?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Chez <a href=\"http:\/\/golem.ai\">Miralia<\/a>, nous croyons \u00e0 la compl\u00e9mentarit\u00e9 des approches Symbolique et G\u00e9n\u00e9rative de l\u2019IA.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Le <a href=\"https:\/\/golem.ai\/fr\/blog\/ia-generative-analytique-neurosymbolique\">premier \u00e9pisode<\/a> de cette s\u00e9rie d\u2019articles l\u2019explique.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Nous vous invitons \u00e0 y jeter un coup d\u2019oeil si vous ne l\u2019avez pas encore fait.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pourquoi choisir <strong>LlaMA-2 ?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Meta, la soci\u00e9t\u00e9 m\u00e8re de Facebook, a fait sensation dans le secteur de l&rsquo;intelligence artificielle (IA) en juillet dernier avec le lancement de LLaMA 2, un mod\u00e8le de langage \u00e0 grande \u00e9chelle (LLM) open-source con\u00e7u pour d\u00e9fier les pratiques restrictives de ses principaux concurrents technologiques.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Contrairement aux syst\u00e8mes d&rsquo;IA lanc\u00e9s par Google, OpenAI et d&rsquo;autres (comme Apple avec Apple GPT ?), qui sont \u00e9troitement prot\u00e9g\u00e9s par des mod\u00e8les propri\u00e9taires, Meta publie gratuitement le code et les donn\u00e9es de LlaMA 2 pour permettre aux chercheurs du monde entier de construire et d&rsquo;am\u00e9liorer la technologie !<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Voici les cinq principales caract\u00e9ristiques de LlaMA 2 :<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>LlaMA 2 surpasse les autres LLM open-source dans les tests de raisonnement, de comp\u00e9tence en codage et de connaissances.<\/li>\n\n\n\n<li>Le mod\u00e8le a \u00e9t\u00e9 entra\u00een\u00e9 sur pr\u00e8s de deux fois plus de donn\u00e9es que la version 1, soit un total de 2 billions de jetons. En outre, l&rsquo;entra\u00eenement a inclus plus d&rsquo;un million de nouvelles annotations humaines et un r\u00e9glage fin pour les compl\u00e9ments de conversation.<\/li>\n\n\n\n<li>Le mod\u00e8le existe en trois tailles, chacune entra\u00een\u00e9e avec 7, 13 et 70 milliards de param\u00e8tres.<\/li>\n\n\n\n<li>LlaMA 2 prend en charge des contextes plus longs, jusqu&rsquo;\u00e0 4096 tokens.<\/li>\n\n\n\n<li>La version 2 a une licence plus permissive que la version 1, autorisant une utilisation commerciale.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">Premiers tests en mode \u201c<strong>Test and learn<\/strong>\u201d avec <a href=\"http:\/\/replicate.com\/\">Replicate.com<\/a><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Pour tester LlaMA-2, nous avons d&rsquo;abord opt\u00e9 pour le SaaS <a href=\"http:\/\/replicate.com\/\">Replicate.com<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cela vous permet de payer au fur et \u00e0 mesure, sans avoir besoin d&rsquo;installer de logiciel sur du mat\u00e9riel existant.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Une premi\u00e8re approche parfaite pour exp\u00e9rimenter !<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Cependant, pour des raisons de confidentialit\u00e9 et d&rsquo;intelligence \u00e9conomique, nous avons opt\u00e9 pour une deuxi\u00e8me approche, expliqu\u00e9e ci-dessous.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Pourquoi LlaMA-2 sur des GPUs internes apr\u00e8s le SaaS <a href=\"http:\/\/replicate.com\/\">Replicate.com<\/a> ?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Chez <a href=\"http:\/\/golem.ai\/\">Miralia<\/a>, l&rsquo;intelligence artificielle de confiance, la souverainet\u00e9 des donn\u00e9es, la s\u00e9curit\u00e9 et le contr\u00f4le de l&rsquo;ensemble de la cha\u00eene de valeur sont les choses les plus importantes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pour cette raison, nous avons d\u00e9cid\u00e9 de r\u00e9aliser notre propre benchmark en utilisant les ressources mat\u00e9rielles de notre fournisseur de cloud fran\u00e7ais <a href=\"https:\/\/www.scaleway.com\/fr\">Scaleway<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Bien que le mod\u00e8le LlaMA-2 soit gratuit \u00e0 t\u00e9l\u00e9charger et \u00e0 utiliser, il convient de noter que l&rsquo;auto-h\u00e9bergement de ce mod\u00e8le n\u00e9cessite une puissance GPU importante pour un traitement optimal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">LlaMA 2 est disponible en trois tailles : 7 milliards, 13 milliards et 70 milliards de param\u00e8tres, selon le mod\u00e8le choisi.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pour les besoins de cette d\u00e9monstration, nous utiliserons le mod\u00e8le 70b pour obtenir la meilleure pertinence !<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Mise en place de la solution GPU interne<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Entrons dans le vif du sujet \ud83d\ude08<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Vue d&rsquo;ensemble de l&rsquo;int\u00e9gration<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>L&rsquo;utilisateur fournit une entr\u00e9e : une invite de commande (c&rsquo;est-\u00e0-dire qu&rsquo;il pose une question).<\/li>\n\n\n\n<li>Un appel API est effectu\u00e9 vers le serveur Llama.cpp, o\u00f9 l\u2019invite de commande est soumise et la r\u00e9ponse g\u00e9n\u00e9r\u00e9e par Llama-2 est obtenue et affich\u00e9e \u00e0 l&rsquo;utilisateur.<\/li>\n<\/ol>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"342\" src=\"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27114657\/Untitled-2-1024x342.png\" alt=\"\" class=\"wp-image-22884\" srcset=\"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27114657\/Untitled-2-300x100.png 300w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27114657\/Untitled-2-1024x342.png 1024w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27114657\/Untitled-2-768x257.png 768w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27114657\/Untitled-2-1536x513.png 1536w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27114657\/Untitled-2-18x6.png 18w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27114657\/Untitled-2.png 1744w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Nous ex\u00e9cutons le mod\u00e8le LlaMA-2 70B \u00e0 l&rsquo;aide de Llama.cpp, avec le pilote NVIDIA CUDA 12.2 sur une Ubuntu 22.04.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\">Llama.cpp<\/a> est une biblioth\u00e8que C\/C++ pour l&rsquo;inf\u00e9rence des mod\u00e8les <a href=\"https:\/\/ai.meta.com\/llama\">LlaMA\/LlaMA-2<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pour ce sc\u00e9nario, nous utiliserons le <a href=\"https:\/\/www.scaleway.com\/en\/h100-pcie-try-it-now\/\">H100<\/a>-1-80G, le mat\u00e9riel le plus puissant de leur gamme de GPU \u00e9tant H100-2-80G, de notre fournisseur fran\u00e7ais Cloud <a href=\"http:\/\/scaleway.com\">Scaleway<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">La gamme de GPU de Scaleway comprend quatre produits d\u00e9di\u00e9s \u00e0 des utilisations diff\u00e9rentes \ud83d\ude80<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th>Machine<\/th><th>GPU<\/th><th>M\u00e9moire GPU (VRAM)<\/th><th>Processeur<\/th><th>Coeurs physiques (vCPU)<\/th><th>RAM<\/th><\/tr><\/thead><tbody><tr><td>Render-S<\/td><td>Dedicated NVIDIA Tesla P100 16GB PCIe<\/td><td>16GB CoWoS HBM2<\/td><td>Intel Xeon Gold 6148 cores<\/td><td>10<\/td><td>42 GB<\/td><\/tr><tr><td>H100-1-80G<\/td><td>H100 PCIe Tensor Core GPU<\/td><td>80GB(HBM2e)<\/td><td>AMD EPYC\u2122 9334<\/td><td>24<\/td><td>240 GB<\/td><\/tr><tr><td>H100-2-80G<\/td><td>2x H100 PCIe Tensor Core GPU<\/td><td>2x 80GB(HBM2e)<\/td><td>AMD EPYC\u2122 9334<\/td><td>48<\/td><td>480 GB<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">La m\u00e9thode de mise en \u0153uvre de la solution est pr\u00e9cis\u00e9e dans les lignes suivantes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Nous estimons qu&rsquo;il faut environ 30mn pour la mettre en place, sous r\u00e9serve que vous r\u00e9pondiez \u00e0 nos exigences OS, logicielles, mat\u00e9rielles et que vous ne rencontriez pas d&rsquo;erreurs \ud83d\ude42 .<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>A. Installation<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Deux voies possibles :<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">1. La m\u00e9thode officielle d\u2019ex\u00e9cuter LlaMA-2 est de passer par leur d\u00e9p\u00f4ts.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avantages :\n<ul class=\"wp-block-list\">\n<li>M\u00e9thode officielle<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Inconv\u00e9nients :\n<ul class=\"wp-block-list\">\n<li>D\u00e9velopp\u00e9 en Python (lenteur d\u2019ex\u00e9cution et consommation excessive de RAM)<\/li>\n\n\n\n<li>Dysfonctionnement de l\u2019acc\u00e9l\u00e9ration GPU H100<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">2. Ex\u00e9cuter LlaMA-2 via l\u2019interface Llama.cpp<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Avantages :\n<ul class=\"wp-block-list\">\n<li>Cette impl\u00e9mentation en C\/C++ pur est plus rapide et plus efficace que son homologue officiel Python, et prend en charge l&rsquo;acc\u00e9l\u00e9ration GPU H100 via CUDA et Metal d&rsquo;Apple. Cela acc\u00e9l\u00e8re consid\u00e9rablement l&rsquo;inf\u00e9rence sur le CPU et rend l&rsquo;inf\u00e9rence sur le GPU H100 plus efficace.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Inconv\u00e9nients :\n<ul class=\"wp-block-list\">\n<li>M\u00e9thode communautaire (non officielle)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Nous avons choisi d\u2019utiliser Llama.cpp pour cette impl\u00e9mentation.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>B. Mod\u00e8les disponibles<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">V\u00e9rifier le type de mod\u00e8le :<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/www.hardware-corner.net\/llm-database\/Llama-2\/\">https:\/\/www.hardware-corner.net\/llm-database\/Llama-2\/<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong><mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">\/!\\<\/mark><\/strong> Llama.cpp ne supporte plus les mod\u00e8les GGML<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-Chat-GGML\">https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-Chat-GGML<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\u21d2 Remplac\u00e9 par des mod\u00e8les GGUF<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-chat-GGUF\">https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-chat-GGUF<\/a> (bas\u00e9e sur Llama-2-70b-chat-hf)<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>C. Processus d\u2019installation<\/strong><\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Installez le pilote NVIDIA CUDA (s&rsquo;il n&rsquo;est pas install\u00e9 sur votre machine GPU H100).<\/strong><\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Pour commencer, installons le pilote NVIDIA CUDA sur Ubuntu 22.04. Le guide pr\u00e9sent\u00e9 ici est le m\u00eame que celui de la page de <a href=\"https:\/\/developer.nvidia.com\/cuda-downloads\">t\u00e9l\u00e9chargement de CUDA Toolkit<\/a> fournie par NVIDIA.<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ wget &lt;https:\/\/developer.download.nvidia.com\/compute\/cuda\/repos\/ubuntu2204\/x86_64\/cuda-keyring_1.1-1_all.deb&gt;\n$ sudo dpkg -i cuda-keyring_1.1-1_all.deb\n$ sudo apt-get update\n$ sudo apt-get -y install cuda-toolkit-12-3\n<\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Apr\u00e8s l&rsquo;installation, le syst\u00e8me doit \u00eatre red\u00e9marr\u00e9. Cela permet de s&rsquo;assurer que les modules du noyau des pilotes NVIDIA sont correctement charg\u00e9s avec dkms. Ensuite, vous devriez \u00eatre en mesure de voir vos GPU H100 en utilisant nvidia-smi.<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ sudo shutdown -r now\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code horiz-scroll\" style=\"font-size:14px\"><code>\nllm@h100-ftw:~$ nvidia-smi\nWed Oct  <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">4 08:44:54 2023<\/mark>       \n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">+---------------------------------------------------------------------------------------+<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">NVIDIA-SMI 535.104.12<\/mark>             Driver Version: <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">535.104.12   CUDA Version: 12.2  <\/mark>   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|-----------------------------------------+----------------------+----------------------+<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU<\/mark>  Name                 Persistence-<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">M <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> Bus-Id        Disp.<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">A<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> Volatile Uncorr. <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">ECC<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> Fan  Temp   Perf          Pwr:Usage\/Cap <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>         Memory-Usage <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU<\/mark>-Util  Compute <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">M.<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>                                         <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>                      <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>              <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\"> MIG M<\/mark>. <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|=========================================+======================+======================|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">0  NVIDIA H100<\/mark> PCIe               On  <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">00000000:01:00.0<\/mark> Off <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>                    0 <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">N\/A<\/mark>   42C    <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">P0 <\/mark>             51W \/ 350W <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>      4MiB \/ 81559MiB <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>      <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">0% <\/mark>     Default <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>                                         <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>                      <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>             Disabled <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">+-----------------------------------------+----------------------+----------------------+\n                                                                                         \n+---------------------------------------------------------------------------------------+<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark> Processes:                                                                            <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>  <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU   GI   CI        PID<\/mark>   Type   Process name                            GPU Memory <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>    <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">    ID   ID <\/mark>                                                            Usage      <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|=======================================================================================|\n|<\/mark>  No running processes found                                                           <mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">|<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">+---------------------------------------------------------------------------------------+<\/mark><\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">2. <strong>Assurez-vous d&rsquo;avoir le binaire nvcc dans votre chemin d&rsquo;acc\u00e8s (path)<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>llm@h100-ftw:~$ nvcc --version\nnvcc: NVIDIA (R) Cuda compiler driver\nCopyright (c) 2005-2023 NVIDIA Corporation\nBuilt on Tue_Aug_15_22:02:13_PDT_2023\nCuda compilation tools, release 12.2, V12.2.140\nBuild cuda_12.2.r12.2\/compiler.33191640_0\n\n*si la commande est introuvable : <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">ln -s \/usr\/local\/cuda\/bin\/ \/bin\/<\/mark><\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">3. <strong>Clonez et compilez Llama.cpp<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Apr\u00e8s l&rsquo;installation de NVIDIA CUDA, toutes les conditions pr\u00e9alables \u00e0 la compilation de Llama.cpp sont d\u00e9j\u00e0 remplies. Il suffit de cloner llama.cpp et de le compiler.<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ git clone &lt;https:\/\/github.com\/ggerganov\/llama.cpp&gt;\n$ cd llama.cpp\n<\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Pour faire correspondre CUDA arch et CUDA gencode pour les diff\u00e9rentes architectures NVIDIA :<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Modifier le Makefile avant la compilation avec<mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\"> <code>NVCCFLAGS += -arch=all-major<\/code><\/mark> au lieu de <code><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">NVCCFLAGS += -arch=native<\/mark><\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ make\n$ make clean &amp;&amp; <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">LLAMA_CUBLAS=1<\/mark> make -j\n<\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">4. <strong>T\u00e9l\u00e9chargez et ex\u00e9cutez LLaMA-2 70B<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Nous utilisons le mod\u00e8le converti et quantifi\u00e9 de l&rsquo;excellent utilisateur de la communaut\u00e9 HuggingFace, <a href=\"https:\/\/huggingface.co\/TheBloke\">TheBloke<\/a>. Les mod\u00e8les pr\u00e9-quantifi\u00e9s sont disponibles <a href=\"https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-chat-GGUF\">via ce lien<\/a> . Dans le nom du d\u00e9p\u00f4t de mod\u00e8les, GGUF fait r\u00e9f\u00e9rence \u00e0 un nouveau format de fichier de mod\u00e8le introduit en ao\u00fbt 2023 pour Llama.cpp.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Pour t\u00e9l\u00e9charger les fichiers du mod\u00e8le, nous commen\u00e7ons par installer et initialiser <a href=\"https:\/\/git-lfs.com\/\">git-lfs<\/a>.<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ sudo apt install git-lfs\n$ git lfs install\n<\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Vous devriez voir \u00ab\u00a0Git LFS initialized\u00a0\u00bb s&rsquo;afficher dans le terminal apr\u00e8s la derni\u00e8re commande. Ensuite, nous pouvons cloner le d\u00e9p\u00f4t, mais avec des liens vers les fichiers au lieu de les t\u00e9l\u00e9charger tous.<\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>cd models\nGIT_LFS_SKIP_SMUDGE=1 git clone &lt;https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-chat-GGUF&gt;\n<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ cd Llama-2-70B-GGUF\n$ git lfs pull --include llama-2-70b-chat.Q6_K.gguf-split-a\n$ git lfs pull --include llama-2-70b-chat.Q6_K.gguf-split-b\n$ cat llama-2-70b-chat.Q6_K.gguf-split-* &gt; llama-2-70b-chat.Q6_K.gguf &amp;&amp; rm llama-2-70b-chat.Q6_K.gguf-split-*\n<\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Le seul fichier dont nous avons besoin est llama-2-70b-chat.Q6_K.gguf, qui est le mod\u00e8le LlaMA 2 70B trait\u00e9 \u00e0 l&rsquo;aide d&rsquo;une des m\u00e9thodes de quantification \u00e0 6 bits.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Ce mod\u00e8le n\u00e9cessite en moyenne 60 Go de m\u00e9moire. Sur le H100, nous avons 80GB (HBM2e) de VRAM. Le traitement sera effectu\u00e9 enti\u00e8rement sur le GPU du H100 !<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"1024\" height=\"99\" src=\"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27121724\/Capture-decran-2023-10-27-a-12.17.21-1024x99.png\" alt=\"\" class=\"wp-image-22886\" srcset=\"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27121724\/Capture-decran-2023-10-27-a-12.17.21-300x29.png 300w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27121724\/Capture-decran-2023-10-27-a-12.17.21-1024x99.png 1024w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27121724\/Capture-decran-2023-10-27-a-12.17.21-768x74.png 768w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27121724\/Capture-decran-2023-10-27-a-12.17.21-1536x149.png 1536w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27121724\/Capture-decran-2023-10-27-a-12.17.21-18x2.png 18w, https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27121724\/Capture-decran-2023-10-27-a-12.17.21.png 1857w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ .\/main -ngl 100 -t 1 -m llama-2-70b-chat.Q6_K.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p \"&#91;INST] &lt;&lt;SYS&gt;&gt;\\\\nYou are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.\\\\n&lt;&lt;\/SYS&gt;&gt;\\\\n{prompt}&#91;\/INST]\"\n<\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">5. <strong>Servir Llama-2 70B<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">De nombreux programmes utiles sont construits lorsque nous ex\u00e9cutons la commande make pour Llama.cpp.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">main est celui \u00e0 utiliser pour g\u00e9n\u00e9rer du texte dans le terminal.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">perplexity peut \u00eatre utilis\u00e9 pour calculer la perplexit\u00e9 par rapport \u00e0 un ensemble de donn\u00e9es donn\u00e9 \u00e0 des fins d&rsquo;analyse comparative.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Dans cette partie, nous examinons le programme server, qui peut \u00eatre ex\u00e9cut\u00e9 pour fournir un serveur API HTTP simple pour les mod\u00e8les compatibles avec Llama.cpp.<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\/blob\/master\/examples\/server\/README.md\">https:\/\/github.com\/ggerganov\/llama.cpp\/blob\/master\/examples\/server\/README.md<\/a><\/p>\n\n\n\n<pre class=\"wp-block-code\" style=\"font-size:14px\"><code>$ .\/server -m models\/Llama-2-70B-chat-GGUF\/llama-2-70b-chat.Q6_K.gguf \\\\\n    -c 4096 -ngl 100 -t 1 --host 0.0.0.0 --port 8080\n<\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Remplacez <code><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">-t 32 <\/mark><\/code>par le nombre de c\u0153urs physiques du processeur. Par exemple, si le syst\u00e8me a 32 c\u0153urs \/ 64 threads, utilisez -t 32. Si vous d\u00e9chargez compl\u00e8tement le mod\u00e8le sur le GPU, utilisez -t 1 (comme sur le H100).<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Remplacez <code><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">-ngl 80<\/mark><\/code> par le nombre de couches GPU pour lesquelles vous disposez de VRAM (comme H100). Utilisez <code><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">-ngl 100<\/mark><\/code> pour d\u00e9charger toutes les couches sur la VRAM &#8211; si vous avez suffisament de VRAM. Sinon, vous pouvez d\u00e9charger partiellement autant de couches que vous avez de VRAM, sur un ou plusieurs GPU.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Param\u00e8tres: <a href=\"https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-Chat-GGML#how-to-run-in-llamacpp\">https:\/\/huggingface.co\/TheBloke\/Llama-2-70B-Chat-GGML#how-to-run-in-llamacpp<\/a><\/p>\n\n\n\n<pre class=\"wp-block-code horiz-scroll\" style=\"font-size:14px\"><code><code><mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llm_load_tensors<\/mark>: ggml ctx size =    <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">0.23 MB<\/mark>\n<code><mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llm_load_tensors<\/mark><span style=\"font-family: inherit; font-size: inherit; color: initial;\">: using <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">CUDA <\/mark><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-cyan-blue-color\">for<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\"> GPU <\/mark>acceleration<\/span><\/code><mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llm_load_tensors<\/mark>: mem required  = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\"> 205.31 MB<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llm_load_tensors<\/mark>: offloading <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">80<\/mark> repeating layers to <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llm_load_tensors<\/mark>: offloading non-repeating layers to <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llm_load_tensors<\/mark>: offloaded <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">83\/83<\/mark> layers to <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llm_load_tensors: VRAM used: 53760.11 MB<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#937043\" class=\"has-inline-color\">...................................................................................................\n<\/mark><mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_new_context_with_model<\/mark>: n_ctx      = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">4096\nllama_new_context_with_model<\/mark>: freq_base  = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">10000.0<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_new_context_with_model<\/mark>: freq_scale = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">1<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_kv_cache_init<\/mark>: offloading v cache to <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_kv_cache_init<\/mark>: offloading k cache to <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">GPU\nllama_kv_cache_init<\/mark>: <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">VRAM<\/mark> kv self = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">1280.00 MB<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_new_context_with_model<\/mark>: kv self size  = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">1280.00 MB<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_new_context_with_model<\/mark>: compute buffer total size = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">573.88 MB<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_new_context_with_model<\/mark>: VRAM scratch buffer: <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">568.00 MB<\/mark>\n<mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">llama_new_context_with_model<\/mark>: total <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">VRAM used: 55608.11 MB <\/mark>(model: <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">53760.11 MB, context: 1848.00 MB<\/mark>)\n<\/code><\/code><\/pre>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Explication des m\u00e9triques de Llama.cpp :<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Lorsque vous ex\u00e9cutez votre entr\u00e9e, diff\u00e9rentes m\u00e9triques vous sont communiqu\u00e9es afin de mesurer sa performance.<\/p>\n\n\n\n<pre class=\"wp-block-code horiz-scroll\" style=\"font-size:14px\"><code>llama_print_timings:        load <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">time<\/mark> = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">59250.72 ms<\/mark>\nllama_print_timings:      sample <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">time<\/mark> =   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">611.28<\/mark> ms \/   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">180<\/mark> runs   (    <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">3.40<\/mark> ms per token,   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">294.47 <\/mark>tokens per second)\nllama_print_timings: prompt <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">eval time<\/mark> =  <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">1597.63<\/mark> ms \/   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">508<\/mark> tokens (    <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">3.14 <\/mark>ms per token,   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">317.97 <\/mark>tokens per second)\nllama_print_timings:        <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">eval time<\/mark> = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">11703.38<\/mark> ms \/   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">179<\/mark> runs   (   <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">65.38<\/mark> ms per token,    <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">15.29<\/mark> tokens per second)\nllama_print_timings:       total <mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">time<\/mark> = <mark style=\"background-color:rgba(0, 0, 0, 0);color:#8c1a54\" class=\"has-inline-color\">13958.0<\/mark>6 ms\n<\/code><\/pre>\n\n\n\n<ul class=\"wp-block-list\">\n<li>temps de chargement : chargement du fichier mod\u00e8le<\/li>\n\n\n\n<li>temps d&rsquo;\u00e9chantillonnage : g\u00e9n\u00e9ration de jetons \u00e0 partir de l&rsquo;invite\/du fichier en choisissant le jeton probable suivant.<\/li>\n\n\n\n<li>temps d&rsquo;\u00e9valuation de l&rsquo;invite : temps n\u00e9cessaire pour traiter l&rsquo;invite\/le fichier par LLaMa avant de g\u00e9n\u00e9rer un nouveau texte.<\/li>\n\n\n\n<li>eval time : temps n\u00e9cessaire pour g\u00e9n\u00e9rer la sortie (jusqu&rsquo;\u00e0 <code><mark style=\"background-color:rgba(0, 0, 0, 0)\" class=\"has-inline-color has-vivid-red-color\">[end of text]<\/mark><\/code> ou la limite fix\u00e9e par l&rsquo;utilisateur).<\/li>\n\n\n\n<li>total : ensemble<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Comparaison entre <a href=\"http:\/\/replicate.com\/\">Replicate.com<\/a> et les GPU Nvidia H100 h\u00e9berg\u00e9s par <a href=\"http:\/\/scaleway.com\">Scaleway<\/a><\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Apr\u00e8s avoir effectu\u00e9 une centaine de tests au total entre <a href=\"http:\/\/replicate.com\/\">Replicate.com<\/a> et le H100 de Nvidia h\u00e9berg\u00e9 par <a href=\"https:\/\/www.scaleway.com\/en\/h100-pcie-try-it-now\/\">Scaleway<\/a>, nous concluons que la diff\u00e9rence d&rsquo;ex\u00e9cution est de 40 % en faveur de l&rsquo;utilisation des GPU H100-1-80G fournis par Scaleway.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Le score d&rsquo;hallucination sur une \u00e9chelle de 0 \u00e0 3 que nous attribuons \u00e0 <a href=\"http:\/\/golem.ai\/\">Miralia<\/a>, qui repr\u00e9sente la pertinence de la r\u00e9ponse \u00e0 chaque test, n&rsquo;est pas suffisamment repr\u00e9sentatif d&rsquo;une diff\u00e9rence notable entre <a href=\"http:\/\/replicate.com\/\">Replicate.com<\/a> et <a href=\"http:\/\/scaleway.com\/\">Scaleway<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pour en savoir plus, nous vous invitons \u00e0 lire l&rsquo;article sur le <a href=\"https:\/\/www.notion.so\/b25c874f8c6a45caa0520d4fabc654f9?pvs=21\">protocole de test LLM de Miralia<\/a><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Conclusion et ouverture<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Les cas d&rsquo;utilisation vont bien au-del\u00e0 de cette premi\u00e8re exp\u00e9rience. Chez Miralia, nous pensons qu&rsquo;il y a beaucoup d&rsquo;autres fa\u00e7ons d&rsquo;utiliser les LLM avec notre technologie, y compris l&rsquo;outillage et le support pour nos utilisateurs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Ce n&rsquo;est que le d\u00e9but d&rsquo;une longue et passionnante aventure.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Il existe plusieurs Frameworks pour servir les LLM. Chacun a ses propres caract\u00e9ristiques.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Dans cet article, nous avons exp\u00e9riment\u00e9 <a href=\"https:\/\/github.com\/ggerganov\/llama.cpp\">Llama.cpp<\/a> en ex\u00e9cutant le mod\u00e8le LLaMa-2 70b.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Pour en savoir plus sur ce sujet, veuillez lire l&rsquo;<a href=\"https:\/\/betterprogramming.pub\/frameworks-for-serving-llms-60b7f7b23407\">article suivant<\/a>, qui traite sp\u00e9cifiquement de ce sujet.<\/p>\n<!-- AddThis Advanced Settings generic via filter on the_content --><!-- AddThis Share Buttons generic via filter on the_content -->","protected":false},"excerpt":{"rendered":"<p>Plongez dans notre \u00e9tude technique d\u00e9taill\u00e9e sur l&rsquo;optimisation des performances des LLM en utilisant les GPUs Nvidia H100 Nvidia de Scaleway. D\u00e9couvrez nos benchmarks et m\u00e9triques.<!-- AddThis Advanced Settings generic via filter on get_the_excerpt --><!-- AddThis Share Buttons generic via filter on get_the_excerpt --><\/p>\n","protected":false},"author":24,"featured_media":22912,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[68,74],"tags":[90,76,75,80],"class_list":["post-22883","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog","category-technologie","tag-exploitation","tag-ia","tag-intelligence-artificielle","tag-relation-client"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Optimisation de performance LLM avec GPUs de Scaleway : une \u00e9tude technique<\/title>\n<meta name=\"description\" content=\"Une \u00e9tude technique compl\u00e8te sur l&#039;optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/miralia.ai\/fr\/blog\/optimisation-llm-scaleway\/\" \/>\n<meta property=\"og:locale\" content=\"fr_FR\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Optimisation de performance LLM avec GPUs Nvidia de Scaleway : le guide technique\" \/>\n<meta property=\"og:description\" content=\"Une \u00e9tude technique compl\u00e8te sur l&#039;optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/miralia.ai\/fr\/blog\/optimisation-llm-scaleway\/\" \/>\n<meta property=\"og:site_name\" content=\"Miralia.ai\" \/>\n<meta property=\"article:published_time\" content=\"2023-10-27T16:16:06+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-02-17T15:46:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2026\/01\/08111426\/PROFIL_01.png\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Kevin Baude\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Optimisation de performance LLM avec GPUs Nvidia de Scaleway : le guide technique\" \/>\n<meta name=\"twitter:description\" content=\"Une \u00e9tude technique compl\u00e8te sur l&#039;optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.\" \/>\n<meta name=\"twitter:creator\" content=\"@miralia_ai\" \/>\n<meta name=\"twitter:site\" content=\"@miralia_ai\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway\"},\"author\":{\"name\":\"Kevin Baude\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#\\\/schema\\\/person\\\/db79ca16f539a1c864d3c693695d8b85\"},\"headline\":\"Optimisation de performance LLM avec GPUs Nvidia de Scaleway : une \u00e9tude technique\",\"datePublished\":\"2023-10-27T16:16:06+00:00\",\"dateModified\":\"2026-02-17T15:46:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway\"},\"wordCount\":1848,\"publisher\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/27175645\\\/Blog-visuels-petite-banniere-570x200-16.png\",\"keywords\":[\"exploitation\",\"IA\",\"intelligence artificielle\",\"relation client\"],\"articleSection\":[\"Blog\",\"Technologie\"],\"inLanguage\":\"fr-FR\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway\",\"url\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway\",\"name\":\"Optimisation de performance LLM avec GPUs de Scaleway : une \u00e9tude technique\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/27175645\\\/Blog-visuels-petite-banniere-570x200-16.png\",\"datePublished\":\"2023-10-27T16:16:06+00:00\",\"dateModified\":\"2026-02-17T15:46:00+00:00\",\"description\":\"Une \u00e9tude technique compl\u00e8te sur l'optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.\",\"inLanguage\":\"fr-FR\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/blog\\\/optimisation-llm-scaleway#primaryimage\",\"url\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/08111426\\\/PROFIL_01.png\",\"contentUrl\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/08111426\\\/PROFIL_01.png\",\"width\":\"\",\"height\":\"\"},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#website\",\"url\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/\",\"name\":\"Miralia.ai\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"fr-FR\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#organization\",\"name\":\"Miralia\",\"url\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/07142128\\\/Logo-Miralia.png\",\"contentUrl\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2026\\\/01\\\/07142128\\\/Logo-Miralia.png\",\"width\":1061,\"height\":211,\"caption\":\"Miralia\"},\"image\":{\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/x.com\\\/miralia_ai\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/miralia\\\/\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/#\\\/schema\\\/person\\\/db79ca16f539a1c864d3c693695d8b85\",\"name\":\"Kevin Baude\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"fr-FR\",\"@id\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/27162415\\\/photo-kevin-baude-150x150.png\",\"url\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/27162415\\\/photo-kevin-baude-150x150.png\",\"contentUrl\":\"https:\\\/\\\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\\\/wp-content\\\/uploads\\\/2023\\\/10\\\/27162415\\\/photo-kevin-baude-150x150.png\",\"caption\":\"Kevin Baude\"},\"url\":\"https:\\\/\\\/miralia.ai\\\/fr\\\/auteur\\\/kevinb\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Optimisation de performance LLM avec GPUs de Scaleway : une \u00e9tude technique","description":"Une \u00e9tude technique compl\u00e8te sur l'optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/miralia.ai\/fr\/blog\/optimisation-llm-scaleway\/","og_locale":"fr_FR","og_type":"article","og_title":"Optimisation de performance LLM avec GPUs Nvidia de Scaleway : le guide technique","og_description":"Une \u00e9tude technique compl\u00e8te sur l'optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.","og_url":"https:\/\/miralia.ai\/fr\/blog\/optimisation-llm-scaleway\/","og_site_name":"Miralia.ai","article_published_time":"2023-10-27T16:16:06+00:00","article_modified_time":"2026-02-17T15:46:00+00:00","og_image":[{"url":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2026\/01\/08111426\/PROFIL_01.png","width":"","height":"","type":"image\/png"}],"author":"Kevin Baude","twitter_card":"summary_large_image","twitter_title":"Optimisation de performance LLM avec GPUs Nvidia de Scaleway : le guide technique","twitter_description":"Une \u00e9tude technique compl\u00e8te sur l'optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.","twitter_creator":"@miralia_ai","twitter_site":"@miralia_ai","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway#article","isPartOf":{"@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway"},"author":{"name":"Kevin Baude","@id":"https:\/\/miralia.ai\/fr\/#\/schema\/person\/db79ca16f539a1c864d3c693695d8b85"},"headline":"Optimisation de performance LLM avec GPUs Nvidia de Scaleway : une \u00e9tude technique","datePublished":"2023-10-27T16:16:06+00:00","dateModified":"2026-02-17T15:46:00+00:00","mainEntityOfPage":{"@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway"},"wordCount":1848,"publisher":{"@id":"https:\/\/miralia.ai\/fr\/#organization"},"image":{"@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway#primaryimage"},"thumbnailUrl":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27175645\/Blog-visuels-petite-banniere-570x200-16.png","keywords":["exploitation","IA","intelligence artificielle","relation client"],"articleSection":["Blog","Technologie"],"inLanguage":"fr-FR"},{"@type":"WebPage","@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway","url":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway","name":"Optimisation de performance LLM avec GPUs de Scaleway : une \u00e9tude technique","isPartOf":{"@id":"https:\/\/miralia.ai\/fr\/#website"},"primaryImageOfPage":{"@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway#primaryimage"},"image":{"@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway#primaryimage"},"thumbnailUrl":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27175645\/Blog-visuels-petite-banniere-570x200-16.png","datePublished":"2023-10-27T16:16:06+00:00","dateModified":"2026-02-17T15:46:00+00:00","description":"Une \u00e9tude technique compl\u00e8te sur l'optimisation des performances des LLM en utilisant les GPUs Nvidia H100 de Scaleway. Benchmarks et m\u00e9triques d\u00e9taill\u00e9s.","inLanguage":"fr-FR","potentialAction":[{"@type":"ReadAction","target":["https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway"]}]},{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/miralia.ai\/blog\/optimisation-llm-scaleway#primaryimage","url":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2026\/01\/08111426\/PROFIL_01.png","contentUrl":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2026\/01\/08111426\/PROFIL_01.png","width":"","height":""},{"@type":"WebSite","@id":"https:\/\/miralia.ai\/fr\/#website","url":"https:\/\/miralia.ai\/fr\/","name":"Miralia.ai","description":"","publisher":{"@id":"https:\/\/miralia.ai\/fr\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/miralia.ai\/fr\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"fr-FR"},{"@type":"Organization","@id":"https:\/\/miralia.ai\/fr\/#organization","name":"Miralia","url":"https:\/\/miralia.ai\/fr\/","logo":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/miralia.ai\/fr\/#\/schema\/logo\/image\/","url":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2026\/01\/07142128\/Logo-Miralia.png","contentUrl":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2026\/01\/07142128\/Logo-Miralia.png","width":1061,"height":211,"caption":"Miralia"},"image":{"@id":"https:\/\/miralia.ai\/fr\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/miralia_ai","https:\/\/www.linkedin.com\/company\/miralia\/"]},{"@type":"Person","@id":"https:\/\/miralia.ai\/fr\/#\/schema\/person\/db79ca16f539a1c864d3c693695d8b85","name":"Kevin Baude","image":{"@type":"ImageObject","inLanguage":"fr-FR","@id":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27162415\/photo-kevin-baude-150x150.png","url":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27162415\/photo-kevin-baude-150x150.png","contentUrl":"https:\/\/golem-ai-website-wordpress-prod.s3.fr-par.scw.cloud\/wp-content\/uploads\/2023\/10\/27162415\/photo-kevin-baude-150x150.png","caption":"Kevin Baude"},"url":"https:\/\/miralia.ai\/fr\/auteur\/kevinb"}]}},"_links":{"self":[{"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/posts\/22883","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/users\/24"}],"replies":[{"embeddable":true,"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/comments?post=22883"}],"version-history":[{"count":8,"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/posts\/22883\/revisions"}],"predecessor-version":[{"id":38073,"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/posts\/22883\/revisions\/38073"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/media\/22912"}],"wp:attachment":[{"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/media?parent=22883"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/categories?post=22883"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/miralia.ai\/fr\/wp-json\/wp\/v2\/tags?post=22883"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}