INSTRUCTLAB.adoc 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269
  1. == Testing InstructLab Models Locally ==
  2. === What is InstructLab? ===
  3. There's a large variety of https://huggingface.co/models[models] available from https://huggingface.co[HuggingFace], and https://huggingface.co/instructlab[InstructLab] is an open-source collection of LLMs with tools that allow users to both use, and improve, LLMs based on Granite models.
  4. There are also OCI model images available on https://catalog.redhat.com/search?gs&q=granite%208b[Red Hat Ecosystem Catalog] (the link is just for the Granite 8b family).
  5. A https://developers.redhat.com/articles/2024/08/01/open-source-ai-coding-assistance-granite-models[Red Hat blog] by Cedric Clyburn shows how you can use Ollama and InstructLab to run LLMs locally in a lot more detail, so I'll keep it short and with a focus on Conda here.
  6. === Setting Up the Environment ===
  7. You can use one of the provided environment files, `env-ilab-25.yml`, to create a Conda environment with the `instructlab` package version `0.25.x`.
  8. This gives you the basic environment that enables you to start serving and chatting to various HuggingFace (and other) Transformer-based models.
  9. Just like with any other Conda environment, start by creating the desired configuration.
  10. [subs="+quotes"]
  11. ----
  12. $ *source conda-init.sh*
  13. (base) $ *mamba env create -y -f envs/env-ilab-25.yml*
  14. Channels:
  15. - conda-forge
  16. Platform: osx-arm64
  17. Collecting package metadata (repodata.json): done
  18. Solving environment: done
  19. Downloading and Extracting Packages:
  20. ...
  21. Preparing transaction: done
  22. Verifying transaction: done
  23. Executing transaction: done
  24. ...
  25. ----
  26. ====
  27. NOTE: The installation uses `pip` to install `instructlab` as there are no Conda Forge packages for it. Be patient, it takes quite some time.
  28. ====
  29. Activate the environment and create a `bash` completion file.
  30. [subs="+quotes"]
  31. ----
  32. (base) $ *mamba activate ilab-25*
  33. (ilab-25) $ *_ILAB_COMPLETE=bash_source ilab > ilab.completion*
  34. (ilab-25) $ *source ilab.completion*
  35. ----
  36. Check the system information.
  37. [subs="+quotes"]
  38. ----
  39. (ilab-25) $ *ilab system info*
  40. Platform:
  41. sys.version: 3.11.12 | packaged by conda-forge | (main, Apr 10 2025, 22:18:52) [Clang 18.1.8 ]
  42. sys.platform: darwin
  43. os.name: posix
  44. platform.release: 24.4.0
  45. platform.machine: arm64
  46. platform.node: foobar
  47. platform.python_version: 3.11.12
  48. platform.cpu_brand: Apple M1 Max
  49. memory.total: 64.00 GB
  50. memory.available: 25.36 GB
  51. memory.used: 14.97 GB
  52. InstructLab:
  53. instructlab.version: 0.25.0
  54. ...
  55. Torch:
  56. torch.version: 2.5.1
  57. ...
  58. __torch.backends.mps.is_built: True
  59. torch.backends.mps.is_available: True__
  60. llama_cpp_python:
  61. llama_cpp_python.version: 0.3.6
  62. _llama_cpp_python.supports_gpu_offload: True_
  63. ----
  64. The PyTorch `mps` and Llama `supports_gpu_offload` settings show that InstructLab is capable of using the M1 Max GPU for serving.
  65. === Downloading Models ===
  66. Visit the InstructLab page and choose a model to download (for this demo, I selected `granite-3.0-8b-lab-community`).
  67. Use the `ilab model download` command to pull it.
  68. By default, models will be stored in `~/.cache/instructlab/models/`, unless you say otherwise with the `--model-dir` option to `ilab model` command.
  69. [subs="+quotes"]
  70. ----
  71. (ilab-25) $ *ilab model download -rp instructlab/granite-3.0-8b-lab-community*
  72. INFO 2025-04-14 13:29:59,724 instructlab.model.download:77: Downloading model from Hugging Face:
  73. Model: instructlab/granite-3.0-8b-lab-community@main
  74. Destination: /foo/bar/.cache/instructlab/models
  75. ...
  76. INFO 2025-04-14 13:36:13,171 instructlab.model.download:288:
  77. ᕦ(òᴗóˇ)ᕤ instructlab/granite-3.0-8b-lab-community model download completed successfully! ᕦ(òᴗóˇ)ᕤ
  78. INFO 2025-04-14 13:36:13,171 instructlab.model.download:302: Available models (\`ilab model list`):
  79. +------------------------------------------+...+---------+--------------------------+
  80. | Model Name |...| Size | Absolute path |
  81. +------------------------------------------+...+---------+--------------------------+
  82. | instructlab/granite-3.0-8b-lab-community |...| 15.2 GB | .../models/instructlab |
  83. +------------------------------------------+...+---------+--------------------------+
  84. ----
  85. ====
  86. NOTE: LLMs are usually quite large (as the name suggests) so be patient and set aside sufficient amount of disk space. The above model is a total download of 17 GiB, so even on a fast link it takes a couple of minutes to download.
  87. ====
  88. Note that the absolute path to model is a directory - if you look inside it, there will be a subdirectory containing the actual download.
  89. The format of the model is HuggingFace _safetensors_, which requires the https://github.com/vllm-project/vllm.git[vLLM] serving backend, and is not supported on macOS by default.
  90. From here on, there are two options: either install vLLM manually, or use `llama.cpp` to convert the model to GGUF.
  91. Personally, I prefer the second option as it very often also results in a smaller model, and does not require too much manual hacking about. You can even have a separate Conda environment just for `llama.cpp`.
  92. === Installing vLLM on macOS ===
  93. If you used the InstructLab env file provided in this repo, you should already have `cmake`, `torch`, and `torchvision` modules in the environment. If not, ensure they are available.
  94. During the compilation, `pip` in particular may complain about some incompatibilities. Just ignore it.
  95. First, clone Triton and install it.
  96. [subs="+quotes"]
  97. ----
  98. (ilab-25) $ *git clone https://github.com/triton-lang/triton.git*
  99. Cloning into 'triton'...
  100. ...
  101. (ilab-25) $ *cd triton/python*
  102. (ilab-25) $ *pip install -e .*
  103. Obtaining file:///foo/bar/baz/triton/python
  104. ...
  105. Successfully built triton
  106. Installing collected packages: triton
  107. Successfully installed triton-3.3.0+git32b42821
  108. (ilab-25) $ *cd ../..*
  109. (ilab-25) $ *rm -rf ./triton/*
  110. ----
  111. ====
  112. NOTE: Triton compilation takes quite a long time and it appears to be doing nothing. Don't worry.
  113. ====
  114. Clone vLLM and build it.
  115. [subs="+quotes"]
  116. ----
  117. (ilab-25) $ *git clone https://github.com/vllm-project/vllm.git*
  118. Cloning into 'vllm'...
  119. ...
  120. (ilab-25) $ *cd vllm*
  121. (ilab-25) $ *sed -i 's/^triton==3.2/triton==3.3/' requirements/requirements-cpu.txt*
  122. (ilab-25) $ *pip install -e .*
  123. Obtaining file:///foo/bar/baz/vllm
  124. ...
  125. Successfully built vllm
  126. Installing collected packages: vllm
  127. Successfully installed vllm-0.8.5.dev3+g7cbfc1094.d20250414
  128. (ilab-25) $ *cd ..*
  129. (ilab-25) $ *rm -rf ./vllm/*
  130. ----
  131. ====
  132. NOTE: vLLM 0.8.5 somehow imposes a restriction of maximum version of Triton being 3.2.0, which is not necessary.
  133. ====
  134. References:
  135. * https://github.com/triton-lang/triton[Triton Development Repository]
  136. * https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html?device=apple[Building vLLM for Apple Silicon]
  137. === Converting Models to GGUF ===
  138. You can use https://github.com/ggerganov/llama.cpp.git[`llama.cpp`] to convert models from HF, GGML, and LORA model formats to GGUF, which InstructLab can serve even on a Mac.
  139. Clone and build `llama.cpp`.
  140. [subs="+quotes"]
  141. ----
  142. (ilab-25) $ *git clone https://github.com/ggerganov/llama.cpp.git*
  143. Cloning into 'llama.cpp'...
  144. ...
  145. (ilab-25) $ *cd llama.cpp*
  146. (ilab-25) $ *pip install --upgrade -r requirements.txt*
  147. Looking in indexes: https://pypi.org/simple, ...
  148. ...
  149. Successfully installed aiohttp-3.9.5 ...
  150. ----
  151. You can now use the various `convert_*.py` scripts. In our case, it would be HF (HuggingFace) to GGUF conversion.
  152. [subs="+quotes"]
  153. ----
  154. (ilab-25) $ *./convert_hf_to_gguf.py \*
  155. *~/.cache/instructlab/models/instructlab/granite-3.0-8b-lab-community/ \*
  156. *--outfile ~/.cache/instructlab/models/granite-3.0-8b-lab-community.gguf \*
  157. *--outtype q8_0*
  158. INFO:hf-to-gguf:Loading model: granite-3.0-8b-lab-community
  159. INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
  160. INFO:hf-to-gguf:Exporting model...
  161. INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
  162. INFO:hf-to-gguf:gguf: loading model part 'model-00001-of-00004.safetensors'
  163. ...
  164. INFO:hf-to-gguf:Model successfully exported to /foo/bar/.cache/instructlab/models/granite-3.0-8b-lab-community.gguf
  165. (ilab-25) $ ilab model list
  166. +------------------------------------------+...+---------+---------------------------------------+
  167. | Model Name |...| Size | Absolute path |
  168. +------------------------------------------+...+---------+---------------------------------------+
  169. | instructlab/granite-3.0-8b-lab-community |...| 15.2 GB | .../instructlab |
  170. | granite-3.0-8b-lab-community.gguf |...| 8.1 GB | .../granite-3.0-8b-lab-community.gguf |
  171. +------------------------------------------+...+---------+---------------------------------------+
  172. ----
  173. Reference: https://github.com/ggml-org/llama.cpp/discussions/2948[Tutorial: How to convert HuggingFace model to GGUF format] on GitHub.
  174. === Serving Models ===
  175. Start the model server.
  176. [subs="+quotes"]
  177. ----
  178. (ilab-25) $ *ilab model serve \*
  179. *--model-path /foo/bar/.cache/instructlab/models/granite-3.0-8b-lab-community.gguf*
  180. INFO 2025-04-14 14:49:05,624 instructlab.model.serve_backend:79: Setting backend_type in the serve config to llama-cpp
  181. INFO 2025-04-14 14:49:05,633 instructlab.model.serve_backend:85: Using model '/foo/bar/.cache/instructlab/models/granite-3.0-8b-lab-community.gguf' with -1 gpu-layers and 4096 max context size.
  182. ...
  183. INFO 2025-04-14 14:49:12,050 instructlab.model.backends.llama_cpp:233: Starting server process, press CTRL+C to shutdown server...
  184. INFO 2025-04-14 14:49:12,050 instructlab.model.backends.llama_cpp:234: After application startup complete see http://127.0.0.1:8000/docs for API.
  185. ----
  186. In another terminal, start a chat.
  187. [subs="+quotes"]
  188. ----
  189. (ilab-25) $ *ilab model chat*
  190. ╭─────────────────────────────────────── system ────────────────────────────────────────╮
  191. │ Welcome to InstructLab Chat w/ GRANITE-3.0-8B-LAB-COMMUNITY.GGUF (type /h for help) │
  192. ╰───────────────────────────────────────────────────────────────────────────────────────╯
  193. >>> *what are your specialties?*
  194. My specialties include providing assistance with general tasks such as setting up a new device, troubleshooting software issues, and answering basic questions about using technology.
  195. I can also help with more specific tasks related to Linux, such as configuring network settings, managing users and groups, and installing software packages. I have experience working with various Linux distributions, including Red Hat Enterprise Linux, Fedora, Ubuntu, and Debian.
  196. Additionally, I am familiar with a wide range of programming languages, tools, and frameworks, including Python, Java, C++, Ruby on Rails, AngularJS, React, and Node.js.
  197. I hope this information is helpful! Let me know if you have any other questions.
  198. ----
  199. Congratulations!