README.adoc 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476
  1. = Linux Metrics Exporter for OpenShift Nodes =
  2. :author: Grega Bremec
  3. :email: gregab-at-p0f-dot-net
  4. :revnumber: 1.0
  5. :revdate: 6th November 2022
  6. :toc:
  7. :toc-placement!:
  8. toc::[]
  9. ifdef::env-github[]
  10. :tip-caption: :bulb:
  11. :note-caption: :information_source:
  12. :important-caption: :heavy_exclamation_mark:
  13. :caution-caption: :fire:
  14. :warning-caption: :warning:
  15. endif::[]
  16. == Components ==
  17. . Container Image for SAR
  18. . Container Image for PSACCT
  19. . Container Image for Exporter
  20. == How It All Works ==
  21. Very simple: two sidecar containers, `collector-sysstat` and
  22. `collector-psacct`, produce data on a shared ephemeral volume, and the third
  23. container, `metrics-exporter`, consumes the data and exposes it on the
  24. `/q/metrics` endpoint where Prometheus can pick them up.
  25. The specific thing about how the entire composition works is that care has been
  26. taken, especially with `psacct` (which can grow excessively during periods of
  27. high activity), that accounting files are regularly truncated or moved out of
  28. the way in order to keep the disk space utilisation as low as possible.
  29. == OpenShift Deployment ==
  30. The easiest? Just use `Kustomize` to deploy existing resource definitions from
  31. the `exporter` manifest in `deployment/`:
  32. [subs=+quotes]
  33. ------
  34. $ *oc apply -k ./deployment/exporter/base/*
  35. ------
  36. The above will create everything in the `exporter` project. If you need to
  37. change that, or some other settings, feel free to have a look at the `custom`
  38. kustomization next to `base`, then apply it instead of the base set of
  39. resources.
  40. [subs=+quotes]
  41. ------
  42. $ *cat deployment/exporter/custom/use-custom-namespace.yml*
  43. apiVersion: builtin
  44. kind: NamespaceTransformer
  45. metadata:
  46. namespace: *my-very-own-namespace*
  47. setRoleBindingSubjects: allServiceAccounts
  48. fieldSpecs:
  49. - path: metadata/name
  50. kind: Namespace
  51. $ *oc apply -k ./deployment/exporter/custom/*
  52. ------
  53. You should have Prometheus deployed somewhere prior to that though, so you
  54. might want to have a look at least at the kustomizations for the `integrate`
  55. manifest in order to target the right places.
  56. TBD
  57. If you still need to deploy Prometheus, there is a sample manifest in there as
  58. well. Two, actually. One to deploy the Prometheus and Grafana operators (you
  59. won't believe it, it's called `operators`), and once those are running, you can
  60. use the other one (called very innovatively `prometheus`) to deploy their
  61. actual instances. That will also target the `prometheus` OpenShift project, so
  62. kustomize away if that's not what you want.
  63. == Standalone Containers ==
  64. Start the composition.
  65. // TODO: podman pod
  66. [subs=+quotes]
  67. ------
  68. $ *podman volume create metrics*
  69. metrics
  70. $ *podman run -d --rm -v metrics:/var/account --cap-add SYS_PACCT --pid=host collector-psacct:latest*
  71. dd9f4825d23614df2acefdcd70ec1e6c3ea18a58b86c9d17ddc4f91038487919
  72. $ *podman run -d --rm -v metrics:/var/log/sa collector-sysstat*
  73. ec3d0957525cc907023956a185b15123c20947460a48d37196d511ae42de2e27
  74. $ *podman run --name exporter -d --rm -v metrics:/metrics -p 8080:8080 metrics-exporter*
  75. d4840ad57bfffd4b069e7c2357721ff7aaa6b6ee77f90ad4866a76a1ceb6adb7
  76. ------
  77. Configure prometheus with a data source from the `exporter` container.
  78. [subs=+quotes]
  79. ------
  80. $ *podman inspect -f '{{.NetworkSettings.IPAddress}}' exporter*
  81. 10.88.0.8
  82. $ *tail -n15 tmp-test/prometheus.yml*
  83. scrape_configs:
  84. # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  85. - job_name: "prometheus"
  86. static_configs:
  87. - targets: ["localhost:9090"]
  88. **- job_name: "exporter"
  89. metrics_path: "/q/metrics"
  90. scheme: "http"
  91. static_configs:
  92. - targets: ["10.88.0.8:8080"]
  93. scrape_interval: 10s
  94. scrape_timeout: 5s**
  95. ------
  96. Add prometheus and grafana.
  97. [subs=+quotes]
  98. ------
  99. $ *podman run --name prometheus \*
  100. *-d --rm \*
  101. *-v ./test/prometheus.yml:/etc/prometheus/prometheus.yml*
  102. *-v prometheus:/prometheus \*
  103. *-p 9090:9090 \*
  104. *registry.redhat.io/openshift4/ose-prometheus:v4.11*
  105. 6eae04677fcded65bbe1cb7f66aa887d94587977a0616f7ec838f9453702474c
  106. $ *podman run --name grafana -d --rm -p 3000:3000 \*
  107. *-v ./test/grafana.ini:/etc/grafana/grafana.ini \*
  108. *registry.redhat.io/openshift4/ose-grafana:v4.11*
  109. 78d5bfa7977923b828c1818bb877fa87bdd96086cc8c875fbc46073489f6760e
  110. ------
  111. Configure grafana with prometheus as the datasource and dashboard away!
  112. .Process Accounting Graphs from a Single Host
  113. image::pics/psacct-sample.png[scaledwidth="95%" width="95%"]
  114. .Sysstat Scheduler Information, Single Host
  115. image::pics/sysstat-sample-sched.png[scaledwidth="95%" width="95%"]
  116. .Sysstat I/O Information, Single Host
  117. image::pics/sysstat-sample-io.png[scaledwidth="95%" width="95%"]
  118. == Container Images ==
  119. This set of images requires a valid entitlement for RHEL (and consequently
  120. either a RHEL system to build on or a RHEL system to create an entitlement
  121. secret from).
  122. IMPORTANT: You do not have to build the images, I have already built them (for
  123. `x86_64` architecture only) and made them available on `quay.io/benko/`.
  124. === SAR ===
  125. The _system activity reporting_ image is based on `ubi-minimal` and includes
  126. just the `sysstat` package.
  127. It expects a volume to be attached at `/var/log/sa`.
  128. Entrypoint takes care of initialising the `saXX` files.
  129. // TODO: and rotating any old files out of the way.
  130. It *requires* to be executed under `root` UID (can be rootless, but that may
  131. affect your data depending on host and container configuration).
  132. It also *requires* access to host's network namespace if you want to measure
  133. global network statistics.
  134. ==== Parameters ====
  135. `PERIOD`::
  136. Sampling period in seconds. Defaults to `10`. Increase this to something
  137. like `30` (or more) for hosts with many network interfaces, block devices,
  138. and/or CPUs.
  139. `STARTUP_SCRATCH`::
  140. Whether to scratch existing `sa1` data at startup. Defaults to `0`, but
  141. could be anything except `1`, `yes`, or `true`, which activates it.
  142. `STARTUP_ROTATE`::
  143. Whether to mark data as rotated at startup. Basically just writes a marker
  144. in the previous `sadc` data file. Defaults to `0`, but could be anything
  145. except `1`, `yes`, or `true`, which activates it.
  146. === PSACCT ===
  147. The _process accounting_ image is based on `ubi-minimal` and includes just the
  148. `psacct` package.
  149. It expects a volume to be attached at `/var/account`.
  150. Entrypoint takes care of rotating any old `pacct` files out of the way.
  151. In addition to *requiring* execution under a *real* `root` UID (i.e. *NOT* a
  152. rootless container), it also *requires* the `CAP_SYS_PACCT` capability
  153. (`--cap-add=SYS_PACCT`) and access to host's PID namespace (`--pid=host`).
  154. ==== Parameters ====
  155. `PERIOD`::
  156. Sampling period in seconds. Defaults to `10`. Increase this to something
  157. like `30` (or more) for hosts with many thousands of processes.
  158. `CUMULATIVE`::
  159. Tells the collection process to never reset the `pacct` file and just keep
  160. it growing, thus reporting cumulative stats since container start. Beware
  161. that the `pacct` file will grow correspondinly large as time goes by.
  162. `STARTUP_SCRATCH`::
  163. Whether to scratch existing `pacct` data at startup. Defaults to `0`, but
  164. could be anything except `1`, `yes`, or `true`, which activates it.
  165. === Exporter ===
  166. The brain of the group.
  167. // TODO: Add support for hostname overrides in app.
  168. // run a maven registry.access.redhat.com/ubi9/openjdk-17 container:
  169. //
  170. // podman volume create maven
  171. //
  172. // podman run -it \
  173. // --name exporter \
  174. // -v maven:/home/default/.m2/repository \
  175. // -v metrics:/metrics \
  176. // -v /Users/johndoe/Documents/workspaces/projects/p0f/linux-metrics-exporter/exporter:/exporter \
  177. // -p 8080:8080 \
  178. // registry.access.redhat.com/ubi9/openjdk-17 bash
  179. //
  180. // $ cd /exporter
  181. // $ mvn quarkus:dev
  182. ==== Parameters ====
  183. In `application.properties` or as Java system properties:
  184. `exporter.data.path`::
  185. Override the location where the metrics files are expected to show up.
  186. Defaults to `/metrics` but obviously can't be that for testing outside of a
  187. container.
  188. You can set the same settings https://quarkus.io/guides/config-reference[from environment variables].
  189. ==== Debugging ====
  190. There are a couple of logger categories that might help you see what's going on.
  191. By default, the routes are fairly noisy, as apparently `TRACE` level logging
  192. doesn't work for some reason, so I had to bump everything up a level, so at
  193. `INFO` you already see a note about every record that's been processed - you
  194. will see their unmarshaled bodies (completely shameless, I know).
  195. These can be bumped up to `DEBUG` if you need more info:
  196. `psacct-reader`::
  197. The route reading process accounting files from `psacct-dump-all` file.
  198. Pretty much all the logic is here, but since there can be a large number of
  199. process records in the file it is split and each record is processed
  200. asynchronously by the dispatch route.
  201. `psacct-dispatch`::
  202. The route dispatching the records to the registration service.
  203. `psacct-reset`::
  204. To be able to work with instantaneous data, rather than cumulative, all
  205. previously registered records are synchronously reset to zero upon the
  206. arrival of a new snapshot. This prevents metrics for previously registered
  207. processes from disappearing.
  208. `sysstat-reader`::
  209. The route that reads `sysstat-dump.json` file. All the logic is here.
  210. `net.p0f.openshift.metrics`::
  211. Non-camel stuff is all logged in this category.
  212. `net.p0f.openshift.metrics.exporter`::
  213. Metric registration and a silly REST endpoint that reports the version.
  214. `net.p0f.openshift.metrics.model`::
  215. `ProcessAccountingRecord` and `SysstatMeasurement` live here.
  216. `net.p0f.openshift.metrics.processor`::
  217. Just a simple processor that transforms a `psacct` record into CSV.
  218. `net.p0f.openshift.metrics.routes`::
  219. Camel routes. See the first four categories for this.
  220. === Building with Podman ===
  221. If building the images using `podman` on an entitled host, no extra steps need
  222. to be performed as host entitlements will automatically be imported into the
  223. build container.
  224. [NOTE]
  225. ========
  226. When building for an architecture without the `ubi-minimal` image or on a
  227. host that can not be entitled (f.e. Fedora CoreOS), you can choose a different
  228. base image by using the `--from` option in `podman build`.
  229. [subs=+quotes]
  230. -------------------------------
  231. $ *podman build --from=registry.fedoraproject.org/fedora-minimal:36 -f ./images/Containerfile-sysstat -t collector-sysstat:latest*
  232. -------------------------------
  233. ========
  234. You will have noticed there is no `Containerfile` for exporter. That is because
  235. `quarkus-maven-plugin` can do just fine
  236. https://quarkus.io/guides/container-image[building an image on its own]. Just
  237. add the `jib` extension and tell it to push the image somewhere.
  238. [subs=+quotes]
  239. -------------------------------
  240. $ *mvn package -Dquarkus.container-image.build=true -Dquarkus.container-image.push=true -Dquarkus.container-image.registry=foo*
  241. -------------------------------
  242. === Building in OpenShift ===
  243. ==== Collector Images ====
  244. If building the images in OpenShift Container Platform, you must make sure an
  245. entitlement secret and corresponding RHSM certificate secret are mounted inside
  246. the build pod in order for packages to be found and installed.
  247. NOTE: The entitled system architecture needs to match the container host!
  248. The process is as follows.
  249. .Verify access to host entitlement data.
  250. [subs=+quotes]
  251. -------------------------------
  252. $ **ls -l /etc/pki/entitlement/*.pem /etc/rhsm/ca/*.pem**
  253. -rw-r--r--. 1 root root 3272 Oct 31 06:09 /etc/pki/entitlement/_6028779042203586857_-key.pem
  254. -rw-r--r--. 1 root root 149007 Oct 31 06:09 /etc/pki/entitlement/_6028779042203586857_.pem
  255. -rw-r--r--. 1 root root 2305 Sep 2 2021 /etc/rhsm/ca/redhat-entitlement-authority.pem
  256. -rw-r--r--. 1 root root 7411 Sep 2 2021 /etc/rhsm/ca/redhat-uep.pem
  257. -------------------------------
  258. .Create corresponding secrets.
  259. [subs=+quotes]
  260. -------------------------------
  261. $ *oc create secret generic etc-pki-entitlement \*
  262. *--from-file=/etc/pki/entitlement/_6028779042203586857_-key.pem \*
  263. *--from-file=/etc/pki/entitlement/_6028779042203586857_.pem*
  264. secret/etc-pki-entitlement created
  265. $ *oc create secret generic rhsm-ca \*
  266. *--from-file=/etc/rhsm/ca/redhat-entitlement-authority.pem \*
  267. *--from-file=/etc/rhsm/ca/redhat-uep.pem*
  268. secret/rhsm-ca created
  269. -------------------------------
  270. .Make sure the BuildConfig mounts those secrets.
  271. [subs=+quotes]
  272. -------------------------------
  273. apiVersion: build.openshift.io/v1
  274. kind: BuildConfig
  275. ...
  276. strategy:
  277. type: Docker
  278. dockerStrategy:
  279. dockerfilePath: Containerfile-psacct
  280. from:
  281. kind: ImageStreamTag
  282. name: ubi-minimal:latest
  283. **volumes:
  284. - source:
  285. type: Secret
  286. secret:
  287. secretName: etc-pki-entitlement
  288. name: etc-pki-entitlement
  289. mounts:
  290. - destinationPath: /etc/pki/entitlement
  291. - source:
  292. type: Secret
  293. secret:
  294. secretName: rhsm-ca
  295. name: rhsm-ca
  296. mounts:
  297. - destinationPath: /etc/rhsm/ca**
  298. -------------------------------
  299. `Containerfile` instructions are written such that they should work without
  300. modification regardless of whether the build is running in `podman` on an
  301. entitled host or inside a correctly configured OpenShift builder pod.
  302. NOTE: Key thing in `Containerfile` steps is to remove `/etc/rhsm-host` at some
  303. point unless `/etc/pki/entitlement-host` contains something (such as for
  304. example, valid entitlemets). Both are symlinks to `/run/secrets`.
  305. ==== Exporter Image ====
  306. ===== Java Build =====
  307. Java build is relatively simple.
  308. Figure out what OpenJDK image is available in the cluster and create a new build.
  309. [subs=+quotes]
  310. -------------------------------
  311. $ *oc new-build openjdk-11-rhel8:1.0~https://github.com/benko/linux-metrics-exporter.git --context-dir=exporter*
  312. -------------------------------
  313. Wait for the build to complete (it's going to take quite some time to download all deps) and that's it!
  314. If you're experimenting with the code, don't forget to mark the build as incremental.
  315. [subs=+quotes]
  316. -------------------------------
  317. $ *oc patch bc/linux-metrics-exporter -p '{"spec": {"strategy": {"sourceStrategy": {"incremental": true}}}}'*
  318. -------------------------------
  319. ===== Native Build =====
  320. TBD
  321. // For the native build, you need a specific Mandrel image. Import it first.
  322. //
  323. // $ oc import-image mandrel --from=registry.redhat.io/quarkus/mandrel-21-rhel8:latest --confirm
  324. // imagestream.image.openshift.io/mandrel imported
  325. // ...
  326. ===== Publishing the Image =====
  327. Make sure the internal OpenShift image registry is exposed if you want to copy the image somewhere else.
  328. [subs=+quotes]
  329. -------------------------------
  330. $ *oc patch config.imageregistry/cluster --type=merge -p '{"spec": {"defaultRoute": true}}'*
  331. -------------------------------
  332. Login to both source and target registries.
  333. [subs=+quotes]
  334. -------------------------------
  335. $ *podman login quay.io*
  336. Username: *youruser*
  337. Password: *yourpassword*
  338. Login Succeeded!
  339. $ *oc whoami -t*
  340. sha256~8tIizkcLNroDEcWXJgoPMsVYUriK1sGnJ6N94WSveEU
  341. $ podman login default-route-openshift-image-registry.apps.your.openshift.cluster
  342. Username: _this-is-irrelevant_
  343. Password: *token-pasted-here*
  344. Login Succeeded!
  345. -------------------------------
  346. Then simply copy the image using `skopeo`.
  347. [subs=+quotes]
  348. -------------------------------
  349. $ *skopeo copy \*
  350. *docker://default-route-openshift-image-registry.apps.your.openshift.cluster/project/linux-metrics-exporter:latest \*
  351. *docker://quay.io/youruser/yourimage:latest*
  352. -------------------------------
  353. == Acknowledgements ==
  354. Thanks to https://github.com/divinitus/[Piotr Baranowski] for the idea about running `sa1` in a DaemonSet.