zlacker

[parent] [thread] 10 comments
1. sophro+(OP)[view] [source] 2024-02-12 22:11:40
Hey there -

I'm a maintainer (and CEO) of Invoke.

It's something we're monitoring as well.

ROCm has been challenging to work with - we're actively talking to AMD to keep apprised of ways we can mitigate some of the more troublesome experiences that users have with getting Invoke running on AMD (and hoping to expand official support to Windows AMD)

The problem is that a lot of the solutions proposed involve significant/unsustainable dev effort (i.e., supporting an entirely different inference paradigm), rather than "drop in" for the existing Torch/diffusers pipelines.

While I don't know enough about your set up to offer immediate solutions, if you join the discord, am sure folks would be happy to try walking through some manual troubleshooting/experimentation to get you up and running - discord.gg/invoke-ai

replies(2): >>latchk+y3 >>Cu3PO4+Hg
2. latchk+y3[view] [source] 2024-02-12 22:34:02
>>sophro+(OP)
Invoke is awesome. Let me know if you guys want some MI300x to develop/test on. =) We've also got some good contacts at AMD if you need help there as well.
3. Cu3PO4+Hg[view] [source] 2024-02-12 23:48:50
>>sophro+(OP)
Hi! I really appreciate you taking the time to reply.

I have since gotten Invoke to run and was already able to get some results I'm really quite happy with, so thank you for your time and commitment working on Invoke!

I understand that ROCm is still challenging, but it seems my problems were less related to ROCm or Invoke itself and more to Python dependency management. It really boiled down to getting the correct (ROCm) versions of packages installed. Installing Invoke from PyPi always removed my Torch and installed CUDA-enabled Torch (as well as cuBLAS, cuDNN, ...). Once I had the correct versions of packages, everything just worked.

To me, your pyproject.toml looks perfectly sane, so I wasn't sure how to go about fixing the problem.

What ended up working for me was to use one of AMD's ROCm OCI base images, manually installing all dependencies, foregoing a virtual environment, cloning your repo (, building the frontend), and then installing from there.

The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.) Trying to build the Dockerfile from your repo, I also ended up with a CUDA-enabled Torch. It did install the correct one first, but in a later step removed the ROCm-enabled Torch to switch it for the CUDA-enabled one.

I hope you'll consider investing some resources into publishing newer, working builds of your Docker image.

replies(3): >>sophro+py >>doctor+jB >>westur+Az3
◧◩
4. sophro+py[view] [source] [discussion] 2024-02-13 01:54:44
>>Cu3PO4+Hg
You bet - Thanks for the feedback. Glad you're enjoying Invoke!

We do have Docker packages hosted on GH, but I'll be the first to admit that we haven't prioritized ROCm. Contributors who have AMDs are a scant few, but maybe we'll find some help in wrangling that problem now that we know there's an avenue to do so.

replies(2): >>Cu3PO4+s71 >>Cu3PO4+433
◧◩
5. doctor+jB[view] [source] [discussion] 2024-02-13 02:24:01
>>Cu3PO4+Hg
> Installing Invoke from PyPi... To me, your pyproject.toml looks perfectly sane, so I wasn't sure how to go about fixing the problem.

You can't install the PyTorch that's best for the currently running platform using a pyproject.toml with a setuptools backend, for starters. Invoke would have to author a setup.py that deals with all the issues, in a way that is compatible with build isolation.

> The majority of my struggle would have been solved by a recent working Docker image containing a working setup. (The one on Docker Hub is 9 months old.)

Why? Given the state of the ecosystem, what guarantee is there really that the documentation for Docker Desktop with AMD ROCm device binding is going to actually work for your device? (https://rocm.docs.amd.com/projects/MIVisionX/en/latest/docke...)

There is a lot of ad-hoc reinvention of tooling in this space.

replies(1): >>Cu3PO4+c71
◧◩◪
6. Cu3PO4+c71[view] [source] [discussion] 2024-02-13 07:50:48
>>doctor+jB
> You can't install the PyTorch that's best for the currently running platform using a pyproject.toml with a setuptools backend, for starters.

I see. I do know Python, but my knowledge of setuptools, pip, poetry and whatever else have you. To get my working setup, I specified an --index-url for my Torch installation. Does that not work while using their current setup?

> Why? Given the state of the ecosystem, what guarantee is there really that the documentation for Docker Desktop with AMD ROCm device binding is going to actually work for your device?

Well, they did work for me. Though I think only passing /dev/{dri,kfd} and setting seccomp=unconfined was sufficient. So for my particular case, getting a working image was the only missing step.

From a more general POV: it might not make sense to invest in a ROCm OCI image from a short-term business perspective, but in the long term and based purely on principal, I do think the ecosystem should strive to be less reliant on CUDA and only CUDA.

◧◩◪
7. Cu3PO4+s71[view] [source] [discussion] 2024-02-13 07:54:29
>>sophro+py
I hate maintaining my own build instructions as much as the next guy, so I'll try to get your Dockerfile working for me and then send a PR.
◧◩◪
8. Cu3PO4+433[view] [source] [discussion] 2024-02-13 21:17:54
>>sophro+py
As promised in my other comment, I did send a PR! https://github.com/invoke-ai/InvokeAI/pull/5714
◧◩
9. westur+Az3[view] [source] [discussion] 2024-02-14 00:53:26
>>Cu3PO4+Hg
> AMD's ROCm OCI base images,

ROCm docs > "Install ROCm Docker containers" > Base Image: https://rocm.docs.amd.com/projects/install-on-linux/en/lates... links to ROCm/ROCm-docker: https://github.com/ROCm/ROCm-docker which is the source of docker.io/rocm/rocm-terminal: https://hub.docker.com/r/rocm/rocm-terminal :

  docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal
ROCm docs > "Docker image support matrix": https://rocm.docs.amd.com/projects/install-on-linux/en/lates...

ROCm/ROCm-docker//dev/Dockerfile-centos-7-complete: https://github.com/ROCm/ROCm-docker/blob/master/dev/Dockerfi...

Bazzite is a ublue (Universal Blue) fork of the Fedora Kinoite (KDE) or Fedora Silverblue (Gnome) rpm-ostree Linux distributions; ublue-os/bazzite//Containerfile : https://github.com/ublue-os/bazzite/blob/main/Containerfile#... has, in addition to fan and power controls, automatic updates on desktop, supergfxctl, system76-scheduler, and an fsync kernel:

  rpm-ostree install rocm-hip \
        rocm-opencl \
        rocm-clinfo
But it's not `rpm-ostree install --apply-live` because its a Containerfile.

To install a ublue-os distro, you install any of the Fedora ostree distros: {Silverblue, Kinoite, Sway Atomic, or Budgie Atomic} from e.g. a USB stick and then `rpm-ostree rebase <OCI_host_image_url>`:

  rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite:stable
  rpm-ostree rebase ostree-unverified-registry:ghcr.io/ublue-os/bazzite-nvidia:stable
  rpm-ostree rebase ostree-image-signed:
ublue-os/config//build/ublue-os-just/40-nvidia.just defines the `ujust configure-nvidia` and `ujust toggle-nvk` commands: https://github.com/ublue-os/config/blob/main/build/ublue-os-...

There's a default `distrobox` with pytorch in ublue-os/config//build/ublue-os-just/etc-distrobox/apps.ini: https://github.com/ublue-os/config/blob/main/build/ublue-os-...

  [mlbox]
  image=nvcr.io/nvidia/pytorch:23.08-py3
  additional_packages="nano git htop"
  init_hooks="pip3 install huggingface_hub tokenizers transformers accelerate datasets wandb peft bitsandbytes fastcore fastprogress watermark torchmetrics deepspeed"
  pre-init-hooks="/init_script.sh"
  nvidia=true
  pull=true
  root=false
  replace=false
docker.io/rocm/pytorch: https://hub.docker.com/r/rocm/pytorch

pytorch/builder//manywheel/Dockerfile: https://github.com/pytorch/builder/blob/main/manywheel/Docke...

ROCm/pytorch//Dockerfile: https://github.com/ROCm/pytorch/blob/main/Dockerfile

The ublue-os (and so also bazzite) OCI host image Containerfile has Sunshine installed; which is a 4k HDR 120fps remote desktop solution for gaming.

There's a `ujust remove-sunshine` command in system_files/desktop/shared/usr/share/ublue-os/just/80-bazzite.just : https://github.com/ublue-os/bazzite/blob/main/system_files/d... and also kernel args for AMD:

  pstate-force-enable:
    rpm-ostree kargs --append-if-missing=amd_pstate=active
ublue-os/config//Containerfile: https://github.com/ublue-os/config/blob/main/Containerfile

LizardByte/Sunshine: https://github.com/LizardByte/Sunshine

moonlight-stream https://github.com/moonlight-stream

Anyways, hopefully this PR fixes the immediate issue: https://github.com/invoke-ai/InvokeAI/pull/5714/files

conda-forge/pytorch-cpu-feedstock > "Add ROCm variant?": https://github.com/conda-forge/pytorch-cpu-feedstock/issues/...

And Fedora supports OCI containers as host images and also podman container images with just systemd to respawn one or a pod of containers.

replies(1): >>Cu3PO4+Hj4
◧◩◪
10. Cu3PO4+Hj4[view] [source] [discussion] 2024-02-14 08:43:50
>>westur+Az3
I actually used the rocm/pytorch image you also linked.

I'm not sure what you're pointing to with your reference to the Fedora-based images. I'm quite happy with my NixOS install and really don't want to switch to anything else. And as long as I have the correct kernel module, my host OS really shouldn't matter to run any of the images.

And I'm sure it can be made to work with many base images, my point was just that the dependency management around pytorch was in a bad state, where it is extremely easy to break.

> Anyways, hopefully this PR fixes the immediate issue: https://github.com/invoke-ai/InvokeAI/pull/5714/files

It does! At least for me. It is my PR after all ;)

replies(1): >>westur+KS4
◧◩◪◨
11. westur+KS4[view] [source] [discussion] 2024-02-14 14:17:09
>>Cu3PO4+Hj4
Unfortunately NixOS (and Debian and Ubuntu) lack SELinux policies or other LSM implementations by default out of the box, and container-selinux contains more than e.g. docker.

Is there a way to 'restorecon --like / /nix/os/root72`; to apply SELonix extended filesystem attributes labels just to NixOS prefixes?

Some research is done with RPM-based distros; which have become so advanced with rpm-ostree support.

FWICS Bazzite has NixOS support, too; in addition to distrobox containers.

Bazzite has alot of other stuff installed that's not necessary when attempting to isolate sources of variance in the interest of reproducible research; but being for gaming it has various optimizations.

InvokeAI might be faster to install and to compute with with conda-forge builds.

[go to top]