Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an installation guide for cloud server users. #1261

Open
20040714Mike opened this issue Aug 19, 2022 · 2 comments
Open

Add an installation guide for cloud server users. #1261

20040714Mike opened this issue Aug 19, 2022 · 2 comments

Comments

@20040714Mike
Copy link

Many developers, especially MAC users, don't have computers with NVIDIA or AMD GPU, thus we have to use cloud servers with GPU.

But we (specifically people like me) are struggling to set up the dependencies for these servers, like installing CUDA and CUDNN.

Yes, I did read tons of installation guide, including the NVIDIA developer installation guide, but still, I am trying to get CUDNN installed.

@torzdf
Copy link
Collaborator

torzdf commented Aug 19, 2022

I don't use Cloud servers myself, and don't really have the time to setup on cloud and support updating the instructions. I welcome PRs in this area though.

You may find some useful information here. Equally, it may be out of date:
https://forum.faceswap.dev/viewforum.php?f=23

@20040714Mike
Copy link
Author

20040714Mike commented Aug 20, 2022

Well I guess I have to be the one to add some cloud tutorials here:

@torzdf if you see this please consider to add this into your tutorials, this can be really helpful for beginners without local GPUs.

For users first time setting up such environments, I strongly recommend you to follow the following steps or you might waste a lot of time just to set up a proper environment.....

My VM:

Ubuntu 2004 GRID 11.1 4 vCPUS 20 vRAM with 1/4 NVIDIA T4 GPU

  1. What platform to choose:

I don't really recommend google colab, unless you have colab pro.
The reason is because deepfake is a time consuming ML process, any interrupt in the colab ( not pro ), whether it is internett connection or whatever, will cause the entire process to be terminated and you will literally lose all the models you have got so far, you might even have to re-upload your files which sucks even more.

So if you don't have Pro, better not use colab.

For buying VMs:

Please be aware that choosing an image with NVIDIA in its name does not necessarily mean that VM has a GPU !!! For the image to work, you have to choose a VM with GPU !!! The funny thing is, you can buy a VM without a GPU and install NVIDIA image on it, this can be highly misleading to beginners.

So be careful about this, if you want to do what I exactly did to set up my cloud ML environment, I suggest you buy a VM with the properties I mentioned above.

I bought it at Tencent Cloud in HK zone, which is only like 0.5$ per hour, yes Chinese things might be unreliable but at the end of the day some of them do are reliable and my VM is one of them.

Tencent Cloud has a GPU trial for new users, costing only around 0.2$ for 15 days. ( Only 15 days )

You can buy such VMs at Google, aws, azure etc. , but usually more expensive.

  1. After you buy your VM

The first thing you should do is :

$nvidia-smi

This is to make sure your VM has a NVIDIA driver or even GPU. If the output says " Ensure the driver is installed and up running", there's a very big chance it doesn't even have a GPU, so please save yourself some time and go find another one.

  1. If the VM has a GPU and a driver:

Now go to https://ngc.nvidia.com/ and register your NVIDIA developer account.

Fill in whatever it wants to finish your register, it doesn't really matter what you fill in.

Then at the NGC main page at the right corner where your avatar is, hover it and click settings.

Click Set API Key

Generate an API key and save it anyway you like.

Go back to the previous page

Click Install NGC CLI.

Choose LINUX AMD64 and do exactly what it tells you to do.

If it asks for any key, it means the API key I meant earlier.

Go back to the place where you set your API key, and do the commands it tells you to.

Then, install docker on your VM, some VMs may have one installed already.

$ sudo apt update

$ sudo apt install docker-ce

This two commands should work for most VMs, if they don't, use your errors to google a solution for your VM.

Open a tmux session to keep everything you will do later permanently:

$ tmux

Then set up the NGC container using this command:

$ sudo docker run --gpus all -it --rm [nvcr.io/nvidia/tensorflow:22.01-tf2-py3](http://nvcr.io/nvidia/tensorflow:22.01-tf2-py3)

This command should be a whole but I don't know why it shows as two commands, just join them together manually.

This can also be quite slow the first time.

Then do this to install google drive commands.
Installing google drive is due to the complexity of locating and downloading files in the NGC container, I rather upload them to google drive first then download.

If you figure out an easier way, do that and comment down below, this is the only part where I feel wierd about doing.

Make sure you are in /workspace at this moment, if not cd /workspace yourself there

wget https://github.com/prasmussen/gdrive/releases/download/2.1.1/gdrive_2.1.1_linux_386.tar.gz
tar -xvf gdrive_2.1.1_linux_386.tar.gz
./gdrive about

At this point you will be asked to verify your google account, just do whatever it says.

To upload files or directory:
( These particular commands can only work if you installed gdrive in /workspace and you are currently in /workspace )

./gdrive upload {FILE_PATH}

OR

./gdrive -r upload {DIRECTORY_PATH}

I believe you can also download files from google drive in a similar way like that.

Then do this to install some dependencies you have to install manually:

apt-get update
apt-get install ffmpeg libsm6 libxext6 -y
apt-get install python3-tk

Even though the tkinter or whatever it is called is for the GUI, for whatever reasons, CLI users still have to install it.

Now clone the faceswap respo into /workspace

git clone https://github.com/deepfakes/faceswap.git

and

cd faceswap

Then install the requirements by:

pip install -r ./requirements/requirements_nvidia.txt

Then configure the deepfake:

python setup.py

No for AMD
No for Docker ( Yes might be OK as well, haven't tried that )
Yes for CUDA

Ignore " CUDA / Cudnn not found "

leave blank for that tensorflow thing

For your training images, I suggest you extract them at local, zip them up, and upload them to an URL, then curl them to /workspace/faceswap by:

curl -O [URL]

unzip [whatever it is called]

OR

upload your training data to google drives first, then download it to your VM using ./gdrive I mentioned earlier.

sounds a bit complex, but I did this because it is much easier for me to test different VMs and find the right one, and in the long term this method does make things more convenient.

Then you can start training your model:

python faceswap.py train -A [A-extracted_image_folder] -B [B-extracted_image_folder] -m ./my_model/

At this point the script should be running just fine, you can see outputs like

[#130414] Saved model: A loss 0.0224 B loss 0.02614

Something like that

The [#130414] suggests that the process has finished 130414 iterations.

For deepfake, you should try to achieve 80,000 iterations at minimum for a decent result, assuming that your training data is of good quality.
My definition for good quality is:
At least 250 high-res clear photos with multiple angles, emotions and lightning conditions for each of your targets.

If you decides to test out your model:

press enter, wait for the training to stop

cd to /workspace

cd /workspace

Upload your model folder to your google drive.

./gdrive upload -r ./faceswap/my_model/

( If your followed my commands exactly, the model folder should be my_model, change it if you used sth else )

Yes you can start the training back on now to save yourself some time and money.

go do google drive via your browser and download that model folder to local.

you can delete it in google drive for easier upload next time.

convert your video using the downloaded model.

For this, go to USAGE.md of this respo.

If you didn't start the training earlier, remember to turn it back on.

You have to stop the training for the upload.

I suggest you convert things locally because the model is much smaller comparative to your training data files, thus you need to more time downloading things if you decide to convert them at the VM.

Please don't delete your tmux session, it should be named 0 by default.

To get back to the NGC container every time logging on to your VM, you should:

tmux attach-session -t 0

if it failed:

tmux ls

to see if you named it sth else

If it says: No tmux running

Congratulations, you have to redo all the steps above and you also lost your trained model.

So don't delete that tmux session, watch out for hot keys like ctrl-c and ctrl-z, there should be no reason for you to jump back to your default user directory, you really don't need to.

If you want to train 2 models at the same time on one VM, I don't recommend doing that cause the speed is basically the same if you train them one at a time consecutively.

Hope this can help.

If you realize any improvements, please comment down below to help more people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants