pchalasani@MM-MAC-3358 ~/D/M/G/predevent> neptune --version 2.0.10 (API: 2.0.7)
would you type
du -hs .[^.]* * and paste the output here (unless it has something confidential)
(py3) MM-MAC-3358:predevent pchalasani$ du -hs .[^.]* * 4.0K .cache 352K .cnvrg 4.0K .cnvrgignore 4.0K .cnvrgignore.conflict 3.0M .domino 4.0K .dominoignore 4.0K .dominoresults 4.0K .floydexpt 4.0K .floydignore 23M .git 4.0K .gitignore 116K .idea 4.0K Dockerfile 4.0K Makefile 4.0K README.md 4.0K README.md.conflict 4.0K _config.yml 40K aws 74M data 32K docs 3.8M domino.log 0B du 4.0K floyd_requirements.txt 4.0K floyd_requirements.txt.conflict 4.0K google_credentials 4.0K hyper.yaml 108K models 924K neptune.log 4.0K neptune.yaml 1.5M notebooks 4.0K requirements.txt 4.0K requirements.txt.conflict 390M results 180K seqdata 4.0K setup 4.0K setup.py 19M test 295M utils
It looks like you are doing everything right but the problem is still on our side. My team is investigating this - I will keep you posted.
I am trying to reproduce your bug with exclude and for me it works but we need to find out why you have problems. As I understand neptune cli wants to send whole directory (you mentioned that it calculates 800MB and it is more or less the sum of du command). Is that correct? (you can check size of dir with
du -s -h .). It would mean that neptune cli ignores neptune.yaml
Do you type only:
neptune send or use some switches? Can you please paste whole command here?
Thanks Piotr, and no worries!
We have just deployed a new version on Neptune which should resolve you issues. We fixed the bug with list of files to exclude in neptune.yaml. Now your config should work.
You can also pass arguments to your Python script using quotation mark. For example:
neptune send "main.py 10 --arg1 34 --arg2 12". The drawback with current version is that we also (at the end) pass some other arguments which we use for technical purposes (like job-id etc). We will remove them but it needs more development so it will be done later.
The last change was support for file with python requirements. Neptune will install additional library before your code starts. Please see how to use it here.
Got latest, and ran a script that does not take any cmd line args:
pchalasani@MM-MAC-3358 ~/D/M/G/predevent> neptune send utils/run_experiment.py Calculated experiment snapshot size: 6.24 MB Sending sources to server: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.24M/6.24M [00:02<00:00, 2.10MB/s] > > Job send, id: 0df2af2d-7c6b-4522-8c17-58f7091fee39 > > To browse the job, follow: > https://banach.neptune.ml/#dashboard/job/0df2af2d-7c6b-4522-8c17-58f7091fee39?getStartedState=folded
Good thing is that it calculates only 6.4MB size, but then it failed in a few seconds, and :
- It wasn’t immediately clear where to look for the failure cause, since there is no console log output
- I clicked on Properties and see this trace:
Run command C neptune run --positional-executable utils/run_experiment.py Worker Type C gcp-small Environment C base Traceback C Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/deepsense/neptune/cli/commands/executing/abstract_job_executor.py", line 34, in execute return self._execute(experiment, job, args) File "/usr/local/lib/python2.7/dist-packages/deepsense/neptune/cli/commands/executing/job_executor.py", line 134, in _execute install_pip_requirements(self._pip_requirements) File "/usr/local/lib/python2.7/dist-packages/deepsense/neptune/cli/commands/utils/pip_requirements_utils.py", line 57, in install_pip_requirements raise RuntimeError("pip was unable to install some of the requirements") RuntimeError: pip was unable to install some of the requirements
And my yaml looks like this:
pip-requirements-file: requirements.txt exclude: - data - .git - .eggs - floyd_requirements.txt - .domino - eggs - lib - lib64 - parts - results - docs - my_runs - test/data - .cache - .idea - utils/my_runs - utils/data - utils/results - runs
I realized I did not specify an environment, so now I did this
neptune send --environment pytorch-0.1.12-gpu-py3 utils/run_experiment.py
and it has been queued for 7 minutes, which is not good.
I looked this job status now and it has been aborted for some reason, after 36 minutes. It’s only a quick test-job that should take 3-4 minutes.
Now I also specified the worker-type and re-ran it but it failed with the same error
neptune send --worker gcp-gpu-medium --environment pytorch-0.1.12-gpu-py3 utils/run_experiment.py
Run command C neptune run --worker gcp-gpu-medium --environment pytorch-0.1.12-gpu-py3 --positional-executable utils/run_experiment.py Worker Type C gcp-gpu-medium Environment C pytorch-0.1.12-gpu-py3 Traceback C Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/deepsense/neptune/cli/commands/executing/abstract_job_executor.py", line 34, in execute return self._execute(experiment, job, args) File "/usr/local/lib/python3.5/dist-packages/deepsense/neptune/cli/commands/executing/job_executor.py", line 134, in _execute install_pip_requirements(self._pip_requirements) File "/usr/local/lib/python3.5/dist-packages/deepsense/neptune/cli/commands/utils/pip_requirements_utils.py", line 57, in install_pip_requirements raise RuntimeError("pip was unable to install some of the requirements") RuntimeError: pip was unable to install some of the requirements
I tried looking for the output/stderr to see which pip install might be failing, but I was not able to find any stderr when I clicked on browse files and navigated from there
I made progress past the previous problem (pip failure), now it runs, but fails to find some of the modules in my code. Digging into it I just noticed something strange – one of my source directories is not being uploaded to neptune, so it is not finding some of my modules (
I think this is a bug on your side.
I still need a way to browse source files via the browser, not download them, when I go to “browse files”
I have a hunch about your bug – I have an item “data” in my exclude list in the yaml file. Your code is probably excluding any directory that contains the string “data”, hence the
seqdata directory is being excluded, am I right?
I think my hunch was right. I moved the
data directory to a different place, and removed
data from the exclude list, and now the
seqdata directory is correctly uploaded. But now I get this error that we got last week, indicating the python environment/image is not quite right
1 Traceback (most recent call last): 2 File "/usr/lib/python3.5/tkinter/__init__.py", line 36, in <module> 3 import _tkinter 4 ImportError: No module named '_tkinter'
thanks for the very helpful feedback. We will take care about all the issues on Monday. I will let you know when it is done.
we need one more day for development. Tomorrow your issues should be solved.I will let you know when it is done.
Ok, thanks for the heads-up. I will be actually on travel to the west coast Tuesday and won’t have a chance to look at this much until Wed