Neptune send question


#21
pchalasani@MM-MAC-3358 ~/D/M/G/predevent> neptune --version
2.0.10 (API: 2.0.7)

#22

would you type du -hs .[^.]* * and paste the output here (unless it has something confidential)


#23
(py3) MM-MAC-3358:predevent pchalasani$ du -hs .[^.]* *
4.0K	.cache
352K	.cnvrg
4.0K	.cnvrgignore
4.0K	.cnvrgignore.conflict
3.0M	.domino
4.0K	.dominoignore
4.0K	.dominoresults
4.0K	.floydexpt
4.0K	.floydignore
 23M	.git
4.0K	.gitignore
116K	.idea
4.0K	Dockerfile
4.0K	Makefile
4.0K	README.md
4.0K	README.md.conflict
4.0K	_config.yml
 40K	aws
 74M	data
 32K	docs
3.8M	domino.log
  0B	du
4.0K	floyd_requirements.txt
4.0K	floyd_requirements.txt.conflict
4.0K	google_credentials
4.0K	hyper.yaml
108K	models
924K	neptune.log
4.0K	neptune.yaml
1.5M	notebooks
4.0K	requirements.txt
4.0K	requirements.txt.conflict
390M	results
180K	seqdata
4.0K	setup
4.0K	setup.py
 19M	test
295M	utils

#24

It looks like you are doing everything right but the problem is still on our side. My team is investigating this - I will keep you posted.


#25

Hi pchalasani,

I am trying to reproduce your bug with exclude and for me it works :slight_smile: but we need to find out why you have problems. As I understand neptune cli wants to send whole directory (you mentioned that it calculates 800MB and it is more or less the sum of du command). Is that correct? (you can check size of dir with du -s -h .). It would mean that neptune cli ignores neptune.yaml
Do you type only: neptune send or use some switches? Can you please paste whole command here?

Best regards,
mariusz


#26

Ok - I have reproduced it. We will fix it tomorrow. @pchalasani I am sorry for that.


#27

Thanks Piotr, and no worries!

-Prasad


#28

Hi Prasad,

We have just deployed a new version on Neptune which should resolve you issues. We fixed the bug with list of files to exclude in neptune.yaml. Now your config should work.
You can also pass arguments to your Python script using quotation mark. For example: neptune send "main.py 10 --arg1 34 --arg2 12". The drawback with current version is that we also (at the end) pass some other arguments which we use for technical purposes (like job-id etc). We will remove them but it needs more development so it will be done later.
The last change was support for file with python requirements. Neptune will install additional library before your code starts. Please see how to use it here.

Best regards,
mariusz


#29

Got latest, and ran a script that does not take any cmd line args:

pchalasani@MM-MAC-3358 ~/D/M/G/predevent> neptune send utils/run_experiment.py
Calculated experiment snapshot size: 6.24 MB
Sending sources to server: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6.24M/6.24M [00:02<00:00, 2.10MB/s]
>
> Job send, id: 0df2af2d-7c6b-4522-8c17-58f7091fee39
>
> To browse the job, follow:
> https://banach.neptune.ml/#dashboard/job/0df2af2d-7c6b-4522-8c17-58f7091fee39?getStartedState=folded

Good thing is that it calculates only 6.4MB size, but then it failed in a few seconds, and :

  1. It wasn’t immediately clear where to look for the failure cause, since there is no console log output
  2. I clicked on Properties and see this trace:
Run command
C
neptune run --positional-executable utils/run_experiment.py
Worker Type
C
gcp-small
Environment
C
base
Traceback
C
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/deepsense/neptune/cli/commands/executing/abstract_job_executor.py", line 34, in execute
    return self._execute(experiment, job, args)
  File "/usr/local/lib/python2.7/dist-packages/deepsense/neptune/cli/commands/executing/job_executor.py", line 134, in _execute
    install_pip_requirements(self._pip_requirements)
  File "/usr/local/lib/python2.7/dist-packages/deepsense/neptune/cli/commands/utils/pip_requirements_utils.py", line 57, in install_pip_requirements
    raise RuntimeError("pip was unable to install some of the requirements")
RuntimeError: pip was unable to install some of the requirements

And my yaml looks like this:

pip-requirements-file: requirements.txt

exclude:
- data
- .git
- .eggs
- floyd_requirements.txt
- .domino
- eggs
- lib
- lib64
- parts
- results
- docs
- my_runs
- test/data
- .cache
- .idea
- utils/my_runs
- utils/data
- utils/results
- runs

#30

I realized I did not specify an environment, so now I did this

neptune send --environment pytorch-0.1.12-gpu-py3 utils/run_experiment.py

and it has been queued for 7 minutes, which is not good.


#31

I looked this job status now and it has been aborted for some reason, after 36 minutes. It’s only a quick test-job that should take 3-4 minutes.

https://banach.neptune.ml/#dashboard/job/92a71e12-e01d-4308-9882-f511fa24c349


#32

Now I also specified the worker-type and re-ran it but it failed with the same error

neptune send --worker gcp-gpu-medium --environment pytorch-0.1.12-gpu-py3 utils/run_experiment.py
Run command
C
neptune run --worker gcp-gpu-medium --environment pytorch-0.1.12-gpu-py3 --positional-executable utils/run_experiment.py
Worker Type
C
gcp-gpu-medium
Environment
C
pytorch-0.1.12-gpu-py3
Traceback
C
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/deepsense/neptune/cli/commands/executing/abstract_job_executor.py", line 34, in execute
    return self._execute(experiment, job, args)
  File "/usr/local/lib/python3.5/dist-packages/deepsense/neptune/cli/commands/executing/job_executor.py", line 134, in _execute
    install_pip_requirements(self._pip_requirements)
  File "/usr/local/lib/python3.5/dist-packages/deepsense/neptune/cli/commands/utils/pip_requirements_utils.py", line 57, in install_pip_requirements
    raise RuntimeError("pip was unable to install some of the requirements")
RuntimeError: pip was unable to install some of the requirements

#33

I tried looking for the output/stderr to see which pip install might be failing, but I was not able to find any stderr when I clicked on browse files and navigated from there


#34

I made progress past the previous problem (pip failure), now it runs, but fails to find some of the modules in my code. Digging into it I just noticed something strange – one of my source directories is not being uploaded to neptune, so it is not finding some of my modules (seqdata)

I think this is a bug on your side.


#35

I still need a way to browse source files via the browser, not download them, when I go to “browse files”


#36

I have a hunch about your bug – I have an item “data” in my exclude list in the yaml file. Your code is probably excluding any directory that contains the string “data”, hence the seqdata directory is being excluded, am I right?


#37

I think my hunch was right. I moved the data directory to a different place, and removed data from the exclude list, and now the seqdata directory is correctly uploaded. But now I get this error that we got last week, indicating the python environment/image is not quite right

https://banach.neptune.ml/#dashboard/job/8d1865e5-6a99-4618-93a0-7eef2a7bafd5

1	Traceback (most recent call last):
2	File "/usr/lib/python3.5/tkinter/__init__.py", line 36, in <module>
3	import _tkinter
4	ImportError: No module named '_tkinter'

#38

Hi Prasad,

thanks for the very helpful feedback. We will take care about all the issues on Monday. I will let you know when it is done.


#39

Hi Prasad,

we need one more day for development. Tomorrow your issues should be solved.I will let you know when it is done.


#40

Ok, thanks for the heads-up. I will be actually on travel to the west coast Tuesday and won’t have a chance to look at this much until Wed