I have been making active use of Neptune for my Kaggle competitions. Will it be possible to support this dataset as well in public repository. This is also a big dataset >10gb
+1 from me. Please add this dataset!
It seems that uploading this dataset to neptune public may violate the specific rules of this competition and because of that we chose not to uploud it.
We will leave it to the competitors to upload the data this time.
We are however taking part in it and as always we document our approach in a github repository. Feel free to use it as a starting point.
Avito dataset (for newbies + Windows OS)
Just in case: https://docs.neptune.ml/cli/commands/data_upload/
@kamil.kaczmarek, Thank you for the link. Apologies that these are newbie type questions:
The process in the doc link would allow me to upload files from my local machine. Since these files are big (I guess ~50GB in total?), is there a walkthru of commands within a Neptune notebook that would eliminate the middle step of Kaggle --> local machine --> Neptune? The upload from local machine --> Neptune I imagine will be quite painful.
The uploaded files would be in the
uploadsdirectory which is project specific, right? Then any single experiment can access those files with the path
Your git command is wrong on the Installation section. It should be
git clone https://github.com/minerva-ml/open-solution-avito-demand-prediction and not
git clone https://github.com/neptune-ml/open-solution-avito-demand-prediction.git. Perhaps the
neptune-ml repo is your private repo.
In https://github.com/minerva-ml/open-solution-avito-demand-prediction documentation in Instalation section it should be
git clone https://github.com/minerva-ml/open-solution-avito-demand-prediction.git
Fixed, thanks for spotting that.
I would like to pull the Kaggle data for Avito into Neptune. I don’t want to do it from my local machine because then I have to download from Kaggle and then upload to Neptune. I’d like to go directly from Kaggle to Neptune.
../uploads path visible as an input.
neptune data upload --project AV ~/.kaggle/kaggle.json from my local command line.
Then I do
!mkdir /root/.kaggle !cp /input/uploads/kaggle.json /root/.kaggle/ !chmod 600 /root/.kaggle/kaggle.json
This works (I can test
!kaggle competitions list successfully).
Now I would like to do
!kaggle competitions download data -c - ... -p /input/uploads as this will make the competition files available to all future experiments in my AV project.
However, when I test a write to this directory I get
OSError: [Errno 30] Read-only file system: '/input/uploads/test.csv'.
Can you advise on this?
Just use neptune command within notebook. It has worked for me:
- Upload to current directory:
neptune data uploadcommand
!neptune data upload yellow_tripdata_2016-01.csv