Dataset for Kaggle's Avito Competition


#1

Hi,
I have been making active use of Neptune for my Kaggle competitions. Will it be possible to support this dataset as well in public repository. This is also a big dataset >10gb


#2

+1 from me. Please add this dataset! :pray:


#3

Hi,
It seems that uploading this dataset to neptune public may violate the specific rules of this competition and because of that we chose not to uploud it.

We will leave it to the competitors to upload the data this time.
We are however taking part in it and as always we document our approach in a github repository. Feel free to use it as a starting point.
Good luck!


Avito dataset (for newbies + Windows OS)
#4

Hi @setuc and @Jonathan,

Just in case: https://docs.neptune.ml/cli/commands/data_upload/

Best,
Kamil


#5

@kamil.kaczmarek, Thank you for the link. Apologies that these are newbie type questions:

  1. The process in the doc link would allow me to upload files from my local machine. Since these files are big (I guess ~50GB in total?), is there a walkthru of commands within a Neptune notebook that would eliminate the middle step of Kaggle --> local machine --> Neptune? The upload from local machine --> Neptune I imagine will be quite painful.

  2. The uploaded files would be in the uploads directory which is project specific, right? Then any single experiment can access those files with the path '../uploads/', right?


#6

Your git command is wrong on the Installation section. It should be git clone https://github.com/minerva-ml/open-solution-avito-demand-prediction and not git clone https://github.com/neptune-ml/open-solution-avito-demand-prediction.git. Perhaps the neptune-ml repo is your private repo.


#7

@kamil.kaczmarek @jakub_czakon
I confirm.
In https://github.com/minerva-ml/open-solution-avito-demand-prediction documentation in Instalation section it should be

git clone https://github.com/minerva-ml/open-solution-avito-demand-prediction.git

minerva-ml not neptune-ml


#8

Fixed, thanks for spotting that.


#9

Hi @jakub_czakon, I have been able to run the kaggle CLI from within a neptune notebook, thanks to @anitaka’s help in another post.

I would like to pull the Kaggle data for Avito into Neptune. I don’t want to do it from my local machine because then I have to download from Kaggle and then upload to Neptune. I’d like to go directly from Kaggle to Neptune.

I’ve made ../uploads path visible as an input.
I do neptune data upload --project AV ~/.kaggle/kaggle.json from my local command line.
Then I do

!mkdir /root/.kaggle
!cp /input/uploads/kaggle.json /root/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json

This works (I can test !kaggle competitions list successfully).

Now I would like to do !kaggle competitions download data -c - ... -p /input/uploads as this will make the competition files available to all future experiments in my AV project.

However, when I test a write to this directory I get OSError: [Errno 30] Read-only file system: '/input/uploads/test.csv'.

Can you advise on this?


#10

Just use neptune command within notebook. It has worked for me:

  1. Upload to current directory:
    !wget https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2016-01.csv
  2. Use neptune data upload command
    !neptune data upload yellow_tripdata_2016-01.csv

#11

:exploding_head:

Thank you!