Kaggle, TalkingData AdTracking Fraud Detection Challenge


#1

Good day. Sorry, maybe it is possible to add new data to public directory? This data set is huge and very important for people who try to participate in competition.


#2

Hi @Alexander,

thanks for your suggestion! We will talk about it on Monday and let you know.


#3

thank you ) this is ~10GB


#4

Hi @Alexander,

Sure we will upload that. I would expect to have that done by tomorrow.
I was planning on doing it in a week or so along with the neptune TalkingData starter repo but if there is need we will happily comply :slight_smile:

Cheers.


#5

Thank you ) beautiful news ) good luck in competition ))
Jakub, sorry, what do you think about series of Google landmark detection challenges? What is the ~ size of data?


#6

I will research that and get back to you.
Thanks for the idea!


#7

Thank you ) this is series barrier for participants


#8

@Alexander you can now access the data in /public/talking_data.
Good luck!


#9

Thank you )) Jakub, sorry, one more question: yesterday you gave us an opportunity to use your beautiful script for Jigsaw challenge (.9864). How long does this script process on Neptune infrastructure? Yesterday I tried to use different variants of hardware, but total time was ~day.


#10

Tell me please, can we use GPU support to accelerate learning of trees? How should we change the code?


#11

Hi there. Running the full grid search will take something like that.
You can make the search narrower by guiding it/exploring first.

When it comes to GPU support xgboost supports it
http://dmlc.ml/2016/12/14/GPU-accelerated-xgboost.html
and an example here:


I haven’t test it yet but will give it a go. If you try please let me know how it went.


#12

Thank you. I will try. If it is possible can you create GPU examples for lightGBM, XGBoost and CatBoost in future? Just easy models without sophisticated features engineering and stacking, with low performance, but with high speed.


#13

Good day. Jakub, help us please. Can you upload new data for competition?


#14

@Alexander new data is uploaded already.
Good luck!


#15

Thank you )) Good luck


#16

Per this question, I would like to submit directly from the neptune notebook. Downloading the submission file from neptune and then uploading to Kaggls is not efficient. Has anyone been able to do this?