Activity
Ratings Progression
Challenge Categories
Challenges Entered
Play in a realistic insurance market, compete for profit!
Latest submissions
See Allgraded | 125317 | ||
graded | 123877 | ||
graded | 122529 |
Participant | Rating |
---|---|
lolatu2 | 0 |
simon_coulombe | 0 |
Participant | Rating |
---|
Insurance pricing game
Python Solution sharing
Almost 4 years agoThere are the workbooks and videos shared by the organisers,
The github repository of Simon Coulombe
The Kaggle examples from Floser
https://www.kaggle.com/floser/glm-neural-nets-and-xgboost-for-insurance-pricing
Failing all of that then do as I first did. Calculate a premium = 114 and submit that and build from that position.
There should be enough resources already shared for you to get going.
Weekly Leaderboard
Almost 4 years agoYes @michael_bordeleau I kept with the same claims model (which Iβm now convinced underfits to the data) and reduced my profit load by approximately 5%. My market share went from 0.7% to 14.8% a level Iβd be happy with but because my model doesnβt differentiate risk as well as others then I picked up too many claims, especially in the 5k to 10k bucket.
Overall I now think Iβve got a good handle on the right profit margin distribution. Now just have to tweak the underlying claims model.
Interesting to hear your comment on staying with your current model. I certainly wonβt be going with my current one. The final week is very different to all that have gone before and that gives rise to the ability to use a new feature that is likely to be very predictive but which up until now no-one has been able to exploitβ¦
Weekly Leaderboard
Almost 4 years agoSame for me, made a submission got weekly feedback but the submission is not appearing on the leaderboard.
Was week 9 a lucky no large claims week?
Almost 4 years agoIf I assume thatβs your feedback from week 9 submission then Iβd conclude that itβs not ideal.
In many ways itβs similar to my week 8 submission when I had a similar looking chart, a similar high market share and a low leaderboard position. The difference is my week 8 RMSE model was 500.1 ish and your week 9 is 499.46 ish.
Now youβd hope that a model with a much lower RMSE should be better at differentiating risk. If that was the case though youβd expect to write a smaller market share of the the policies with higher actual claims. Your chart though doesnβt show such behaviour. That may be because your good performance on the public RMSE leaderboard is not generalising to the unseen policies in the profit leaderboard. This could be a result of placing too much emphasis on RMSE leaderboard feedback and not enough on good local cross validation results. It can lead to unwittingly overfitting to the RMSE leaderboard at the expense of a good fit to unseen data.
Now I think I have a similar issue, ie a poor fitting model as my week 8 and week 9 charts are less that idealβ¦ (Iβd rather have a chart like @davidlkl but with greater market share). But the cause of my issue is that in seeking not to overfit too much I havenβt fit well enough relative to others.
Another point to investigate is your market share. A 30% market share, suggests your profit margin is set lower than competitors. In a pool of 10 people a 10% market share would be a a reasonable target to go for.
The only difference between my week 8 and week 9 submission was that I increased my rates by a fixed single digit percentage. Iβve learnt that I perhaps increased rates too much as my market share has fallen further than Iβd like.
So for my final submission Iβll be refining my model and making a minor tweak to my profit margin. (And no doubt subsequently regretting doing so as I see my profit leaderboard position plummet!)
Was week 9 a lucky no large claims week?
Almost 4 years agoOK so thatβs helpful to know nan% is a possible output. So as in week 9 I got 100% in the βnever x 10k+β claims heatmap cell it means there were some policies in the market that had claims in the 10k+ range.
I havenβt really changed my claim model for the last 5 weeks or so. All Iβve been doing is tweaking pricing strategy, gradually edging rates up. When there have been large 50k capped claims in the market historically Iβve been rather good at writing them and therefore ended up with a low leaderboard position. When, I suspect, there havenβt been many large losses (week 5 and 9 ?) I end up in a reasonable leaderboard position (1st and 3rd).
Iβd be really interested to know if anyone thinks there were any 50k capped losses in this round. Sometimes you can tell because your profit exhibit is centered around -50k or -100k and the claims heatmap exhibit will say 100% in the βoften x 10k+β cell.
Like my week 8 chartβ¦
If there were 50K claims and I avoided them then maybe Iβve found the right level of premium loading to claims. More likely though that my margin is too high now, given my small market share, and Iβve got lucky on the policies I did write.
Was week 9 a lucky no large claims week?
Almost 4 years agoSharing some of my leaderboard feedback hoping to get some insight on large claims in week 9β¦
So I didnβt change my underlying claim model between week 8 and week 9, I just increased my rates by a uniform single digit number and I go from 90 to 3 on the profit leaderboard.
If I look at the overall leaderboard I see that many more of us actually made a profit this time. Makes me wonder whether this was another lucky no large claims week? However because I didnβt write any this week I canβt tell if I avoided them by my skill at rating or by the luck of there being none to avoid.
Week 8
Week 9
R Starterpack (499.65 on RMSE leaderboard - 20th position) {recipes} + {tweedie xgboost}
Almost 4 years agoA few years ago a member of my team, who is an actuary and Kaggle master, used to delight in tormenting the rest of us when his models out-performed ours on leaderboards.
He would always insist his out-performance was due to his process of selecting lucky seeds.
We all knew full well thereβs no such thing, that generalises to the private leaderboard, but many a time I caught myself trying a few different seeds to see if I can get lucky and beat his model.
There is of course nothing wrong in running a few models with different seeds and taking the average result. Thatβs a recognised technique called bagging which will often improve a model at the cost of implementation complexity.
Values in new heatmap changed between sunday and today?
Almost 4 years agoGood spotβ¦ but I have to say I donβt like what the new one is telling me!
A potential silver lining is that⦠maybe in my model I have found a way of identifying all the policies with large claims⦠now all I need to do is stop writing them!
Weekly Feedback // Evaluation clarification?
Almost 4 years agoGood luck Michael with the strategy for gaining market share to improve profit.
I had the winning model for week 5. I kept it unchanged for week 6 and we can all see what happened I ended up in 63rd position. Given my current strategy isnβt giving consistent results Iβm going to change it so feel itβs OK to share my feedback from weeks 5 and 6.
In week 5 I wrote a market share of 13%, avoided any large claims (maybe there werenβt any) and managed the following.
Looking good I thought, so letβs leave it unchanged for week 6β¦ Ouchβ¦
So there are few things of interest.
First, I gained market share. That shouldnβt surprise usβ¦ as a market on average weβre doing a good job of losing money, so I could believe that market rates are rising as the weeks go by.
Second, gaining market share means an increased likelihood of picking up large claims. It looks like there were 3 large claims (given claims are now capped to 50k) in the week 6 market and lucky me managed to pick them up more often than not.
If you believe large claims are predictable and youβve set your rates accordingly then they shouldnβt worry you.
But if you think they are largely random then we are all playing a game of chance. In that case I suspect the lucky winner will be someone who sets their rates at an uncompetitive level, writes a small but profitable market share and avoids the large claims to take home the prize.
Time will soon tell, but in the meantime Iβm going to go back to the data once more to see what I can make of the large claims. An then I had a cunning plan to prevent me from repeatedly picking up the large claims in the leaderboard calc which @alfarzan has just scuppered by declaring that the leaderboard is now calculated deterministically rather than by simulation as was the case!
Ideas on increasing engagement rate
Almost 4 years agoI think my regulator would object to me sharing price sensitive informationβ¦ Iβll let you know next year when the reserving actuaries publish the results
Ideas on increasing engagement rate
Almost 4 years agoI think weekly profit leaderboard revision is enough. Thereβs a limit in what can be done with the data available and not everyone will have the time to revise their models even weekly.
One tip Iβd share is to encourage people to avoid getting trapped in whatβs known in Kaggle as a public leaderboard hill-climb. It can result in overfit models which fail to generalise to the unseen data used to determine final rankings.
To illustrate, my week 5 winning profit leaderboard submission is built from a model that scores 500.3 on the RMSE leaderboard ie about position 110.!
There are many sound reasons why we should be ignoring the RMSE leaderboard and putting our trust in a robust local cross-validation approach. Yet I know from bitter Kaggle experience that itβs all too easy to forget this and get carried away by the thrill of chasing leaderboard positions.
Ideas on increasing engagement rate
Almost 4 years agoIn preface to my feedback and suggestions below I must first say
- I believe the principle of the competition is great and engaging
- youβve been really proactive in engaging with competitors and fixing issues as they arise
However I imagine that many of those that have dropped away have done so because their first encounter was difficult and ended in a failed first, second or third submission. At which point they may have quiet reasonably decided to give up.
Iβm an experienced Kaggler and the leader of a Data Science team that developed and runs itβs own in-house data science competition platform; so I do appreciate how difficult it is to get this right. The profit leaderboard aspect of this competition, which adds to the fun and engagement, necessitates some complexity in the submission process and behind the scenes will introduce complexity in IT infrastrutre and scoring process.
Unfortunately for competitors they experience that complexity before they get the reward of leaderboard feedback. You canβt remove all of that but you can smooth the way a bit.
Hears my 5 suggestionsβ¦
- continue to make your platform more stable and responsive
- continue to make the website cleaner and easier to navigate
- continue to improve the submission process
- if large claims are causing a leaderboard lottery then implement a fix quickly (so people at the top of the leaderboard donβt give up because they know skill may not help them win)
- introduce an element of knowledge share (so people further down the leaderboard arenβt demotivated and drop out)
There will be a limit to what you can do in the time available and I recognise youβve been working on a number of these aspects already.
The competition and platform now is much better than it was when I was the first person to struggle to make a submission, for which you should be congratulated!
So my final recommendation would be to reach out to people that may have joined early and been put off to let them know you listened to user feedback and made improvementsβ¦ maybe theyβd be encouraged to try again?
Review of week 4
Almost 4 years agoConfirm I too saw a small group of exposure with a 150k+ average loss that went with it. I agree that a single large claim in the market would explain things⦠including the significant shift in positions some high ranking week 3 people saw.
Shame no-one is able to offer any Excess-of-Loss cover. A single large claim like this in the final round could make the final outcome a bit of a lottery!
Inference failed errors
Almost 4 years agoTurned out I then hit another error which may worth @alfarzan investigating and other R zip file submission people that have recently upgraded to xgboost 1.3 and are getting submission errors also trying.
I upgraded my local version of xgboost to 1.3 which for R users was released on CRAN in the last week. Anyhow after the upgrade Iβve not been successful in making submissions from locally trained zip file, even though they pass all the local tests as per the zip instructions (and Iβve been making successful zip based submissions in previous weeks with xgboost 1.2.)
Anyhow after I downgraded my local version of xgboost back to 1.2, retrained, re-ran tests which again passed and then reloaded my submission then worked.
Could it be that AICrowd is loading xgboost 1.2 for R users and the models trained with the new 1.3 version have incompatibilities causing submission errors? In which case guess I need to specify my version of xgboost in install packages.
Any tips for successfully submitting an h2o model?
Almost 4 years agoYes, itβs hardcoded in the R zip file routine load model which is part of model.R unless youβve already overidden?
load_model <- function(){
# Load a saved trained model from the file `trained_model.RData`.
# This is called by the server to evaluate your submission on hidden data.
# Only modify this *if* you modified save_model.
load('trained_model.RData')
return(model)
}
Inference failed errors
Almost 4 years agoNow submissions are working (thanks admins!!) I see Iβm not alone in getting inference failed error messages.
In my case itβs because Iβve got inconsistencies between the feature names of my locally trained model and the features names Iβve put in the prepare_data routine of the zip file submission.
Sharing in case it helps otherβs get their submission fixed before the extended deadline.
Servers look like they are struggling to process submissions
Almost 4 years agoLooks like the servers are struggling to keep up with submissions?
I see a large number of submissions waiting for scores⦠seems to have been like this since 17:45.
I really should learn not to leave my modelling to a Saturday afternoon. Lookβs like this will be the third week in a row that I fail to successfully get my Saturday profit leaderboard submission made before the cutoff!
RMSE evaluation error - "pod deleted during operation"
Almost 4 years agoI had a pod timed out error last week. Itβs an error message that reminds me of scaling issues with kubernetes clusters, ie issues with the tech behind the scenes of the competition site.
For me the issues appeared to resolve themselves β¦ or more likely went away when the admins did something behind the scenes.
Seeking clarification on model selection
About 4 years agoThanks @alfarzan thatβs enough for me to work on.
To summarise; the competitive profit leaderboard will use the predict_premium function from the last valid upload prior to the weekly Saturday 10pm cutoff.
So I must remember to keep a submission free and make sure I upload in plenty of time for the 10pm cutoff.
Ye olde library
Almost 4 years agoNot a paper as such but a pricing competition Iβll always remember was the Porto Seguro competition on Kaggle. It was special in that it was the first time I recall seeing xgboost not be the major part of the winning solution. Instead the winner Michael Jahrer used denoising auto encoders (a neural network approach).
Michael won the competition by a wide margin and how he won became a big discussion topic among the Kaggle community, many people trying to replicate his approach.
While weβve not seen his winning approach port to other competitions with similar success it does, for me, show that when data allows neural network based approaches will outperform the current ubiquitous GBM type approaches.