I've been looking into betfairs historic data files and trying to work out the prices on a football fixture's "match odds" market just before it goes in play.
I thought I'd cracked it by using the record where IN_PLAY column = 'PE' and LATEST_TAKEN
I've been trying by using the last record with a timestamp just prior to the actual off time and an IN-PLAY or 'PE' ( pre event ) but this sometimes results in a record a few hours before the off.
Looking at the data it appears that 'PE' indicates the bet was placed before the event went off and does not distinguish whether it was matched before or after the off.
Anyone know how to extract the market prices for Pre-Inplay ?
Thanks in advance
I've been trying by using the last record with a timestamp just prior to the actual off time and an IN-PLAY or 'PE' ( pre event )but this sometimes results in a record a few hours before the off.Looking at the data it appears that 'PE' indicates the
The problem is Betfair class keep bets as PE so they can be taken during the inplay period. I guess the best way is to determine the off time using the lowest IP time then use the last PE bet prior to that
The problem is Betfair class keep bets as PE so they can be taken during the inplay period. I guess the best way is to determine the off time using the lowest IP time then use the last PE bet prior to that
noooooo..... not quite, getting the first IP time and then using the first PE prior to that, Occasionally returns a record a few hours prior to the off. much the same as using the DT_ACTUAL_OFF field.
using the first IP record itself produces under round markets.
bizarre why.... when the data files contain about 10mb ( yes thats 10,000,000 10......MIIIIIIILLLLLLLLION..... BYTES ) for each match, They couldn't use 100 or so for such non entities at the SP.
noooooo..... not quite, getting the first IP time and then using the first PE prior to that, Occasionally returns a record a few hours prior to the off. much the same as using the DT_ACTUAL_OFF field.using the first IP record itself produces under ro
The raw data doesn't look sorted to me (PE rows are intermingled with IP rows). I think you'll need to sort the data by event and then by market. How are you attempting to extract the data? Perl would be good for this as you could read-in only those events relevant to your interest, then sort them in a hash (or a hash of arrays) and then extract the data in the format that you require.
Hi EscapeeThe raw data doesn't look sorted to me (PE rows are intermingled with IP rows). I think you'll need to sort the data by event and then by market. How are you attempting to extract the data? Perl would be good for this as you could read-in o
No that wouldnt work because there could be a PE with fist/last taken both before actual date off where the later PE has a first taken before the aforementioned PE but a last taken well into play IP. The former record may then be best used. However if the latter record has the last first taken PE you can maybe use this provided the overround is not out by too far. I have my own techniques for this which provides an excellent approximation in these cases.
No that wouldnt work because there could be a PE with fist/last taken both before actual date off where the later PE has a first taken before the aforementioned PE but a last taken well into play IP. The former record may then be best used. However i
The dat by the way needs a mega amount of time spent on it for numerous reasons. It has to be meticulously cleansed as the markets are so tight if you leave it how it is then it can do more harm than good.
The dat by the way needs a mega amount of time spent on it for numerous reasons. It has to be meticulously cleansed as the markets are so tight if you leave it how it is then it can do more harm than good.
The start of this thread has caused me to look at this raw data for the first time. Eric is seemingly right about the data cleansing and also the various conditionals to pull out the correct information, but this looks well worth the effort. I think I'll have a go myself and start pulling out markets of interest for myself.
Thanks guys.
The start of this thread has caused me to look at this raw data for the first time. Eric is seemingly right about the data cleansing and also the various conditionals to pull out the correct information, but this looks well worth the effort. I think
Looking at the data format some more, I think it was the introduction of 'keep bets' which caused the data to go a bit doo-lally.
If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.
Looking at the data format some more, I think it was the introduction of 'keep bets' which caused the data to go a bit doo-lally.If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would al
•LATEST_TAKEN (when these odds were last matched on the selection) •FIRST_TAKEN (when these odds were first matched on the selection)
Aren't they already doing that Escapee?•LATEST_TAKEN (when these odds were last matched on the selection) •FIRST_TAKEN (when these odds were first matched on the selection)
because if that were the case then there would be zero 'PE' records with a LATEST_TAKEN after the kick off time.
i.e. the IN_PLAY field states that the bet was placed 'PE' pre-event, (prior to kick off), and the LATEST_TAKEN fields states the bet was matched 'IP' in-play.
So I'd speculate that the Bet-Placed-Date is used to derive the IN-PLAY field and the Bet-Matched-Date is used to derive the LATEST-TAKEN field.
Doesn't look like it to me Feck,because if that were the case then there would be zero 'PE' records with a LATEST_TAKEN after the kick off time.i.e. the IN_PLAY field states that the bet was placed 'PE' pre-event, (prior to kick off), and the LATEST_
If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.
this should be....
If they changed from using the bet placed date to the bet matched date as the source of the IN_PLAY column then it would all sort itself out again.
If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.this should be....If they changed from using the bet placed date to the bet matched date as the source of
I see what you mean now Escapee. All that's needed to resolve this is a list of eventid's and what time they were turned in play. Maybe not the sort of information they'd like to give out given their record.
I see what you mean now Escapee. All that's needed to resolve this is a list of eventid's and what time they were turned in play. Maybe not the sort of information they'd like to give out given their record.
That information would render a lot of the data useless though (where the first taken was before the off and the last taken after the off). Masters of the universe my @rse.
That information would render a lot of the data useless though (where the first taken was before the off and the last taken after the off). Masters of the universe my @rse.
I think the way the data's currently laid out means you'll never get an exact idea of the market at suspension just because a pre event bet at the correct starting price could also be last taken well into the game and wouldn't therefore show at the off time.
Looking at the data again the only way I could see of you getting some approximate odds would be to use the average FIRST_TAKEN IP odds at the DT ACTUAL_OFF time, that should hopefully be close to the actual start time odds as there would be keep bets in the market holding the prices and using the average should ensure both sides of the spread were included to eliminate the fact maybe just the lay side was taken first etc
I think the way the data's currently laid out means you'll never get an exact idea of the market at suspension just because a pre event bet at the correct starting price could also be last taken well into the game and wouldn't therefore show at the o
I thought the problem with pre-off bets being listed after the event start was a football specific thing, and happened when Betfair would take the market out of 'in play' during half time? That's just a guess though as I haven't really looked into it. The horse racing data doesn't seem to have the same problem.
As has been said, you need to remove PE bets taken after ACTUAL_OFF, or after the first inplay bet has been struck. To get the SP, I wouldn't take the first inplay bets (they might vary wildly) but take the last pre-start bets.
I thought the problem with pre-off bets being listed after the event start was a football specific thing, and happened when Betfair would take the market out of 'in play' during half time? That's just a guess though as I haven't really looked into it
just thought i'd ttt this to see if anyone else has any other methods
I've been experimenting with various criteria and approaches. Using the PE record with the largest NUMBER_BETS seems to the best so far for creating a valid market.
I need to try it on more data files to confirm.
just thought i'd ttt this to see if anyone else has any other methodsI've been experimenting with various criteria and approaches.Using the PE record with the largest NUMBER_BETS seems to the best so far for creating a valid market.I need to try it o
As has been said, you need to remove PE bets taken after ACTUAL_OFF, or after the first inplay bet has been struck. To get the SP, I wouldn't take the first inplay bets (they might vary wildly) but take the last pre-start bets.
The trouble with that is they only time PE bets by first and last taken Mr Magoo, so say the off price was 6's, 6's may have been taken a day before on that selection and also mid way thru the match so wouldn't appear anywhere near the actual off time.
As has been said, you need to remove PE bets taken after ACTUAL_OFF, or after the first inplay bet has been struck. To get the SP, I wouldn't take the first inplay bets (they might vary wildly) but take the last pre-start bets.The trouble with that i
I don't see your problem. It is easy to work out the start time of the event. So, you then disregard all pre-event bets in the data that are matched after this time. Find the bet(s) that were matched closest to the off time. You could either use FIRST_TAKEN or LATEST_TAKEN to pick them, you'll still end up with something close to the right time.
This should get a 'good enough' measure of what was the SP at the time. To double-check, just ensure that the SP overround is close to 100%.
Aiming to be totally exact on the last traded odds seems a waste of effort. Just before the off, punters will have a choice of two odds to bet on for each runner/team (i.e. the back or the lay price) and in a stable, liquid market it is (mostly) random which one will have been traded last. So I wouldn't worry too much about exact precision here.
I don't see your problem. It is easy to work out the start time of the event. So, you then disregard all pre-event bets in the data that are matched after this time. Find the bet(s) that were matched closest to the off time. You could either use FIRS
Mr Magoo, Yea I tried that initially but found that some of the Match odds markets had oscilated so much after kick off that the latest PE record with a timestamp before the Kick off was 8 hours prior to the match.
Using that method I was getting markets with a range 97%-110%.
Using the PE record with the largest number of bets the range is more like 99.5%-101% with only about 2 or 3 markets per season outside that range.
I tried a few different approaches, none were perfect, but the Number-Of-Bets approach does at least represent what the majority of punters got matched at and gave the narrowest range of over/underround.
Mr Magoo, Yea I tried that initially but found that some of the Match odds markets had oscilated so much after kick off that the latest PE record with a timestamp before the Kick off was 8 hours prior to the match.Using that method I was getting mark
Makes sense I guess - most of the betting will be close to the start of the match, and last minute football odds changes should be fairly rare.
Have you noticed if there a pattern in which matches/bets are marked PE but are taken after the off? I don't get why some markets have this problem and others don't. I can't see why it should have anything to do with keep-bets, since they are going to be matched against in-play bets and so it would be easy to mark them as such.
Makes sense I guess - most of the betting will be close to the start of the match, and last minute football odds changes should be fairly rare.Have you noticed if there a pattern in which matches/bets are marked PE but are taken after the off? I don'
Mr Magoo I can't see why it should have anything to do with keep-bets, since they are going to be matched against in-play bets and so it would be easy to mark them as such.
Going by the matches I looked at in detail, I reckon that records are marked as PE or IP according to the Bet-Placed-Date, not the Bet-Matched-Date. I can't think of any other reason why there would be PE records with a latest taken timestamp dated after Kick off time.
I'd also hazard a guess that the whole data achiving process was designed/written before the 'Keep Bets' feature was introduced and therefore using either Bet-Placed or Bet-Matched date would have produced the same ( correct ) results and there wouldn't have been an issue. But with the introduction of Keep Bets, it matters which date field is used to derive the IN-PLAY flag
Anyone know how to raise a bug/change request ticket with betfair ?
Mr MagooI can't see why it should have anything to do with keep-bets, since they are going to be matched against in-play bets and so it would be easy to mark them as such. Going by the matches I looked at in detail, I reckon that records are marked a
I was aware of this when keep bets first appeared in the data and did ask Betfair if this could change but to no avail. You can approximate these cases well enough to get an sp however everyone is going to do this differently. There are some important factors to consider when doing this which again everyone will do differently.
I was aware of this when keep bets first appeared in the data and did ask Betfair if this could change but to no avail. You can approximate these cases well enough to get an sp however everyone is going to do this differently. There are some importan
The Investor, if you know of an app/have an app which will collect data from markets simultaneously in one sport which you can set a market sample rate for that is stable and not affected in practice by the number of current markets being sampled/stored I would be interested in paying for it. Attempted to get this once but the developer Stefan let me down badly on here the sample rate was all over the place when the number of markets being monitored changed making it an expensive screensaver.
The Investor, if you know of an app/have an app which will collect data from markets simultaneously in one sport which you can set a market sample rate for that is stable and not affected in practice by the number of current markets being sampled/sto
It took us a while to figure out but here is the PHP code as is for getting the best possible overround. its not perfect, but from the tests we did it got us within +/- 5% from 100% overround with the most of them being +/- 1%. this isn't our import script - just a proof of concept... basically finding the smallest difference between LASTEST_TAKEN and KICKOFF timestamps.
while ( $row = fgetcsv($fp, 2048) ) { if ( trim($row[3]) != "English Soccer/Barclays Premier League/Fixtures 01 November /Arsenal v Burnley" ) continue;
It took us a while to figure out but here is the PHP code as is for getting the best possible overround. its not perfect, but from the tests we did it got us within +/- 5% from 100% overround with the most of them being +/- 1%. this isn't our import
Check the previous scores for both sides on soccerway, stand up and turn away from the laptop, have a good fart and the predicted score will be just as accurate as most of the shyte on this thread.
Check the previous scores for both sides on soccerway, stand up and turn away from the laptop, have a good fart and the predicted score will be just as accurate as most of the shyte on this thread.
If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.
This is a pain all over (account statement, bet history etc.)
Who really gives a damn about placed date? Matched date is so much more important.
If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.This is a pain all over (account statement, bet history etc.)Who really gives a damn about placed date? M
Agreed. All those Gigabytes of Betfair Data - what a waste. It could have been so useful, for no extra trouble.
Seems to me you shouldn't be using BF data in the first place. But if you are, there is now (but there wasn't in 2010) at least a way to gauge whether your preferred way of estimating football SP is reasonable:- compare your estimate with BSP (in those markets which have a BSP).
I'm not saying BSP is a perfect match for what you're looking for, "Last price traded prior to kick-off". But it's pretty good. If you aren't getting close to BSP (where available), then chances are it's your estimate that's skewed, not BSP.
Agreed. All those Gigabytes of Betfair Data - what a waste. It could have been so useful, for no extra trouble.Seems to me you shouldn't be using BF data in the first place. But if you are, there is now (but there wasn't in 2010) at least a way to ga