Forums

General Betting

Welcome to Live View – Take the tour to learn more
Start Tour
There is currently 1 person viewing this thread.
Escapee
30 Dec 10 21:54
Joined:
Date Joined: 27 Aug 04
| Topic/replies: 10,554 | Blogger: Escapee's blog
I've been looking into betfairs historic data files and trying to work out the prices on a football fixture's "match odds" market just before it goes in play.

I thought I'd cracked it by using the record where
IN_PLAY column = 'PE'
and LATEST_TAKEN
Pause Switch to Standard View Any Betfair data experts about ?
Show More
Loading...
Report Escapee December 30, 2010 9:55 PM GMT
well it seems to have lost 75% of my post
Report Escapee December 30, 2010 9:56 PM GMT
Does anyone know how to extract the last pre-off prices for match odds from betfairs historical data files?
Report Escapee December 30, 2010 10:00 PM GMT
I've been trying by using the last record with a timestamp just prior to the actual off time and an IN-PLAY or 'PE' ( pre event )
but this sometimes results in a record a few hours before the off.

Looking at the data it appears that 'PE' indicates the bet was placed before the event went off and does not distinguish whether it was matched before or after the off.


Anyone know how to extract the market prices for Pre-Inplay ?


Thanks in advance
Report Ghetto Joe December 30, 2010 10:41 PM GMT
The problem is Betfair class keep bets as PE so they can be taken during the inplay period. I guess the best way is to determine the off time using the lowest IP time then use the last PE bet prior to that
Report Escapee December 30, 2010 11:17 PM GMT
Thanks Joe, that seems like a bright idea.

I'll see if it works
Report Escapee December 31, 2010 12:24 AM GMT
noooooo..... not quite, getting the first IP time and then using the first PE prior to that, Occasionally returns a record a few hours prior to the off. much the same as using the DT_ACTUAL_OFF field.

using the first IP record itself produces under round markets.


bizarre why.... when the data files contain about 10mb ( yes thats 10,000,000    10......MIIIIIIILLLLLLLLION..... BYTES ) for each match,
They couldn't use 100 or so for such non entities at the SP.
Report ebasson61 December 31, 2010 8:51 AM GMT
Hi Escapee

The raw data doesn't look sorted to me (PE rows are intermingled with IP rows). I think you'll need to sort the data by event and then by market. How are you attempting to extract the data? Perl would be good for this as you could read-in only those events relevant to your interest, then sort them in a hash (or a hash of arrays) and then extract the data in the format that you require.
Report ebasson61 December 31, 2010 9:03 AM GMT
Isn't it just a case of the smallest LAST_TAKEN column that is larger than the DT_ACTUAL_OFF column?
Report eric_morris December 31, 2010 9:16 AM GMT
No that wouldnt work because there could be a PE with fist/last taken both before actual date off where the later PE has a first taken before the aforementioned PE but a last taken well into play IP. The former record may then be best used. However if the latter record has the last first taken PE you can maybe use this provided the overround is not out by too far. I have my own techniques for this which provides an excellent approximation in these cases.
Report eric_morris December 31, 2010 9:20 AM GMT
The dat by the way needs a mega amount of time spent on it for numerous reasons. It has to be meticulously cleansed as the markets are so tight if you leave it how it is then it can do more harm than good.
Report ebasson61 December 31, 2010 9:30 AM GMT
The start of this thread has caused me to look at this raw data for the first time. Eric is seemingly right about the data cleansing and also the various conditionals to pull out the correct information, but this looks well worth the effort. I think I'll have a go myself and start pulling out markets of interest for myself.

Thanks guys.
Report Escapee December 31, 2010 11:34 AM GMT
Looking at the data format some more, I think it was the introduction of 'keep bets' which caused the data to go a bit doo-lally.

If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.
Report Feck N. Eejit December 31, 2010 11:49 AM GMT
Aren't they already doing that Escapee?

•LATEST_TAKEN (when these odds were last matched on the selection)
•FIRST_TAKEN (when these odds were first matched on the selection)
Report Escapee December 31, 2010 12:12 PM GMT
Doesn't look like it to me Feck,

because if that were the case then there would be zero 'PE' records with a LATEST_TAKEN after the kick off time.

i.e. the IN_PLAY field states that the bet was placed 'PE' pre-event, (prior to kick off), and the LATEST_TAKEN fields states the bet was matched 'IP' in-play.



So I'd speculate that the Bet-Placed-Date is used to derive the IN-PLAY field and the Bet-Matched-Date is used to derive the LATEST-TAKEN field.
Report Escapee December 31, 2010 12:30 PM GMT
If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.

this should be....

If they changed from using the bet placed date to the bet matched date as the source of the IN_PLAY column then it would all sort itself out again.
Report Feck N. Eejit December 31, 2010 2:13 PM GMT
I see what you mean now Escapee. All that's needed to resolve this is a list of eventid's and what time they were turned in play. Maybe not the sort of information they'd like to give out given their record.
Report Feck N. Eejit December 31, 2010 2:15 PM GMT
That information would render a lot of the data useless though (where the first taken was before the off and the last taken after the off). Masters of the universe my @rse.
Report Ghetto Joe December 31, 2010 7:09 PM GMT
I think the way the data's currently laid out means you'll never get an exact idea of the market at suspension just because a pre event bet at the correct starting price could also be last taken well into the game and wouldn't therefore show at the off time.

Looking at the data again the only way I could see of you getting some approximate odds would be to use the average FIRST_TAKEN IP odds at the DT ACTUAL_OFF time, that should hopefully be close to the actual start time odds as there would be keep bets in the market holding the prices and using the average should ensure both sides of the spread were included to eliminate the fact maybe just the lay side was taken first etc
Report Mr Magoo January 1, 2011 10:03 AM GMT
I thought the problem with pre-off bets being listed after the event start was a football specific thing, and happened when Betfair would take the market out of 'in play' during half time? That's just a guess though as I haven't really looked into it. The horse racing data doesn't seem to have the same problem.

As has been said, you need to remove PE bets taken after ACTUAL_OFF, or after the first inplay bet has been struck. To get the SP, I wouldn't take the first inplay bets (they might vary wildly) but take the last pre-start bets.
Report Escapee January 3, 2011 12:12 PM GMT
just thought i'd ttt this to see if anyone else has any other methods

I've been experimenting with various criteria and approaches.
Using the PE record with the largest NUMBER_BETS seems to the best so far for creating a valid market.

I need to try it on more data files to confirm.
Report Ghetto Joe January 3, 2011 4:33 PM GMT
As has been said, you need to remove PE bets taken after ACTUAL_OFF, or after the first inplay bet has been struck. To get the SP, I wouldn't take the first inplay bets (they might vary wildly) but take the last pre-start bets.

The trouble with that is they only time PE bets by first and last taken Mr Magoo, so say the off price was 6's, 6's may have been taken a day before on that selection and also mid way thru the match so wouldn't appear anywhere near the actual off time.
Report Mr Magoo January 3, 2011 8:16 PM GMT
I don't see your problem. It is easy to work out the start time of the event. So, you then disregard all pre-event bets in the data that are matched after this time. Find the bet(s) that were matched closest to the off time. You could either use FIRST_TAKEN or LATEST_TAKEN to pick them, you'll still end up with something close to the right time.

This should get a 'good enough' measure of what was the SP at the time. To double-check, just ensure that the SP overround is close to 100%.

Aiming to be totally exact on the last traded odds seems a waste of effort. Just before the off, punters will have a choice of two odds to bet on for each runner/team (i.e. the back or the lay price) and in a stable, liquid market it is (mostly) random which one will have been traded last. So I wouldn't worry too much about exact precision here.
Report Escapee January 3, 2011 9:29 PM GMT
Mr Magoo, Yea I tried that initially but found that some of the Match odds markets had oscilated so much after kick off that the latest PE record with a timestamp before the Kick off was 8 hours prior to the match.

Using that method I was getting markets with a range 97%-110%.

Using the PE record with the largest number of bets the range is more like 99.5%-101% with only about 2 or 3 markets per season outside that range.

I tried a few different approaches, none were perfect, but the Number-Of-Bets approach does at least represent what the majority of punters got matched at and gave the narrowest range of over/underround.
Report Escapee January 3, 2011 9:30 PM GMT
*Using that method I was getting markets with a range 97%-110%. with some as low as 89%
Report Mr Magoo January 3, 2011 9:46 PM GMT
Makes sense I guess - most of the betting will be close to the start of the match, and last minute football odds changes should be fairly rare.

Have you noticed if there a pattern in which matches/bets are marked PE but are taken after the off? I don't get why some markets have this problem and others don't. I can't see why it should have anything to do with keep-bets, since they are going to be matched against in-play bets and so it would be easy to mark them as such.
Report Escapee January 4, 2011 12:16 PM GMT

Mr Magoo

I can't see why it should have anything to do with keep-bets, since they are going to be matched against in-play bets and so it would be easy to mark them as such.


Going by the matches I looked at in detail, I reckon that records are marked as PE or IP according to the Bet-Placed-Date, not the Bet-Matched-Date.
I can't think of any other reason why there would be PE records with a latest taken timestamp dated after Kick off time.


I'd also hazard a guess that the whole data achiving process was designed/written before the 'Keep Bets' feature was introduced and therefore using either Bet-Placed or Bet-Matched date would have produced the same ( correct ) results and there wouldn't have been an issue.
But with the introduction of Keep Bets, it matters which date field is used to derive the IN-PLAY flag



Anyone know how to raise a bug/change request ticket with betfair ?
Report Feck N. Eejit January 4, 2011 4:56 PM GMT
Don't waste your time Escapee. They're well aware of it but can't be @rsed.
Report The Investor January 4, 2011 6:17 PM GMT
I guess you'd be better of collecting this data yourself for future events.
Report eric_morris January 4, 2011 11:25 PM GMT
I was aware of this when keep bets first appeared in the data and did ask Betfair if this could change but to no avail. You can approximate these cases well enough to get an sp however everyone is going to do this differently. There are some important factors to consider when doing this which again everyone will do differently.
Report eric_morris January 4, 2011 11:30 PM GMT
The Investor, if you know of an app/have an app which will collect data from markets simultaneously in one sport which you can set a market sample rate for that is stable and not affected in practice by the number of current markets being sampled/stored I would be interested in paying for it. Attempted to get this once but the developer Stefan let me down badly on here the sample rate was all over the place when the number of markets being monitored changed making it an expensive screensaver.
Report The Investor January 5, 2011 1:46 PM GMT
eric, I don't have it yet.

I'm also interested in getting this set up in the near future though, probably as part of a custom build API app.
Report Ghetto Joe January 5, 2011 3:53 PM GMT
I reckon it'll cost you a fortune in data charges come Saturday afternoon, Investor.
Report eric_morris January 5, 2011 7:23 PM GMT
The Investor ... if you are looking for someone to split the cost of the app just let me know.
Report TJC November 12, 2015 3:00 AM GMT
It took us a while to figure out but here is the PHP code as is for getting the best possible overround. its not perfect, but from the tests we did it got us within +/- 5% from 100% overround with the most of them being +/- 1%. this isn't our import script - just a proof of concept... basically finding the smallest difference between LASTEST_TAKEN and KICKOFF timestamps.




$fp = fopen("bfinf_other_141027to141102_141105122445.csv", "rt" );

while ( $row = fgetcsv($fp, 2048) )
{
    if ( trim($row[3]) != "English Soccer/Barclays Premier League/Fixtures 01 November /Arsenal v Burnley" )
        continue;

    if ( trim($row[5] ) != "Match Odds" )
        continue;

    if ( trim($row[15]) == "IP" )
        continue;

    $kickoff_ts             = get_timestamp(rearrange_date(trim($row[4]).":00"));
    $bet_ts                 = get_timestamp(rearrange_date(trim($row[6].":00") ) );
    $latest_taken_ts        = get_timestamp(rearrange_date(trim($row[12].":00") ) );

    $difference = abs($latest_taken_ts - $kickoff_ts);
    $selection = trim($row[8]);

    if ( !$difference_selector[$selection]['difference'] )
    {
        $difference_selector[$selection]['difference'] = $difference;
        $difference_selector[$selection]['odds'] = trim($row[9]); 
    }
    else
    {
        if ( $difference < $difference_selector[$selection]['difference'] )
        {
            $difference_selector[$selection]['difference'] = $difference;
            $difference_selector[$selection]['odds'] = trim($row[9]);
        }
    }


}
print_r ( $difference_selector);



function rearrange_date ($date)
{
    $pieces = explode ( " ", $date );
    $date_pieces = explode ( "-", $pieces[0] );

    return $date_pieces[2]."-".$date_pieces[1]."-".$date_pieces[0] . " " . $pieces[1];


}



function get_timestamp($date)
{
    // If the date is a number, assume it's a timestamp
    if ($date == strval(intval($date))) {
        return $date;
    }

    $pi = explode(" ", $date);
    if (count($pi) == 2) {
        $date_p = explode("-", $pi[0]);
        $time_p = explode(":", $pi[1]);
        $timestamp = mktime(
            (int)$time_p[0],
            (int)$time_p[1],
            (int)$time_p[2],
            (int)$date_p[1],
            (int)$date_p[2],
            (int)$date_p[0]
        );
    } else {
        if (strlen($date) == 8) {
            $date_p = explode("/", $date);
            $date_p[2] = "20" . $date_p[2];
            $timestamp = gmmktime(0, 0, 0, (int)$date_p[1], (int)$date_p[0], (int)$date_p[2]);
        } else {
            $date_p = explode("-", $date);
            $timestamp = gmmktime(0, 0, 0, (int)$date_p[1], (int)$date_p[2], (int)$date_p[0]);
        }
    }

    return $timestamp;

}
Report Westender November 12, 2015 5:14 PM GMT
Check the previous scores for both sides on soccerway, stand up and turn away from the laptop, have a good fart and the predicted score will be just as accurate as most of the shyte on this thread.
Report PeteTheBloke November 12, 2015 8:37 PM GMT
Thanks for open-sourcing TJC.
Report Trevh November 12, 2015 9:27 PM GMT
Wow haha, nearly 5 years later, that is certainly "a while", nice! :)
Report TheInvestor2 January 19, 2016 3:06 PM GMT
If they changed from using the bet placed date to the bet matched date as the source of the date columns then it would all sort itself out again.

This is a pain all over (account statement, bet history etc.)

Who really gives a damn about placed date?
Matched date is so much more important.
Report sun January 19, 2016 4:32 PM GMT
Agreed. All those Gigabytes of Betfair Data - what a waste. It could have been so useful, for no extra trouble.

Seems to me you shouldn't be using BF data in the first place. But if you are, there is now (but there wasn't in 2010) at least a way to gauge whether your preferred way of estimating football SP is reasonable:- compare your estimate with BSP (in those markets which have a BSP).

I'm not saying BSP is a perfect match for what you're looking for, "Last price traded prior to kick-off". But it's pretty good. If you aren't getting close to BSP (where available), then chances are it's your estimate that's skewed, not BSP.
Report Mr Magoo January 24, 2016 3:01 PM GMT
How long have Betfair been doing SPs on football? I hadn't noticed...
Report longbridge January 25, 2016 12:51 PM GMT
Since 2014 at least.  Maybe longer.
Post Your Reply
<CTRL+Enter> to submit
Please login to post a reply.

Wonder

Instance ID: 13539
www.betfair.com