"AllDataBase" - AllData_J1's Database
The long awaited release of AllData's database has finally arrived!
Following, are some basic instructions for "AllDataBase_v10" (although this version runs in all versions of Excel, we suggest you use pre-2007 Excel, due to the very slow processing time in Excel 2007). I will start you off slowly, going step by step in order to prevent your being confused or overwhelmed by all the elements of the program.
You can drop down below these instructions, to the download area and download the database, if you like, or you can read the instructions first.
If you download now, click on the ".zip" file, making sure you know where the file will be put on your computer. After the download is complete, extract the program using an "unzipping" application. Once "unzipped", move the ".xls" Excel workbook to your "C" drive, preferably in it's own directory so that the file path for it is simple, something like "C:\AllDataBase_v10" (your drive may be different letter, that's ok, just try to keep the path to it simple).
Once you have moved it to your drive, right click it, select "Send to", then "Desktop - Create shortcut" (your OS may not have this exact wording but it should be similar). The shortcut you create will appear on your desktop, allowing you to open the database by clicking that shortcut.
Whether you download it now or later, please read these instructions before trying to operate the database. Also, read the help documents on the sub-page, under the "AllDataBase" tab, "AllDataBase Help Document" for instructions on the latest updates.
You can drop down below these instructions, to the download area and download the database, if you like, or you can read the instructions first.
If you download now, click on the ".zip" file, making sure you know where the file will be put on your computer. After the download is complete, extract the program using an "unzipping" application. Once "unzipped", move the ".xls" Excel workbook to your "C" drive, preferably in it's own directory so that the file path for it is simple, something like "C:\AllDataBase_v10" (your drive may be different letter, that's ok, just try to keep the path to it simple).
Once you have moved it to your drive, right click it, select "Send to", then "Desktop - Create shortcut" (your OS may not have this exact wording but it should be similar). The shortcut you create will appear on your desktop, allowing you to open the database by clicking that shortcut.
Whether you download it now or later, please read these instructions before trying to operate the database. Also, read the help documents on the sub-page, under the "AllDataBase" tab, "AllDataBase Help Document" for instructions on the latest updates.
Instructions for AllDataBase_v10.xls
The "D" worksheet (think "Database")
The above screenshot shows the "D" worksheet, this is where almost all of your time will be spent. Note that there are several sections that have been "bordered". These bordered sections allow you to interact with your data files and/or view reports, etc..
Let's start in the top-left corner, cell range A1:J18. This area is where most of your filtering will be done. You'll note that there are rows of labels with 2 cells below each one. The label tells you which factor you are filtering. Some factors are rankings and some are dates, distances in yards, dollar values, etc. Most of the factors that are not rankings are highlighted yellow. The ones that aren't highlighted yellow are rankings.
The 2 cells below the labels contain a maximum rank or value (on top) and a minimum rank or value(on bottom). For instance, in cell B1 is the label "Distance", and below that label are the max and min value for filtering the distance of races you want to analyze. On top, in cell B2, you see "1350", and, in cell B3, you see "1300". This shows that the distance filter is set to return all races at distances above 1300 yards, but below 1350 yards. Look over to the right, in cells K2 and K3, and you will see these 2 yardages converted into furlongs ( 5.91f and 6.14f). This means that you only want to see 6f races, because that is the only race distance that falls between the maximum and minimum filter settings. If you want to see both 6f and 6.5f races then just click in cell B2 and type 1431 (6.5f x 220 yards = 1430 yards, so you add 1 to 1430, giving you 1431, so that you are including races as long as 1430 yards).
So, we see that we are creating ranges of values or rankings, and the range tells the database the values or rankings to be included in the filtering for that factor.
Let's look at a factor that is filtered on "rankings". Look at cells B10, B11, and B12. The factor is labeled "EPS", or "Earnings Per Start". The maximum rank is 11, and the minimum rank is -1. These 2 numbers mean that any horse, in each race, that ranks from 1 through 10 will be returned for analysis. Note that the rankings used in AllDataBase are reversed from what you are used to seeing, 10 is the highest/best rank and 1 is the lowest/worst ranking, that can be returned by the filter. It doesn't matter if there are 6 horses or 20 horses in a race, if your maximum rank is set to 11 and your minimum rank is set to -1, the filter will return the highest ranked horses in the race, up to 10 horses. If there are 12 horses in a race, then the 2 lowest ranking horses will not be returned, only the top 10 will be returned. So, if you want to return only the top ranked horse in each race, in "EPS", you would enter a max rank setting of 11, and a minimum rank setting of 9. This results in a filter value of 10, because 10 is the only rank between 9 and 11. This may sound confusing, but, just remember that the max setting and the min setting will NOT be returned, only the rankings between those 2 numbers will be included in the filter, for that factor.
Below these values and rankings filters is a row of filters starting with: "Surface", Race Type", etc.. Here you can filter by any of these labels, for instance, if you wanted to analyze all races except maiden races, under "Race Type" just type in "<>M*" (without the quotation marks) the wildcard characters <>* mean " does not equal (<>) any type of (*), and the "M" means maiden races". "C" would return only claiming races. Look on the "Notes" worksheet for a list of wildcard characters you can use.
Note that you may have to expand the column widths for some of the factors, in order to see the whole label, or, you can just click the label and look up in your formula bar to read the label. If you have questions regarding what some of the labels mean, just ask, in the Blog or by email or on any of the forums that have AllData sections (PaceAdvantage.com, Partsnutspage.freeforums.org, or, AllDataExcel.freeforums.org). Harry or I will answer all of your questions.
Let's start in the top-left corner, cell range A1:J18. This area is where most of your filtering will be done. You'll note that there are rows of labels with 2 cells below each one. The label tells you which factor you are filtering. Some factors are rankings and some are dates, distances in yards, dollar values, etc. Most of the factors that are not rankings are highlighted yellow. The ones that aren't highlighted yellow are rankings.
The 2 cells below the labels contain a maximum rank or value (on top) and a minimum rank or value(on bottom). For instance, in cell B1 is the label "Distance", and below that label are the max and min value for filtering the distance of races you want to analyze. On top, in cell B2, you see "1350", and, in cell B3, you see "1300". This shows that the distance filter is set to return all races at distances above 1300 yards, but below 1350 yards. Look over to the right, in cells K2 and K3, and you will see these 2 yardages converted into furlongs ( 5.91f and 6.14f). This means that you only want to see 6f races, because that is the only race distance that falls between the maximum and minimum filter settings. If you want to see both 6f and 6.5f races then just click in cell B2 and type 1431 (6.5f x 220 yards = 1430 yards, so you add 1 to 1430, giving you 1431, so that you are including races as long as 1430 yards).
So, we see that we are creating ranges of values or rankings, and the range tells the database the values or rankings to be included in the filtering for that factor.
Let's look at a factor that is filtered on "rankings". Look at cells B10, B11, and B12. The factor is labeled "EPS", or "Earnings Per Start". The maximum rank is 11, and the minimum rank is -1. These 2 numbers mean that any horse, in each race, that ranks from 1 through 10 will be returned for analysis. Note that the rankings used in AllDataBase are reversed from what you are used to seeing, 10 is the highest/best rank and 1 is the lowest/worst ranking, that can be returned by the filter. It doesn't matter if there are 6 horses or 20 horses in a race, if your maximum rank is set to 11 and your minimum rank is set to -1, the filter will return the highest ranked horses in the race, up to 10 horses. If there are 12 horses in a race, then the 2 lowest ranking horses will not be returned, only the top 10 will be returned. So, if you want to return only the top ranked horse in each race, in "EPS", you would enter a max rank setting of 11, and a minimum rank setting of 9. This results in a filter value of 10, because 10 is the only rank between 9 and 11. This may sound confusing, but, just remember that the max setting and the min setting will NOT be returned, only the rankings between those 2 numbers will be included in the filter, for that factor.
Below these values and rankings filters is a row of filters starting with: "Surface", Race Type", etc.. Here you can filter by any of these labels, for instance, if you wanted to analyze all races except maiden races, under "Race Type" just type in "<>M*" (without the quotation marks) the wildcard characters <>* mean " does not equal (<>) any type of (*), and the "M" means maiden races". "C" would return only claiming races. Look on the "Notes" worksheet for a list of wildcard characters you can use.
Note that you may have to expand the column widths for some of the factors, in order to see the whole label, or, you can just click the label and look up in your formula bar to read the label. If you have questions regarding what some of the labels mean, just ask, in the Blog or by email or on any of the forums that have AllData sections (PaceAdvantage.com, Partsnutspage.freeforums.org, or, AllDataExcel.freeforums.org). Harry or I will answer all of your questions.
In the above screenshot you will see the area just below the filters area. This is the summary report area. It's pretty much self explanatory. It gives you the option of entering a "Start Bank" amount, "Type Bet" you want to check, "Per%Bank"(percentage of your bank you want to wager on each play, and the "Down" amount (all these "Bank" settings apply to a "Kelly" type wagering method.
The cell below the "WNRS/DATBSE" label counts the total number of winning horses in the database, this one has 335 horses' data in it. Below that are the number of plays your filtering created and the amount bet, # of plays, amount bet, etc..
To the left of that are the average mutuel of all the winners, and the maximum mutuel (largest payout) for all the winners.
Under"TypeBet" you see "Total Win" amount, "Win%" (hit rate), and "ROI" %. To the right are the same things for Place and Show wagers.
The cell below the "WNRS/DATBSE" label counts the total number of winning horses in the database, this one has 335 horses' data in it. Below that are the number of plays your filtering created and the amount bet, # of plays, amount bet, etc..
To the left of that are the average mutuel of all the winners, and the maximum mutuel (largest payout) for all the winners.
Under"TypeBet" you see "Total Win" amount, "Win%" (hit rate), and "ROI" %. To the right are the same things for Place and Show wagers.
The above screenshot shows a graphic representation of your bankroll's growth or decline, based on your filter settings. A consistent upward trend, from left to right, is ideal. This one shows high growth at the beginning, but, a steady, dramatic decline to $0. This is typical of "backfitting", where you think you have found the "golden goose", only to discover that it quits working. Makes it easy to quickly discern the potential value of your filter settings.
The above screenshot shows the area where you can "segment" your database. Segmenting allows you to filter the database, with promising filter settings, then run a portion of those returned plays "historically", meaning your looking at what happened in the past, using the portion including the "Divide 1st Segment" button, then you "drill down", modifying your filter settings to find factors that positively affect the number of plays, hit rate and ROI, etc., of those races, and once you have found what you believe is a set of filters that have value, you can use the "Divide 2nd Segment" to run that set of filter settings against a "fresh" set of races, from the same database. This screenshot does not show the proper way to segment. In the 1st segment you see that your filters returned 66 total plays, then you would "Start" with the first play (1) and run through, say, the 40th play (40). Then for the "Divide 2nd Segment" you would start with 41 and end with 66. So, you use the first 40 plays to drill down to some potentially good factors, then test those potentially good factors against the remaining plays in the database.
The "Analyze" button, when clicked, analyzes all the filters against the 1st segment and shows them to the right of this area. Below is a screenshot of that area:
The "Analyze" button, when clicked, analyzes all the filters against the 1st segment and shows them to the right of this area. Below is a screenshot of that area:
The above "Analyze Utility" area is the real "nitty gritty" of "AllDataBase". Here you will see every "ranked" factor in your filter area, listed, starting in the cell just right of "RUNNERS" (number of runners in the current segment).
Under each labeled factor are 3 groupings of numbers:
In the top group, the first number is the average ranking, for this factor, of all the winners. The number below that is the average ranking of the place horses, the number below that is the average ranking for the show horse, etc..
In the middle group, is the hit rate, for the top ranked horses, below that the hit rate for the 2nd ranked horses, 3rd ranked horses, etc., and finally,
In the bottom group, the ROI for the top ranked horses, 2nd ranked horses, 3rd ranked, etc..
Let's look at an example.
Look at the column labeled "Average Win":
The winners, in all the races in this segment, averaged a ranking of 7.3 (the winners were ranked higher, in this factor, than the place horses (6.9), the show horses (6.5), etc.. A good sign, the winners are ranked higher, in this factor, than the horses they beat.
In the 2nd group, hit rate, the top ranked horses, in this factor, won at a 24.9% rate, 2nd ranked horses won at a 14.9% rate, 3rd ranked won at 16%, etc.. The hit rate for the top ranked horses is a little low, but, look at the bottom group.
In the bottom group, the ROI. The top ranked horses had an ROI of 10.7%, now look at the lesser ranked horses, all negative ROI. Another positive sign, a decent hit rate coupled with a decent ROI for the top ranked horses, in this factor. You could now go back to the filtering area and change the settings for "Average Win" to a max of 11 and a min of 9, which would filter for the top ranked horse, only, in this factor. Then rerun the 1st segment, "drill down" some more and repeat, until you've found the best combinations of factors and rankings. Then click "Save 1st Segment" and all the groups and factors in the reports will be copied down below this report area, so you can have them to compare against to see if further filter changes are positive or negative.
This is just an example of what one might do to analyze the factors' potential values. However, beware that even the very best set-ups you can find will very likely not work forever. You must continually re-evaluate your set-ups and adjust as necessary, for them to continue to produce positive results.
Under each labeled factor are 3 groupings of numbers:
In the top group, the first number is the average ranking, for this factor, of all the winners. The number below that is the average ranking of the place horses, the number below that is the average ranking for the show horse, etc..
In the middle group, is the hit rate, for the top ranked horses, below that the hit rate for the 2nd ranked horses, 3rd ranked horses, etc., and finally,
In the bottom group, the ROI for the top ranked horses, 2nd ranked horses, 3rd ranked, etc..
Let's look at an example.
Look at the column labeled "Average Win":
The winners, in all the races in this segment, averaged a ranking of 7.3 (the winners were ranked higher, in this factor, than the place horses (6.9), the show horses (6.5), etc.. A good sign, the winners are ranked higher, in this factor, than the horses they beat.
In the 2nd group, hit rate, the top ranked horses, in this factor, won at a 24.9% rate, 2nd ranked horses won at a 14.9% rate, 3rd ranked won at 16%, etc.. The hit rate for the top ranked horses is a little low, but, look at the bottom group.
In the bottom group, the ROI. The top ranked horses had an ROI of 10.7%, now look at the lesser ranked horses, all negative ROI. Another positive sign, a decent hit rate coupled with a decent ROI for the top ranked horses, in this factor. You could now go back to the filtering area and change the settings for "Average Win" to a max of 11 and a min of 9, which would filter for the top ranked horse, only, in this factor. Then rerun the 1st segment, "drill down" some more and repeat, until you've found the best combinations of factors and rankings. Then click "Save 1st Segment" and all the groups and factors in the reports will be copied down below this report area, so you can have them to compare against to see if further filter changes are positive or negative.
This is just an example of what one might do to analyze the factors' potential values. However, beware that even the very best set-ups you can find will very likely not work forever. You must continually re-evaluate your set-ups and adjust as necessary, for them to continue to produce positive results.
Ok, now that we've seen some of the things that can be done, internally, solely within AllDataBase, let's get to the "nuts and bolts". How to populate the database. We need race cards and results files in AllDataBase for it to be worth anything. For this we must go back to AllData_J1.
Remember the above screenshot? It is of the AllData_J1, "Main" worksheet. Here we "List Cards", "List Results", select "With Results", enter our "Starting Card", our "Ending Card #", and enter a "1" in the "Batch Process" box. Now we can "Run Data", go get a cup of coffee, chill for a few minutes, and when we see the Timer window, we know all the race cards and results files have been processed, and are waiting in the "Data" worksheet (cells BW31 through EV-whatever).
In order to get them into AllDataBase we must first move all that data from AllData_J1 to another location, so AlldataBase can find it and import the data. To move the data from AllData_J1 we use the "OUTPUT" button. When we click that button, it opens a new worksheet, copies all the data from the prior mentioned range in sheet "Data", and pastes it in the new worksheet. Then that new worksheet is saved as "ZZOUTPUT.xls" and put in our Results folder, whose path was previously entered in cell K3, in AllData_J1, "Main" worksheet.
Once the data is in ZZOUTPUT.xls, in the results folder, we can close AllData_J1, we're done with it for now.
Now open the "AllDataBase" workbook, and on the "D" worksheet (the first screenshot at the top of this page), click the "Import Database" button. This will copy the data from "ZZOUTPUT.xls" and paste it in worksheet "C", starting in cell A17, above any data that was already in the database.
If you want to clear old data out of AllDataBase, before importing new data, go to worksheet "C", and around cells C8 and C9 you will see a button named "ERASE". Click "ERASE" and all the old data, starting down in cell A17 will be erased. Then you can go back to sheet "D", and click the "Import Database" button. This will paste the new data from "ZZOUTPUT.xls" into sheet "C", starting in cell A17.
Now you can click "Reset", if you want to go back to the default filter settings, or if you want to keep the settings you have, click "Query Database". This will cause all the races you have imported to appear starting in cell A41. Check to make sure they are the files you ran in AllData_J1.
Once you're satisfied that the data is correct, you're ready to set filters, segment, analyze, drill down, etc., etc..
In order to get them into AllDataBase we must first move all that data from AllData_J1 to another location, so AlldataBase can find it and import the data. To move the data from AllData_J1 we use the "OUTPUT" button. When we click that button, it opens a new worksheet, copies all the data from the prior mentioned range in sheet "Data", and pastes it in the new worksheet. Then that new worksheet is saved as "ZZOUTPUT.xls" and put in our Results folder, whose path was previously entered in cell K3, in AllData_J1, "Main" worksheet.
Once the data is in ZZOUTPUT.xls, in the results folder, we can close AllData_J1, we're done with it for now.
Now open the "AllDataBase" workbook, and on the "D" worksheet (the first screenshot at the top of this page), click the "Import Database" button. This will copy the data from "ZZOUTPUT.xls" and paste it in worksheet "C", starting in cell A17, above any data that was already in the database.
If you want to clear old data out of AllDataBase, before importing new data, go to worksheet "C", and around cells C8 and C9 you will see a button named "ERASE". Click "ERASE" and all the old data, starting down in cell A17 will be erased. Then you can go back to sheet "D", and click the "Import Database" button. This will paste the new data from "ZZOUTPUT.xls" into sheet "C", starting in cell A17.
Now you can click "Reset", if you want to go back to the default filter settings, or if you want to keep the settings you have, click "Query Database". This will cause all the races you have imported to appear starting in cell A41. Check to make sure they are the files you ran in AllData_J1.
Once you're satisfied that the data is correct, you're ready to set filters, segment, analyze, drill down, etc., etc..
Just a couple more things, for now. In the above screenshot, from AllDataBase, sheet "D", there are several buttons. The "Query Database" button is the same as the "Query Database" button below the graph, in the "segmenting" area. This way you don't have to move your curser all over the place to query the database. The "Analyze" button does the same thing as the "Analyze" button down in the "segmenting" area, whatever data was returned by the "DIVIDE 1ST SEGMENT" macro will be analyzed and will appear in the reports area to the right of the "segmenting" area.
The "RESET" button, returns all the filter settings back to their default settings, except for the "Date" and "Distance" filter settings, which aren't affected by the "Reset" button, you must change these 2 filters manually.
The "Import Database" button imports the "ZZOUTPUT.xls" file data and pastes it in sheet "C".
We'll talk about the "IMPORT DAILY" button, the "APPLY TO DAILY" button, and the "SAVE SETUP" button later (I don't even know how they work yet, gotta get back with Harry to find out).
The "UNIQUE DIST" button lists all the different race distances in the database, along with the number of plays for each of those "unique" distances, down below the "segmenting" area, starting in cell H56.
Here's a screenshot of that "Unique Distances" area:
The "RESET" button, returns all the filter settings back to their default settings, except for the "Date" and "Distance" filter settings, which aren't affected by the "Reset" button, you must change these 2 filters manually.
The "Import Database" button imports the "ZZOUTPUT.xls" file data and pastes it in sheet "C".
We'll talk about the "IMPORT DAILY" button, the "APPLY TO DAILY" button, and the "SAVE SETUP" button later (I don't even know how they work yet, gotta get back with Harry to find out).
The "UNIQUE DIST" button lists all the different race distances in the database, along with the number of plays for each of those "unique" distances, down below the "segmenting" area, starting in cell H56.
Here's a screenshot of that "Unique Distances" area:
Well, I think that's enough for now. Do some experimenting and get familiar with the workbook. Don't forget to save the original download somewhere, so if you mess something up you will have the original one still on your computer.
Please read the help documents located on the sub-page, under the "AllDataBase" tab, named: "AllDataBase Help Document".
Download for AllDataBase_v10
New Upated Version: Here's the latest version of AllDataBase, with all known bugs fixed. This version runs in all versions of Excel, however, it runs much slower in Excel 2007 and we suggest that you use a pre-2007 version with both workbooks.
Anyway, click the download below to get the latest version. Please read the help documents located on the sub-page, under the "AllDataBase" tab, named: "AllDataBase Help Document".
Ray (Raybo)
Anyway, click the download below to get the latest version. Please read the help documents located on the sub-page, under the "AllDataBase" tab, named: "AllDataBase Help Document".
Ray (Raybo)
alldatabase_v10.zip | |
File Size: | 5104 kb |
File Type: | zip |