View previous topic :: View next topic |
Author |
Message |
Tango
Joined: 18 Mar 2005 Posts: 24
|
Posted: Fri Mar 18, 2005 12:53 pm Post subject: Feeds for daily NBA standings, schedules, boxscores? |
|
|
Hi all:
I've been putting together the following types of graphs for wins/loss trending analysis like the following...
I'm trying to automate the collection of the data so that it's easier to create these graphs. Currently it's a pretty manual process of copying and pasting standings & team schedules+outcomes to do this.
What sources of daily standings and schedules do you all recommend for doing this?
I did visit Doug Steele's site so i could simplify the process some by taking the daily scores and loading them into a flat file for crunching. I was looking for a little more sophistication to do this.
I would appreciate everyone's thoughts on the matter. Thanks! |
|
Back to top |
|
|
Ed Küpfer
Joined: 30 Dec 2004 Posts: 785 Location: Toronto
|
Posted: Fri Mar 18, 2005 2:43 pm Post subject: Re: Feeds for daily NBA standings, schedules, boxscores? |
|
|
Hey Tango.
Tango wrote: | I did visit Doug Steele's site so i could simplify the process some by taking the daily scores and loading them into a flat file for crunching. I was looking for a little more sophistication to do this. |
I have a spreadsheet that does all that. Since you're only looking for day-to-day wins and losses, why not use Doug Steele's Game Results page? _________________ ed |
|
Back to top |
|
|
wtbarron
Joined: 10 Feb 2005 Posts: 12
|
Posted: Fri Mar 18, 2005 3:38 pm Post subject: |
|
|
If you're a coder, you could do what I do and parse everything straight out of NBA.com. |
|
Back to top |
|
|
Tango
Joined: 18 Mar 2005 Posts: 24
|
Posted: Fri Mar 18, 2005 4:24 pm Post subject: |
|
|
Hi Ed:
Thanks for the suggestion. I was referring to Doug's Game Results page as well. I could certainly do a "save as" of his page into a txt file and then use Excel or Access to clean up & format the data as I need it. That saves a bit of work for sure! However I was looking at doing something with a little more automation- something more along the lines of what wtbarron is suggesting.
How much work are you doing with your spreadsheet to make the data collection (e.g. macros etc.) more painless for you? I'd be curious to know to help me gauge what level of effort I want to put into the automation.
wtbarron:
I'm a novice at all this so you'll have to bear with me ! However I'm tech savvy enough to try the coding to get the parsing to work though. Can you give me some advice as to how I would go about doing that? I'm comfortable coding in PHP, have a tad of experience with JS, and know how to manipulate mysql DB's. What tools and techniques would you suggest for parsing data from websites? Thanks so much! |
|
Back to top |
|
|
wtbarron
Joined: 10 Feb 2005 Posts: 12
|
Posted: Fri Mar 18, 2005 6:28 pm Post subject: |
|
|
Tango wrote: | wtbarron:
I'm a novice at all this so you'll have to bear with me ! However I'm tech savvy enough to try the coding to get the parsing to work though. Can you give me some advice as to how I would go about doing that? I'm comfortable coding in PHP, have a tad of experience with JS, and know how to manipulate mysql DB's. What tools and techniques would you suggest for parsing data from websites? Thanks so much! |
The approach I use goes like this:
1. Download a team's schedule document (one of these).
2. Parse it for a home box score URL (if you do home and visiting, you'll end up doing every box score twice).
3. Download the box score document (one of these).
4. Parse it for stats. I assign each player's stats to an instance of a class that acts as a stat-holder object, which eventually gets serialized to persistence. You could just send it straight to MySQL.
5. Repeat from 2 until the end of the document.
6. Repeat from 1 until you've done every team.
The parsing, of course, is the hard part. Search the document string for the unique bits of HTML that mark the data you want. Then use your HTML "markers" as points of reference to extract substrings containing the data.
Actually, if all you need is what's on your charts, I think you can get by without even bothering with 3 and 4. The outcomes of the games are on the schedule documents themselves. This would also keep you from having to cope with the vagaries of their box scores (missing player names, changing formats, and even completely missing box scores).
I've never written any PHP, so I don't know what sort of string manipulation or HTTP socket functionality it has, if any. My JavaScript is rusty, but if I remember correctly, I think it can do all of the above. (Perl would be perfect for this. I used Java because I had already used it to write about half of the code I needed.) An HTML or XML editor would be helpful for choosing the right HTML markers. |
|
Back to top |
|
|
Tango
Joined: 18 Mar 2005 Posts: 24
|
Posted: Sat Mar 19, 2005 5:13 pm Post subject: |
|
|
wtbarron:
Thanks for the excellent tips!
Here's a tutorial for any interested in coding in PHP to do this taking wtbarron's steps and breaking it out more in a bit more detail.
http://www.jjwdesign.com/data_mining_functions.html
I just tried the snoopy.php.class created by someone to do the fetching (step 1 listed by wtbarron) and it works like a charm. Now working on how to format the output for parsing. |
|
Back to top |
|
|
wtbarron
Joined: 10 Feb 2005 Posts: 12
|
Posted: Sat Mar 19, 2005 7:43 pm Post subject: |
|
|
My pleasure, Tango. Great idea, finding that PHP project. |
|
Back to top |
|
|
apparition312
Joined: 16 Feb 2005 Posts: 6 Location: Chicago, IL
|
Posted: Sun Mar 20, 2005 10:22 am Post subject: |
|
|
I've done this for a fantasy basketball site I'm working on. I use ESPNs box scores. First I set up a MySQL database with several tables: games, standings, playerstats, teamstats and players.
Next, I populated the player info by converting Patricia's rosters into a CSV file, and populated the games info from the daily schedule on NBA.com.
Then I wrote a script that takes each box score and matches it to a game in the games table. Then it goes through each stat line and adds the info to the playerstats table. It takes the totals for each team and adds them to the teamstats table.
To get the standings, I start with 1 entry for each team that includes all zeroes and, based on what happened in the game being parsed, increment the values accordingly.
All I need to do is download each day's HTML boxscores and upload each one into my script. (I could probably make it so I can do all of them at once, but I like seeing the results of each game as I upload them.)
Once all the data is in the database, I can make scripts to do pretty much whatever I want - standings, box scores, player profiles, league leaders, stat querys, etc.
If you want more info about setting up the tables or parsing the stats, let me know. |
|
Back to top |
|
|
Tango
Joined: 18 Mar 2005 Posts: 24
|
Posted: Mon Mar 21, 2005 10:29 am Post subject: |
|
|
apparition312:
Thanks for the tips. I could use help with the code for parsing the data. My current understanding is to take the following approach:
(1) find the start and end "markers" I'm looking for in the html
(2) serialize the text between the start & end into a string
(3) go through the string and break it down into the parts I'm looking for and store them into an array
(4) write the array into the DB
Does this sound about right? I'm now looking for the appropriate PHP functions to use for all this - especially for #1 and #2. |
|
Back to top |
|
|
apparition312
Joined: 16 Feb 2005 Posts: 6 Location: Chicago, IL
|
Posted: Mon Mar 21, 2005 11:43 am Post subject: |
|
|
Tango, what I do is look through the HTML source of the boxscore to find unique identifiers for the info I'm looking for. I like ESPNs boxscores because their code is pretty easy to parse and they use first and last names in the boxscores which makes matching stat lines to the player database easier.
Basically I parse the HTML for the game date and home and away teams to match to the games database (and get a corresponding game id). Then I go through each stat line and grab the stats. The method is pretty much the same for each.
I start with a pattern - the away team, for example - then use preg_match() to find the pattern within the HTML ($data) and store the value in an array called $away_info.
Code: |
$pattern = "/<div id=\"awayTeamName\" align=\"center\">([A-Za-z ]+).*<\/div>/";
preg_match($pattern,$data,$away_info);
|
When there are multiple matches of your pattern and your pattern contains multiple values, as in each player's stat line, your the info will be stored in a multi-dimensional array.
$stats[0][0] will be the first value in the first player's stat line, $stats[1][0] will be the second value, etc. You can use a for statement to go through each match and store each stat value in your database.
There's a lot more to it, but that's basically how I go about parsing the stats. |
|
Back to top |
|
|
|