This is Google's cache of viewtopic.php?t=148&sid=9675970af1d23112fcfd1a3fbd4e3e96. It is a snapshot of the page as it appeared on Mar 31, 2011 18:40:03 GMT. The current page could have changed in the meantime. Learn more

Text-only version
These search terms are highlighted: ed küpfer  
APBRmetrics :: View topic - Feeds for daily NBA standings, schedules, boxscores?
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Feeds for daily NBA standings, schedules, boxscores?

 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
Tango



Joined: 18 Mar 2005
Posts: 24

PostPosted: Fri Mar 18, 2005 12:53 pm    Post subject: Feeds for daily NBA standings, schedules, boxscores? Reply with quote

Hi all:

I've been putting together the following types of graphs for wins/loss trending analysis like the following...






I'm trying to automate the collection of the data so that it's easier to create these graphs. Currently it's a pretty manual process of copying and pasting standings & team schedules+outcomes to do this.

What sources of daily standings and schedules do you all recommend for doing this?

I did visit Doug Steele's site so i could simplify the process some by taking the daily scores and loading them into a flat file for crunching. I was looking for a little more sophistication to do this.

I would appreciate everyone's thoughts on the matter. Thanks!
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 785
Location: Toronto

PostPosted: Fri Mar 18, 2005 2:43 pm    Post subject: Re: Feeds for daily NBA standings, schedules, boxscores? Reply with quote

Tango wrote:
Hi all:
Hey Tango.

Tango wrote:
I did visit Doug Steele's site so i could simplify the process some by taking the daily scores and loading them into a flat file for crunching. I was looking for a little more sophistication to do this.

I have a spreadsheet that does all that. Since you're only looking for day-to-day wins and losses, why not use Doug Steele's Game Results page?
_________________
ed
Back to top
View user's profile Send private message Send e-mail
wtbarron



Joined: 10 Feb 2005
Posts: 12

PostPosted: Fri Mar 18, 2005 3:38 pm    Post subject: Reply with quote

If you're a coder, you could do what I do and parse everything straight out of NBA.com.
Back to top
View user's profile Send private message
Tango



Joined: 18 Mar 2005
Posts: 24

PostPosted: Fri Mar 18, 2005 4:24 pm    Post subject: Reply with quote

Hi Ed:

Thanks for the suggestion. I was referring to Doug's Game Results page as well. I could certainly do a "save as" of his page into a txt file and then use Excel or Access to clean up & format the data as I need it. That saves a bit of work for sure! However I was looking at doing something with a little more automation- something more along the lines of what wtbarron is suggesting.

How much work are you doing with your spreadsheet to make the data collection (e.g. macros etc.) more painless for you? I'd be curious to know to help me gauge what level of effort I want to put into the automation.

wtbarron:

I'm a novice at all this so you'll have to bear with me Smile! However I'm tech savvy enough to try the coding to get the parsing to work though. Can you give me some advice as to how I would go about doing that? I'm comfortable coding in PHP, have a tad of experience with JS, and know how to manipulate mysql DB's. What tools and techniques would you suggest for parsing data from websites? Thanks so much!
Back to top
View user's profile Send private message
wtbarron



Joined: 10 Feb 2005
Posts: 12

PostPosted: Fri Mar 18, 2005 6:28 pm    Post subject: Reply with quote

Tango wrote:
wtbarron:

I'm a novice at all this so you'll have to bear with me Smile! However I'm tech savvy enough to try the coding to get the parsing to work though. Can you give me some advice as to how I would go about doing that? I'm comfortable coding in PHP, have a tad of experience with JS, and know how to manipulate mysql DB's. What tools and techniques would you suggest for parsing data from websites? Thanks so much!

The approach I use goes like this:
    1. Download a team's schedule document (one of these).
    2. Parse it for a home box score URL (if you do home and visiting, you'll end up doing every box score twice).
    3. Download the box score document (one of these).
    4. Parse it for stats. I assign each player's stats to an instance of a class that acts as a stat-holder object, which eventually gets serialized to persistence. You could just send it straight to MySQL.
    5. Repeat from 2 until the end of the document.
    6. Repeat from 1 until you've done every team.
The parsing, of course, is the hard part. Search the document string for the unique bits of HTML that mark the data you want. Then use your HTML "markers" as points of reference to extract substrings containing the data.

Actually, if all you need is what's on your charts, I think you can get by without even bothering with 3 and 4. The outcomes of the games are on the schedule documents themselves. This would also keep you from having to cope with the vagaries of their box scores (missing player names, changing formats, and even completely missing box scores).

I've never written any PHP, so I don't know what sort of string manipulation or HTTP socket functionality it has, if any. My JavaScript is rusty, but if I remember correctly, I think it can do all of the above. (Perl would be perfect for this. I used Java because I had already used it to write about half of the code I needed.) An HTML or XML editor would be helpful for choosing the right HTML markers.
Back to top
View user's profile Send private message
Tango



Joined: 18 Mar 2005
Posts: 24

PostPosted: Sat Mar 19, 2005 5:13 pm    Post subject: Reply with quote

wtbarron:

Thanks for the excellent tips!

Here's a tutorial for any interested in coding in PHP to do this taking wtbarron's steps and breaking it out more in a bit more detail.
http://www.jjwdesign.com/data_mining_functions.html

I just tried the snoopy.php.class created by someone to do the fetching (step 1 listed by wtbarron) and it works like a charm. Now working on how to format the output for parsing.
Back to top
View user's profile Send private message
wtbarron



Joined: 10 Feb 2005
Posts: 12

PostPosted: Sat Mar 19, 2005 7:43 pm    Post subject: Reply with quote

My pleasure, Tango. Great idea, finding that PHP project. Wink
Back to top
View user's profile Send private message
apparition312



Joined: 16 Feb 2005
Posts: 6
Location: Chicago, IL

PostPosted: Sun Mar 20, 2005 10:22 am    Post subject: Reply with quote

I've done this for a fantasy basketball site I'm working on. I use ESPNs box scores. First I set up a MySQL database with several tables: games, standings, playerstats, teamstats and players.

Next, I populated the player info by converting Patricia's rosters into a CSV file, and populated the games info from the daily schedule on NBA.com.

Then I wrote a script that takes each box score and matches it to a game in the games table. Then it goes through each stat line and adds the info to the playerstats table. It takes the totals for each team and adds them to the teamstats table.

To get the standings, I start with 1 entry for each team that includes all zeroes and, based on what happened in the game being parsed, increment the values accordingly.

All I need to do is download each day's HTML boxscores and upload each one into my script. (I could probably make it so I can do all of them at once, but I like seeing the results of each game as I upload them.)

Once all the data is in the database, I can make scripts to do pretty much whatever I want - standings, box scores, player profiles, league leaders, stat querys, etc.

If you want more info about setting up the tables or parsing the stats, let me know.
Back to top
View user's profile Send private message
Tango



Joined: 18 Mar 2005
Posts: 24

PostPosted: Mon Mar 21, 2005 10:29 am    Post subject: Reply with quote

apparition312:

Thanks for the tips. I could use help with the code for parsing the data. My current understanding is to take the following approach:

(1) find the start and end "markers" I'm looking for in the html
(2) serialize the text between the start & end into a string
(3) go through the string and break it down into the parts I'm looking for and store them into an array
(4) write the array into the DB

Does this sound about right? I'm now looking for the appropriate PHP functions to use for all this - especially for #1 and #2.
Back to top
View user's profile Send private message
apparition312



Joined: 16 Feb 2005
Posts: 6
Location: Chicago, IL

PostPosted: Mon Mar 21, 2005 11:43 am    Post subject: Reply with quote

Tango, what I do is look through the HTML source of the boxscore to find unique identifiers for the info I'm looking for. I like ESPNs boxscores because their code is pretty easy to parse and they use first and last names in the boxscores which makes matching stat lines to the player database easier.

Basically I parse the HTML for the game date and home and away teams to match to the games database (and get a corresponding game id). Then I go through each stat line and grab the stats. The method is pretty much the same for each.

I start with a pattern - the away team, for example - then use preg_match() to find the pattern within the HTML ($data) and store the value in an array called $away_info.

Code:

$pattern = "/<div id=\"awayTeamName\" align=\"center\">([A-Za-z ]+).*<\/div>/";
preg_match($pattern,$data,$away_info);


When there are multiple matches of your pattern and your pattern contains multiple values, as in each player's stat line, your the info will be stored in a multi-dimensional array.

$stats[0][0] will be the first value in the first player's stat line, $stats[1][0] will be the second value, etc. You can use a for statement to go through each match and store each stat value in your database.

There's a lot more to it, but that's basically how I go about parsing the stats.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group