Below is a snapshot of the Web page as it appeared on 4/7/2011 (the last time our crawler visited it). This is the version of the page that was used for ranking your search results. The page may have changed since we last cached it. To see what might have changed (without the highlights), go to the current page.
Bing is not responsible for the content of this page.
APBRmetrics :: View topic - Box Scores Project
APBRmetrics Forum Index APBRmetrics
The statistical revolution will not be televised.
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Box Scores Project
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
 
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion
View previous topic :: View next topic  
Author Message
mateo82



Joined: 06 Aug 2005
Posts: 211

PostPosted: Mon May 21, 2007 11:14 pm    Post subject: Reply with quote

I tried it on my home computer using ocrad and gocr and it was indeed too noisy. I have Adobe Professional at work and it has a fairly good ocr if I recall, i'll try it there tomorrow.
Back to top
View user's profile Send private message
HeatherA



Joined: 03 Aug 2006
Posts: 55

PostPosted: Tue May 22, 2007 10:51 am    Post subject: Reply with quote

PaulG and I would both be happy to contribute our time to this effort if you decide to launch it.
Back to top
View user's profile Send private message
Ed Küpfer



Joined: 30 Dec 2004
Posts: 787
Location: Toronto

PostPosted: Tue May 22, 2007 11:32 am    Post subject: Reply with quote

More notes: the PoR website has a cap on how many PDF downloads one user can make during a 24-hour period. What this means is that you should try to find the page with the box scores selectively, without scrolling through an entire issue of Sporting News. Unfortunately, TSN doesn't have a table of contents, so there's a lot of flipping through pages to find the boxes.

Here are the 3 stepts for inputting the box scores:
1. Track down the issue/page number of the boxscores in TSN.
2. Download that particular page.
3. Input the data into some spreadsheet.

#1 is a pain. If someone can find a search term that narrows down the hits, that would be good. Because the pdf sources are large (~500kb), hunting for the right pages is a challenge. What I propose is to split up the effort a little: I am in the process of downloading onto my hard drive the pages with the boxscores from each issue. Later, me or someone else can input the data from these pages -- I can email them to anyone who wants to contribute an hour or two.

But I think that because of the download restrictions, anyone who downloads a TSN page with boxscores should take care to save a copy to their HD (I've been saving each file with an issue date-page number format to keep track).
_________________
ed
Back to top
View user's profile Send private message Send e-mail
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Tue May 22, 2007 11:47 am    Post subject: Reply with quote

Ed Küpfer wrote:
Unfortunately, TSN doesn't have a table of contents, so there's a lot of flipping through pages to find the boxes.


There's usually a table of contents somewhere in the first 4-5 pages of each issue (at least for the ones I have seen).

Ed Küpfer wrote:
#1 is a pain. If someone can find a search term that narrows down the hits, that would be good. Because the pdf sources are large (~500kb), hunting for the right pages is a challenge. What I propose is to split up the effort a little: I am in the process of downloading onto my hard drive the pages with the boxscores from each issue. Later, me or someone else can input the data from these pages -- I can email them to anyone who wants to contribute an hour or two.


First, what season are you working on? I just gave that 1979-80 box score as an example, not intending for that to be our test season. I actually started entering the 1969-70 NBA season yesterday and I am about 20 games in. I chose this as a test season for two reasons: (1) fewer teams, which means fewer games to enter and (2) the ABA box scores are also there, so I'll have all the files I need when I want to enter those.

My process for finding the box scores has been to set the date to the issue date and leave the search field blank. The first result should be the first page of that issue. I then look through the first few pages for the table of contents, and that helps me find the box scores.

Ed Küpfer wrote:
But I think that because of the download restrictions, anyone who downloads a TSN page with boxscores should take care to save a copy to their HD (I've been saving each file with an issue date-page number format to keep track).


Yes, saving copies of the PDFs is a great idea. Ed, can you give me an example of a file name that you are using? I just want to understand exactly how you are naming the files.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ed Küpfer



Joined: 30 Dec 2004
Posts: 787
Location: Toronto

PostPosted: Tue May 22, 2007 11:59 am    Post subject: Reply with quote

jkubatko wrote:
First, what season are you working on? I just gave that 1979-80 box score as an example, not intending for that to be our test season. I actually started entering the 1969-70 NBA season yesterday and I am about 20 games in. I chose this as a test season for two reasons: (1) fewer teams, which means fewer games to enter and (2) the ABA box scores are also there, so I'll have all the files I need when I want to enter those.


I'm open. I spent the day yesterday exploring the possibilities of the PoR site. I can see that navigating it will be an obstical to many people, which is why I thought that having some people dedicated to finding and saving pdfs means the people who just want to enter data can do it with less fuss.


Quote:
Ed, can you give me an example of a file name that you are using? I just want to understand exactly how you are naming the files.


yyyy-mm-dd.PageNum.pdf
1979-10-27.57.pdf = Oct 27, 1979, page 57.
_________________
ed
Back to top
View user's profile Send private message Send e-mail
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Tue May 22, 2007 12:02 pm    Post subject: Reply with quote

Right now we have Ed, Ryan, Heather, Paul, Mateo (yes?), and myself as people who are interested in trying this out on a test season. In the 1969-70 season there were 7*82 = 574 regular season games, so that would mean about 100 games each. It takes me about 3 minutes to enter a game by hand, so that's about 5-6 hours of work. Here is the first game in my 1969-70 file:

Code:

"date","lgID","teamID","oppID","name","FG","FT","FTA","DQ"
10/14/1969,"NBA","SEA","NYK","Allen",3,0,0,
10/14/1969,"NBA","SEA","NYK","Boozer",3,4,4,
10/14/1969,"NBA","SEA","NYK","Clemens",2,2,2,
10/14/1969,"NBA","SEA","NYK","Harris",4,1,1,
10/14/1969,"NBA","SEA","NYK","Meschery",0,2,3,
10/14/1969,"NBA","SEA","NYK","Mueller",1,0,2,
10/14/1969,"NBA","SEA","NYK","Murrey",0,0,0,
10/14/1969,"NBA","SEA","NYK","Rule",11,5,7,
10/14/1969,"NBA","SEA","NYK","Thorn",1,1,2,
10/14/1969,"NBA","SEA","NYK","Tresvant",4,10,12,
10/14/1969,"NBA","SEA","NYK","Wilkens",3,4,7,
10/14/1969,"NBA","SEA","NYK","Winfield",3,2,2,
10/14/1969,"NBA","NYK","SEA","Barnett",10,2,4,
10/14/1969,"NBA","NYK","SEA","Bowman",0,0,0,
10/14/1969,"NBA","NYK","SEA","Bradley",5,2,2,
10/14/1969,"NBA","NYK","SEA","DeBusscherre",6,1,3,
10/14/1969,"NBA","NYK","SEA","Frazier",5,6,9,
10/14/1969,"NBA","NYK","SEA","Hosket",0,0,0,
10/14/1969,"NBA","NYK","SEA","May",0,3,3,
10/14/1969,"NBA","NYK","SEA","Reed",14,0,0,
10/14/1969,"NBA","NYK","SEA","Riordan",5,2,2,
10/14/1969,"NBA","NYK","SEA","Russell",4,0,0,
10/14/1969,"NBA","NYK","SEA","Stallworth",4,0,0,
10/14/1969,"NBA","NYK","SEA","Warren",1,2,2,


No need to include fields for rebounds, etc., at the moment because they are not listed in the box scores; at the end I'll just add them in as null fields. After we get the season done, I'll put all of the files together and then match things to my database. That way we can link the names to my player ID system. I'll also do various QC checks, like making sure that player points equal team points for all games.

Should we start to split up the issues so we can get to work, or are there other things we need to discuss?
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
ziller



Joined: 30 Jun 2005
Posts: 126
Location: Sac Metro

PostPosted: Tue May 22, 2007 12:05 pm    Post subject: Reply with quote

I'd like to help out with this, Justin/Ed. Just add me to the list.
_________________
SactownRoyalty.com
tziller@gmail.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Tue May 22, 2007 12:05 pm    Post subject: Reply with quote

Ed Küpfer wrote:
yyyy-mm-dd.PageNum.pdf
1979-10-27.57.pdf = Oct 27, 1979, page 57.


Perfect. Let's use that naming scheme.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Tue May 22, 2007 12:11 pm    Post subject: Reply with quote

FYI, please don't start grabbing PDF files of the box scores yet, as we don't want people to be doing the same work. That would just be a waste of time and of PoR's bandwidth.

So we'll do the 1969-70 season. Items to address:

1) Get volunteers. If you're not sure you can do it for at least one test season, then don't volunteer. Let's wait another day or two and see if anyone else is interested.

2) Divvy up the issues. The 1969-70 issues have both NBA and ABA box scores, so we might as well grab the pages that have the ABA box scores, even though we'll just start with the NBA season.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Tue May 22, 2007 12:32 pm    Post subject: Reply with quote

Update: I just grabbed all of the NBA and ABA box score pages from the 1969 part of the 1969-70 season.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
mateo82



Joined: 06 Aug 2005
Posts: 211

PostPosted: Tue May 22, 2007 2:14 pm    Post subject: Reply with quote

Yes, I'm in.

I'm not sure I understand how you want this formatted though. I'm assuming B-R uses a mysql database, right? So, do you want just one big file with each line representing one players statline for a particular night, or do you want one file for each day or one file for each game?
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Tue May 22, 2007 2:20 pm    Post subject: Reply with quote

mateo82 wrote:
Yes, I'm in.

I'm not sure I understand how you want this formatted though. I'm assuming B-R uses a mysql database, right? So, do you want just one big file with each line representing one players statline for a particular night, or do you want one file for each day or one file for each game?


At the end it will be one big file with one stat line per player, per game.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ryan J. Parker



Joined: 23 Mar 2007
Posts: 711
Location: Raleigh, NC

PostPosted: Tue May 22, 2007 8:56 pm    Post subject: Reply with quote

Definitely don't want to duplicate efforts, so divide up seasons so we know where to focus our efforts.

Oh, and the naming convention sounds good.
Back to top
View user's profile Send private message Visit poster's website
94by50



Joined: 01 Jan 2006
Posts: 499
Location: Phoenix

PostPosted: Wed May 23, 2007 4:08 am    Post subject: Reply with quote

I've done plenty of data entry in the past. I'd love to help.
Back to top
View user's profile Send private message
jkubatko



Joined: 05 Jan 2005
Posts: 702
Location: Columbus, OH

PostPosted: Wed May 23, 2007 8:11 am    Post subject: Reply with quote

94by50 wrote:
I've done plenty of data entry in the past. I'd love to help.


Great. By my count that brings us up to 8:

Justin
Ed
Ryan
Heather
Paul
Mateo
Ziller
94by50

Let's wait until tomorrow to see if anyone else is interested in helping out. I have already entered the box scores from the first two issues of the 1969-70 season, and I have downloaded the pages with the box scores through January. Once again, so that we don't repeat the same work, you don't need to do anything until we have made assignments for the test season.
_________________
Regards,
Justin Kubatko
Basketball-Reference.com
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    APBRmetrics Forum Index -> General discussion All times are GMT - 5 Hours
Goto page Previous  1, 2, 3, 4, 5, 6, 7  Next
Page 2 of 7

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group