Penn State's Schedule of Courses -- un-"dapped" potential
Daehee Park recently wrote about scaping Penn State's Schedule of Courses
to create a free API of course data. His method of choice was to use Beautiful Soup for Python. As I have yet to learn Python, I will be taking a different approach towards the same end: Dapper
. Dapper bills itself as a tool for creating an API for any website
--hopefully this includes schedule.psu.edu.
To use Dapper, you must "train" it how to scrape your desired data. It supposedly works best on templated pages. You simply supply the "Dapp factory" tool with several urls of pages that use the same template. Then you highlight the desired data to scrape from each url. This process is very hit-or-miss. Selecting one piece of data may cause other data to be sporadically selected. Sometimes this data is what you wanted (in which case you may find yourself praising its algorithm as genius) and sometimes the data isn't quite what you had in mind (in which case you'll be cursing the algorithm for wasting so much of your time). Ilan Flint, of Dapper Support wrote
Dapper's method of action is finding an algorithm that will "understand" , in machine terms, what is the content that you wanted to choose.
The purpose and meaning of each element in a web page, something that is very intuitively understood by you, is completely incoherent to a computer, and that's the gap that Dapper bridges.
Once your "dapp" has been created, you can choose from a wide variety of output formats including xml, rss, iCal, Google Maps, and html to name a few. There is even a feature where you can "link" the output of one dapp to be the input of another dapp.
While Dapper has huge potential, it can be very unrefined at times. It has been slow/crashing more than half the time I've been using it. I found that it works best using the Opera browser instead of the more mainstream Firefox and Safari. The folks over at Dapper are aware of these issues and are working on them. This is what Flint wrote to me in a support email reply to a question about the site's performance issues:
What you experienced was a temporary issue - one that hopefully
will not repeat itself.
Dapper is aware of these issues popping-up every now and then,
and is constantly working hard to improve the stability and reliability.
You've still gotta give the Dapper people credit. Their interface uses ajax--something which can be prone to crashing and is still full of compatibility issues. I think they will continue to face an uphill battle when it comes to ensuring a smooth user experience.
Nevertheless, I can see many neat possibilities using the Penn State Schedule of Courses and Dapper. Here are some that sprang to mind:
- an rss feed for the number of seat openings left in a given course
- a Google Maps mashup of a student's course schedule, showing schedule possibilities that require the least amount of walking (or the most amount of walking if you'd like to stave off those freshmen 15)
- an iCal or Google Calendar mashup in which a student's schedule can be easily imported into one of those formats
I'll be trying to bring these ideas to fruition, but feel free to beat me to it.
In the meantime, you can play with a flash widget of a dapp I made. Simply type in a semester, campus, and department and the widget will return a list of course numbers. Don't know where to begin? Try typing "fall" for semester, "up" for campus, and "pl_sc" for department. There may be some errors and I make no claims about the accuracy of the widget.
18 July 2008