Subletting in Waterloo
I'm currently a student at the University of Waterloo, and this summer I was faced with the same issue that has, at one point, plagued every student at the University of Waterloo: finding a 4-month fall sublet. Nobody wants to commit to a full year because of the coop program, all the new students make the leasing and subletting market even tighter; resulting in it being almost impossible to find a 4-month sublet. The market for sublets, which exists almost entirely as posts on a Facebook group, is competitive to the point that if you see a post for someone selling their place for 4 months in the fall, and it's more than 15 minutes ago, don't bother messaging them. It's already taken. So how, besides sheer luck, am I to find a place to stay for 4 months? I decided to apply data analytics method I've been utilizing for years: creating a Facebook shell app to acquire and analyze posts on a page.
Getting Data From Facebook
Facebook’s Graph API is clunky. It’s hard to find what you're looking for, especially when that happens to be something it's not intended to be used for. The kind of unrestricted data that includes all publicly made posts that SRM companies sell to other companies like McDonald’s to find people who have posted about them, is NOT, under ANY circumstances, available to regular users. The kind of data available to users, in an unintended way, is posts made on groups and pages that you, the individual user, have access to. Here’s how to get it:
- Identify The Group ID This is usually easy to find, generally it is in the link of the group, something like https://www.facebook.com/groups/110354088989367/ with group 110354088989367. Otherwise, a google search for something that identifies the group id of a Facebook link works as well.
- Create a "Shell App" This isn't as hard as you might think. We'll only need the skeleton of an app. Go to https://developers.facebook.com/docs/apps/register and follow the directions. This shouldn't take more than 15 minutes
- Get a Regular User Token Go to the graph API explorer here https://developers.facebook.com/tools/explorer/ and get a User access token. Make sure to give yourself all the permissions for everything. You have just created an app, made yourself the only user, and given the app permissions to look at posts on groups and pages you’re in. Also, hold on to this URL, it's a useful tool for debugging throughout the process.
- Get Extended User Token Use this URL https://developers.facebook.com/tools/accesstoken/, click on the "get extended token." Should last about a month.
After a while your app gains the ability to extend the user token in perpetuity. Unfortunately, Facebook changes their API versions frequently, so watch out for depreciated functionality. I'm using version 2.10 as of this write up.
Getting the data is as easy as making a URL request. For a more complete list of the optional arguments, look at the Facebook Graph API documentation, and using them in the FB graph API explorer. If something isn't returning what it's supposed to, try using a different type of token; that was the most common issue I ran into.
Processing The Data
Using the method above to get data from the housing post, I implemented different techniques to guess what kind of posts they were.
- Name (of poster), Date, and Post Message These are acquired from the post itself. these details are provided from the query to the Facebook page.
- Term The term (as in which school term) easily calculated from date
- Gender Is this a female only unit? Use RegEx's to look for hook terms like "Female Only" and its variants
- Period, Location, Price The location of the unit, price, and period of sublet (or rent), 4,8 or 12 months. All of these are calculated by looking at a series of RegEx's and determining what makes the most sense (all the code is on GitHub)
- Action This is the hard one. I tried to determine whether the post was someone trying to sell their unit or buy a unit. I had three methods to achieve this:
- Using pythons NLTK and sklearn to brake expressions into parts and teach the computer to predict 'buying' or 'selling' or 'neither' for each of them. worked 40-57% of the time
- Running the message against a series of RegEx's, creating a matrix of 1's and 0's of the expressions based on whether or not they were found in the message. Then creating a decision tree using a Radon Forest to decide for new posts weather or not it’s a 'buying', 'selling', or 'neither' type post. Accurate 55-60% of the time.
- Using just the RegEx's, classifying each RegEx as a buying or selling RegEx, then matching each one against the message. Then, for each message, if there’s more buying then selling matches, labeling the post as buying, same for selling. If none are matched, it labels the post as neither. Worked 75-80% of the time.
I ended up using the last of the action algorithms. The problem with the others wasn't the methodology, but that 900 training data points isn't enough to teach a computer how to identify language patterns of this complexity. I put the data in a SQLite database, included on GitHub. I fetched data between every 5-18 seconds, checking if there were any new posts, and appropriately categorizing them. I set up the code to run for an hour, then, using a raspberry pi, I executed the code each hour from 5AM to 2AM using a chron job.
The problem I have with the waterloo housing Facebook group is that only 2.7% of the posts are students selling their place for 4 months in the fall. However, using the data analysis method described above, and the constant monitoring of the Facebook page, I am able to identify approximately 77% of the 4 month sublets posted within ~5 seconds of them being posted. Twilio, as a service, allows you to buy a phone number and use it, programmatically, through python or any other language. I set up a free account here (this whole part only took around 10 minutes) and hooked it up to my python method for going through the Facebook data. After setting it up on my raspberry pi, I was able to receive a text message to my phone, with the link to the post and its details like price and location, ~10 seconds after someone posted a sale of their place for 4 months, for 77% of the posts. Occasionally, the algorithm miss-identifies posts that are not 4-month sales as 4-month sales, but more times than not it got it right, making this a very minor inconvenience. I was very pleased with the outcome; I'm able to see and respond to the most valuable offers before the user has ever switched tabs, giving myself a huge advantage on the best places to live. This need not be merely an application for Facebook posts for UW students, this could be a way to have good offers in any city through any website categorized and sent strait to your phone as soon as they arrive.