Steph's Adventure in the Woods 2: January 2014

Grad school while working full time

Before I went into grad school, I faced the big question: Is it possible to go through grad school while keeping one's full time dream job and being a parent?

Now that I am toward the end of the M.Sc., I can answer, Yup! It's possible!

The biggest challenge I have faced is that I did not have quite enough vacation days. I made it through the first 3 years no problem. I feel like I was able to complete those years (which was just doing course work) to the fullest of my potential. I worked hard, I got high marks, and am proud of that.

However, by the time my fourth year rolled around, I had a clash between my classes and paper publication work. I used a LOT of vacation during that time because there was so much activity compressed into a short period of time, not to mention unexpected needed time off due to lack of child care. During my fourth year of grad school, I burnt the remainder of my vacation right into the ground. Gone! Each month, I get a little more, but I burn it right away. It is stressful because I worry if something happens and I have to be away from work for any length of time, I'll get into trouble. I am very fortunate to have supervisors, both on the employer side and on the grad school side, who want me to complete my M.Sc.. If either side tried to push me out, I'd be a goner.

So, what would I say to a person who is thinking about going into grad school, but is facing similar worries as I?

I would say, if research is your dream, then follow your dream. DO IT! If the choice is grad school or no grad school, then pick grad school!

I worry that people who have watched me from afar would look at all my vacation problems and child care problems and then advise other potential students not to follow the same path I did. But I would shout back and say that I am so, so, so glad I made the choice I did. Sure, I may have missed some opportunities and didn't go to all the conferences I could have or submitted all the papers I could have or even learned all the things I could have in all my classes. But I did my best, and I STILL LEARNED A LOT. It was worth it.

I still have to write my thesis, mind you, and there is a lot of work left for me, and my vacation days continue to be in shortage. But I know it will be fine. I am doing it.

This is about the freedom to be able to experience grad school, to try it. If some external force says that a potential student shouldn't be allowed into the program because it is assumed that they will not have enough time.... then that would be truly sad. I am talking about social barriers, equality, and access to education.

I think that people should have the choice to be able to try grad school, and if they fail, to fail on their own terms and not by some external force judging it for them, i.e. discrimination ("you'll never make it anyway, so why should we even let you try!"). On the other hand, research institutions only want to support people who will be able to be productive researchers. Is it the university's responsibility to "give people a chance", for equality and fairness sake? Or, do universities have the right to deny an application to a student who doesn't appear to have enough free time to be able to keep up with classes and research?

Maybe universities should have some say in only accepting the "best" students. But, many people who have potential to be among "the best" are prevented from doing so if basic needs are not provided for, such access to daycare spots (AND ON WEEKENDS, TOO!). But there are not enough social support structures in existence to meet demand right now - where I live, getting a daycare spot is extremely competitive because there are so few spots available. This could be fixed with more government funding, but I don't know if anyone cares: it is just assumed that one parent will quit their job and take care of the kids, so why bother putting tax money into creating daycare spaces. (answer: for example, sometimes the parent wants to continue working outside the home! ) This is an artifact of the 1950s and earlier, where the woman was expected to stay at home and only the man would work. Only men were allowed to go to university and only men could vote, etc.. But nowdays we are more realistic: sometimes the man stays at home, or sometimes the woman is married to a woman, or a man to a man, single parent families exist, etc. etc. Society should support all kinds of families and this means providing enough daycare to meet demand.

Sigh, first world problems I guess. I am very, very lucky that I even have access to daycare and higher education at all!!

Reflections on developing a sequence recommender

Time for another research blog post! This is how I do open notebook science, right here. :)

(side note about Open Notebook Science: Okay, okay, this blog entry isn't the truest sense of Open Notebook Science, seeing as I'm not uploading my discarded datasets or source code. But, I don't think the value that anyone could get from my discarded stuff can be higher than what it would cost me to go to the trouble of cleaning it up and posting it.   I think the most valuable thing I can share are the stories of my experiences and the take-away lessons learned, hence this entry. That said, I do occasionally share milestone contributions such as this java applet from last year.)

Anyone who has followed my blog over the years will know that my research has been a quest to create a "thing", or a "guide" that can use all available resources on the WWW to create learning experiences, like little adventures you can follow for the purposes of learning a lesson or gaining practice or experience in a given area. It would be like a self-guided study, only with a little program to prepare fun experiences for you and push you futher than you might have otherwise gone on your own. I'm not trying to create lessons or learning adventures - that's the work of an instructional designer - but I am trying to create the thing that creates the learning adventures. So even though I work in a different field, I always need to listen carefully to what educators and instructional designers say, otherwise my work would be terribly uniformed.

Anyway, my approach to this throughout my M.Sc. experience has been to use my supervisor's Ecological Approach. This means I'm treating the WWW as a bank of learning objects and I can assume that each learner has an agent and there are a whole bunch of metadata in some structure (like an API) available to me. This EA metadata is like molding clay, or building blocks: it's what gets inputted and outputted by my "things" or "guides" that I'm designing. I first wrote about this in 2006 - at the time I was thinking of using RDF.   Since then, I have discovered simulation, so I can use any format I want and I don't have to worry about the actual implementation right now. (But I will have to, eventually!)

Last year, I developed an approach that leveraged Apache Mahout's recommender libraries. Most recommender sytems are used to recommend things like books or movies or some kind of product. I had twisted the system around so that the item to be recommended wasn't a book or movie, but rather a sequence of learning objects. That way, I could use already existing algorithms for a new purpose, that is, use collaborative filtering on sequences of things.

The biggest problem, predictably, is that when you create sequences of things, suddenly the number of "items" explodes. If you have just 40 learning objects, and even if you don't care about the ordering of the items in the sequence, that's still over 90 000 "items" if you take 4 at a time. (nCr, n=40, r=4. It's even bigger if you are true to the definition of "sequence" and you use nPr).   Having a large number of items can be problematic because it becomes time consuming for the recommender algorithm to churn through all the calculations.

So that is why I am writing this blog entry -- I'm in the middle of trying numerous approaches to address this. I have so many approaches on the go that I've decided to write them down.

For instance, I am experimenting with ways to fine-tune the Mahout settings for optimal performance. I have to make sure I'm using Mahout to the best of its ability so I can get a taste of what the current limits are for non-sequential item recommendations. (taste, hehehe. little private joke.)

On top of this, I am also experimenting with various approaches to collapse the 90 000 "items" (or astronomically higher numbers than 90 000!). For example, I can chop up the plane (discretize the dimension? what language to people use here? ) by varying the definition of "item"; basically I create clusters. For example I might allow two sequences of length 4 to be considered the SAME item if they have enough learning objects in common. I did this by creating a new "threshold" parameter in my simulation, where the threshold must be <= k.

But creating equivalence classes like this gets tricky because a given sequence might have overlapping possible clusters.   There are a lot of delightful machine learning / clustering approaches I could/should try here, if only I had some spare time!

So far, my strategies to tackle the exploding items problem have been:
1) tuning the engine for performance (ex. switching to hard files instead of database)
2) changing the definition of "item" by creating the threshold parameter.

Yet a third approach I have tried is by 3) pre-loading the simulation with a strategically generated synthetic dataset of starter ratings. Theoretically, a recommender algorithm should be able to run on a huge dataset if it has enough starter data to create appropriate comparisons across users or items. This has been fun and is helping my understanding of the inner workings of the algorithms, because I need to understand the algorithm in order to create a helpful dataset to "kick start" the engine.

Recently, my advisor suggested a fourth approach, which involves 4) cutting Mahout out of the picture and inserting our own algorithm.   I know this will have to be done eventually, but sometimes it seems too early to do it. I want to finish exploring 1), 2) and 3) before starting 4)! But the approach my advisor suggested is quite brilliant actually and could be far more efficient than Mahout's recommender libraries, because they weren't actually designed to do what I am trying to do.

So that is why I wrote this blog entry. I wanted to get all of this out of my head and commit it to "paper" so I can keep coming back to this as I push forward on 4) but will inevitably sidetrack on various pieces connected to 1), 2) and 3).

Wee!
Steph

I think society should have universal daycare

Good point!

http://blogs.reuters.com/reihan-salam/2014/01/03/universal-preschool-may-help-parents-more-than-children-and-thats-okay/

sweet Hibernate links

Earlier, I started a list of "sweet Hibernate links". Because I can no longer edit my old blog, I'm creating a new spot for it here.

http://hibernatedb.blogspot.ca/2009/05/automatic-reconnect-from-hibernate-to.html
http://www.hildeberto.com/2008/05/hibernate-and-jersey-conflict-on.html
InvalidStateException -- how to loop through error attributes to get more debugging info (thanks EE for finding this link!) http://stackoverflow.com/questions/4067920/what-would-cause-an-hibernate-invalidstateexception

Previous links

original post
links from original post, copied here:

http://docs.jboss.org/hibernate/orm/3.3/reference/en/html/tutorial.html#tutorial-firstapp
This one is JPA, not hibernate, but related:
http://openjpa.apache.org/builds/1.2.3/apache-openjpa/docs/jpa_overview_persistence.html

MySQL does not install on Windows

MySQL does not install on Windows. Can you believe it? I would have thought this is a common and reasonable combination of tools. But NOPE!

I have tried Windows Server 2008, Windows 2012, MySQL 5.1 and MySQL 5.5 and all combinations fail.

Finally, a co-worker informed me that this has been a problem for many years. Apparently it's related to the fact that the Windows directory name "Program Files" had recently added an "(x86)" at the end, so, "C:\Program Files (x86)\". Apparently the brace symbol in the directory name is responsible for mucking everything up. When the installer finishes and it goes to start the service for you, it says:

"Windows could not start the MySQL service on Local Computer. Error: 1053: The service did not respond to the start or control request in a timely fashion."

Unfortunately, the installer did not give me the option of choosing a different directory. Further, I checked and one time I managed to get the installer to go into "C:\Program Files" (without the (x86)) and it STILL failed. My colleague suggested it might be possible to change the path by directly editing the registry.

However, at this point, I was running out of time. I gave up on Windows and got myself a RedHat Linux server instead. It is working.

My sincerest sympathy to anyone who does not have this freedom and is forced to get MySQL installed on Windows. If I ever find a set of steps that explains how to fix this, I'll post the link here.

I just wish I had known about this "Known Issue" BEFORE I spent a day of my life trying to get it to work. :(

Steph's Adventure in the Woods 2

Pages