Code

//IM SURE THIS ISN'T THE MOST EFFICIENT WAY, BUT IT WORKS!

String url = "http://twitter.com/statuses/user_timeline/6490642.rss";
String[] strArray = loadStrings(url);
int count = 0; //determines how many blog postings there are
 
for (int i=10; i< strArray.length; i+=7){
  String myPost = strArray[i];
  myPost = myPost.trim();
  //println(myPost);
}

for (int i=10; i< strArray.length; i+=7){
  String myPost = strArray[i];
  
  //determine what the substring must be
  int start = myPost.indexOf(':');
  int end = myPost.lastIndexOf('<');
  
  //extract the substring and trim, just to be sure
  myPost = myPost.substring(start+2, end);
  myPost = myPost.trim();
  
  //save
  strArray[i] = myPost;
  count++;
}

String[] finalArray = new String[count];
count = 0;// resets count

for (int i=10; i< strArray.length; i+=7){
  String myPost = strArray[i];
  finalArray[count] = myPost;
  count++;
}

println(finalArray);

0701 - Data Scraping (was Quiz 070): Formerly Quiz 070

Statement:There were enough small issues with your solutions to Quiz 070, that I decided to turn it into a homework assignment instead. The good news is that you get "more time" to work on it, but the tradeoff is that you'll be expected to solve it thoroughly and correctly. Many of you will need to revise your Quiz solutions slightly. NOTE:: Owing to Java security restrictions, your code will not work as an applet. You are only expected to upload code for this assignment (no .jar files).

The code here (and also below) provides the beginnings of an algorithm for scraping my Twitter feed. Take the strings that are returned, and strip off the XML tags so that we get just the Twitter posts. This means you also have to strip out the user-name at the beginning of each posting (assume that Twitter user-names don't contain colons). Your solution should work with the RSS feed any Twitter user; a couple of different feeds are provided below.

To complete this assignment correctly, you must use the String methods documented in the Java String reference! Here's the complete list of things to make sure you accomplish in your solution:

  • You'll want to use the String.substring() method to extract the posted text from the returned information. The substring() method needs a start index and an ending index; some hints about how to obtain these can be found below.
  • You'll probably want to use the String.length() method as a means for computing where the posted text ends.
  • Your solution should work with feeds from different users of Twitter. Their usernames might have different lengths! Instead of hard-coding a starting index for the post (which will change depending on the username), use the String.indexOf() method instead to find the colon which demarcates the start of the posted text.
  • Process the data in two passes (that is, two separate for-loops). First, count up how many lines in the String array are actual Twitter posts. In the second pass through the data, stash your extracted Twitter posts into a second String array, which is sized to have the right number of elements.

// Make sure your code works with both URLs...
String url = "http://twitter.com/statuses/user_timeline/6490642.rss"; // user golan
String url2 = "http://twitter.com/statuses/user_timeline/15463062.rss";  // user wattenberg
String[] strArray = loadStrings(url);
 
for (int i=10; i < strArray.length; i+=7){
  String myPost = strArray[i];
  myPost = myPost.trim();
  println(myPost);
}

hide statement