SOCRATES Documentation

Development Overview

Setting up SOCRATES locally

Install the dependencies:

Python (we are using version 1.7.3)
MongoDB (we are using version 2.4.9)
PRAW https://praw.readthedocs.org/en/latest/
TextBlob http://textblob.readthedocs.org/en/latest/install.html
pymongo http://api.mongodb.org/python/current/installation.html
Flask http://flask.pocoo.org/docs/installation/

Afterwards, download the source from GitHub at https://github.com/kevinAlbs/Socrates

The directories containing the code are front-end and back-end. Put the directory named front-end somewhere in your web root (i.e. htdocs or public_html). For example, my web root is located in my home directory in public_html in /home/kevin/public_html. So I created a subdirectory named socrates and moved the front-end directory there. The back-end directory can remain elsewhere.

At this point you should have the following

The dependencies listed above installed
The front-end directory located in your local web server's root

Now before you navigate to the front-end, you must run the back-end Flask server to receive requests from the front-end. The default settings in the back-end may be sufficient, but depending on your installation of MongoDB you may have to change the connection code.
Under back-end/wsgi.py you should see the line:

client = MongoClient()

This can be changed according to the pymongo docs to include any credentials you have (if any).

Now try running the back-end Flask server with the following (in the back-end directory)

python wsgi.py

Once it is running, try navigating to the front-end directory in your web browser.

NOTE: If you are using Chrome, try using Incognito. Due to the way Chrome caches requests, the AJAX requests from the front-end sometimes take 10-15 seconds if you are not using Incognito.

Collection, Analysis, and Exploration

These are the main three components of SOCRATES. Both Analysis and Collection are completely handled in the back-end while Exploration (visualization) is handled in the front-end (with D3.js)

Collection

Simply enough, collection requests utilize social media API's to fetch data. This data is assumed to be in a list (a list of posts, friends, etc.).

Analysis

An analysis can be performed on collection data and other analysis data. An analysis can output two kinds of data: entry-data and aggregate-data.

Entry-Data

Entry-Data consists of a list of equal length as the collection data. This data corresponds directly with each individual entry in the collection data. For example, suppose our collection data consists of a list of tweets. We can do a word count analysis on the content of the tweets. This will return word counts for each individual tweet.

Aggregate-Data

Aggregate-Data consists of single valued data. For example, if I want to get statistics on the followers of tweets, I would perform a statistics analysis on the collection of tweets. This would return aggregate-data describing the average, standard deviation, etc. Notice, these are single values (as opposed to the entry data corresponding to each individual tweet).

Back-End

The back-end is written in Python and data is managed with MongoDB. The main tasks of the back-end are to collect data from social media sources (storing the resulting data in MongoDB) and perform analyses on that data. Because most API's return JSON and MongoDB works well with JSON all data sets are in JSON.

The back-end is responsible for:

Storing and retreiving working sets
Validation of parameters sent from user
Processing requests for analysis functions
Processing requests for exploration (visualization) functions

What is a Working Set?

The working set is the user's primary data storage. It is a single JSON object containing the data results for collection, analysis (possibly many), visualization (coming soon), metadata of each of those components, and additional non-essential data such as a reference to the database entry in MongoDB.

{
	data: [
		//collection data, this is an array of separate items from the collection source. 
		//E.g. this could be an array of 200 tweets from Twitter.
	]
	meta: {
		//meta contains information regarding the collection data
	}

	analysis: [
		//Notice, this is an array since you can do multiple analyses
		{
			//first analysis data
			//as described above, aggregate data is single valued
			aggregate_analysis: {}, 
			entry_analysis: {}, //entry data corresponds directly to each entry
			aggregate_meta: {}, //metadata describing data of aggregate
			entry_meta: {}
		},
		{
			//second analysis data
		}
		//and so on...
	]
}

Design Decisions for Working Sets

As a note, this section can be skipped without any loss in understanding.

A working set can only contain data from ONE collection source. This is to avoid the following situation. Say you collect data from Twitter and Youtube from the same user. Twitter may return 100 tweets and Youtube may return 200 entries on video data. If I want to analyze say the difference in followers from the tweets against the number of likes in the video data (an analysis). Since there are 100 more Youtube entries than Twitter, what happens to the 100 extra entries? Analysis on two separate collection data sets doesn't make sense unless both data sets have exactly the same number of entries. Therefore, as of now, we decided to limit a working set to only one collection data set at a time.

However, we do think that visualization of multiple data sets is useful. (E.g. you could see a scatterplot of likes vs. views on multiple Youtube data sets of differing length). Therefore, we are planning to allow Visualization functions to import data from separate working sets