How to install and run the analytics backend locally:
We have had some troubles getting people up and running locally with the analytics backend, so I wrote up a quick guide for installation. If you run into any undocumented issues or trouble, please try to document it here. These instructions were performed on a clean install of Ubuntu 14.04.
Clone the analytics repositories
1. Navigate to the folder you want to install into and git clone these repositories:
- edx/edx-analytics-data-api
- edx/edx-analytics-pipeline
- edx/edx-analytics-data-api-client
- edx/edx-analytics-dashboard
cd {PATH_TO_EDX_FOLDER}/analytics
cd {PATH_TO_EDX_FOLDER}/analytics
git clone https://github.com/edx/edx-analytics-pipeline.git
git clone https://github.com/edx/edx-analytics-data-api.git
git clone https://github.com/edx/edx-analytics-data-api-client.git
git clone https://github.com/edx/edx-analytics-dashboard.git
2. Create virtual environments in which to run the repositories
- It is best to create a separate virtual environment for each repository; otherwise, you may run into conflicts between their dependencies.
mkdir ~/.venvs
cd ~/.venvs
virtualenv edx-analytics-pipeline
virtualenv edx-analytics-data-api
virtualenv edx-analytics-data-api-client
virtualenv edx-analytics-dashboard
Install the dependencies
- You will need to activate and deactivate each virtualenv in turn
- Once you think the dependencies are installed, check them by running the repository's unit tests.
- If the unit tests complete successfully, you will see output of the form "Ran X tests in Ys OK"
1. Installing edx-analytics-pipeline:
cd {PATH_TO_EDX_FOLDER}/analytics
cd edx-analytics-pipeline/
source ~/.venvs/edx-analytics-pipeline/bin/activate
make requirements
make test
If this raises a NoAuthHandlerFound error from boto, run:
export AWS_ACCESS_KEY_ID="TESTACCESSKEY"
export AWS_SECRET_ACCESS_KEY="TESTSECRET"
make test
To run this in production, we need to supply actual AWS credentials to boto, but the test suite does not care if they are valid.
deactivate
source ~/.venvs/edx-analytics-data-api/bin/activate
2. Installing edx-analytics-data-api:
cd ../edx-analytics-data-api
make develop
./manage.py migrate --noinput
./manage.py migrate --noinput --database=analytics
./manage.py set_api_key edx edx
make validate
deactivate
3. Installing edx-analytics-data-api-client:
cd ../edx-analytics-data-api-client/
source ~/.venvs/edx-analytics-data-api-client/bin/activate
pip install -r requirements.txt
make test
deactivate
4. Installing edx-analytics-dashboard:
cd ../edx-analytics-dashboard/
source ~/.venvs/edx-analytics-dashboard/bin/activate
sudo apt-get update
sudo apt-get install gettext
sudo apt-get install npm
sudo apt-get install openjdk-7-jre
sudo apt-get install openjdk-7-jdk
sudo apt-get install libxml2-dev libxslt-dev python-dev zlib1g-dev
make develop
make validate
If this raises an OfflineGenerationError for missing compression keys, run:
./manage.py compress --settings=analytics_dashboard.settings.test
make validate
deactivate
Run pipeline task locally and verify its completion
1. Install MySQL locally and create a credentials file for the pipeline
sudo apt-get install mysql-server
mysql -u root -p
CREATE USER 'analytics'@'localhost' IDENTIFIED BY 'edx';
GRANT ALL PRIVILEGES ON * . * TO 'analytics'@'localhost';
FLUSH PRIVILEGES;
cd {PATH_TO_EDX_FOLDER}/analytics
vi mysql_creds
***BEGIN mysql_creds FILE***
{
"host": "127.0.0.1",
"port": "3306",
"username": "analytics",
"password": "edx",
"database": "analytics"
}
***END mysql_creds FILE***
cd edx-analytics-pipeline
vi override.cfg
***BEGIN override.cfg***
[database-export]
database = analytics
credentials = {PATH_TO_EDX_FOLDER}/analytics/mysql_creds
[database-import]
database = edxprod
destination = s3://<bucket for intermediate hadoop products>/intermediate/database-import
credentials = s3://<secrets bucket>/edxapp_prod_ro_mysql_creds
[event-logs]
expand_interval = 2 days
pattern = .*tracking.log-(?P<date>[0-9]+).*
source = s3://<bucket to where all tracking logs are synched>/tracking/
[hive]
warehouse_path = s3://<bucket for intermediate hadoop products>/warehouse/hive/
[manifest]
path = s3://<bucket for intermediate hadoop products>/user-activity-file-manifests/manifest
lib_jar = s3://<secrets bucket>/oddjob-1.0.1-standalone-modified.jar
input_format = oddjob.ManifestTextInputFormat
[enrollments]
blacklist_date = 2001-01-01
blacklist_path = /tmp/blacklist
[answer-distribution]
valid_response_types = customresponse,choiceresponse,optionresponse,multiplechoiceresponse,numericalresponse,stringresponse,formularesponse
***END EXAMPLE override.cfg***
2. Acquire a log file (or create a dummy one)
mkdir /tmp/log_files
cd /tmp/log_files
At this point, you can either acquire a log file from S3 or another developer or use the dummy file below (Include the empty line at the end). Either way, place it in /tmp/log_files
vi tracking.log-20150101-1234567890
*** BEGIN DUMMY LOG FILE ***
{"username": "test_user", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555555, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-23T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_0", "choice_1", "choice_2"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}
{"username": "test_user_alt", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555556, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-22T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_4", "choice_5", "choice_6"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}
{"username": "test_user", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555555, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-22T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_4", "choice_5", "choice_6"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}
*** END DUMMY LOG FILE ***
3. Run the API locally and query for results of the pipeline's aggregation
cd PATH_TO_EDX_FOLDER/analytics/edx-analytics-pipeline
source ~/.venvs/edx-analytics-pipeline/bin/activate
launch-task AnswerDistributionToMySQLTaskWorkflow --local-scheduler --remote-log-level DEBUG --include *tracking.log* --src /tmp/log_files --dest /tmp/answer_dist --mapreduce-engine local --name test_task
mysql -u root -p
USE ANALYTICS;
SELECT COUNT(*) FROM answer_distribution;
If the pipeline task ran successfully (and you used the dummy file above), this should be the output:
+----------+
| COUNT(*) |
+----------+
| 2 |
+----------+
1 row in set (0.00 sec)
exit
deactivate
cd ../edx-analytics-data-api
source ~/.venvs/edx-analytics-data-api/bin/activate
./manage.py runserver --settings=analyticsdataserver.settings.local_mysql
Verify that the data API can connect to the database
1. Navigate to 127.0.0.1:8000 in your web browser:
- If the page does not display and you see ImproperlyConfigured: Error loading MySQLdb module in the logs, run: 'pip install mysql-python'
- If the page indicates a 401 access forbidden error, you need to rerun: './manage.py set_api_key edx edx'