zoukankan      html  css  js  c++  java
  • 运行数据分析

    How to install and run the analytics backend locally:

    We have had some troubles getting people up and running locally with the analytics backend, so I wrote up a quick guide for installation. If you run into any undocumented issues or trouble, please try to document it here. These instructions were performed on a clean install of Ubuntu 14.04.

    Clone the analytics repositories

    1. Navigate to the folder you want to install into and git clone these repositories:

    • edx/edx-analytics-data-api
    • edx/edx-analytics-pipeline
    • edx/edx-analytics-data-api-client
    • edx/edx-analytics-dashboard
    cd {PATH_TO_EDX_FOLDER}/analytics
    cd {PATH_TO_EDX_FOLDER}/analytics
    
    git clone https://github.com/edx/edx-analytics-pipeline.git
    git clone https://github.com/edx/edx-analytics-data-api.git
    git clone https://github.com/edx/edx-analytics-data-api-client.git
    git clone https://github.com/edx/edx-analytics-dashboard.git
    

    2. Create virtual environments in which to run the repositories

    • It is best to create a separate virtual environment for each repository; otherwise, you may run into conflicts between their dependencies.
    mkdir ~/.venvs
    cd ~/.venvs
    virtualenv edx-analytics-pipeline
    virtualenv edx-analytics-data-api
    virtualenv edx-analytics-data-api-client
    virtualenv edx-analytics-dashboard
    

    Install the dependencies

    • You will need to activate and deactivate each virtualenv in turn
    • Once you think the dependencies are installed, check them by running the repository's unit tests.
    • If the unit tests complete successfully, you will see output of the form "Ran X tests in Ys OK"

    1. Installing edx-analytics-pipeline:

    cd {PATH_TO_EDX_FOLDER}/analytics
    cd edx-analytics-pipeline/
    source ~/.venvs/edx-analytics-pipeline/bin/activate
    make requirements
    make test
    

    If this raises a NoAuthHandlerFound error from boto, run:

    export AWS_ACCESS_KEY_ID="TESTACCESSKEY"
    export AWS_SECRET_ACCESS_KEY="TESTSECRET"
    make test
    

    To run this in production, we need to supply actual AWS credentials to boto, but the test suite does not care if they are valid.

    deactivate
    source ~/.venvs/edx-analytics-data-api/bin/activate
    

    2. Installing edx-analytics-data-api:

    cd ../edx-analytics-data-api
    make develop
    ./manage.py migrate --noinput
    ./manage.py migrate --noinput --database=analytics
    ./manage.py set_api_key edx edx
    make validate
    deactivate
    

    3. Installing edx-analytics-data-api-client:

    cd ../edx-analytics-data-api-client/
    source ~/.venvs/edx-analytics-data-api-client/bin/activate
    pip install -r requirements.txt 
    make test
    deactivate
    

    4. Installing edx-analytics-dashboard:

    cd ../edx-analytics-dashboard/
    source ~/.venvs/edx-analytics-dashboard/bin/activate
    sudo apt-get update
    sudo apt-get install gettext
    sudo apt-get install npm
    sudo apt-get install openjdk-7-jre
    sudo apt-get install openjdk-7-jdk
    sudo apt-get install libxml2-dev libxslt-dev python-dev zlib1g-dev
    make develop
    make validate
    

    If this raises an OfflineGenerationError for missing compression keys, run:

    ./manage.py compress --settings=analytics_dashboard.settings.test
    make validate
    deactivate
    

    Run pipeline task locally and verify its completion

    1. Install MySQL locally and create a credentials file for the pipeline

    sudo apt-get install mysql-server
    mysql -u root -p
    
    CREATE USER 'analytics'@'localhost' IDENTIFIED BY 'edx';
    GRANT ALL PRIVILEGES ON * . * TO 'analytics'@'localhost';
    FLUSH PRIVILEGES;
    
    cd {PATH_TO_EDX_FOLDER}/analytics
    vi mysql_creds
    
    ***BEGIN mysql_creds FILE***
    {
    "host": "127.0.0.1",
    "port": "3306",
    "username": "analytics",
    "password": "edx",
    "database": "analytics"
    }
    ***END mysql_creds FILE***
    
    cd edx-analytics-pipeline
    vi override.cfg
    
    ***BEGIN override.cfg***
    [database-export]
    database = analytics
    credentials = {PATH_TO_EDX_FOLDER}/analytics/mysql_creds
    
    [database-import]
    database = edxprod
    destination = s3://<bucket for intermediate hadoop products>/intermediate/database-import
    credentials = s3://<secrets bucket>/edxapp_prod_ro_mysql_creds
    
    [event-logs]
    expand_interval = 2 days
    pattern = .*tracking.log-(?P<date>[0-9]+).*
    source = s3://<bucket to where all tracking logs are synched>/tracking/
    
    [hive]
    warehouse_path = s3://<bucket for intermediate hadoop products>/warehouse/hive/
    
    [manifest]
    path = s3://<bucket for intermediate hadoop products>/user-activity-file-manifests/manifest
    lib_jar = s3://<secrets bucket>/oddjob-1.0.1-standalone-modified.jar
    input_format = oddjob.ManifestTextInputFormat
    
    [enrollments]
    blacklist_date = 2001-01-01
    blacklist_path = /tmp/blacklist
    
    [answer-distribution]
    valid_response_types = customresponse,choiceresponse,optionresponse,multiplechoiceresponse,numericalresponse,stringresponse,formularesponse
    ***END EXAMPLE override.cfg***
    

    2. Acquire a log file (or create a dummy one)

    mkdir /tmp/log_files
    cd /tmp/log_files
    

    At this point, you can either acquire a log file from S3 or another developer or use the dummy file below (Include the empty line at the end). Either way, place it in /tmp/log_files

    vi tracking.log-20150101-1234567890
    
    *** BEGIN DUMMY LOG FILE ***
    {"username": "test_user", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555555, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-23T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_0", "choice_1", "choice_2"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}
    {"username": "test_user_alt", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555556, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-22T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_4", "choice_5", "choice_6"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}
    {"username": "test_user", "host": "class.stanford.edu", "event_source": "server", "event_type": "problem_check", "context": {"course_id": "edX/DemoX/DemoCourse", "course_user_tags": {}, "user_id": 555555, "org_id": "Education", "module": {"display_name": "Quiz - Reasoning"}}, "time": "2014-06-22T16:17:16.856434+00:00", "ip": "0.0.0.0", "event": {"submission": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"input_type": "checkboxgroup", "question": "Choose as many as you like.", "response_type": "choiceresponse", "answer": ["Reasoning is the essence of what mathematics is", "Reasoning is useful for working in most jobs", "Reasoning allows people to connect ideas and make mathematical breakthroughs"], "variant": "", "correct": false}}, "success": "incorrect", "grade": 0, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "state": {"student_answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_2"]}, "seed": 1, "done": true, "correct_map": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {"hint": "", "hintmode": null, "correctness": "incorrect", "npoints": null, "msg": "", "queuestate": null}}, "input_state": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": {}}}, "answers": {"i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1": ["choice_4", "choice_5", "choice_6"]}, "attempts": 2, "max_grade": 1, "problem_id": "i4x://edX/DemoX-S/problem/a58470ee54cc49ecb2bb7c1b1c0ab43a"}, "agent": "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:30.0) Gecko/20100101 Firefox/30.0", "page": "x_module"}
    *** END DUMMY LOG FILE ***
    

    3. Run the API locally and query for results of the pipeline's aggregation

    cd PATH_TO_EDX_FOLDER/analytics/edx-analytics-pipeline
    source ~/.venvs/edx-analytics-pipeline/bin/activate
    launch-task AnswerDistributionToMySQLTaskWorkflow --local-scheduler --remote-log-level DEBUG --include *tracking.log* --src /tmp/log_files --dest /tmp/answer_dist --mapreduce-engine local --name test_task
    mysql -u root -p
    
    USE ANALYTICS;
    SELECT COUNT(*) FROM answer_distribution;
    

    If the pipeline task ran successfully (and you used the dummy file above), this should be the output:

    +----------+
    | COUNT(*) |
    +----------+
    |        2 |
    +----------+
    1 row in set (0.00 sec)
    
    exit
    deactivate
    cd ../edx-analytics-data-api
    source ~/.venvs/edx-analytics-data-api/bin/activate
    ./manage.py runserver --settings=analyticsdataserver.settings.local_mysql
    

    Verify that the data API can connect to the database

    1. Navigate to 127.0.0.1:8000 in your web browser:

    • If the page does not display and you see ImproperlyConfigured: Error loading MySQLdb module in the logs, run: 'pip install mysql-python'
    • If the page indicates a 401 access forbidden error, you need to rerun: './manage.py set_api_key edx edx'

    2. Click on the answer_distribution query modal and enter 'i4x-edX-DemoX-S-problem-a58470ee54cc49ecb2bb7c1b1c0ab43a_2_1' into the box (or a different module_id from your logs if you didn't use the dummy log file from above)

    3. Click to request the data from the API, and the results should match the log file from above (or whichever you used)

  • 相关阅读:
    增量学习中的自我训练
    半监督学习和直推学习的区别
    LeetCode: Word Break
    LeetCode: Linked List Cycle
    LeetCode: Reorder List
    LeetCode: Binary Tree Traversal
    LeetCode: LRU Cache
    LeetCode: Insertion Sort List
    LeetCode: Sort List
    LeetCode: Max Points on a Line
  • 原文地址:https://www.cnblogs.com/zhaojianwei/p/4666871.html
Copyright © 2011-2022 走看看