EZPZ is a collection of tools to support OCLC EZProxy log and report file analyses.
They are designed around currently manual steps to collate files.
The files include the EZProxy Report files and the EZProxy Log files.
##Processing goals include: ###Process EZProxy Report files:
- Extract campus per user
- Anonymize userids
- Extract Date and time into fields
- Extract session id
- Store as csv
###Process EZProxy log files:
- Extract session id
- Extract URLs
- Break URLS into components: service, server, resource
- Store as CSV
- Other stuff, TBD
##Summary of Analysis:
Combine the monthly reports and the daily log files into a set of monthly reports about users, resources, campuses, and time.
To do this create monthly logs from the daily logs(1), join them to the monthly report via the session id(2,3).
In that process create a campus id, add a user status (student, faculty, etc), and anonymize userids. Then generate reports.
##Processing Steps:
- Append the daily log files into a monthly file
ls ezp*.log > 201601.lst
cat $(cat 201601.lst) > 201601.log
- Delete the daily files
- Cut the top of the report file into a new file(s)
- All campuses:
head -n $(grep -n Login\ summary ccl201511_report.log |cut -d: -f1) ccl201511_report.log
- Optional: Only CGU (for example):
head -n $(grep -n Login\ summary ccl201512_report.log |cut -d: -f1) ccl201512_report.log |grep cgu > ezp201512CGU.txt
OR: use bash script ezpPullCampusReport [monthly report] [campus report] [campus]
- e.g.
~/ezpPullCampusReport ccl201602_report.log ccl201602_report_PIT.log pit
- ...will read ccl201602_report.log, create ccl201602_report_PIT.log
- ezpCountUIDS.py will count in the report the unique user ids per campus.
-
To create a campus log file use xtrctCampusezp.py with the above campus report and the monthly log file, e.g.
~/ezpz/xtrctCampusEzp.py 201604.log ezp201604KGI.log ccl201604_report_KGI.log
...will extract a campus's log lines into a new file like ezp201601KGI.log -
Extract URLs from the new file and split them into component parts for analysis:
findezpURLS logfile output
####Todo:
- Generalize xtrctKGIezp.py to work for any campus, any month
- Speed improvements needed!
- Are session ids unique per month, year, forever?
- Web UI
- Port bash functions into python script(s)
- Combine scripts into one application