Skip to content

Commit

Permalink
Merge pull request #9 from ARGOeu/devel
Browse files Browse the repository at this point in the history
Version 0.2
  • Loading branch information
themiszamani authored Apr 19, 2022
2 parents 8131b63 + 674e644 commit ff59fc6
Show file tree
Hide file tree
Showing 15 changed files with 550 additions and 134 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,6 @@ dmypy.json

# Pyre type checker
.pyre/

# sensitive data
data/
91 changes: 83 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# eosc-recommender-metrics
A framework for counting the recommender metrics

# Preprocessor v.1.0
# Preprocessor v.0.2
<p align="center">
<a href="https://github.com/nikosT/Gisola">
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/master/docs/Preprocessor.png" width="70%"/>
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/Preprocessor.png">
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/Preprocessor.png" width="70%"/>
</a>
</p>

# RS metrics v.1.0
# RS metrics v.0.2
<p align="center">
<a href="https://github.com/nikosT/Gisola">
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/master/docs/RSmetrics.png" width="70%"/>
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/RSmetrics.png">
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/RSmetrics.png" width="70%"/>
</a>
</p>

Expand All @@ -20,13 +20,88 @@ A framework for counting the recommender metrics

# Dependencies
1. Install Conda from here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html. Tested on conda v 4.10.3.
2. Run from terminal: `conda env create -f rsmetrics_env.yml`
2. Run from terminal: `conda env create -f environment.yml`
3. Run from terminal: `conda activate rsmetrics`
4. Run from terminal: `chmod +x ./preprocessor.py ./rsmetrics.py`

# Usage
7. Run from terminal: `./preprocessor.py` in order to prepare the data for the RSmetrics
8. Run from terminal: `./rsmetrics.py` to run RSmetrics
```bash

_____
| __ \
| |__) | __ ___ _ __ _ __ ___ ___ ___ ___ ___ ___ _ __
| ___/ '__/ _ \ '_ \| '__/ _ \ / __/ _ \/ __/ __|/ _ \| '__|
| | | | | __/ |_) | | | (_) | (_| __/\__ \__ \ (_) | |
|_| |_| \___| .__/|_| \___/ \___\___||___/___/\___/|_|
| |
|_|

Version: 0.2
© 2022, National Infrastructures for Research and Technology (GRNET)

usage: preprocessor [-c [FILEPATH]] [-o [DIRPATH]] [-s [DATETIME]] [-e [DATETIME]] [-h]
[-v]

Prepare data for the EOSC Marketplace RS metrics calculation

optional arguments:
-c [FILEPATH], --config [FILEPATH]
override default configuration file (./config.yaml)
-o [DIRPATH], --output [DIRPATH]
override default output dir path (./data)
-s [DATETIME], --starttime [DATETIME]
process data starting from given datetime in ISO format (UTC)
e.g. YYYY-MM-DD
-e [DATETIME], --endtime [DATETIME]
process data ending to given datetime in ISO format (UTC) e.g.
YYYY-MM-DD
-h, --help show this help message and exit
-v, --version show program's version number and exit
```
8. Configure `./preprocessor.py` by editting the `config.yaml` or providing another with `-c`:
<p align="center">
<a href="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/preprocessor-config.png">
<img src="https://github.com/nikosT/eosc-recommender-metrics/blob/devel/docs/preprocessor-config.png" width="70%"/>
</a>
</p>
9. Run from terminal: `./rsmetrics.py` to run RSmetrics
```bash
_____ _____ _ _
| __ \ / ____| | | (_)
| |__) | (___ _ __ ___ ___| |_ _ __ _ ___ ___
| _ / \___ \| '_ ` _ \ / _ \ __| '__| |/ __/ __|
| | \ \ ____) | | | | | | __/ |_| | | | (__\__ \
|_| \_\_____/|_| |_| |_|\___|\__|_| |_|\___|___/
Version: 0.2
© 2022, National Infrastructures for Research and Technology (GRNET)
usage: rsmetrics [-i [FILEPATH]] [-s [DATETIME]] [-e [DATETIME]] [--users] [--services]
[-h] [-v]
Calculate metrics for the EOSC Marketplace RS
optional arguments:
-i [FILEPATH], --input [FILEPATH]
override default output dir (./data)
-s [DATETIME], --starttime [DATETIME]
calculate metrics starting from given datetime in ISO format
(UTC) e.g. YYYY-MM-DD
-e [DATETIME], --endtime [DATETIME]
calculate metrics ending to given datetime in ISO format (UTC)
e.g. YYYY-MM-DD
--users enable reading total users from users.csv, otherwise it will be
calculated according to the user actions
--services enable reading total services from services.csv, otherwise it
will be calculated according to the user actions
-h, --help show this help message and exit
-v, --version show program's version number and exit
```


## Reporting
Expand Down
26 changes: 20 additions & 6 deletions config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,30 @@ Source:
port: 27017
db: recommender_dev

# Use the EOSC-Marketplace webpage
# to associate page_id and service_id
Marketplace:
User:
export: true
#from: 'user_actions'
#from: 'recommendations'
from: 'source'

Service:
# Use the EOSC-Marketplace webpage
# to associate page_id and service_id
download: true
path: ./page_map

#Reward:
# transition: ./transition_rewards.csv
export: true
#from: 'user_actions'
#from: 'recommendations'
from: 'source'
#from: 'page_map'

published: false # applies only on source option

User-actions:
merge: false # not implemented yet

# Calculate connector's metrics
# Calculate source's metrics
Metrics: true


Expand Down
Binary file modified docs/Preprocessor.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/Preprocessor.png.old
Binary file not shown.
Binary file modified docs/RSmetrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/RSmetrics.png.old
Binary file not shown.
Binary file added docs/preprocessor-config.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 16 additions & 4 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=4.5=1_gnu
- ca-certificates=2022.3.29=h06a4308_0
- ca-certificates=2022.3.18=h06a4308_0
- certifi=2021.10.8=py39h06a4308_2
- ld_impl_linux-64=2.35.1=h7274673_9
- libffi=3.3=he6710b0_2
Expand All @@ -17,16 +17,28 @@ dependencies:
- python=3.9.11=h12debd9_2
- readline=8.1.2=h7f8727e_1
- setuptools=58.0.4=py39h06a4308_0
- sqlite=3.38.2=hc218d9a_0
- sqlite=3.38.0=hc218d9a_0
- tk=8.6.11=h1ccaba5_0
- tzdata=2022a=hda174b7_0
- tzdata=2021e=hda174b7_0
- wheel=0.37.1=pyhd3eb1b0_0
- xz=5.2.5=h7b6447c_0
- zlib=1.2.11=h7f8727e_4
- pip:
- beautifulsoup4==4.10.0
- charset-normalizer==2.0.12
- idna==3.3
- pymongo==4.0.2
- joblib==1.1.0
- natsort==8.1.0
- numpy==1.22.3
- pandas==1.4.2
- pymongo==4.1.0
- python-dateutil==2.8.2
- pytz==2022.1
- pyyaml==6.0
- requests==2.27.1
- scikit-surprise==1.1.1
- scipy==1.8.0
- six==1.16.0
- soupsieve==2.3.2
- surprise==0.1
- urllib3==1.26.9
83 changes: 61 additions & 22 deletions get_service_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,43 +7,82 @@



# Main logic
def main(args=None):
# call eosc marketplace with ample number of services per page: default = 1000
url = "https://marketplace.eosc-portal.eu/services?page=1&per_page={}".format(str(args.items))

print("Retrieving page: marketplace list of services... \nGrabbing url: {0}".format(url))
def get_eosc_marketplace_url(num_of_items=1000):
"""Constructs the EOSC Marketplace URL to grab the complete service catalog (list of available services) in one request.
Args:
num_of_items (int, optional): Number of items per page to be used as an url argument when contacting EOSC Marketplace webpage to grab all services in one take. Defaults to 1000.
Returns:
string: EOSC marketplace url along with the neccessary url parameters to grab the list of all available services
"""
url = "https://marketplace.eosc-portal.eu/services?page=1&per_page={}".format(
str(num_of_items))
return url


# Contacts eosc marketplace page to retrieve the complete list of items in a single tak
def get_service_catalog_page_content(url):
"""Returns the HTML Page content of EOSC Marketplace Service Page catalog
Args:
url (string): url to EOSC Marketplace Service list
Returns:
bytes: html content of the eosc marketplace service list page
"""
page = requests.get(url)
return page.content

print("Page retrieved!\nGenerating results...")
soup = BeautifulSoup(page.content, 'html.parser')
def get_service_catalog_items(content):
"""Parses EOSC Marketplace service list html page and extracts all active services.
Each service is described by a list of three items: [service_id, service_name, service_path]
# Find all h2 that contain the data-e2e attribute equal to service-id
results = soup.findAll("h2", {"data-e2e":"service-id"})
Args:
content (bytes): Html content of EOSC Marketplace page containing the complete list of available services
Returns:
list of lists: A list of service entries. Each service entry is a three-item list containing: [service_id, service_name, service_path]
"""
rows = []
# populate rows with each row = [service id, service name, service path]
soup = BeautifulSoup(content, 'html.parser')
results = soup.findAll("h2", {"data-e2e": "service-id"})
for item in results:
a = item.findChildren("a",recursive=False)[0]
row = [int(item.attrs["data-service-id"]),item.text.strip(),a['href']]
a = item.findChildren("a", recursive=False)[0]
row = [int(item.attrs["data-service-id"]),
item.text.strip(), a['href']]
rows.append(row)
# sort rows by id
# sort rows by id
rows = sorted(rows, key=lambda x: x[0])

# output to csv
with open(args.output, "w") as f:
return rows

def save_service_items_to_csv(items, output):
with open(output, "w") as f:
writer = csv.writer(f)
writer.writerows(rows)

writer.writerows(items)

# Main logic
def main(args=None):
# call eosc marketplace with ample number of services per page: default = 1000
url = get_eosc_marketplace_url(args.items)
print(
"Retrieving page: marketplace list of services... \nGrabbing url: {0}".format(url))
page_content = get_service_catalog_page_content(url)
print("Page retrieved!\nGenerating results...")
results = get_service_catalog_items(page_content)
# output to csv
save_service_items_to_csv(results, args.output)
print("File written to {}".format(args.output))


# Parse arguments and call main
# Parse arguments and call main
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Retrieve service catalog from eosc marketplace")
parser = argparse.ArgumentParser(
description="Retrieve service catalog from eosc marketplace")
parser.add_argument(
"-n", "--num-of-items", metavar="STRING", help="Number of items per page", required=False, dest="items", default="1000")
parser.add_argument(
"-o", "--output", metavar="STRING", help="Output csv file", required=False, dest="output", default="./service_catalog.csv")

# Parse the arguments
sys.exit(main(parser.parse_args()))
sys.exit(main(parser.parse_args()))
Loading

0 comments on commit ff59fc6

Please sign in to comment.