Modern Information Research and Processing Technologies. MFJ-101. Practical Task 2. Data Mining. Simple Clustering
Preparation for Data Mining:
1. Download Orange Data Mining tool: https://orange.biolab.si/download/#windows
Use portable version for easy access (no need for straight installation): https://download.biolab.si/download/files/Orange3-3.27.0.zip
2. Prepare data for mining: collect up to 6 media text and save them separately as .txt files in one destination folder.
Simple data mining clustering:
Legend comment: clustering scenario: upper level - check up, lower level - deploy.
Right mouse button to open context menu. Type in for searching. Left mouse button to select.
Deploy:
3. Open Import Documents and browse for destination folder.
4. Link it to corpus viewer to see sources content.
5. Link it to Preprocess Text and select Regexp.
6. Connect Preprocess Text to Bag of Words. Regularization must be set as Euclidean.
7. Set Distance and select Cosine metric.
8. Link Hierarchical Clustering at the end of the fork.
9. Open cluster and set linkage to ward and annotation to name, than change clustering markup level (vertical line) to see different cluster groups by the source.
10. Save final result as picture file.
Task results will be accepted until 13:00 on October, 30 in direct comment here or via Telegram.
Комментарии
Отправить комментарий