I have a document library that houses job reqs that our users search against. We want to provide them with a list of tech keywords they can use for searching purposes.
I want to build a list of technical terms based on word density from these documents. Obviously, I'm not interested in noise words like "the", "a", etc.
I don't have a predetermined list to start with, so I'd like to build a report that shows me the number of times each unique word shows up in this document library. I want to use this report to create a technology term list that i can use for future purposes.
I'm imagining a report that looks something like this
| Keyword | Count |
| SharePoint | 56 |
| C# | 100 |
| etc.. | etc... |
Any ideas? Any suggestions would be helpful.
Thanks