How do I begin using ProQuest TDM Studio? And where do I go for help?
Answer
ProQuest TDM Studio is ProQuest’s text and data mining platform and their solution for researchers who want to mine their licensed content. Researchers cannot bulk download articles from ProQuest databases and they cannot scrape the database interfaces using scripts or other automated processes. TDM Studio consists of two components: the workbench and visualizations.
You can access ProQuest TDM Studio by going to their website and clicking the create account button in the upper right hand corner. Use your George Mason University email address. You will receive a confirmation email and then you can begin using the platform.
Before you start, read the Terms of Use to understand what you can and cannot do with the content and the platform. You cannot feed the data or datasets you create within TDM Studio into Artificial Intelligence (AI) tools.
What is the difference between visualizations and the workbench?
Those new to text and data mining can use the visualization component of the platform, which is a user-friendly way to engage with ProQuest content and does not require any knowledge of programming languages. Researchers can create datasets using licensed ProQuest content and analyze those datasets by selecting multiple visualization options: sentiment analysis, geographic analysis, and topic modeling.
The workbench enables researchers to create datasets using licensed ProQuest content and analyze those datasets by running Python or R scripts in an accompanying Jupyter Notebook. Researchers can download and export code within the confines of a weekly limit. Researchers are not able to download the datasets that are created within the platform. Researchers should have experience using Python or R to use the workbench effectively. If you are working collaboratively in a team, up to five researchers can have access to a workbench at one time. Each researcher or research team will have access to that project’s dataset(s) and code in Jupyter Notebook.
Visualizations | Workbench | |
Number of datasets | 10 datasets | 10 datasets |
Number of documents within datasets | 10,000 documents | 2,000,000 documents |
Available content |
61,00+ publication titles 360+ databases |
79,000+ publication titles 360+ databases |
Export options | Metadata used to create visualizations |
Export up to 1,000,000 metadata records per week Export 15 MB of data from Jupyter Notebook |
Programming skills | None required | Prior experience with Python or R recommended |
Collaborative work | No | Yes, up to 5 authors |
Visualizations Visualization options available |
Workbench Sample scripts in Jupyter Notebook |
Sentiment analysis | Convert to dataframe |
Geographic analysis | Display document counts |
Topic modeling | Document term matrix |
GPT batch processing | |
GPT sentiment analysis | |
Keyword in context | |
KWIC over time | |
Named entity recognition using SpaCy | |
Top 10 entity recognition |
What resources are available to help me get started with TDM Studio?
Begin with the TDM Studio guide that describes how to use the platform, how to create datasets, and the kinds of analyses you can perform. There is a quick start guide if you are short on time, and there are FAQs. There are search tips that are useful for when you are creating a search string to find and refine content for your dataset. Within the platform, you can also find help within the help and learn tab that is in the top navigation menu.
I'm using the workbench and need assistance. What resources are available to help me with the workbench component?
There is a walkthrough video of the workbench dashboard. You can schedule an onboarding session with a TDM Studio expert. Note that the onboarding session is for researchers using the workbench, not the visualization component.
If you require further assistance, send an email to Alyssa Fahringer, the Digital Scholarship Consultant at Data and Digital Scholarship Services: afahring@gmu.edu. Describe your project and what you need help with and Alyssa will get back to you.