How to Work with the Data: A User Guide
The PTSD-Repository contains data from over 300 randomized controlled trial (RCT) studies of the treatment of PTSD, to read more see our story About the PTSD-Repository. For those who want to dig into the data a bit more and conduct some of your own analysis, we have created this guide to help you and outline the tools available on the platform -- while you cannot do calculations or manipulate the data on the platform, you can use it to create visualizations or access it in a variety of ways off platform for these kinds of inquiries into the data. This guide starts off at a high level discussing the data itself, but gradually goes into greater detail on ways to work with the data. The guide is broken out into four steps and we encourage you to read the first two to learn about the data so that you can approach your analysis in an informed way and be a responsible consumer of the data. We understand that visitors to this site may have varying levels of comfort in working with the data, so whether you are a friend or family member, member of the media or a clinician, or a researcher, we have attempted to build this resource for everyone. For an overview of this guide, please reference the table of contents below.
- How to Work with the Data: A User Guide
Step 1: Sign up as a Community User
Before you get started, we encourage you to sign up as a community user. While users may create visualizations and other content using the data in the Repository, in order to save it you will need to sign up with the Socrata platform. It is a quick process and within minutes you will have a username and password to sign into the site.
Step 2: Learn about the PTSD-Repository
The National Center for PTSD's PTSD-Repository is a collection of datasets, visualizations and stories based on data abstracted from over 300 randomized controlled trials (RCT) studies on PTSD. The data was originally abstracted into a single data set, but has been reformatted for consumption on this platform into multiple data sets. To learn about the data please review the stories below to familiarize yourself about the origin of the data and how it has been organized. Some of the most important things to know are:
- The data was formatted to be machine-readable, meaning that the structure of each data table includes well defined column variables that contain one value per row. Each row is in turn defined by some of these variables in ways that may not be intuitive. For example, in some of the datasets, there will be more than one row for each RCT.
- The data was formatted in such a way that each table is intended to capture a core characteristic of the studies, whether it be treatments, demographics, outcomes, or references.
Step 3: Learn about the Data
Once you understand what is captured by this data and how it is organized, NCPTSD has put together some resources to inform users about key components of the data, including who has been studied and which treatments have been studied. You can also start by looking at the data itself. The datasets are easily accessible through the Data Catalog (see button in top left of page).
Once you select a dataset, you will be taken to the primer page. Take some time to read through the information on the "Primer Page" of each dataset, including:
- Description: A description of the data found in the dataset and notes on "reading" the data.
- Metadata: Information on the data, including when it was last updated, any tags, and a button to contact the dataset owner if you have questions about the dataset.
- What's in this dataset: A high-level summary of the dataset properties, including number of rows and columns and, importantly, what each row represents.
- Column -level data, such as column name and description of the column data (e.g. Column Name: Author, Year, Description: Primary Author and Year of the study, the type of data (e.g. text, number, date, etc.)
As mentioned above, since the data has been organized to be machine readable, users should pay careful attention to this information to ensure they aren't "misreading" or drawing incorrect conclusions.
Examples of row level organization
A single RCT study may have multiple rows in some tables. For example, the table to the left includes one row for each treatment arm, or intervention arm, in the RCT. Some studies only have two arms, A and B. Others, like this example, have as many as four.
In this example, a single RCT reports PTSD outcomes in various ways. Multiple assessment points after treatment, including at end of treatment, report data.
At each assessment point, there are two common methods for collecting and reporting data: Intention to Treat (ITT) or Completer. Many RCTs report distinct data for both of these Analysis Types.
And remember, each treatment arm may report data as well. That's why in the table to the left you see 12 rows for just one single RCT study.
There are 6 rows for Intervention A and 6 rows for Intervention B: 3 assessment points in time (0 weeks, 6 weeks, and 12 weeks) and 2 analysis types (ITT, Completer) all report a set of data.
The columns that tell you this information:
- Intervention Group
- Follow-up Assessment in Weeks
- Analysis Type
Step 4: Analyze the Data
Once you have a good understanding of the data, you may want to do your own analysis. The platform offers several tools to enable users to analyze the data. Most of these tools are accessible via the Primer Page; from here you can "View the Dataset", "Visualize" the data, download it, or get the API documentation for the dataset. More on these below.
To view the dataset, select "View the Dataset" and you'll be taken to a table view of the data. From here you can view the data in more detail, embed the data, filter, and export the data. More details on these below.
Platform Visualization Tools
By selecting the "Visualize" button on the Primer Page Users are able to visualize the dataset on the platform by creating a variety of charts to draw insights from the data. To learn how to start visualizing the data, read Getting Started on our Open Data Platform
Filtering the Data
You may want to filter a dataset in order to learn more about a specific variable, such as PTSD trials that took place at military institutions (Column: Site Type)) or in a specific clinical setting (Column: Clinical Setting). To get started, go to a dataset's primer page and select "View Data". You'll be taken to a page with a tabular view of the dataset-- note: that you may need to scroll to the right to see the whole dataset, to ensure the data is readable by users the page may not show the whole dataset if it is too wide. Under the Filter menu on a dataset, you will find multiple filter conditions for a variety of filter use cases. To add a filter to a dataset, navigate to the Filter section under the Filter menu and click the "Add a New Filter Condition" button.
Upon creating a new filter condition, you'll have several operators to choose from:
- is not
- starts with
- does not contain
- is blank
You can also use the "Options" menu to access additional customization such as filtering conditions matching all or any conditions.
Using these different conditions, you can create a diverse set of filters. Filters are not case-sensitive.
Filter On "AND/OR" Match Conditions
The filter panel currently only allows AND matches.
For isolating rows on OR matches, you can use the conditional formatting tool to apply a color to isolate rows matching any of the match conditions you set.
Navigate to the 'Filter' tab, select When to 'Any Condition', and then conditions to the match options of which you would like at least one to be met. The rows will be isolated visually, but cannot currently be aggregated alone to form their own separate view.
If you want to use an AND match condition in combination with an OR match condition, create a filtered view based off of your AND match condition and use conditional formatting on this filtered view.
Once you have filtered the dataset, you will need to save the view. This will create a Filtered view data type of the dataset. These Filtered views can be searched within the Data Catalog by clicking on the "Filtered view" category on the left hand panel.
Download the Data
Once a dataset has been loaded into Socrata, it is available for download in a number of different formats:
- CSV (Comma Separated Values)
- CSV for Excel (Tab Separated Values)
- CSV for Excel (Europe)*
- RSS** (with GeoRSS information if there is a Location column in the dataset)
- TSV for Excel
All of the above formats are also available for that data in filtered views and are also available to developers via the Socrata Open Data Consumer API.
Downloading a dataset can be done through either the Primer page or through the table view. With Primer just select Download at the top of the page.
Accessing the Data through API
Each dataset on Socrata has a corresponding Application Programming Interface, or API, document that is hosted on dev.socrata.com. This page contains details on utilizing the API for the particular dataset.
The API Docs page can be found from either dataset's Primer page or data table page.
To Access from Primer
Select API from the top right menu bar which will open a new pop up window. From this window select API Docs, this will direct you to the docs for the particular dataset.
Read more about our API Docs here.
API Access from the Data Table
Click on the blue Export button from the top panel and select Soda API to open the API drop-down. From here select API Docs.
Access via Odata
Use OData to open the dataset in tools like Excel or Tableau. This provides a direct connection to the data that can be refreshed on-demand within the connected application.
Connect to either the Odata V2 or V4 endpoints by choosing the ellipses on the dataset page.