Introduction

The Fitbit Data Parsers take JSON data provided by the study-hub and parse it into two formats.

A human readable format that allows for use in either Microsoft Excel for clinical trial staff or querying in other languages.
Structuring data so that it can be readily processed for future analyses.

Application Structure

The parser structure manages several functions so that it automatically parses data with minimal interaction on the user’s part.

A JSON file is first parsed by Parser.py.

findOmissions.py performs several functions. It structures the data into a tree and contains several methods to find omissions in data.

printOmissions.py allows for the output of data a user defines to the terminal.

Parser.py

The Parser module contains one class, Table(), initialized with the filepath of the JSON and contains two instance variables

Values	Data Type	Description
filepath	Str	File path of the JSON file
parsedTable	pd.DataFrame	The resulting DataFrame of the parsed JSON file

Table(str)
Table.filepath -> str # The filepath of the JSON file
Table.parsedTable -> pd.DataFrame # The resulting parsed table

To parse a JSON file:

from Parser import Table
fpath = ~/documents/V2/someJsonFile.json
table = Table(fpath)
table.parsedTable()

Table() contains multiple methods to achieve the above:

Function	Input Data Type	Return Data Type	Description
openJsonFile		file	Opens and reads the JSON file at Table.filepath.
parseJsonFile		pd.DataFrame	Parses a JSON file to a pandas DataFrame
createDataFrame	pd.Series	pd.DataFrame	Creates a Pandas DataFrame from a dictionary
parseDataColumn	pd.DataFrame	[pd.DataFrame]	Creates a DataFrame from dictionaries in a Pandas DataFrame
concatAndTransposeData		pd.DataFrame	Concatenates a list of dataframes
parseNameSpace	str, str	pd.DataFrame	takes a regex string and searches for matches within the namespace column
namespaceBruteSearch		list	formats the namespace column using a reformatted regex string and self.parseNameSpace
getAllFrames		pd.DataFrame	Creates a human readable DataFrame
addFromBruteSearch		pd.DataFrame	concatenates reformatted namespace column
renameCol		pd.DataFrame	renames the Columns

Table.createDataFrame(self, series):

Parameters	Data Type	Description
series	pd.Series	A regex string for pattern matching

Values	Data Type	Description
	pd.DataFrame	Creates a Pandas DataFrame from a dictionary

Table.parseDataColumn(self, dataToClean):

Parameters	Data Type	Description
dataToClean	pd.DataFrame	A regex string for pattern matching

Values	Data Type	Description
	pd.DataFrame	The resulting DataFrame from a list of dictionaries

Table.parseNameSpace(self, regex, testString):

Parameters	Data Type	Description
regex	str	A regex string for pattern matching
testString	str	The string to find matches in

Values	Data Type	Description
regex	str	A regex string for pattern matching
testString	str	The string to find matches in

findOmissions.py

findOmissions.py creates a tree of which no node is more than two steps away from the root node.

Two classes are contained in findOmissions MainData and SubData:

MainData

MainData has several instance variables containing information pertinent to the overview of the study:

Values	Data Type	Description
df	pd.DataFrame	The current DataFrame to be processed
surveyParticipants	list	A unique list of survey participants
arrayOfSubsetObjects	list	A list containing the SubData objects detailing each patient’s participation in the study
patientsToContact	list	deprecated

Methods:

Function	Input Data Type	Return Data Type	Description
_getJsonPath		file	Opens and reads the JSON file at Table.filepath.
_stringCleaning	string	string	removes parenthesis
_convertTime	string	DateTime	converts epoch to datetime
_newTable	pd.DataFrame	pd.DataFrame	removes parenthesis in table
_getStartDates	pd.DataFrame	pd.DataFrame	Returns time requested and time completed for each survey with ID as the pKey
_construct			sets self.df using the return value from _getStartDates and sets self.surveyIDs as unique IDs
_getCondition			deprecated
createTraversal		self.arrayOfSubsetObjects	Creates a SubData class for every participant ID

_stringCleaning(self, stringToClean)

Parameters	Data Type	Description
stringToClean	Str	String containing extraneous parentheses.

_convertTime(self, unixTime)

Parameters	Data Type	Description
unixTime	Str	Time expressed in the number of seconds past Thursday, 1 January 1970, minus the number of leap seconds that have taken place since then

_getStartDates(self, table) -> table

converts epochs in table.

Parameters	Data Type	Description
table	pd.DataFrame	A pandas DataFrame containing a parsed JSON file

Values	Data Type	Description
table	pd.DataFrame	A pandas DataFrame with UnixTime objects converted to DateTime

createTraversal(self)

traverses over DataFrame and creates SubData objects for each participant ID

To set structure survey data:

From findOmissions import MainData
structure =  MainData()
structure.createTraversal()

SubData

SubData structures surveys, scoring, dates to complete, and each individuals enrollment date.

SubData’s instance variables:

Values	Data Type	Description
participantID	Str	ID of the Participant
df	pd.DataFrame	Data respective of the individual participant
uniqueSurveys	pd.DataFrame	Surveys of the participant
enrollmentDate	[DateTime]	Date of Enrollment for the participant
datesToCompleteSurveys	[DateTime]	Last possible date to complete survey respective of self.uniqueSurveys
self.contactPatient	Bool	True if staff to contact patient

Methods:

getQuestionDate(self, question, requested = requested) -> [dateTime]

Retrieves a list of the dates for each task presented

Parameters	Data Type	Description
question	Str	Survey or Task (‘psqi’, ‘vas’, ‘sibdq’, ‘sleep’)
requested	Str	time requested, default = requested, (‘completed’, ‘requested’)

calculateCompletionDates(self) -> list

Returns the dates of when individuals should receive survey tasks

calculateDateRanges(self) -> (datesOfCompletion, addWindow)

Returns a tuple of DateTimes

Values	Data Type	Description
datesOfCompletion	DateTime	First possible date of completion of survey tasks
addWindow	DateTime	Last possible date of completion of survey tasks

findOmissions(self, question) -> [participantID, completionDates, completionAfterDates, nonCompletionDates]

Parameters	Data Type	Description
question	Str	Survey or Task (‘psqi’, ‘vas’, ‘sibdq’, ‘sleep’)
requested	Str	time requested, default = requested, (‘completed’, ‘requested’)

Values	Data Type	Description
participantID	Str	The ID of the Participant
completionDates	list	Dates of task completion
completionAfterDates	list	Dates of task completion after date range
nonCompletionDates	list	Last possible dates of completion for surveys not completed

Usage

As each individual’s raw surveys and enrollment date is accessible separately from others, surveys can be scored and accessed systematically.

Surveys and their dates are 0 indexed and are accessed separately. Should one want to correlate a survey with a date or tuples of dates, their indices are equivalent.

An example of calculating PSQIs

from findOmissions import MainData

data = MainData() # creating the MainData Object
data.createTraversal() # structuring the data
for subject in data.getSubsetData():
    subject.getUniqueSurveys()
    print subject.participantID #prints an ID for the current participant
    subject.getSurveySeries('PSQI')
    print map (subject.scorePSQI, subject.uniqueSurveys) # prints an array of the PSQI scores

The Data Dashboard

The dashboard pulls all longitudinal data according to two lines of code:

import results
result = Results('#questionsToQuery')
result.output(values = False) # gives dates, defaults to true to give values

questions to query: * PSQI * SIBDQ * WongBaker * SubjectGlobalAssesmentVAS * SleepVas

The results query also does a rudimentary search. If the query is a substring in the question above, it will return a result. Careful use of this function is warranted, and it should be advised against unless the two questions have distinctly different characters.

The instance variables in the Results Class:

Values	Data Type	Description
IDs	list	List of all IDs in the study
Dates	np.array	dates of participant tasks
Values	np.array	values of participatn tasks
survey	string	string of task to retrieve results for

_getData(self) -> self.Dates, self.Values Retrieves and structures data. Structures all data according to participant task date and value.
insertDateGaps(self, dates) -> dic(self.IDs,self.Dates) Iterates over the current dictionary of survey dates. If there are a pair of dates with greater than a datetime difference of 1 day, it inserts the integer value of the number of days in “SKIPPED” strings.
output(self, date = True, values = False) -> “survey.csv”

Performs all of the functions as above to format a CSV file as the output, either for date times or values.