Manipulation

Binning time

Sometimes you'll want to aggregate over a larger time to ensure you have consistent readings per time points. For example, the ethoscope can record several readings per second, however sometimes tracking of a fly will be lost for short time. Binning the time to 60 means you'll smooth over these gaps.

However, this will just be done for 1 variable so will only be useful in specific analysis. If you want this applied across all variables remember to set it as your time window length in your loading functions.

# Sort the data into bins of time with a single column to summarise the bin

# bin time into groups of 60 seconds with 'moving' the aggregated column of choice
# default aggregating function is the mean
bin_df = df.bin_time('moving', 60)

output:
                                t_bin  moving_mean
id
2019-08-02_14-21-23_021d6b|01   86400          0.75
2019-08-02_14-21-23_021d6b|01   86460          0.5
2019-08-02_14-21-23_021d6b|01   86520          0.0
2019-08-02_14-21-23_021d6b|01   86580          0.0
2019-08-02_14-21-23_021d6b|01   86640          0.0
...                               ...          ...
2020-08-07_12-23-10_172d50|19  431760          1.0
2020-08-07_12-23-10_172d50|19  431820          0.75
2020-08-07_12-23-10_172d50|19  431880          0.5
2020-08-07_12-23-10_172d50|19  431940          0.25
2020-08-07_12-23-10_172d50|20  215760          1.0

# the column containg the time and the aggregating function can be changed

bin_df = df.bin_time('moving', 60, t_column = 'time', function = 'max')

output:
                                time_bin  moving_max
id
2019-08-02_14-21-23_021d6b|01   86400          1.0
2019-08-02_14-21-23_021d6b|01   86460          1.0
2019-08-02_14-21-23_021d6b|01   86520          0.0
2019-08-02_14-21-23_021d6b|01   86580          0.0
2019-08-02_14-21-23_021d6b|01   86640          0.0
...                               ...          ...
2020-08-07_12-23-10_172d50|19  431760          1.0
2020-08-07_12-23-10_172d50|19  431820          1.0
2020-08-07_12-23-10_172d50|19  431880          1.0
2020-08-07_12-23-10_172d50|19  431940          1.0
2020-08-07_12-23-10_172d50|20  215760          1.0

Wrap time

The time in the ethoscope data is measured in seconds, however these numbers can get very large and don't look great when plotting data or showing others. Use this method to change the time column values to be a decimal of a given time period, the default is the normal day (24) and will change time to be in hours from reference hour or experiment start.

# Change the time column to be a decimal of a given time period, e.g. 24 hours
# wrap can be performed inplace and will not return a new behavpy
df.wrap_time(24, inplace = True)
# however if you want to create a new dataframe leave inplace False
new_df = df.wrap_time(24)

Remove specimens with low data points

Sometimes you'll run an experiment and have a few specimens that were tracked poorly or just have fewer data points than the rest. This can be really affect some analysis, so it's best to remove it.

Specify the minimum number of data points you want per specimen, any lower and they'll be removed from the metadata and data. Remember the minimum points per a single day will change with the frequency of your measurements.

# removes specimens from both the metadata and data when they have fewer data points than the user specified amount 

# 1440 is 86400 / 60. So the amount of data points needed for 1 whole day if the data points are measured every minute

new_df = df.curate(points = 1440)

Baseline

Not all experiments are run at the same time and you'll often have differing number of days before an interaction (such as sleep deprivation) occurs. To have all the data aligned so the interaction day is the same include in your metadata csv file a column called baseline. Within write the number of additional days that needs to be added to align to the longest set of baseline experiments.

# add addtional time to specimens time column to make specific interaction times line up when the baseline time is not consistent
# the metadata must contain a a baseline column with an integer from 0 - infinity

df = df.baseline(column = 'baseline')

# perform the operation inplace with the inplace parameter 

Add day number and phase

Add new columns to the data, one called phase will state whether it's light or dark given your reference hour and a normal circadian rhythm (12:12). However, if you're working with different circadian hours you can specify the time it turns dark.

# Add a column with the a number which indicates which day of the experiment the row occured on
# Also add a column with the phase of day (light, dark) to the data
# This method is performed in place and won't return anything. However you can make it return a new dataframe with the inplace = False
df.add_day_phase(t_column = 't') # default parameter for t_column is 't'

# if you're running circadian experiments you can change the length of the days the experiment is running as well as the time the lights turn off, see below.

# Here the exoeriments had days of 30 hours long, with the lights turning off at ZT 15 hours. Also we changed inplace to False to return a modified behavpy, rather than modify it in place.
df = df.add_day_phase(day_length = 30, lights_off = 15, inplace = False)

Last updated