Manipulation

Binning time

Sometimes you'll want to aggregate over a larger time to ensure you have consistent readings per time points. For example, the ethoscope can record several readings per second, however sometimes tracking of a fly will be lost for short time. Binning the time to 60 means you'll smooth over these gaps.

However, this will just be done for 1 variable so will only be useful in specific analysis. If you want this applied across all variables remember to set it as your time window length in your loading functions.

# Sort the data into bins of time with a single column to summarise the bin

# bin time into groups of 60 seconds with 'moving' the aggregated column of choice
# default aggregating function is the mean
bin_df = df.bin_time('moving', 60)

output:
                                t_bin  moving_mean
id
2019-08-02_14-21-23_021d6b|01   86400          0.75
2019-08-02_14-21-23_021d6b|01   86460          0.5
2019-08-02_14-21-23_021d6b|01   86520          0.0
2019-08-02_14-21-23_021d6b|01   86580          0.0
2019-08-02_14-21-23_021d6b|01   86640          0.0
...                               ...          ...
2020-08-07_12-23-10_172d50|19  431760          1.0
2020-08-07_12-23-10_172d50|19  431820          0.75
2020-08-07_12-23-10_172d50|19  431880          0.5
2020-08-07_12-23-10_172d50|19  431940          0.25
2020-08-07_12-23-10_172d50|20  215760          1.0

# the column containg the time and the aggregating function can be changed

bin_df = df.bin_time('moving', 60, t_column = 'time', function = 'max')

output:
                                time_bin  moving_max
id
2019-08-02_14-21-23_021d6b|01   86400          1.0
2019-08-02_14-21-23_021d6b|01   86460          1.0
2019-08-02_14-21-23_021d6b|01   86520          0.0
2019-08-02_14-21-23_021d6b|01   86580          0.0
2019-08-02_14-21-23_021d6b|01   86640          0.0
...                               ...          ...
2020-08-07_12-23-10_172d50|19  431760          1.0
2020-08-07_12-23-10_172d50|19  431820          1.0
2020-08-07_12-23-10_172d50|19  431880          1.0
2020-08-07_12-23-10_172d50|19  431940          1.0
2020-08-07_12-23-10_172d50|20  215760          1.0

Wrap time

The time in the ethoscope data is measured in seconds, however these numbers can get very large and don't look great when plotting data or showing others. Use this method to change the time column values to be a decimal of a given time period, the default is the normal day (24) and will change time to be in hours from reference hour or experiment start.

Remove specimens with low data points

Sometimes you'll run an experiment and have a few specimens that were tracked poorly or just have fewer data points than the rest. This can be really affect some analysis, so it's best to remove it.

Specify the minimum number of data points you want per specimen, any lower and they'll be removed from the metadata and data. Remember the minimum points per a single day will change with the frequency of your measurements.

Baseline

Not all experiments are run at the same time and you'll often have differing number of days before an interaction (such as sleep deprivation) occurs. To have all the data aligned so the interaction day is the same include in your metadata csv file a column called baseline. Within write the number of additional days that needs to be added to align to the longest set of baseline experiments.

Add day number and phase

Add new columns to the data, one called phase will state whether it's light or dark given your reference hour and a normal circadian rhythm (12:12). However, if you're working with different circadian hours you can specify the time it turns dark.

Last updated