Joining
Concatenate
Concatenate allows you to join two or more behavpy data tables together, joining both the data and metadata of each table. The two tables do not need to have identical columns, where there's a mismatch the column values will be replaced with NaNs.
# An example of concatenate using .xmv() to create seperate data tables
df1 = df.xmv('species', 'D.vir')
df2 = df.xmv('species', 'D.sec')
df3 = df.xmv('species', 'D.ere')
# a behapvy wrapper to expand the pandas function to concat the metadata
new_df = df1.concat(df2)
# .concat() can process multiple data frames
new_df = df.concat(df2, df3)Pivot
Sometimes you want to get summary statistics of a single column per specimen. This is where you can use .pivot(). The method will take all the values in your desired column per fly and apply a summary statistic. You can choose from a basic selection, e.g. mean, median, sum. But you can also use your own function if you wish (the function must work on array data and return a single output).
# Pivot the data frame by 'id' to find summary statistics of a selected columns
# Example summary statistics: 'mean', 'max', 'sum', 'median'...
pivot_df = df.pivot('interactions', 'sum')
output:
interactions_sum
id
2019-08-02_14-21-23_021d6b|01 0
2019-08-02_14-21-23_021d6b|02 43
2019-08-02_14-21-23_021d6b|03 24
2019-08-02_14-21-23_021d6b|04 15
2020-08-07_12-23-10_172d50|18 45
2020-08-07_12-23-10_172d50|19 32
2020-08-07_12-23-10_172d50|20 43
# the output column will be a string combination of the column and summary statistic
# each row is a single specimenRe-join
Sometimes you will create an output from the pivot table or just a have column you want to add to the metadata for use with other methods. The column to be added must be a pandas series of matching length to the metadata and with the same specimen ids.
# you can add these pivoted data frames or any data frames with one row per specimen to the metadata with .rejoin()
# the joining dataframe must have an index 'id' column that matches the metadata
df = df.rejoin(pivot_df)Last updated