Modules¶
fife.base_modelers module¶
fife.lgb_modelers module¶
FIFE modelers based on LightGBM, which trains gradient-boosted trees.
-
class
fife.lgb_modelers.
GradientBoostedTreesModeler
(**kwargs)¶ Bases:
fife.lgb_modelers.LGBSurvivalModeler
Deprecated alias for LGBSurvivalModeler
-
class
fife.lgb_modelers.
LGBExitModeler
(exit_col, **kwargs)¶ Bases:
fife.lgb_modelers.LGBModeler
,fife.base_modelers.ExitModeler
Use LightGBM to forecast the circumstance of exit conditional on exit.
-
class
fife.lgb_modelers.
LGBModeler
(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None, duration_col: str = '_duration', event_col: str = '_event_observed', predict_col: str = '_predict_obs', test_col: str = '_test', validation_col: str = '_validation', period_col: str = '_period', max_lead_col: str = '_maximum_lead', spell_col: str = '_spell', weight_col: Union[None, str] = None, allow_gaps: bool = False)¶ Bases:
fife.base_modelers.Modeler
Train a gradient-boosted tree model for each lead length using LightGBM.
-
config
¶ User-provided configuration parameters.
- Type
dict
-
data
¶ User-provided panel data.
- Type
pd.core.frame.DataFrame
-
categorical_features
¶ Column names of categorical features.
- Type
list
-
duration_col
¶ Name of the column representing the number of future periods observed for the given individual.
- Type
str
-
event_col
¶ Name of the column indicating whether the individual is observed to exit the dataset.
- Type
str
-
predict_col
¶ Name of the column indicating whether the observation will be used for prediction after training.
- Type
str
-
test_col
¶ Name of the column indicating whether the observation will be used for testing model performance after training.
- Type
str
-
validation_col
¶ Name of the column indicating whether the observation will be used for evaluating model performance during training.
- Type
str
-
period_col
¶ Name of the column representing the number of periods since the earliest period in the data.
- Type
str
-
max_lead_col
¶ Name of the column representing the number of observable future periods.
- Type
str
-
spell_col
¶ Name of the column representing the number of previous spells of consecutive observations of the same individual.
- Type
str
-
weight_col
¶ Name of the column representing observation weights.
- Type
str
-
reserved_cols
¶ Column names of non-features.
- Type
list
-
numeric_features
¶ Column names of numeric features.
- Type
list
-
n_intervals
¶ The largest number of periods ahead to forecast.
- Type
int
-
model
¶ A trained LightGBM model (lgb.basic.Booster) for each lead length.
- Type
list
-
objective
¶ The LightGBM model objective appropriate for the outcome type, which is “binary” for binary classification.
- Type
str
-
num_class
¶ The num_class LightGBM parameter, which is 1 for binary classification.
- Type
int
-
build_model
(n_intervals: Union[None, int] = None, params: dict = None, parallelize: bool = True) → None¶ Train and store a sequence of gradient-boosted tree models.
-
compute_shap_values
(subset: Union[None, pandas.core.series.Series] = None) → dict¶ Compute SHAP values by lead length, observation, and feature.
-
hyperoptimize
(n_trials: int = 64, rolling_validation: bool = True, subset: Union[None, pandas.core.series.Series] = None) → dict¶ Search for hyperparameters with greater out-of-sample performance.
- Parameters
n_trials – The number of hyperparameter sets to evaluate for each time horizon. Return None if non-positive.
rolling_validation – Whether or not to evaluate performance on the most recent possible periods instead of the validation set labeled by self.validation_col. Ignored for a given time horizon if there is only one possible period for training and evaluation.
subset – A Boolean Series that is True for observations on which to train and validate. If None, default to all observations not flagged by self.test_col or self.predict_col.
- Returns
A dictionary containing the best-performing parameter dictionary for each time horizon.
-
predict
(subset: Union[None, pandas.core.series.Series] = None, cumulative: bool = True) → numpy.ndarray¶ Use trained LightGBM models to predict the outcome for each observation and time horizon.
- Parameters
subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.
cumulative – If True, produce cumulative survival probabilies. If False, produce marginal survival probabilities (i.e., one minus the hazard rate).
- Returns
A numpy array of predictions by observation and lead length.
-
save_model
(file_name: str = 'GBT_Model', path: str = '') → None¶ Save the horizon-specific LightGBM models that comprise the model to disk.
-
train
(params: Union[None, dict] = None, subset: Union[None, pandas.core.series.Series] = None, validation_early_stopping: bool = True, parallelize: bool = True) → List[lightgbm.basic.Booster]¶ Train a LightGBM model for each lead length.
-
train_single_model
(time_horizon: int, params: Union[None, dict] = None, subset: Union[None, pandas.core.series.Series] = None, validation_early_stopping: bool = True) → lightgbm.basic.Booster¶ Train a LightGBM model for a single lead length.
-
transform_features
() → pandas.DataFrame¶ Transform features to suit model training.
-
-
class
fife.lgb_modelers.
LGBStateModeler
(state_col, **kwargs)¶ Bases:
fife.lgb_modelers.LGBModeler
,fife.base_modelers.StateModeler
Use LightGBM to forecast the future value of a feature conditional on survival.
-
class
fife.lgb_modelers.
LGBSurvivalModeler
(**kwargs)¶ Bases:
fife.lgb_modelers.LGBModeler
,fife.base_modelers.SurvivalModeler
Use LightGBM to forecast probabilities of being observed in future periods.
fife.nnet_survival module¶
FIFE uses the nnet_survival module of Gensheimer, M.F., and Narasimhan, B., “A scalable discrete-time survival model for neural networks,” PeerJ 7 (2019): e6257. The nnet_survival version packaged with FIFE is GitHub commit d5a8f26 on Nov 18, 2018 posted to https://github.com/MGensheimer/nnet-survival/blob/master/nnet_survival.py. nnet_survival is licensed under the MIT License. The FIFE development team modified lines 12 and 13 of nnet_survival for compatibility with TensorFlow 2.0.
-
class
fife.nnet_survival.
PropHazards
(output_dim, **kwargs)¶ Bases:
tensorflow.keras.layers.Layer
-
build
(input_shape)¶
-
call
(x)¶
-
compute_output_shape
(input_shape)¶
-
get_config
()¶
-
-
fife.nnet_survival.
make_surv_array
(t, f, breaks)¶ Transforms censored survival data into vector format that can be used in Keras.
- Arguments
t: Array of failure/censoring times. f: Censoring indicator. 1 if failed, 0 if censored. breaks: Locations of breaks between time intervals for discrete-time survival model (always includes 0)
- Returns
Two-dimensional array of survival data, dimensions are number of individuals X number of time intervals*2
-
fife.nnet_survival.
nnet_pred_surv
(y_pred, breaks, fu_time)¶
-
fife.nnet_survival.
surv_likelihood
(n_intervals)¶ Create custom Keras loss function for neural network survival model.
- Arguments
n_intervals: the number of survival time intervals
- Returns
Custom loss function that can be used with Keras
-
fife.nnet_survival.
surv_likelihood_rnn
(n_intervals)¶ Create custom Keras loss function for neural network survival model. Used for recurrent neural networks with time-distributed output. This function is very similar to surv_likelihood but deals with the extra dimension of y_true and y_pred that exists because of the time-distributed output.
fife.pd_modelers module¶
FIFE modeler based on Pandas, which tabulates interacted fixed effects.
-
class
fife.pd_modelers.
IFEExitModeler
(exit_col, **kwargs)¶ Bases:
fife.pd_modelers.IFEModeler
,fife.base_modelers.ExitModeler
Forecast the circumstance of exit conditional on exit using the mean of observations with the same values.
-
class
fife.pd_modelers.
IFEModeler
(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None, duration_col: str = '_duration', event_col: str = '_event_observed', predict_col: str = '_predict_obs', test_col: str = '_test', validation_col: str = '_validation', period_col: str = '_period', max_lead_col: str = '_maximum_lead', spell_col: str = '_spell', weight_col: Union[None, str] = None, allow_gaps: bool = False)¶ Bases:
fife.base_modelers.Modeler
Predict with mean of training observations with same values.
-
config
¶ User-provided configuration parameters.
- Type
dict
-
data
¶ User-provided panel data.
- Type
pd.core.frame.DataFrame
-
categorical_features
¶ Column names of categorical features.
- Type
list
-
reserved_cols
¶ Column names of non-features.
- Type
list
-
numeric_features
¶ Column names of numeric features.
- Type
list
-
n_intervals
¶ The largest number of one-period intervals any individual is observed to survive.
- Type
int
-
model
¶ Survival rates for each combination of categorical values in the training data.
- Type
pd.core.frame.DataFrame
-
hyperoptimize
(**kwargs) → dict¶ Returns None for InteractedFixedEffectsModeler, which does not have hyperparameters
-
predict
(subset: Union[None, pandas.core.series.Series] = None, cumulative: bool = True) → numpy.ndarray¶ Map observations to outcome means from their categorical values.
Map observations with a combination of categorical values not seen in the training data to the mean of the outcome in the training set.
- Parameters
subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.
cumulative – If True, will produce cumulative survival probabilities. If False, will produce marginal survival probabilities (i.e., one minus the hazard rate).
- Returns
A numpy array of outcome means by observation and lead length.
-
save_model
(file_name: str = 'IFE_Model', path: str = '') → None¶ Save the pandas DataFrame model to disk.
-
train
() → pandas.core.frame.DataFrame¶ Compute the mean of the outcome for each combination of categorical values.
-
-
class
fife.pd_modelers.
IFEStateModeler
(state_col, **kwargs)¶ Bases:
fife.pd_modelers.IFEModeler
,fife.base_modelers.StateModeler
Forecast the future value of a feature conditional on survival using the mean of observations with the same values.
-
class
fife.pd_modelers.
IFESurvivalModeler
(**kwargs)¶ Bases:
fife.pd_modelers.IFEModeler
,fife.base_modelers.SurvivalModeler
Predict with survival rate of training observations with same values.
-
class
fife.pd_modelers.
InteractedFixedEffectsModeler
(**kwargs)¶ Bases:
fife.pd_modelers.IFESurvivalModeler
Deprecated alias for IFESurvivalModeler
fife.processors module¶
Data processing functions and classes for FIFE.
-
class
fife.processors.
DataProcessor
(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None)¶ Bases:
object
Prepare data by identifying features as degenerate or categorical.
-
is_categorical
(col: str) → bool¶ Determine if the given feature should be processed as categorical, as opposed to numeric.
-
is_degenerate
(col: str) → bool¶ Determine if a feature is constant or has too many missing values.
-
-
class
fife.processors.
PanelDataProcessor
(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None)¶ Bases:
fife.processors.DataProcessor
Ready panel data for modelling.
-
config
¶ User-provided configuration parameters.
- Type
dict
-
data
¶ Processed panel data.
- Type
pd.core.frame.DataFrame
-
raw_subset
¶ An unprocessed sample from the final period of data. Useful for displaying meaningful values in SHAP plots.
- Type
pd.core.frame.DataFrame
-
categorical_maps
¶ Contains for each categorical feature a map from each unique value to a whole number.
- Type
dict
-
numeric_ranges
¶ Contains for each numeric feature the maximum and minimum value in the training set.
- Type
pd.core.frame.DataFrame
-
build_processed_data
(parallelize: bool = True) → None¶ Clean, augment, and store a panel dataset and related information.
Sort data by individual and time.
Drop degenerate features.
Label subsets for prediction, validation, and testing.
Compute survival duration and if departure is observed.
Store a subset of the raw input data from the final period.
Map categorical features to unsigned integers.
Scale numeric features.
-
build_reserved_cols
()¶ Add data split and outcome-related columns to the data.
-
check_panel_consistency
() → None¶ Ensure observations have unique individual-period combinations.
-
flag_validation_individuals
() → pandas.core.series.Series¶ Flag observations from a random share of individuals.
-
process_all_columns
(parallelize: bool = True) → None¶ Split, process, and merge all data columns.
-
process_single_column
(colname: str) → Union[None, pandas.core.series.Series]¶ Apply data cleaning functions to an individual data column.
-
sort_panel_data
() → pandas.core.frame.DataFrame¶ Sort the data by individual, then by period.
-
-
fife.processors.
check_column_consistency
(data: pandas.core.frame.DataFrame, colname: str) → None¶ Assert column exists, has no missing values, and is not constant.
-
fife.processors.
deduplicate_column_values
(data: pandas.core.frame.DataFrame, reserved_cols: List[str] = [], max_obs: int = 65536) → pandas.core.frame.DataFrame¶ Delete columns with the same values as a later column.
- Parameters
df – A DataFrame.
reserved_cols – Names of columns to exclude from deduplication.
max_obs – The number of observations to sample if df has more than that many observations.
- Returns
A DataFrame containing only the last instance of each unique column.
-
fife.processors.
factorize_categorical_feature
(col: pandas.core.series.Series, excluded_obs: Union[None, pandas.core.series.Series] = None) → Tuple[pandas.core.series.Series, dict]¶ Map categorical values to unsigned integers.
- Parameters
col – A Series.
excluded_obs – True for observations to exclude from creating the map.
- Returns
A pandas Series of unsigned integers.
A dictionary mapping each unique value among the included observations and np.nan to an integer and any other value not in the Series to 0.
-
fife.processors.
normalize_numeric_feature
(col: pandas.core.series.Series, excluded_obs: Union[None, pandas.core.series.Series] = None) → Tuple[pandas.core.series.Series, List[float]]¶ Scale numeric values to their empirical range.
- Parameters
col – A numeric Series.
excluded_obs – True for observations to exclude when computing min/max.
- Returns
A Series of floats.
A list containing the minimum and maximum values among the included observations.
-
fife.processors.
process_categorical_feature
(col: pandas.core.series.Series, cat_map: dict) → pandas.core.frame.DataFrame¶ Map categorical values to unsigned integers.
- Parameters
col – A pandas Series.
cat_map – A dict containing a key for each unique value in col.
- Returns
A pandas Series of unsigned integers.
-
fife.processors.
process_numeric_feature
(col: pandas.core.series.Series, minimum: float, maximum: float) → pandas.core.series.Series¶ Scale numeric values to a given range.
- Parameters
col – A numeric Series.
minimum – The value to map to -0.5.
maximum – The value to map to 0.5.
- Returns
A Series of floats.
-
fife.processors.
produce_categorical_map
(col: pandas.core.series.Series) → dict¶ Return a map from categorical values to unsigned integers.
Zero is reserved for values not seen in data used to create map.
- Parameters
col – A Series.
- Returns
A dictionary mapping each unique value in the Series and np.nan to an integer and any other value not in the Series to zero.
fife.tf_modelers module¶
FIFE modelers based on TensorFlow, which trains neural networks.
-
class
fife.tf_modelers.
CumulativeProduct
(*args: Any, **kwargs: Any)¶ Bases:
tensorflow.keras.layers.Layer
Transform an array into its row-wise cumulative products.
-
call
(inputs, **kwargs)¶ Multiply each value with all previous values in the same row.
-
-
fife.tf_modelers.
FeedForwardNeuralNetworkModeler
¶
-
class
fife.tf_modelers.
FeedforwardNeuralNetworkModeler
(**kwargs)¶ Bases:
fife.tf_modelers.TFSurvivalModeler
Deprecated alias for TFSurvivalModeler
-
class
fife.tf_modelers.
ProportionalHazardsEncodingModeler
(**kwargs)¶ Bases:
fife.tf_modelers.TFSurvivalModeler
Train a proportional hazards model with binary-encoded categorical features using Keras.
-
build_model
(n_intervals: Union[None, int] = None) → None¶ Train and store a neural network with a proportional hazards restriction.
-
construct_network
() → tensorflow.keras.Model¶ Set all features to feed directly into a single node.
The single node feeds into a proportional hazards layer.
- Returns
An untrained Keras model.
-
format_input_data
(data: Union[None, pandas.core.frame.DataFrame] = None, subset: Union[None, pandas.core.series.Series] = None) → List[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]]¶ Keep only the features and observations desired for model input.
-
hyperoptimize
(**kwargs) → dict¶ Returns None for ProportionalHazardsEncodingModeler, which does not have hyperparameters
-
save_model
(file_name: str = 'PH_Encoded_Model', path: str = '') → None¶ Save the TensorFlow model to disk.
-
-
class
fife.tf_modelers.
ProportionalHazardsModeler
(**kwargs)¶ Bases:
fife.tf_modelers.TFSurvivalModeler
Train a proportional hazards model with embeddings for categorical features using Keras.
-
construct_embedding_network
() → tensorflow.keras.Model¶ Set embedding layers that feed into a single node.
Each categorical feature passes through its own embedding layer, which maps whole numbers to the real line.
The embedded values and numeric features feed directly into a single node. The single node feeds into a proportional hazards layer.
- Returns
An untrained Keras model.
-
hyperoptimize
(**kwargs) → dict¶ Returns None for ProportionalHazardsModeler, which does not have hyperparameters
-
save_model
(file_name: str = 'PH_Model', path: str = '') → None¶ Save the TensorFlow model to disk.
-
-
class
fife.tf_modelers.
TFModeler
(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None, duration_col: str = '_duration', event_col: str = '_event_observed', predict_col: str = '_predict_obs', test_col: str = '_test', validation_col: str = '_validation', period_col: str = '_period', max_lead_col: str = '_maximum_lead', spell_col: str = '_spell', weight_col: Union[None, str] = None, allow_gaps: bool = False)¶ Bases:
fife.base_modelers.Modeler
Train a neural network model using Keras with TensorFlow backend.
-
config
¶ User-provided configuration parameters.
- Type
dict
-
data
¶ User-provided panel data.
- Type
pd.core.frame.DataFrame
-
categorical_features
¶ Column names of categorical features.
- Type
list
-
reserved_cols
¶ Column names of non-features.
- Type
list
-
numeric_features
¶ Column names of numeric features.
- Type
list
-
n_intervals
¶ The largest number of one-period intervals any individual is observed to survive.
- Type
int
-
model
¶ A trained neural network.
- Type
keras.Model
-
build_model
(n_intervals: Union[None, int] = None, params: dict = None) → None¶ Train and store a neural network, freezing embeddings midway.
-
compute_model_uncertainty
(subset: Union[None, pandas.core.series.Series] = None, n_iterations: int = 200) → numpy.ndarray¶ Predict with dropout as proposed by Gal and Ghahramani (2015).
See https://arxiv.org/abs/1506.02142.
- Parameters
subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.
n_iterations – Number of random dropout specifications to obtain predictions from.
- Returns
A numpy array of predictions by observation, lead length, and iteration.
-
compute_shap_values
(subset: Union[None, pandas.core.series.Series] = None) → dict¶ Compute SHAP values by lead length, observation, and feature.
SHAP values for networks with embedding layers are not supported as of 9 Jun 2020.
Compute SHAP values for restricted mean survival time in addition to each lead length.
- Parameters
subset – A Boolean Series that is True for observations for which the shap values will be computed. If None, default to all observations.
- Returns
A dictionary of numpy arrays, each of which contains SHAP values for the outcome given by its key.
-
construct_embedding_network
(dense_layers: int = 2, nodes_per_dense_layer: int = 512, dropout_share: float = 0.25, embed_exponent: float = 0, embed_L2_reg: float = 2.0) → tensorflow.keras.Model¶ Set embedding layers followed by alternating dropout/dense layers.
Each categorical feature passes through its own embedding layer, which maps whole numbers to the real line.
Each dense layer has a sigmoid activation function. The output layer has one node for each lead length.
- Parameters
dense_layers – The number of dense layers in the neural network.
nodes_per_dense_layer – The number of nodes per dense layer in the neural network.
dropout_share – The probability of a densely connected node of the neural network being set to zero weight during training.
embed_exponent – The ratio of the natural logarithm of the number of embedded values to the natural logarithm of the number of unique categories for each categorical feature.
embed_L2_reg – The L2 regularization coefficient for each embedding layer.
- Returns
An untrained Keras model.
-
format_input_data
(data: Union[None, pandas.core.frame.DataFrame] = None, subset: Union[None, pandas.core.series.Series] = None) → List[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]]¶ List each categorical feature for input to own embedding layer.
-
hyperoptimize
(n_trials: int = 64, subset: Union[None, pandas.core.series.Series] = None, max_epochs: int = 128) → dict¶ Search for hyperparameters with greater out-of-sample performance.
- Parameters
n_trials – The number of hyperparameter sets to evaluate for each time horizon. Return None if non-positive.
subset – A Boolean Series that is True for observations on which to train and validate. If None, default to all observations not flagged by self.test_col or self.predict_col.
- Returns
A dictionary containing the best-performing parameters.
-
predict
(subset: Union[None, pandas.core.series.Series] = None, custom_data: Union[None, pandas.core.frame.DataFrame] = None, cumulative: bool = True) → numpy.ndarray¶ Use trained Keras model to predict observation survival rates.
- Parameters
subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.
custom_data – A DataFrame in the same format as the input data for which predictions will be produced. If None, default to the assigned input data.
cumulative – If True, produce cumulative survival probabilies. If False, produce marginal survival probabilities (i.e., one minus the hazard rate).
- Returns
A numpy array of survival probabilities by observation and lead length.
-
save_model
(file_name: str = 'FFNN_Model', path: str = '') → None¶ Save the TensorFlow model to disk.
-
train
(params: Union[None, dict] = None, subset: Union[None, pandas.core.series.Series] = None, validation_early_stopping: bool = True) → tensorflow.keras.Model¶ Train with survival loss function of Gensheimer, Narasimhan (2019).
See https://peerj.com/articles/6257/.
Use the AMSGrad variant of the Adam optimizer. See Reddi, Kale, and Kumar (2018) at https://openreview.net/forum?id=ryQu7f-RZ.
Train until validation set performance does not improve for the given number of epochs or the given maximum number of epochs.
- Returns
A trained Keras model.
-
transform_features
() → pandas.DataFrame¶ Transform features to suit model training.
-
-
class
fife.tf_modelers.
TFSurvivalModeler
(**kwargs)¶ Bases:
fife.tf_modelers.TFModeler
,fife.base_modelers.SurvivalModeler
Use TensorFlow to forecast probabilities of being observed in future periods.
-
fife.tf_modelers.
binary_encode_feature
(col: pandas.core.series.Series) → pandas.core.frame.DataFrame¶ Map whole numbers to bits.
- Parameters
col – a pandas Series of whole numbers.
- Returns
A pandas DataFrame of Boolean values, each combination of values unique to each unique value in the given Series.
-
fife.tf_modelers.
freeze_embedding_layers
(model: tensorflow.keras.Model) → tensorflow.keras.Model¶ Prevent embedding layers of the given neural network from training.
-
fife.tf_modelers.
make_predictions_cumulative
(model: tensorflow.keras.Model) → tensorflow.keras.Model¶ Append a cumulative product layer to a neural network.
-
fife.tf_modelers.
make_predictions_marginal
(model: tensorflow.keras.Model) → tensorflow.keras.Model¶ Remove final layer of a neural network if a cumulative product layer.
-
fife.tf_modelers.
split_categorical_features
(data: pandas.core.frame.DataFrame, categorical_features: List[str], numeric_features: List[str]) → List[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]]¶ Split each categorical column in a DataFrame into its own list item.
Necessary to specify inputs to a neural network with embeddings.
- Parameters
df – A DataFrame.
categorical_features – A list of column names to split out.
numeric_features – A list of column names to keep in a single DataFrame.
- Returns
A list where each element but the last is a Series and the last element is a DataFrame of the given numeric features.
fife.utils module¶
I/O, logging, calculation, and plotting functions for FIFE.
-
class
fife.utils.
FIFEArgParser
¶ Bases:
argparse.ArgumentParser
Argument parser for the FIFE command-line interface.
-
fife.utils.
compute_aggregation_uncertainty
(individual_probabilities: pandas.core.frame.DataFrame, percent_confidence: float = 0.95) → pandas.core.frame.DataFrame¶ Statistically bound number of events given each of their probabilities.
- Parameters
individual_probabilities – A DataFrame of probabilities where each row represents an individual and each column represents an event.
percent_confidence – The percent confidence of the two-sided intervals defined by the computed bounds.
- Raises
ValueError – If percent_confidence is outside of the interval (0, 1).
- Returns
A DataFrame containing, for each column in individual_probabilities, the expected number of events and interval bounds on the number of events.
-
fife.utils.
create_example_data
(n_persons: int = 8192, n_periods: int = 12) → pandas.core.frame.DataFrame¶ Fabricate an unbalanced panel dataset suitable as FIFE input.
-
fife.utils.
ensure_folder_existence
(path: str = '') → None¶ Create a directory if it doesn’t already exist.
-
fife.utils.
import_data_file
(file_path: str = 'Input_Data') → pandas.core.frame.DataFrame¶ Return the data stored in given file in given folder.
-
fife.utils.
make_results_reproducible
(seed: int = 9999) → None¶ Ensure executing from a fresh state produces identical results.
-
fife.utils.
plot_binary_prediction_errors
(errors: dict, width: float = 8, height: float = 1, alpha: float = 0.00390625, color: str = 'black', center_tick_color: str = 'green', center_tick_height: float = 0.125, path: str = '') → None¶ Make a rug plot of binary prediction errors.
- Parameters
errors – A dictionary of numpy arrays, each of which contains error values for the outcome given by its key.
width – Width of the rug plot.
height – Height of the rug plot.
alpha – The opacity of plotted ticks, from 2e-8 (nearly transparent) to 1 (opaque).
color – The color of plotted ticks.
center_tick_color – The color of the ticks marking the center of the plot.
center_tick_height – The height of the ticks marking the center of the plot.
path – The path preceding the Output folder in which the plots will be saved.
-
fife.utils.
plot_shap_values
(shap_values: dict, raw_data: pandas.core.frame.DataFrame, processed_data: Union[None, pandas.core.frame.DataFrame] = None, no_summary_col: str = typing.Union[NoneType, str], alpha: float = 0.5, path: str = '') → None¶ Make plots of SHAP values.
SHAP values quantify feature contributions to predictions.
- Parameters
shap_values – A dictionary of numpy arrays, each of which contains SHAP values for the outcome given by its key.
raw_data – Feature values prior to processing into model input.
processed_data – Feature values used as model input.
no_summary_col – The name of a column to never use for summary plots.
alpha – The opacity of plotted points, from 2e-8 (nearly transparent) to 1 (opaque).
path – The path preceding the Output folder in which the plots will be saved.
-
fife.utils.
print_config
(config: dict) → None¶ Neatly print given dictionary of config parameters.
-
fife.utils.
print_copyright
() → None¶
-
fife.utils.
redirect_output_to_log
(path: str = '') → None¶ Send future output to a text file instead of console.
-
fife.utils.
save_intermediate_data
(data: pandas.core.frame.DataFrame, file_name: str, file_format: str = 'pickle', path: str = '') → None¶ Save given DataFrame in Intermediate folder in given format.
-
fife.utils.
save_maps
(obj: Union[pandas.core.series.Series, dict], file_name: str, path: str = '') → None¶ Save a map from values to other values to Intermediate folder.
-
fife.utils.
save_output_table
(data: pandas.core.frame.DataFrame, file_name: str, index: bool = True, path: str = '') → None¶ Save given DataFrame in the Output folder as a csv file.
-
fife.utils.
save_plot
(file_name: str, path: str = '') → None¶ Save the most recently plotted plot in high resolution.