Modules

fife.base_modelers module

fife.lgb_modelers module

FIFE modelers based on LightGBM, which trains gradient-boosted trees.

class fife.lgb_modelers.GradientBoostedTreesModeler(**kwargs)

Bases: fife.lgb_modelers.LGBSurvivalModeler

Deprecated alias for LGBSurvivalModeler

class fife.lgb_modelers.LGBExitModeler(exit_col, **kwargs)

Bases: fife.lgb_modelers.LGBModeler, fife.base_modelers.ExitModeler

Use LightGBM to forecast the circumstance of exit conditional on exit.

class fife.lgb_modelers.LGBModeler(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None, duration_col: str = '_duration', event_col: str = '_event_observed', predict_col: str = '_predict_obs', test_col: str = '_test', validation_col: str = '_validation', period_col: str = '_period', max_lead_col: str = '_maximum_lead', spell_col: str = '_spell', weight_col: Union[None, str] = None, allow_gaps: bool = False)

Bases: fife.base_modelers.Modeler

Train a gradient-boosted tree model for each lead length using LightGBM.

config

User-provided configuration parameters.

Type

dict

data

User-provided panel data.

Type

pd.core.frame.DataFrame

categorical_features

Column names of categorical features.

Type

list

duration_col

Name of the column representing the number of future periods observed for the given individual.

Type

str

event_col

Name of the column indicating whether the individual is observed to exit the dataset.

Type

str

predict_col

Name of the column indicating whether the observation will be used for prediction after training.

Type

str

test_col

Name of the column indicating whether the observation will be used for testing model performance after training.

Type

str

validation_col

Name of the column indicating whether the observation will be used for evaluating model performance during training.

Type

str

period_col

Name of the column representing the number of periods since the earliest period in the data.

Type

str

max_lead_col

Name of the column representing the number of observable future periods.

Type

str

spell_col

Name of the column representing the number of previous spells of consecutive observations of the same individual.

Type

str

weight_col

Name of the column representing observation weights.

Type

str

reserved_cols

Column names of non-features.

Type

list

numeric_features

Column names of numeric features.

Type

list

n_intervals

The largest number of periods ahead to forecast.

Type

int

model

A trained LightGBM model (lgb.basic.Booster) for each lead length.

Type

list

objective

The LightGBM model objective appropriate for the outcome type, which is “binary” for binary classification.

Type

str

num_class

The num_class LightGBM parameter, which is 1 for binary classification.

Type

int

build_model(n_intervals: Union[None, int] = None, params: dict = None, parallelize: bool = True) → None

Train and store a sequence of gradient-boosted tree models.

compute_shap_values(subset: Union[None, pandas.core.series.Series] = None) → dict

Compute SHAP values by lead length, observation, and feature.

hyperoptimize(n_trials: int = 64, rolling_validation: bool = True, subset: Union[None, pandas.core.series.Series] = None) → dict

Search for hyperparameters with greater out-of-sample performance.

Parameters
  • n_trials – The number of hyperparameter sets to evaluate for each time horizon. Return None if non-positive.

  • rolling_validation – Whether or not to evaluate performance on the most recent possible periods instead of the validation set labeled by self.validation_col. Ignored for a given time horizon if there is only one possible period for training and evaluation.

  • subset – A Boolean Series that is True for observations on which to train and validate. If None, default to all observations not flagged by self.test_col or self.predict_col.

Returns

A dictionary containing the best-performing parameter dictionary for each time horizon.

predict(subset: Union[None, pandas.core.series.Series] = None, cumulative: bool = True) → numpy.ndarray

Use trained LightGBM models to predict the outcome for each observation and time horizon.

Parameters
  • subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.

  • cumulative – If True, produce cumulative survival probabilies. If False, produce marginal survival probabilities (i.e., one minus the hazard rate).

Returns

A numpy array of predictions by observation and lead length.

save_model(file_name: str = 'GBT_Model', path: str = '') → None

Save the horizon-specific LightGBM models that comprise the model to disk.

train(params: Union[None, dict] = None, subset: Union[None, pandas.core.series.Series] = None, validation_early_stopping: bool = True, parallelize: bool = True) → List[lightgbm.basic.Booster]

Train a LightGBM model for each lead length.

train_single_model(time_horizon: int, params: Union[None, dict] = None, subset: Union[None, pandas.core.series.Series] = None, validation_early_stopping: bool = True) → lightgbm.basic.Booster

Train a LightGBM model for a single lead length.

transform_features() → pandas.DataFrame

Transform features to suit model training.

class fife.lgb_modelers.LGBStateModeler(state_col, **kwargs)

Bases: fife.lgb_modelers.LGBModeler, fife.base_modelers.StateModeler

Use LightGBM to forecast the future value of a feature conditional on survival.

class fife.lgb_modelers.LGBSurvivalModeler(**kwargs)

Bases: fife.lgb_modelers.LGBModeler, fife.base_modelers.SurvivalModeler

Use LightGBM to forecast probabilities of being observed in future periods.

fife.nnet_survival module

FIFE uses the nnet_survival module of Gensheimer, M.F., and Narasimhan, B., “A scalable discrete-time survival model for neural networks,” PeerJ 7 (2019): e6257. The nnet_survival version packaged with FIFE is GitHub commit d5a8f26 on Nov 18, 2018 posted to https://github.com/MGensheimer/nnet-survival/blob/master/nnet_survival.py. nnet_survival is licensed under the MIT License. The FIFE development team modified lines 12 and 13 of nnet_survival for compatibility with TensorFlow 2.0.

class fife.nnet_survival.PropHazards(output_dim, **kwargs)

Bases: tensorflow.keras.layers.Layer

build(input_shape)
call(x)
compute_output_shape(input_shape)
get_config()
fife.nnet_survival.make_surv_array(t, f, breaks)

Transforms censored survival data into vector format that can be used in Keras.

Arguments

t: Array of failure/censoring times. f: Censoring indicator. 1 if failed, 0 if censored. breaks: Locations of breaks between time intervals for discrete-time survival model (always includes 0)

Returns

Two-dimensional array of survival data, dimensions are number of individuals X number of time intervals*2

fife.nnet_survival.nnet_pred_surv(y_pred, breaks, fu_time)
fife.nnet_survival.surv_likelihood(n_intervals)

Create custom Keras loss function for neural network survival model.

Arguments

n_intervals: the number of survival time intervals

Returns

Custom loss function that can be used with Keras

fife.nnet_survival.surv_likelihood_rnn(n_intervals)

Create custom Keras loss function for neural network survival model. Used for recurrent neural networks with time-distributed output. This function is very similar to surv_likelihood but deals with the extra dimension of y_true and y_pred that exists because of the time-distributed output.

fife.pd_modelers module

FIFE modeler based on Pandas, which tabulates interacted fixed effects.

class fife.pd_modelers.IFEExitModeler(exit_col, **kwargs)

Bases: fife.pd_modelers.IFEModeler, fife.base_modelers.ExitModeler

Forecast the circumstance of exit conditional on exit using the mean of observations with the same values.

class fife.pd_modelers.IFEModeler(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None, duration_col: str = '_duration', event_col: str = '_event_observed', predict_col: str = '_predict_obs', test_col: str = '_test', validation_col: str = '_validation', period_col: str = '_period', max_lead_col: str = '_maximum_lead', spell_col: str = '_spell', weight_col: Union[None, str] = None, allow_gaps: bool = False)

Bases: fife.base_modelers.Modeler

Predict with mean of training observations with same values.

config

User-provided configuration parameters.

Type

dict

data

User-provided panel data.

Type

pd.core.frame.DataFrame

categorical_features

Column names of categorical features.

Type

list

reserved_cols

Column names of non-features.

Type

list

numeric_features

Column names of numeric features.

Type

list

n_intervals

The largest number of one-period intervals any individual is observed to survive.

Type

int

model

Survival rates for each combination of categorical values in the training data.

Type

pd.core.frame.DataFrame

hyperoptimize(**kwargs) → dict

Returns None for InteractedFixedEffectsModeler, which does not have hyperparameters

predict(subset: Union[None, pandas.core.series.Series] = None, cumulative: bool = True) → numpy.ndarray

Map observations to outcome means from their categorical values.

Map observations with a combination of categorical values not seen in the training data to the mean of the outcome in the training set.

Parameters
  • subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.

  • cumulative – If True, will produce cumulative survival probabilities. If False, will produce marginal survival probabilities (i.e., one minus the hazard rate).

Returns

A numpy array of outcome means by observation and lead length.

save_model(file_name: str = 'IFE_Model', path: str = '') → None

Save the pandas DataFrame model to disk.

train() → pandas.core.frame.DataFrame

Compute the mean of the outcome for each combination of categorical values.

class fife.pd_modelers.IFEStateModeler(state_col, **kwargs)

Bases: fife.pd_modelers.IFEModeler, fife.base_modelers.StateModeler

Forecast the future value of a feature conditional on survival using the mean of observations with the same values.

class fife.pd_modelers.IFESurvivalModeler(**kwargs)

Bases: fife.pd_modelers.IFEModeler, fife.base_modelers.SurvivalModeler

Predict with survival rate of training observations with same values.

class fife.pd_modelers.InteractedFixedEffectsModeler(**kwargs)

Bases: fife.pd_modelers.IFESurvivalModeler

Deprecated alias for IFESurvivalModeler

fife.processors module

Data processing functions and classes for FIFE.

class fife.processors.DataProcessor(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None)

Bases: object

Prepare data by identifying features as degenerate or categorical.

is_categorical(col: str) → bool

Determine if the given feature should be processed as categorical, as opposed to numeric.

is_degenerate(col: str) → bool

Determine if a feature is constant or has too many missing values.

class fife.processors.PanelDataProcessor(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None)

Bases: fife.processors.DataProcessor

Ready panel data for modelling.

config

User-provided configuration parameters.

Type

dict

data

Processed panel data.

Type

pd.core.frame.DataFrame

raw_subset

An unprocessed sample from the final period of data. Useful for displaying meaningful values in SHAP plots.

Type

pd.core.frame.DataFrame

categorical_maps

Contains for each categorical feature a map from each unique value to a whole number.

Type

dict

numeric_ranges

Contains for each numeric feature the maximum and minimum value in the training set.

Type

pd.core.frame.DataFrame

build_processed_data(parallelize: bool = True) → None

Clean, augment, and store a panel dataset and related information.

  • Sort data by individual and time.

  • Drop degenerate features.

  • Label subsets for prediction, validation, and testing.

  • Compute survival duration and if departure is observed.

  • Store a subset of the raw input data from the final period.

  • Map categorical features to unsigned integers.

  • Scale numeric features.

build_reserved_cols()

Add data split and outcome-related columns to the data.

check_panel_consistency() → None

Ensure observations have unique individual-period combinations.

flag_validation_individuals() → pandas.core.series.Series

Flag observations from a random share of individuals.

process_all_columns(parallelize: bool = True) → None

Split, process, and merge all data columns.

process_single_column(colname: str) → Union[None, pandas.core.series.Series]

Apply data cleaning functions to an individual data column.

sort_panel_data() → pandas.core.frame.DataFrame

Sort the data by individual, then by period.

fife.processors.check_column_consistency(data: pandas.core.frame.DataFrame, colname: str) → None

Assert column exists, has no missing values, and is not constant.

fife.processors.deduplicate_column_values(data: pandas.core.frame.DataFrame, reserved_cols: List[str] = [], max_obs: int = 65536) → pandas.core.frame.DataFrame

Delete columns with the same values as a later column.

Parameters
  • df – A DataFrame.

  • reserved_cols – Names of columns to exclude from deduplication.

  • max_obs – The number of observations to sample if df has more than that many observations.

Returns

A DataFrame containing only the last instance of each unique column.

fife.processors.factorize_categorical_feature(col: pandas.core.series.Series, excluded_obs: Union[None, pandas.core.series.Series] = None) → Tuple[pandas.core.series.Series, dict]

Map categorical values to unsigned integers.

Parameters
  • col – A Series.

  • excluded_obs – True for observations to exclude from creating the map.

Returns

  • A pandas Series of unsigned integers.

  • A dictionary mapping each unique value among the included observations and np.nan to an integer and any other value not in the Series to 0.

fife.processors.normalize_numeric_feature(col: pandas.core.series.Series, excluded_obs: Union[None, pandas.core.series.Series] = None) → Tuple[pandas.core.series.Series, List[float]]

Scale numeric values to their empirical range.

Parameters
  • col – A numeric Series.

  • excluded_obs – True for observations to exclude when computing min/max.

Returns

  • A Series of floats.

  • A list containing the minimum and maximum values among the included observations.

fife.processors.process_categorical_feature(col: pandas.core.series.Series, cat_map: dict) → pandas.core.frame.DataFrame

Map categorical values to unsigned integers.

Parameters
  • col – A pandas Series.

  • cat_map – A dict containing a key for each unique value in col.

Returns

A pandas Series of unsigned integers.

fife.processors.process_numeric_feature(col: pandas.core.series.Series, minimum: float, maximum: float) → pandas.core.series.Series

Scale numeric values to a given range.

Parameters
  • col – A numeric Series.

  • minimum – The value to map to -0.5.

  • maximum – The value to map to 0.5.

Returns

A Series of floats.

fife.processors.produce_categorical_map(col: pandas.core.series.Series) → dict

Return a map from categorical values to unsigned integers.

Zero is reserved for values not seen in data used to create map.

Parameters

col – A Series.

Returns

A dictionary mapping each unique value in the Series and np.nan to an integer and any other value not in the Series to zero.

fife.tf_modelers module

FIFE modelers based on TensorFlow, which trains neural networks.

class fife.tf_modelers.CumulativeProduct(*args: Any, **kwargs: Any)

Bases: tensorflow.keras.layers.Layer

Transform an array into its row-wise cumulative products.

call(inputs, **kwargs)

Multiply each value with all previous values in the same row.

fife.tf_modelers.FeedForwardNeuralNetworkModeler

alias of fife.tf_modelers.FeedforwardNeuralNetworkModeler

class fife.tf_modelers.FeedforwardNeuralNetworkModeler(**kwargs)

Bases: fife.tf_modelers.TFSurvivalModeler

Deprecated alias for TFSurvivalModeler

class fife.tf_modelers.ProportionalHazardsEncodingModeler(**kwargs)

Bases: fife.tf_modelers.TFSurvivalModeler

Train a proportional hazards model with binary-encoded categorical features using Keras.

build_model(n_intervals: Union[None, int] = None) → None

Train and store a neural network with a proportional hazards restriction.

construct_network() → tensorflow.keras.Model

Set all features to feed directly into a single node.

The single node feeds into a proportional hazards layer.

Returns

An untrained Keras model.

format_input_data(data: Union[None, pandas.core.frame.DataFrame] = None, subset: Union[None, pandas.core.series.Series] = None) → List[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]]

Keep only the features and observations desired for model input.

hyperoptimize(**kwargs) → dict

Returns None for ProportionalHazardsEncodingModeler, which does not have hyperparameters

save_model(file_name: str = 'PH_Encoded_Model', path: str = '') → None

Save the TensorFlow model to disk.

class fife.tf_modelers.ProportionalHazardsModeler(**kwargs)

Bases: fife.tf_modelers.TFSurvivalModeler

Train a proportional hazards model with embeddings for categorical features using Keras.

construct_embedding_network() → tensorflow.keras.Model

Set embedding layers that feed into a single node.

Each categorical feature passes through its own embedding layer, which maps whole numbers to the real line.

The embedded values and numeric features feed directly into a single node. The single node feeds into a proportional hazards layer.

Returns

An untrained Keras model.

hyperoptimize(**kwargs) → dict

Returns None for ProportionalHazardsModeler, which does not have hyperparameters

save_model(file_name: str = 'PH_Model', path: str = '') → None

Save the TensorFlow model to disk.

class fife.tf_modelers.TFModeler(config: Union[None, dict] = {}, data: Union[None, pandas.core.frame.DataFrame] = None, duration_col: str = '_duration', event_col: str = '_event_observed', predict_col: str = '_predict_obs', test_col: str = '_test', validation_col: str = '_validation', period_col: str = '_period', max_lead_col: str = '_maximum_lead', spell_col: str = '_spell', weight_col: Union[None, str] = None, allow_gaps: bool = False)

Bases: fife.base_modelers.Modeler

Train a neural network model using Keras with TensorFlow backend.

config

User-provided configuration parameters.

Type

dict

data

User-provided panel data.

Type

pd.core.frame.DataFrame

categorical_features

Column names of categorical features.

Type

list

reserved_cols

Column names of non-features.

Type

list

numeric_features

Column names of numeric features.

Type

list

n_intervals

The largest number of one-period intervals any individual is observed to survive.

Type

int

model

A trained neural network.

Type

keras.Model

build_model(n_intervals: Union[None, int] = None, params: dict = None) → None

Train and store a neural network, freezing embeddings midway.

compute_model_uncertainty(subset: Union[None, pandas.core.series.Series] = None, n_iterations: int = 200) → numpy.ndarray

Predict with dropout as proposed by Gal and Ghahramani (2015).

See https://arxiv.org/abs/1506.02142.

Parameters
  • subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.

  • n_iterations – Number of random dropout specifications to obtain predictions from.

Returns

A numpy array of predictions by observation, lead length, and iteration.

compute_shap_values(subset: Union[None, pandas.core.series.Series] = None) → dict

Compute SHAP values by lead length, observation, and feature.

SHAP values for networks with embedding layers are not supported as of 9 Jun 2020.

Compute SHAP values for restricted mean survival time in addition to each lead length.

Parameters

subset – A Boolean Series that is True for observations for which the shap values will be computed. If None, default to all observations.

Returns

A dictionary of numpy arrays, each of which contains SHAP values for the outcome given by its key.

construct_embedding_network(dense_layers: int = 2, nodes_per_dense_layer: int = 512, dropout_share: float = 0.25, embed_exponent: float = 0, embed_L2_reg: float = 2.0) → tensorflow.keras.Model

Set embedding layers followed by alternating dropout/dense layers.

Each categorical feature passes through its own embedding layer, which maps whole numbers to the real line.

Each dense layer has a sigmoid activation function. The output layer has one node for each lead length.

Parameters
  • dense_layers – The number of dense layers in the neural network.

  • nodes_per_dense_layer – The number of nodes per dense layer in the neural network.

  • dropout_share – The probability of a densely connected node of the neural network being set to zero weight during training.

  • embed_exponent – The ratio of the natural logarithm of the number of embedded values to the natural logarithm of the number of unique categories for each categorical feature.

  • embed_L2_reg – The L2 regularization coefficient for each embedding layer.

Returns

An untrained Keras model.

format_input_data(data: Union[None, pandas.core.frame.DataFrame] = None, subset: Union[None, pandas.core.series.Series] = None) → List[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]]

List each categorical feature for input to own embedding layer.

hyperoptimize(n_trials: int = 64, subset: Union[None, pandas.core.series.Series] = None, max_epochs: int = 128) → dict

Search for hyperparameters with greater out-of-sample performance.

Parameters
  • n_trials – The number of hyperparameter sets to evaluate for each time horizon. Return None if non-positive.

  • subset – A Boolean Series that is True for observations on which to train and validate. If None, default to all observations not flagged by self.test_col or self.predict_col.

Returns

A dictionary containing the best-performing parameters.

predict(subset: Union[None, pandas.core.series.Series] = None, custom_data: Union[None, pandas.core.frame.DataFrame] = None, cumulative: bool = True) → numpy.ndarray

Use trained Keras model to predict observation survival rates.

Parameters
  • subset – A Boolean Series that is True for observations for which predictions will be produced. If None, default to all observations.

  • custom_data – A DataFrame in the same format as the input data for which predictions will be produced. If None, default to the assigned input data.

  • cumulative – If True, produce cumulative survival probabilies. If False, produce marginal survival probabilities (i.e., one minus the hazard rate).

Returns

A numpy array of survival probabilities by observation and lead length.

save_model(file_name: str = 'FFNN_Model', path: str = '') → None

Save the TensorFlow model to disk.

train(params: Union[None, dict] = None, subset: Union[None, pandas.core.series.Series] = None, validation_early_stopping: bool = True) → tensorflow.keras.Model

Train with survival loss function of Gensheimer, Narasimhan (2019).

See https://peerj.com/articles/6257/.

Use the AMSGrad variant of the Adam optimizer. See Reddi, Kale, and Kumar (2018) at https://openreview.net/forum?id=ryQu7f-RZ.

Train until validation set performance does not improve for the given number of epochs or the given maximum number of epochs.

Returns

A trained Keras model.

transform_features() → pandas.DataFrame

Transform features to suit model training.

class fife.tf_modelers.TFSurvivalModeler(**kwargs)

Bases: fife.tf_modelers.TFModeler, fife.base_modelers.SurvivalModeler

Use TensorFlow to forecast probabilities of being observed in future periods.

fife.tf_modelers.binary_encode_feature(col: pandas.core.series.Series) → pandas.core.frame.DataFrame

Map whole numbers to bits.

Parameters

col – a pandas Series of whole numbers.

Returns

A pandas DataFrame of Boolean values, each combination of values unique to each unique value in the given Series.

fife.tf_modelers.freeze_embedding_layers(model: tensorflow.keras.Model) → tensorflow.keras.Model

Prevent embedding layers of the given neural network from training.

fife.tf_modelers.make_predictions_cumulative(model: tensorflow.keras.Model) → tensorflow.keras.Model

Append a cumulative product layer to a neural network.

fife.tf_modelers.make_predictions_marginal(model: tensorflow.keras.Model) → tensorflow.keras.Model

Remove final layer of a neural network if a cumulative product layer.

fife.tf_modelers.split_categorical_features(data: pandas.core.frame.DataFrame, categorical_features: List[str], numeric_features: List[str]) → List[Union[pandas.core.series.Series, pandas.core.frame.DataFrame]]

Split each categorical column in a DataFrame into its own list item.

Necessary to specify inputs to a neural network with embeddings.

Parameters
  • df – A DataFrame.

  • categorical_features – A list of column names to split out.

  • numeric_features – A list of column names to keep in a single DataFrame.

Returns

A list where each element but the last is a Series and the last element is a DataFrame of the given numeric features.

fife.utils module

I/O, logging, calculation, and plotting functions for FIFE.

class fife.utils.FIFEArgParser

Bases: argparse.ArgumentParser

Argument parser for the FIFE command-line interface.

fife.utils.compute_aggregation_uncertainty(individual_probabilities: pandas.core.frame.DataFrame, percent_confidence: float = 0.95) → pandas.core.frame.DataFrame

Statistically bound number of events given each of their probabilities.

Parameters
  • individual_probabilities – A DataFrame of probabilities where each row represents an individual and each column represents an event.

  • percent_confidence – The percent confidence of the two-sided intervals defined by the computed bounds.

Raises

ValueError – If percent_confidence is outside of the interval (0, 1).

Returns

A DataFrame containing, for each column in individual_probabilities, the expected number of events and interval bounds on the number of events.

fife.utils.create_example_data(n_persons: int = 8192, n_periods: int = 12) → pandas.core.frame.DataFrame

Fabricate an unbalanced panel dataset suitable as FIFE input.

fife.utils.ensure_folder_existence(path: str = '') → None

Create a directory if it doesn’t already exist.

fife.utils.import_data_file(file_path: str = 'Input_Data') → pandas.core.frame.DataFrame

Return the data stored in given file in given folder.

fife.utils.make_results_reproducible(seed: int = 9999) → None

Ensure executing from a fresh state produces identical results.

fife.utils.plot_binary_prediction_errors(errors: dict, width: float = 8, height: float = 1, alpha: float = 0.00390625, color: str = 'black', center_tick_color: str = 'green', center_tick_height: float = 0.125, path: str = '') → None

Make a rug plot of binary prediction errors.

Parameters
  • errors – A dictionary of numpy arrays, each of which contains error values for the outcome given by its key.

  • width – Width of the rug plot.

  • height – Height of the rug plot.

  • alpha – The opacity of plotted ticks, from 2e-8 (nearly transparent) to 1 (opaque).

  • color – The color of plotted ticks.

  • center_tick_color – The color of the ticks marking the center of the plot.

  • center_tick_height – The height of the ticks marking the center of the plot.

  • path – The path preceding the Output folder in which the plots will be saved.

fife.utils.plot_shap_values(shap_values: dict, raw_data: pandas.core.frame.DataFrame, processed_data: Union[None, pandas.core.frame.DataFrame] = None, no_summary_col: str = typing.Union[NoneType, str], alpha: float = 0.5, path: str = '') → None

Make plots of SHAP values.

SHAP values quantify feature contributions to predictions.

Parameters
  • shap_values – A dictionary of numpy arrays, each of which contains SHAP values for the outcome given by its key.

  • raw_data – Feature values prior to processing into model input.

  • processed_data – Feature values used as model input.

  • no_summary_col – The name of a column to never use for summary plots.

  • alpha – The opacity of plotted points, from 2e-8 (nearly transparent) to 1 (opaque).

  • path – The path preceding the Output folder in which the plots will be saved.

fife.utils.print_config(config: dict) → None

Neatly print given dictionary of config parameters.

fife.utils.redirect_output_to_log(path: str = '') → None

Send future output to a text file instead of console.

fife.utils.save_intermediate_data(data: pandas.core.frame.DataFrame, file_name: str, file_format: str = 'pickle', path: str = '') → None

Save given DataFrame in Intermediate folder in given format.

fife.utils.save_maps(obj: Union[pandas.core.series.Series, dict], file_name: str, path: str = '') → None

Save a map from values to other values to Intermediate folder.

fife.utils.save_output_table(data: pandas.core.frame.DataFrame, file_name: str, index: bool = True, path: str = '') → None

Save given DataFrame in the Output folder as a csv file.

fife.utils.save_plot(file_name: str, path: str = '') → None

Save the most recently plotted plot in high resolution.