
A list of frequently Asked Keras Questions.

General questions
How can i train a keras model on multiple gpus (on a single machine), how can i distribute training across multiple machines, how can i train a keras model on tpu, where is the keras configuration file stored, how to do hyperparameter tuning with keras, how can i obtain reproducible results using keras during development, what are my options for saving models, how can i install hdf5 or h5py to save my models, how should i cite keras, training-related questions, what do "sample", "batch", and "epoch" mean, why is my training loss much higher than my testing loss, how can i use keras with datasets that don't fit in memory, how can i ensure my training run can recover from program interruptions, how can i interrupt training when the validation loss isn't decreasing anymore, how can i freeze layers and do fine-tuning, what's the difference between the training argument in call() and the trainable attribute, in fit() , how is the validation split computed, in fit() , is the data shuffled during training, what's the recommended way to monitor my metrics when training with fit() , what if i need to customize what fit() does, how can i train models in mixed precision, what's the difference between model methods predict() and __call__() , modeling-related questions, how can i obtain the output of an intermediate layer (feature extraction), how can i use pre-trained models in keras, how can i use stateful rnns.
There are two ways to run a single model on multiple GPUs: data parallelism and device parallelism . In most cases, what you need is most likely data parallelism.
1) Data parallelism
Data parallelism consists in replicating the target model once on each device, and using each replica to process a different fraction of the input data.
The best way to do data parallelism with Keras models is to use the tf.distribute API. Make sure to read our guide about using [ tf.distribute ](https://www.tensorflow.org/api_docs/python/tf/distribute) with Keras .
The gist of it is the following:
a) instantiate a "distribution strategy" object, e.g. MirroredStrategy (which replicates your model on each available device and keeps the state of each model in sync):
b) Create your model and compile it under the strategy's scope:
Note that it's important that all state variable creation should happen under the scope. So in case you create any additional variables, do that under the scope.
c) Call fit() with a tf.data.Dataset object as input. Distribution is broadly compatible with all callbacks, including custom callbacks. Note that this call does not need to be under the strategy scope, since it doesn't create new variables.
2) Model parallelism
Model parallelism consists in running different parts of a same model on different devices. It works best for models that have a parallel architecture, e.g. a model with two branches.
This can be achieved by using TensorFlow device scopes. Here is a quick example:
TensorFlow enables you to write code that is almost entirely agnostic to how you will distribute it: any code that can run locally can be distributed to multiple workers and accelerators by only adding to it a distribution strategy ( tf.distribute.Strategy ) corresponding to your hardware of choice, without any other code changes.
This also applies to any Keras model: just add a tf.distribute distribution strategy scope enclosing the model building and compiling code, and the training will be distributed according to the tf.distribute distribution strategy.
For distributed training across multiple machines (as opposed to training that only leverages multiple devices on a single machine), there are two distribution strategies you could use: MultiWorkerMirroredStrategy and ParameterServerStrategy :
- tf.distribute.MultiWorkerMirroredStrategy implements a synchronous CPU/GPU multi-worker solution to work with Keras-style model building and training loop, using synchronous reduction of gradients across the replicas.
- tf.distribute.experimental.ParameterServerStrategy implements an asynchronous CPU/GPU multi-worker solution, where the parameters are stored on parameter servers, and workers update the gradients to parameter servers asynchronously.
Distributed training is somewhat more involved than single-machine multi-device training. With ParameterServerStrategy , you will need to launch a remote cluster of machines consisting "worker" and "ps", each running a tf.distribute.Server , then run your python program on a "chief" machine that holds a TF_CONFIG environment variable that specifies how to communicate with the other machines in the cluster. With MultiWorkerMirroredStrategy , you will run the same program on each of the chief and workers, again with a TF_CONFIG environment variable that specifies how to communicate with the cluster. From there, the workflow is similar to using single-machine training, with the main difference being that you will use ParameterServerStrategy or MultiWorkerMirroredStrategy as your distribution strategy.
Importantly, you should:
- Make sure your dataset is so configured that all workers in the cluster are able to efficiently pull data from it (e.g. if your cluster is running on Google Cloud, it's a good idea to host your data on Google Cloud Storage).
- Make sure your training is fault-tolerant (e.g. by configuring a keras.callbacks.BackupAndRestore callback).
Below, we provide a couple of code snippets that cover the basic workflow. For more information about CPU/GPU multi-worker training, see Multi-GPU and distributed training ; for TPU training, see How can I train a Keras model on TPU? .
With ParameterServerStrategy :
With MultiWorkerMirroredStrategy :
TPUs are a fast & efficient hardware accelerator for deep learning that is publicly available on Google Cloud. You can use TPUs via Colab, AI Platform (ML Engine), and Deep Learning VMs (provided the TPU_NAME environment variable is set on the VM).
Make sure to read the TPU usage guide first. Here's a quick summary:
After connecting to a TPU runtime (e.g. by selecting the TPU runtime in Colab), you will need to detect your TPU using a TPUClusterResolver , which automatically detects a linked TPU on all supported platforms:
After the initial setup, the workflow is similar to using single-machine multi-GPU training, with the main difference being that you will use TPUStrategy as your distribution strategy.
- Make sure your dataset yields batches with a fixed static shape. A TPU graph can only process inputs with a constant shape.
- Make sure you are able to read your data fast enough to keep the TPU utilized. Using the TFRecord format to store your data may be a good idea.
- Consider running multiple steps of gradient descent per graph execution in order to keep the TPU utilized. You can do this via the experimental_steps_per_execution argument compile() . It will yield a significant speed up for small models.
The default directory where all Keras data is stored is:
$HOME/.keras/
For instance, for me, on a MacBook Pro, it's /Users/fchollet/.keras/ .
Note that Windows users should replace $HOME with %USERPROFILE% .
In case Keras cannot create the above directory (e.g. due to permission issues), /tmp/.keras/ is used as a backup.
The Keras configuration file is a JSON file stored at $HOME/.keras/keras.json . The default configuration file looks like this:
It contains the following fields:
- The image data format to be used as default by image processing layers and utilities (either channels_last or channels_first ).
- The epsilon numerical fuzz factor to be used to prevent division by zero in some operations.
- The default float data type.
- The default backend. This is legacy; nowadays there is only TensorFlow.
Likewise, cached dataset files, such as those downloaded with get_file() , are stored by default in $HOME/.keras/datasets/ , and cached model weights files from Keras Applications are stored by default in $HOME/.keras/models/ .
We recommend using KerasTuner .
During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, or merely a result of a new random seed.
First, you need to set the PYTHONHASHSEED environment variable to 0 before the program starts (not within the program itself). This is necessary in Python 3.2.3 onwards to have reproducible behavior for certain hash-based operations (e.g., the item order in a set or a dict, see Python's documentation or issue #2280 for further details). One way to set the environment variable is when starting python like this:
Moreover, when running on a GPU, some operations have non-deterministic outputs, in particular tf.reduce_sum() . This is due to the fact that GPUs run many operations in parallel, so the order of execution is not always guaranteed. Due to the limited precision of floats, even adding several numbers together may give slightly different results depending on the order in which you add them. You can try to avoid the non-deterministic operations, but some may be created automatically by TensorFlow to compute the gradients, so it is much simpler to just run the code on the CPU. For this, you can set the CUDA_VISIBLE_DEVICES environment variable to an empty string, for example:
The below snippet of code provides an example of how to obtain reproducible results:
Note that you don't have to set seeds for individual initializers in your code if you do the steps above, because their seeds are determined by the combination of the seeds set above.
Note: it is not recommended to use pickle or cPickle to save a Keras model.
1) Whole-model saving (configuration + weights)
Whole-model saving means creating a file that will contain:
- the architecture of the model, allowing you to re-create the model
- the weights of the model
- the training configuration (loss, optimizer)
- the state of the optimizer, allowing you to resume training exactly where you left off.
The default and recommended way to save a whole model is to just do: model.save(your_file_path.keras) .
Keras still supports its original HDF5-based saving format. To save a model in HDF5 format, use model.save(your_file_path, save_format='h5') . Note that this option is automatically used if your_file_path ends in .h5 . Please also see How can I install HDF5 or h5py to save my models? for instructions on how to install h5py .
After saving a model in either format, you can reinstantiate it via model = keras.models.load_model(your_file_path) .
2) Weights-only saving
If you need to save the weights of a model , you can do so in HDF5 with the code below:
Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the same architecture:
If you need to load the weights into a different architecture (with some layers in common), for instance for fine-tuning or transfer-learning, you can load them by layer name :
Please also see How can I install HDF5 or h5py to save my models? for instructions on how to install h5py .
3) Configuration-only saving (serialization)
If you only need to save the architecture of a model , and not its weights or its training configuration, you can do:
The generated JSON file is human-readable and can be manually edited if needed.
You can then build a fresh model from this data:
4) Handling custom layers (or other custom objects) in saved models
If the model you want to load includes custom layers or other custom classes or functions, you can pass them to the loading mechanism via the custom_objects argument:
Alternatively, you can use a custom object scope :
Custom objects handling works the same way for load_model & model_from_json :
In order to save your Keras models as HDF5 files, Keras uses the h5py Python package. It is a dependency of Keras and should be installed by default. On Debian-based distributions, you will have to additionally install libhdf5 :
If you are unsure if h5py is installed you can open a Python shell and load the module via
If it imports without error it is installed, otherwise you can find detailed installation instructions here .
Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
@misc{chollet2015keras, title={Keras}, author={Chollet, Fran\c{c}ois and others}, year={2015}, howpublished={\url{https://keras.io}}, }
Below are some common definitions that are necessary to know and understand to correctly utilize Keras fit() :
- Sample : one element of a dataset. For instance, one image is a sample in a convolutional network. One audio snippet is a sample for a speech recognition model.
- Batch : a set of N samples. The samples in a batch are processed independently, in parallel. If training, a batch results in only one update to the model. A batch generally approximates the distribution of the input data better than a single input. The larger the batch, the better the approximation; however, it is also true that the batch will take longer to process and will still result in only one update. For inference (evaluate/predict), it is recommended to pick a batch size that is as large as you can afford without going out of memory (since larger batches will usually result in faster evaluation/prediction).
- Epoch : an arbitrary cutoff, generally defined as "one pass over the entire dataset", used to separate training into distinct phases, which is useful for logging and periodic evaluation. When using validation_data or validation_split with the fit method of Keras models, evaluation will be run at the end of every epoch . Within Keras, there is the ability to add callbacks specifically designed to be run at the end of an epoch . Examples of these are learning rate changes and model checkpointing (saving).
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time. They are reflected in the training time loss but not in the test time loss.
Besides, the training loss that Keras displays is the average of the losses for each batch of training data, over the current epoch . Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. This can bring the epoch-wise average down. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
You should use the tf.data API to create tf.data.Dataset objects -- an abstraction over a data pipeline that can pull data from local disk, from a distributed file system, from GCS, etc., as well as efficiently apply various data transformations.
For instance, the utility [ tf.keras.utils.image_dataset_from_directory ](/api/data_loading/image#imagedatasetfromdirectory-function) will create a dataset that reads image data from a local directory. Likewise, the utility [ tf.keras.utils.text_dataset_from_directory ](/api/data_loading/text#textdatasetfromdirectory-function) will create a dataset that reads text files from a local directory.
Dataset objects can be directly passed to fit() , or can be iterated over in a custom low-level training loop.
To ensure the ability to recover from an interrupted training run at any time (fault tolerance), you should use a tf.keras.callbacks.experimental.BackupAndRestore that regularly saves your training progress, including the epoch number and weights, to disk, and loads it the next time you call Model.fit() .
Find out more in the callbacks documentation .
You can use an EarlyStopping callback:
Setting the trainable attribute
All layers & models have a layer.trainable boolean attribute:
On all layers & models, the trainable attribute can be set (to True or False). When set to False , the layer.trainable_weights attribute is empty:
Setting the trainable attribute on a layer recursively sets it on all children layers (contents of self.layers ).
1) When training with fit() :
To do fine-tuning with fit() , you would:
- Instantiate a base model and load pre-trained weights
- Freeze that base model
- Add trainable layers on top
- Call compile() and fit()
You can follow a similar workflow with the Functional API or the model subclassing API. Make sure to call compile() after changing the value of trainable in order for your changes to be taken into account. Calling compile() will freeze the state of the training step of the model.
2) When using a custom training loop:
When writing a training loop, make sure to only update weights that are part of model.trainable_weights (and not all model.weights ).
Interaction between trainable and compile()
Calling compile() on a model is meant to "freeze" the behavior of that model. This implies that the trainable attribute values at the time the model is compiled should be preserved throughout the lifetime of that model, until compile is called again. Hence, if you change trainable , make sure to call compile() again on your model for your changes to be taken into account.
For instance, if two models A & B share some layers, and:
- Model A gets compiled
- The trainable attribute value on the shared layers is changed
- Model B is compiled
Then model A and B are using different trainable values for the shared layers. This mechanism is critical for most existing GAN implementations, which do:
training is a boolean argument in call that determines whether the call should be run in inference mode or training mode. For example, in training mode, a Dropout layer applies random dropout and rescales the output. In inference mode, the same layer does nothing. Example:
trainable is a boolean layer attribute that determines the trainable weights of the layer should be updated to minimize the loss during training. If layer.trainable is set to False , then layer.trainable_weights will always be an empty list. Example:
As you can see, "inference mode vs training mode" and "layer weight trainability" are two very different concepts.
You could imagine the following: a dropout layer where the scaling factor is learned during training, via backpropagation. Let's name it AutoScaleDropout . This layer would have simultaneously a trainable state, and a different behavior in inference and training. Because the trainable attribute and the training call argument are independent, you can do the following:
Special case of the BatchNormalization layer
Consider a BatchNormalization layer in the frozen part of a model that's used for fine-tuning.
It has long been debated whether the moving statistics of the BatchNormalization layer should stay frozen or adapt to the new data. Historically, bn.trainable = False would only stop backprop but would not prevent the training-time statistics update. After extensive testing, we have found that it is usually better to freeze the moving statistics in fine-tuning use cases. Starting in TensorFlow 2.0, setting bn.trainable = False will also force the layer to run in inference mode.
This behavior only applies for BatchNormalization . For every other layer, weight trainability and "inference vs training mode" remain independent.
If you set the validation_split argument in model.fit to e.g. 0.1, then the validation data used will be the last 10% of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note that the data isn't shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed.
The same validation set is used for all epochs (within the same call to fit ).
Note that the validation_split option is only available if your data is passed as Numpy arrays (not tf.data.Datasets , which are not indexable).
If you pass your data as NumPy arrays and if the shuffle argument in model.fit() is set to True (which is the default), the training data will be globally randomly shuffled at each epoch.
If you pass your data as a tf.data.Dataset object and if the shuffle argument in model.fit() is set to True , the dataset will be locally shuffled (buffered shuffling).
When using tf.data.Dataset objects, prefer shuffling your data beforehand (e.g. by calling dataset = dataset.shuffle(buffer_size) ) so as to be in control of the buffer size.
Validation data is never shuffled.
Loss values and metric values are reported via the default progress bar displayed by calls to fit() . However, staring at changing ascii numbers in a console is not an optimal metric-monitoring experience. We recommend the use of TensorBoard , which will display nice-looking graphs of your training and validation metrics, regularly updated during training, which you can access from your browser.
You can use TensorBoard with fit() via the TensorBoard callback .
You have two options:
1) Subclass the Model class and override the train_step (and test_step ) methods
This is a better option if you want to use custom update rules but still want to leverage the functionality provided by fit() , such as callbacks, efficient step fusing, etc.
Note that this pattern does not prevent you from building models with the Functional API, in which case you will use the class you created to instantiate the model with the inputs and outputs . Same goes for Sequential models, in which case you will subclass keras.Sequential and override its train_step instead of keras.Model .
The example below shows a Functional model with a custom train_step .
You can also easily add support for sample weighting:
Similarly, you can also customize evaluation by overriding test_step :
2) Write a low-level custom training loop
This is a good option if you want to be in control of every last little detail. But it can be somewhat verbose. Example:
This example does not include a lot of essential functionality like displaying a progress bar, calling callbacks, updating metrics, etc. You would have to do this yourself. It's not difficult at all, but it's a bit of work.
Keras has built-in support for mixed precision training on GPU and TPU. See this extensive guide .
Let's answer with an extract from Deep Learning with Python, Second Edition :
Both y = model.predict(x) and y = model(x) (where x is an array of input data) mean "run the model on x and retrieve the output y ." Yet they aren't exactly the same thing. predict() loops over the data in batches (in fact, you can specify the batch size via predict(x, batch_size=64) ), and it extracts the NumPy value of the outputs. It's schematically equivalent to this:
This means that predict() calls can scale to very large arrays. Meanwhile, model(x) happens in-memory and doesn't scale. On the other hand, predict() is not differentiable: you cannot retrieve its gradient if you call it in a GradientTape scope. You should use model(x) when you need to retrieve the gradients of the model call, and you should use predict() if you just need the output value. In other words, always use predict() unless you're in the middle of writing a low-level gradient descent loop (as we are now).
In the Functional API and Sequential API, if a layer has been called exactly once, you can retrieve its output via layer.output and its input via layer.input . This enables you do quickly instantiate feature-extraction models, like this one:
Naturally, this is not possible with models that are subclasses of Model that override call .
Here's another example: instantiating a Model that returns the output of a specific named layer:
You could leverage the models available in keras.applications , or the models available on TensorFlow Hub . TensorFlow Hub is well-integrated with Keras.
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
- all batches have the same number of samples
- If x1 and x2 are successive batches of samples, then x2[i] is the follow-up sequence to x1[i] , for every i .
To use statefulness in RNNs, you need to:
- explicitly specify the batch size you are using, by passing a batch_size argument to the first layer in your model. E.g. batch_size=32 for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
- set stateful=True in your RNN layer(s).
- specify shuffle=False when calling fit() .
To reset the states accumulated:
- use model.reset_states() to reset the states of all layers in the model
- use layer.reset_states() to reset the states of a specific stateful RNN layer
Note that the methods predict , fit , train_on_batch , etc. will all update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.
How to cite Keras
Python package
Keras is an open-source neural-network library written in Python.
Copy to clipboard: CTLR + C, then ENTER or click OK
Source code repository
Popular book.
You can cite source code repository directly
Citation in APA style
Citation in vancouver style, citation in harvard style, citation in bibtex format.
In case you do now want to cite source codes, you can cite popular book about Keras
Leave feedback
Do you find a mistake? Do you know a better reference? Do we miss some important information? Please let us know!
Your email address
Do you like it?
Unprofitable.
Please consider a donation – we do it in our free time. Do it easily Support .
Suggestions are always welcome! You can also help us with writing and became a part of the team.
Share science!
Just tell your friends about us.
- Docs »
- Getting started »
- Edit on GitHub
Keras FAQ: Frequently Asked Keras Questions
How should i cite keras, how can i run keras on gpu, how can i save a keras model, why is the training loss much higher than the testing loss, how can i visualize the output of an intermediate layer, how can i use keras with datasets that don't fit in memory, how can i interrupt training when the validation loss isn't decreasing anymore, how is the validation split computed, is the data shuffled during training, how can i record the training / validation loss / accuracy at each epoch, how can i use stateful rnns.
Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
If you are running on the TensorFlow backend, your code will automatically run on GPU if any available GPU is detected. If you are running on the Theano backend, you can use one of the following methods:
Method 1: use Theano flags.
The name 'gpu' might have to be changed depending on your device's identifier (e.g. gpu0 , gpu1 , etc).
Method 2: set up your .theanorc : Instructions
Method 3: manually set theano.config.device , theano.config.floatX at the beginning of your code:
It is not recommended to use pickle or cPickle to save a Keras model.
If you only need to save the architecture of a model, and not its weights, you can do:
You can then build a fresh model from this data:
If you need to save the weights of a model, you can do so in HDF5 with the code below.
Note that you will first need to install HDF5 and the Python library h5py, which do not come bundled with Keras.
Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the same architecture:
This leads us to a way to save and reconstruct models from only serialized data:
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
You can build a Keras function that will return the output of a certain layer given a certain input, for example:
Similarly, you could build a Theano and TensorFlow function directly.
Note that if your model has a different behavior in training and testing phase (e.g. if it uses Dropout , BatchNormalization , etc.), you will need to pass the learning phase flag to your function:
Another more flexible way of getting output from intermediate layers is to use the functional API .
You can do batch training using model.train_on_batch(X, y) and model.test_on_batch(X, y) . See the models documentation .
Alternatively, you can write a generator that yields batches of training data and use the method model.fit_generator(data_generator, samples_per_epoch, nb_epoch) .
You can see batch training in action in our CIFAR10 example .
You can use an EarlyStopping callback:
Find out more in the callbacks documentation .
If you set the validation_split argument in model.fit to e.g. 0.1, then the validation data used will be the last 10% of the data. If you set it to 0.25, it will be the last 25% of the data, etc.
Yes, if the shuffle argument in model.fit is set to True (which is the default), the training data will be randomly shuffled at each epoch.
Validation data is never shuffled.
The model.fit method returns an History callback, which has a history attribute containing the lists of successive losses and other metrics.
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
- all batches have the same number of samples
- If X1 and X2 are successive batches of samples, then X2[i] is the follow-up sequence to X1[i] , for every i .
To use statefulness in RNNs, you need to:
- explicitly specify the batch size you are using, by passing a batch_input_shape argument to the first layer in your model. It should be a tuple of integers, e.g. (32, 10, 16) for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
- set stateful=True in your RNN layer(s).
To reset the states accumulated:
- use model.reset_states() to reset the states of all layers in the model
- use layer.reset_states() to reset the states of a specific stateful RNN layer
Notes that the methods predict , fit , train_on_batch , predict_classes , etc. will all update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.
Search code, repositories, users, issues, pull requests...
Provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Citing keras-tuner #264
vb690 commented Mar 28, 2020 • edited
- 👍 2 reactions
haifeng-jin commented Jun 2, 2020 • edited
- 👍 11 reactions
Sorry, something went wrong.
No branches or pull requests
Frequently Asked Questions
How should i cite keras.
Please cite Keras in your publications if it helps your research. Here is an example BibTeX entry:
How can I run Keras on a GPU?
Note that installation and configuration of the GPU-based backends can take considerably more time and effort. So if you are just getting started with Keras you may want to stick with the CPU version initially, then install the appropriate GPU version once your training becomes more computationally demanding.
Below are instructions for installing and enabling GPU support for the various supported backends.
If your system has an NVIDIA® GPU and you have the GPU version of TensorFlow installed then your Keras code will automatically run on the GPU.
Additional details on GPU installation can be found here: https://tensorflow.rstudio.com/installation_gpu.html .
If you are running on the Theano backend, you can set the THEANO_FLAGS environment variable to indicate you’d like to execute tensor operations on the GPU. For example:
The name ‘gpu’ might have to be changed depending on your device’s identifier (e.g. gpu0 , gpu1 , etc).
If you have the GPU version of CNTK installed then your Keras code will automatically run on the GPU.
Additional information on installing the GPU version of CNTK can be found here: https://learn.microsoft.com/en-us/cognitive-toolkit/setup-linux-python
How can I run a Keras model on multiple GPUs?
We recommend doing so using the TensorFlow backend. There are two ways to run a single model on multiple GPUs: data parallelism and device parallelism .
In most cases, what you need is most likely data parallelism.
Data parallelism
Data parallelism consists in replicating the target model once on each device, and using each replica to process a different fraction of the input data. Keras has a built-in utility, multi_gpu_model() , which can produce a data-parallel version of any model, and achieves quasi-linear speedup on up to 8 GPUs.
For more information, see the documentation for multi_gpu_model . Here is a quick example:
Device parallelism
Device parallelism consists in running different parts of a same model on different devices. It works best for models that have a parallel architecture, e.g. a model with two branches.
This can be achieved by using TensorFlow device scopes. Here is a quick example:
What does “sample”, “batch”, “epoch” mean?
Below are some common definitions that are necessary to know and understand to correctly utilize Keras:
- Example: one image is a sample in a convolutional network
- Example: one audio file is a sample for a speech recognition model
- A batch generally approximates the distribution of the input data better than a single input. The larger the batch, the better the approximation; however, it is also true that the batch will take longer to process and will still result in only one update. For inference (evaluate/predict), it is recommended to pick a batch size that is as large as you can afford without going out of memory (since larger batches will usually result in faster evaluating/prediction).
- When using evaluation_data or evaluation_split with the fit method of Keras models, evaluation will be run at the end of every epoch .
- Within Keras, there is the ability to add callbacks specifically designed to be run at the end of an epoch . Examples of these are learning rate changes and model checkpointing (saving).
Why are Keras objects modified in place?
Unlike most R objects, Keras objects are “mutable”. That means that when you modify an object you’re modifying it “in place”, and you don’t need to assign the updated object back to the original name. For example, to add layers to a Keras model you might use this code:
Rather than this code:
You need to be aware of this because it makes the Keras API a little different than most other pipelines you may have used, but it’s necessary to match the data structures and behavior of the underlying Keras library.
How can I save a Keras model?
Saving/loading whole models (architecture + weights + optimizer state).
You can use save_model_hdf5() to save a Keras model into a single HDF5 file which will contain:
- the architecture of the model, allowing to re-create the model
- the weights of the model
- the training configuration (loss, optimizer)
- the state of the optimizer, allowing to resume training exactly where you left off.
You can then use load_model_hdf5() to reinstantiate your model. load_model_hdf5() will also take care of compiling the model using the saved training configuration (unless the model was never compiled in the first place).
Saving/loading only a model’s architecture
If you only need to save the architecture of a model , and not its weights or its training configuration, you can do:
The generated JSON / YAML files are human-readable and can be manually edited if needed.
You can then build a fresh model from this data:
Saving/loading only a model’s weights
If you need to save the weights of a model , you can do so in HDF5 with the code below.
Assuming you have code for instantiating your model, you can then load the weights you saved into a model with the same architecture:
If you need to load weights into a different architecture (with some layers in common), for instance for fine-tuning or transfer-learning, you can load weights by layer name :
For example:
Why is the training loss much higher than the testing loss?
A Keras model has two modes: training and testing. Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing time.
Besides, the training loss is the average of the losses over each batch of training data. Because your model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
How can I obtain the output of an intermediate layer?
One simple way is to create a new Model that will output the layers that you are interested in:
How can I use Keras with datasets that don’t fit in memory?
Generator functions.
To provide training or evaluation data incrementally you can write an R generator function that yields batches of training data then pass the function to the fit_generator() function (or related functions evaluate_generator() and predict_generator() .
The output of generator functions must be a list of one of these forms:
- (inputs, targets)
- (inputs, targets, sample_weights)
All arrays should contain the same number of samples. The generator is expected to loop over its data indefinitely. For example, here’s simple generator function that yields randomly sampled batches of data:
The steps_per_epoch parameter indicates the number of steps (batches of samples) to yield from generator before declaring one epoch finished and starting the next epoch. It should typically be equal to the number of unique samples if your dataset divided by the batch size.
External Data Generators
The above example doesn’t however address the use case of datasets that don’t fit in memory. Typically to do that you’ll write a generator that reads from another source (e.g. a sparse matrix or file(s) on disk) and maintains an offset into that data as it’s called repeatedly. For example, imagine you have a set of text files in a directory you want to read from:
The above function is an example of a stateful generator—the function maintains information across calls to keep track of which data to provide next. This is accomplished by defining shared state outside the generator function body and using the <<- operator to assign to it from within the generator.

Image Generators
You can also use the flow_images_from_directory() and flow_images_from_data() functions along with fit_generator() for training on sets of images stored on disk (with optional image augmentation/normalization via image_data_generator() ).
Batch Functions
You can also do batch training using the train_on_batch() and test_on_batch() functions. These functions enable you to write a training loop that reads into memory only the data required for each batch.
How can I interrupt training when the validation loss isn’t decreasing anymore?
You can use an early stopping callback:
Find out more in the callbacks documentation .
How is the validation split computed?
If you set the validation_split argument in fit to e.g. 0.1, then the validation data used will be the last 10% of the data. If you set it to 0.25, it will be the last 25% of the data, etc. Note that the data isn’t shuffled before extracting the validation split, so the validation is literally just the last x% of samples in the input you passed.
The same validation set is used for all epochs (within a same call to fit ).
Is the data shuffled during training?
Yes, if the shuffle argument in fit is set to TRUE (which is the default), the training data will be randomly shuffled at each epoch.
Validation data is never shuffled.
How can I record the training / validation loss / accuracy at each epoch?
The model.fit method returns an History callback, which has a history attribute containing the lists of successive losses and other metrics.
How can I “freeze” Keras layers?
To “freeze” a layer means to exclude it from training, i.e. its weights will never be updated. This is useful in the context of fine-tuning a model, or using fixed embeddings for a text input.
You can pass a trainable argument (boolean) to a layer constructor to set a layer to be non-trainable:
Additionally, you can set the trainable property of a layer to TRUE or FALSE after instantiation. For this to take effect, you will need to call compile() on your model after modifying the trainable property. Here’s an example:
Finally, you can freeze or unfreeze the weights for an entire model (or a range of layers within the model) using the freeze_weights() and unfreeze_weights() functions. For example:
How can I use stateful RNNs?
Making a RNN stateful means that the states for the samples of each batch will be reused as initial states for the samples in the next batch.
When using stateful RNNs, it is therefore assumed that:
- all batches have the same number of samples
- If X1 and X2 are successive batches of samples, then X2[[i]] is the follow-up sequence to X1[[i] , for every i .
To use statefulness in RNNs, you need to:
- explicitly specify the batch size you are using, by passing a batch_size argument to the first layer in your model. E.g. batch_size=32 for a 32-samples batch of sequences of 10 timesteps with 16 features per timestep.
- set stateful=TRUE in your RNN layer(s).
- specify shuffle=FALSE when calling fit().
To reset the states accumulated in either a single layer or an entire model use the reset_states() function.
Notes that the methods predict() , fit() , train_on_batch() , predict_classes() , etc. will all update the states of the stateful layers in a model. This allows you to do not only stateful training, but also stateful prediction.
How can I remove a layer from a Sequential model?
You can remove the last added layer in a Sequential model by calling pop_layer() :
How can I use pre-trained models in Keras?
Code and pre-trained weights are available for the following image classification models:
- InceptionV3
- InceptionResNetV2
- MobileNetV2
For a few simple usage examples, see the documentation for the Applications module .
The VGG16 model is also the basis for the Deep dream Keras example script.
How can I use other Keras backends?
By default the Keras Python and R packages use the TensorFlow backend. Other available backends include Theano or CNTK. To learn more about using alternatate backends (e.g. Theano or CNTK) see the article on Keras backends.
How can I use the PlaidML backend?
PlaidML is an open source portable deep learning engine that runs on most existing PC hardware with OpenCL-capable GPUs from NVIDIA, AMD, or Intel. PlaidML includes a Keras backend which you can use as described below.
First, build and install PlaidML as described on the project website . You must be sure that PlaidML is correctly installed, setup, and working before proceeding further!
Then, to use Keras with the PlaidML backend you do the following:
This should automatically discover and use the Python environment where plaidml and plaidml-keras were installed. If this doesn’t work as expected you can also force the selection of a particular Python environment. For example, if you installed PlaidML in conda environment named “plaidml” you would do this:
How can I use Keras in another R package?
Testing on cran.
The main consideration in using Keras within another R package is to ensure that your package can be tested in an environment where Keras is not available (e.g. the CRAN test servers). To do this, arrange for your tests to be skipped when Keras isn’t available using the is_keras_available() function.
For example, here’s a testthat utility function that can be used to skip a test when Keras isn’t available:
You can pass the version argument to check for a specific version of Keras.
Keras Module
Another consideration is gaining access to the underlying Keras Python module. You might need to do this if you require lower level access to Keras than is provided for by the Keras R package.
Since the Keras R package can bind to multiple different implementations of Keras (either the original Keras or the TensorFlow implementation of Keras), you should use the keras::implementation() function to obtain access to the correct python module. You can use this function within the .onLoad function of a package to provide global access to the module within your package. For example:
Custom Layers
If you create custom layers in R or import other Python packages which include custom Keras layers, be sure to wrap them using the create_layer() function so that they are composable using the magrittr pipe operator. See the documentation on layer wrapper functions for additional details.
How can I obtain reproducible results using Keras during development?
During development of a model, sometimes it is useful to be able to obtain reproducible results from run to run in order to determine if a change in performance is due to an actual model or data modification, or merely a result of a new random sample.
The use_session_with_seed() function establishes a common random seed for R, Python, NumPy, and TensorFlow. It furthermore disables hash randomization, GPU computations, and CPU parallelization, which can be additional sources of non-reproducibility.
To use the function, call it immediately after you load the keras package:
This function takes all measures known to promote reproducible results from Keras sessions, however it’s possible that various individual features or libraries used by the backend escape its effects. If you encounter non-reproducible results please investigate the possible sources of the problem. The source code for use_session_with_seed() is here: https://github.com/rstudio/tensorflow/blob/main/R/seed.R . Contributions via pull request are very welcome!
Please note again that use_session_with_seed() disables GPU computations and CPU parallelization by default (as both can lead to non-deterministic computations) so should generally not be used when model training time is paramount. You can re-enable GPU computations and/or CPU parallelism using the disable_gpu and disable_parallel_cpu arguments. For example:
Where is the Keras configuration filed stored?
The default directory where all Keras data is stored is:
In case Keras cannot create the above directory (e.g. due to permission issues), /tmp/.keras/ is used as a backup.
The Keras configuration file is a JSON file stored at $HOME/.keras/keras.json . The default configuration file looks like this:
It contains the following fields:
- The image data format to be used as default by image processing layers and utilities (either channels_last or channels_first ).
- The epsilon numerical fuzz factor to be used to prevent division by zero in some operations.
- The default float data type.
- The default backend (this will always be “tensorflow” in the R interface to Keras)
Likewise, cached dataset files, such as those downloaded with get_file() , are stored by default in $HOME/.keras/datasets/ .
- Home
- Article citations
- Biomedical & Life Sci.
- Business & Economics
- Chemistry & Materials Sci.
- Computer Sci. & Commun.
- Earth & Environmental Sci.
- Engineering
- Medicine & Healthcare
- Physics & Mathematics
- Social Sci. & Humanities
Journals by Subject
- Biomedical & Life Sciences
- Chemistry & Materials Science
- Computer Science & Communications
- Earth & Environmental Sciences
- Social Sciences & Humanities
- Paper Submission
- Information for Authors
- Peer-Review Resources
- Open Special Issues
- Open Access Statement
- Frequently Asked Questions
Publish with us
Article citations more>>.
Chollet, F. (2015) keras, GitHub. https://github.com/fchollet/keras
has been cited by the following article:
TITLE: Double Sarsa and Double Expected Sarsa with Shallow and Deep Learning
KEYWORDS: Double Sarsa , Double Expected Sarsa , Reinforcement Learning , Deep Learning
JOURNAL NAME: Journal of Data Analysis and Information Processing , Vol.4 No.4 , October 17, 2016
ABSTRACT: Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, producing two new algorithms called Double Sarsa and Double Expected Sarsa that are shown to be more robust than their single counterparts when rewards are stochastic. We find that these algorithms add a significant amount of stability in the learning process at only a minor computational cost, which leads to higher returns when using an on-policy algorithm. We then use shallow and deep neural networks to approximate the actionvalue, and show that Double Sarsa and Double Expected Sarsa are much more stable after convergence and can collect larger rewards than the single versions.
Related Articles:
- Open Access Articles A Comprehensive Parametric Study of Planar Inverted-F Antenna Hassan Tariq Chattha, Yi Huang, Muhammad Kamran Ishfaq, Stephen J. Boyes Wireless Engineering and Technology Vol.3 No.1 , January 12, 2012 DOI: 10.4236/wet.2012.31001
- Open Access Articles Kinetical Inflation and Quintessence by F-Harmonic Map Antonin Kanfon, Dominique Lambert Lambert Journal of Modern Physics Vol.3 No.11 , November 16, 2012 DOI: 10.4236/jmp.2012.311213
- Open Access Articles Biologically Relevant Universality of Move F in Wh-Questions Daoshan Ma Open Access Library Journal Vol.5 No.3 , March 15, 2018 DOI: 10.4236/oalib.1104442
- Open Access Articles Evaluation of the Notifiable Diseases Surveillance System in Beitbridge District, Zimbabwe 2015 Juru P. Tsitsi, Ncube Nomagugu, Notion T. Gombe, Mufuta Tshimanga, Bangure Donewell, More Mungati, Chikodzore Rudo Open Journal of Epidemiology Vol.5 No.3 , August 17, 2015 DOI: 10.4236/ojepi.2015.53024
- Open Access Articles On Chinese Government’s Stock Market Rescue Efforts in 2015 Fanhua Zeng, Wei-Chiao Huang, James Hueng Modern Economy Vol.7 No.4 , April 27, 2016 DOI: 10.4236/me.2016.74045
- Journals A-Z
About SCIRP
- Publication Fees
- For Authors
- Peer-Review Issues
- Special Issues
- Manuscript Tracking System
- Subscription
- Translation & Proofreading
- Volume & Issue
- Open Access
- Publication Ethics
- Preservation
- Privacy Policy

Citing Keras?
François Chollet
Hello, I'm using Keras for research in my master's thesis on deep neural networks. Do you have a paper I can cite, or any standard citations for attribution purposes?
-- You received this message because you are subscribed to the Google Groups "Keras-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected] . To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/b8a7ed22-e4a2-481d-bd1f-68f46c70d064%40googlegroups.com . For more options, visit https://groups.google.com/d/optout .
[email protected]
- Español – América Latina
- Português – Brasil
- Tiếng Việt
Citing TensorFlow
TensorFlow publishes a DOI for the open-source code base using Zenodo.org: 10.5281/zenodo.4724125
TensorFlow's white papers are listed for citation below.
Large-Scale Machine Learning on Heterogeneous Distributed Systems
Access this white paper.
Abstract: TensorFlow is an interface for expressing machine learning algorithms and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.
In BibTeX format
If you use TensorFlow in your research and would like to cite the TensorFlow system, we suggest you cite this whitepaper.
Or in textual form:
TensorFlow: A System for Large-Scale Machine Learning
Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives flexibility to the application developer: whereas in previous “parameter server” designs the management of shared state is built into the system, TensorFlow enables developers to experiment with novel optimizations and training algorithms. TensorFlow supports a variety of applications, with a focus on training and inference on deep neural networks. Several Google services use TensorFlow in production, we have released it as an open-source project, and it has become widely used for machine learning research. In this paper, we describe the TensorFlow dataflow model and demonstrate the compelling performance that TensorFlow achieves for several real-world applications.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License , and code samples are licensed under the Apache 2.0 License . For details, see the Google Developers Site Policies . Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-01-21 UTC.

IMAGES
VIDEO
COMMENTS
Museums should be cited similar to a Corporate Author. This includes the museum’s name and location in the necessary information. Write the name of the museum first, followed by a period.
To cite a PDF in MLA, identify what type of the work it is, and then cite accordingly. If the work cannot be cited by type, then it should be cited following the digital file guidelines.
Social referencing is term that refers to the tendency of a person particularly an infant, to analyze the facial expressions of a significant other in order to be able to determine what to do.
How can I install HDF5 or h5py to save my models? How should I cite Keras? Training-related questions. What do "sample", "batch", and "epoch"
Citation in Harvard style. Chollet, F. & others, 2015. Keras. Available at: https://github.com/fchollet/keras.
fchollet commented on Jun 18, 2015. In the future, we may release a paper for the purpose of citations, in the spirit of this one (which
Keras FAQ: Frequently Asked Keras Questions. How should I cite Keras? How can I run Keras on GPU? How can I save a Keras model? Why is the training loss much
As I talked to the authors of Keras Tuner, we agreed on the following bibtex for citations. @misc{omalley2019kerastuner
This exact question is addressed in their FAQ. As of the time of this answer, their suggestion is: Please cite Keras in your publications if
How should I cite Keras? · How can I run Keras on a GPU? · How can I run a Keras model on multiple GPUs? · What does “sample”, “batch”, “epoch” mean? · Why are
Article citationsMore>>. Chollet, F. (2015) keras, GitHub. https://github.com/fchollet/keras. has been cited by the following article: TITLE: Double Sarsa and
In the future, we may release a paper for the purpose of citations, in the spirit of this one: http://arxiv.org/pdf/1506.00619.pdf (which
Keras FAQ: Frequently Asked Keras Questions. How should I cite Keras? How can I run Keras on GPU? How can I run a Keras model on multiple GPUs? What does
Citing TensorFlow · On this page · Large-Scale Machine Learning on Heterogeneous Distributed Systems. In BibTeX format · TensorFlow: A System for