There is often a difference between the training library and the test library in the inclusion of noise, which affects the practical application of speech endpoint detection. We propose to use a generalized regularized online sequential extreme learning machine with forgetting factor (GR-OSELM-FF) for voice activation detection to adapt to the difference between test and training samples. Practical usability is our driving motivation; the proposed model should be easily adapted to new conditions. When a new voice stream arrives in the test or the actual application phase, the proposed model can directly adjust the output weight. To overcome the weakness of ELM’s vulnerability to random hidden layer parameters, we use an extreme learning machine-based autoencoder (ELM-AE) to initialize the model parameters instead of using random initialization. The experimental results show that the pretrained models achieve better performance with ELM-AE, which can obtain the potential information from the data. The experimental results also show that the proposed algorithm maintains good accuracy and omission rates in different SNR noise environments and real-world voice samples.