Hate speech is potentially harmful to individuals and the society. Social media remains a major channel for spreading hate speech. Posts on the social media are largely composed of non-standard linguistic signals, which makes automatic detection of hate speech on social media difficult. Poorly constructed linguistic contents on the social media contributes significantly to the difficulty of automatic detection of hate speech. Computational resources for creating large labeled corpora are costly. Deep neural network (DNN) presents an opportunity for efficiently learning features in a speech corpora therefore presenting prospect for automatic detection of hate speech. In this study an ensemble DNN model composed of a stacked auto encoder (SAE) and a convolutional neural network (CNN) is designed for the task of learning representations of X (formerly Twitter) comments with the aim of classifying hate speech. The dataset used in the study was obtained online from the X. The auto encoder (AE) component complements the weak feature extraction capability of CNN and improves data dimensionality reduction of the dataset. The output of the unsupervised AE and the extracted features, are input into the supervised CNN for classification. The study leveraged on the rich neural network support of Python to build and test the model through the low level libraries provided by Tensorflow and the high level neural network interface of Keras. The results showed that the ensemble AE-CNN had significant improvement for the binary classification task. The model achieved 96.0% accuracy and an F1-score of 94.8%.