In this paper, we investigate the problem of wireless service provisioning through a rotary-wing UAV which can serve as an aerial base station (BS) to communicate with multiple ground terminals (GTs) in a boost demand area. Our objective is to optimize the UAV control for maximizing the UAV.s energy efficiency, in which both aerodynamic energy and communication energy are considered while ensuring the communication requirements for each GT and backhaul link between the UAV and the terrestrial BS. The mobility of the UAV and GTs lead to time-varying channel conditions that make the environment dynamic. We formulate a nonconvex optimization for controlling the UAV considering the practical angle-dependent Rician fading channels between the UAV and GTs, and between the UAV and the terrestrial BS. Traditional optimization approaches are not able to handle the dynamic environment and high complexity of the problem in real-time. We propose to use a deep reinforcement learning-based approach namely Deep Deterministic Policy Gradient (DDPG) to solve the formulated nonconvex problem of UAV control with continuous action space that takes into account the real-time of the environment including time-varying UAV-ground channel conditions, available onboard energy of the UAV, and the communication requirement of the GTs. However, the DDPG method may not achieve good performance in an unstable environment and will face a large number of hyperparameters. We extend our approach to use the Trust Region Policy Optimization (TRPO) method that can improve the performance of the UAV compared to the DDPG method in such a dynamic environment.