Unmanned aerial vehicle (UAV)-assisted communications have several promising advantages, such as the ability to facilitate on-demand deployment, high flexibility in network reconfiguration, and high chance of having line-of-sight (LoS) communication links. In this paper, we aim to optimize the UAV control for maximizing the UAV’s energy efficiency, in which both aerodynamic energy and communication energy are considered while ensuring the communication requirements for each ground terminal (GT) and backhaul link between the UAV and the terrestrial base station (BS). The mobility of the UAV and GTs lead to time-varying channel conditions that make the environment dynamic. We formulate a nonconvex optimization for controlling the UAV considering the practical angle-dependent Rician fading channels between the UAV and GTs, and between the UAV and the terrestrial BS. Traditional optimization approaches are not able to handle the dynamic environment and high complexity of the problem in real-time. We propose to use the Trust Region Policy Optimization (TRPO) method that can improve the performance of the UAV compared to the Deep Deterministic Policy Gradient (DDPG) method in such a dynamic environment as in this paper.