On Hierarchical Encoding and Reasoning in Deep Transformer-based Generative Models