Egocentric Audio-Visual Understanding using Approaches for Self-Supervision and Counting