Metadata like in tags, HTML meta tags, etc. is where you describe the content so meaning can be extracted from it by machines and automated processing.
2. These are all complex formats. If you want to ingest and process them then you already have to build all the hard parts. Getting the metadata out is dead simple compared to parsing, decoding, and then processing an image, for example.