Rethinking the Content Inventory: image aspect ratios

Key Points:

Sometimes inventories of assets, such as images, are just as essential as the core content. ← tweet
When approaching a content problem, we often don't even know the questions in advance. ← tweet
Remember to approach each inventory with fresh eyes, rather than just applying standard tools or templates. ← tweet

Report: Rethinking the Content Inventory

One of the key themes in my approach to content inventory is that there is no one-sized-fits-all method or inventory template, and that content inventories need to be targeted to the content question at hand.

In this article I will describe an example: a client had an issue with a new template they were about to roll out: the new template did not work well with tall images, and they had a large repository of articles that had tall images within them. They needed an inventory to help tackle this issue.

Question 1: How broad is this issue?

The first question was: how broad is this issue? To answer this question required:

Scraping the HTML of the article pages (in this case, the images were not stored in an asset management system so we had to actually look for image tags).
For each image, capture the images dimensions (and then calculate the image ratio).

After doing this, we had the following:

Although there are more reports with images that are not tall, there were still a large number of reports (2,825) with tall images.

Bar chart showing those reports that have tall images and those that do not

If it turned out there were few examples of tall images then we would have probably stopped there, but since there were a large number we needed to dig deeper.

Question 2: Can we break down this issue to best tackle the problem?

Now that we established that there were a lot of articles with this problem, my client asked the next question: how many might merit special, hand treatment, and which might we be able to sneak by with a more global solution (such as a javascript resizing tall images on the client side). To answer this question, we looked at the following:

How much traffic are these articles with tall images getting? This was pulled from Google Analytics (using the total page views year to date).
What year were these articles published? This was pulled from the URL, which contained the year of the report in it.

After weaving this information into the scraped information on the reports and the aspect ratios, we generated this report of all the tall images (each vertical bar is a year):

Graph from image inventory listing just tall images, highlighting those that are within the most-read reports

Stacked histogram by year, showing proportion of pageviews

The images in articles that are rarely viewed could probably be dealt with in a blanket, automated approach so are grayed out in the chart above, and those that received over 20,000 page views almost certainly could be hand massaged.

How this example used principles from the Rethinking the Content Inventory series

Many of the principles from this series were used in this example:

Exploration. We didn't start of fully understanding the problem or what the questions were. We started with a basic question and proceeded to a more nuanced exploration from there as we better understood the problem.
Sources of data. Although at the beginning we started with just scraped data (and the list of article URLs from the CMS), we proceeded to combine Google Analytics and also URL patterns.
Quality. At one point I attempted to take a shortcut in the analysis, but when I generated the multi-color chart above I saw that there were no 2015 reports with significant traffic. This was obviously an error, and slicing the information in a new way helped this.
Layers of content. We may dig into this more, but by clicking on the various years in the chart above I was able to sample what years the articles were using a different type of image treatment (which we may decide to dig into the data again to automatically extract exactly when this "layer" of content was in place).

Note that this entire undertaking was very dynamic, including even the various charting and reporting — dynamic charting allows us to more quickly dig around in the data (and also to automatically have all charts update when the data is updated).

A quick note on tools

Although the tools really are not the point in an inventory, I thought it might be useful to list some of the tools used to do the above analysis:

Zoho Reports: for the reporting (and ability to drill into charts) and much of the analysis including merging data sources, data cleanup, and some data extraction (like the year from the URL)
Excel: to view and verify the raw data from the data sources before uploading to Zoho Reports
Google Sheets, and Google Analytics Spreadsheet Add-on: to pull down more than the export limit from the Google Analytics web interface
The Consolidation Assistant tool of Add-ins.com (a suite of Excel macros): to merge the different GA data pulls from the above
RegExr: for some quick modifications of URLs to get them into a format that was easier to work with
Scrapy (Python): for scraping (standard inventory tools like Screaming Frog did not pull out the image information in a way that was useful for this analysis)
PIL (Python): for image manipulation

Please don't go and think you need all of these for your project, and different inventory questions demand different tools (on other projects I have used almost completely different tools). But seeing the types of tools used in this example hopefully illustrate some of the ways an inventory exploration can happen.

Report: Rethinking the Content Inventory

Download

First published 29 September 2015

You are here

Rethinking the Content Inventory: image aspect ratios

Question 1: How broad is this issue?

Question 2: Can we break down this issue to best tackle the problem?

How this example used principles from the Rethinking the Content Inventory series

A quick note on tools

Report: Rethinking the Content Inventory