ATOM/RSS feed for recently modified Dreamweaver pages
The problem
Where I work, there is a certain enormous website that we maintain. Traditionally, content has moved from subject matter experts to the editors and then to the web team. In the spirit of simplifying processes, we’ve recently adopted put in place a system that allows subject matter experts to be in more control of their content.
Initially, as with most CMSs, we expected this to dramatically reduce the about of repetitive work imposed on the web professionals, e.g., “here’s some text; put it on the web”. While this proved to be true, we quickly discovered that non-web folks don’t always produce valid HTML. Kind of a no-brainer. This problem seems to be the classic trade off between poor content and poor HTML.
The solution
To combat this, we’ve put in place a workflow that provides our technical team a non-intrusive way of monitoring this web content. We’ve built a script that generates an ATOM feed of recently modified pages on our web sites. This feed tells us who modified which page, when, and whether or not the page passes a W3C validation. This way, we can do ‘technical editorial’ on only the pages that really need it.
We run this every 30 minutes via a cron job. The output is routed to a recently_modified.xml file in the website (this is the ATOM feed).
recently_modified.rb
Download recently_modified.rb
require 'rubygems'
require 'xmlsimple'
require 'erb'
require 'hpricot'
require 'open-uri'
@base_path = File.expand_path($0).gsub(/lib\/recently_modified.rb/, '')
@base_url = "http://example.com/"
template = %q{<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Recently modified example.com pages</title>
<link href="http://example.com/lib/recently_modified.xml" rel="self"/>
<updated><%= Time.now %></updated>
<author>
<name>Web Team</name>
<email>webteam@example.com</email>
</author>
<id>urn:uuid:60a76c80-d1e9-01d9-b91d-0003939e0af6</id>
<% @files.each do |file| %>
<%
file.gsub!(@base_path, '')
info = get_modified_info(file)
url = @base_url + file.gsub(@base_path, '')
%>
<entry>
<title>/<%= file %></title>
<link href="http://example.com/<%= file %>"/>
<updated><%= Time.at(info['date'].to_i) %></updated>
<summary>Modified by <%= info['name']%> (<%= info['email'] %>)
at <%= Time.at(info['date'].to_i) %>. This file has <%= is_valid?(url) ? "passed" : "failed" %>
W3C markup validation: http://validator.w3.org/check?uri=<%= url %>.</summary>
</entry>
<% end %>
</feed>
}
def is_valid?(url)
doc = Hpricot(open("http://validator.w3.org/check?uri=#{url}"))
return (doc/"title").inner_html.gsub(/.*\[/mis, '').gsub(/\].*/mis, '') == "Valid"
end
def get_modified_info(file)
if file.index('/')
info_file = @base_path + file.gsub(/\/[^\/]*$/, '') + '/_notes/' + file.gsub(/^.*\//, '').strip + '.mno'
else
info_file = @base_path + '_notes/' + file.gsub(/^.*\//, '').strip + '.mno'
end
data = XmlSimple.xml_in(info_file)
ret = Hash.new
data['infoitem'].each do |info|
ret['name'] = info['value'] if info['key'] == 'ccLastSubmitter'
ret['email'] = info['value'] if info['key'] == 'ccLastSubmitterEmail'
ret['date'] = info['value'] if info['key'] == 'ccLastPublishDate'
end
return ret
end
@files = `find #{@base_path} -mmin -60 | grep .html$`
puts ERB.new(template).result
Comment on this post