ATOM/RSS feed for recently modified Dreamweaver pages

The prob­lem

Where I work, there is a cer­tain enor­mous web­site that we main­tain. Tra­di­tion­ally, con­tent has moved from sub­ject mat­ter experts to the edi­tors and then to the web team. In the spirit of sim­pli­fy­ing processes, we’ve recently adopted put in place a sys­tem that allows sub­ject mat­ter experts to be in more con­trol of their content.

Ini­tially, as with most CMSs, we expected this to dra­mat­i­cally reduce the about of repet­i­tive work imposed on the web pro­fes­sion­als, e.g., “here’s some text; put it on the web”. While this proved to be true, we quickly dis­cov­ered that non-web folks don’t always pro­duce valid HTML. Kind of a no-brainer. This prob­lem seems to be the clas­sic trade off between poor con­tent and poor HTML.

The solu­tion

To com­bat this, we’ve put in place a work­flow that pro­vides our tech­ni­cal team a non-intrusive way of mon­i­tor­ing this web con­tent. We’ve built a script that gen­er­ates an ATOM feed of recently mod­i­fied pages on our web sites. This feed tells us who mod­i­fied which page, when, and whether or not the page passes a W3C val­i­da­tion. This way, we can do ‘tech­ni­cal edi­to­r­ial’ on only the pages that really need it.

We run this every 30 min­utes via a cron job. The out­put is routed to a recently_modified.xml file in the web­site (this is the ATOM feed).

recently_modified.rb

Down­load recently_modified.rb

require 'rubygems'
require 'xmlsimple'
require 'erb'
require 'hpricot'
require 'open-uri'

@base_path = File.expand_path($0).gsub(/lib\/recently_modified.rb/, '')
@base_url = "http://example.com/"

template = %q{<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
	<title>Recently modified example.com pages</title>
 	<link href="http://example.com/lib/recently_modified.xml" rel="self"/>
 	<updated><%= Time.now %></updated>
 	<author>
  	<name>Web Team</name>
   	<email>webteam@example.com</email>
	</author>
	<id>urn:uuid:60a76c80-d1e9-01d9-b91d-0003939e0af6</id>
	<% @files.each do |file| %>
	<%
	  file.gsub!(@base_path, '')
	  info = get_modified_info(file)
	  url = @base_url + file.gsub(@base_path, '')
	%>
	<entry>
      <title>/<%= file %></title>
      <link href="http://example.com/<%= file %>"/>
      <updated><%= Time.at(info['date'].to_i) %></updated>
      <summary>Modified by <%= info['name']%> (<%= info['email'] %>)
      at <%= Time.at(info['date'].to_i) %>.  This file has <%= is_valid?(url) ? "passed" : "failed" %>
      W3C markup validation: http://validator.w3.org/check?uri=<%= url %>.</summary>
   </entry>
	<% end %>
</feed>
}

def is_valid?(url)
  doc = Hpricot(open("http://validator.w3.org/check?uri=#{url}"))
  return (doc/"title").inner_html.gsub(/.*\[/mis, '').gsub(/\].*/mis, '') == "Valid"
end

def get_modified_info(file)
  if file.index('/')
    info_file = @base_path + file.gsub(/\/[^\/]*$/, '') + '/_notes/' + file.gsub(/^.*\//, '').strip + '.mno'
  else
    info_file = @base_path + '_notes/' + file.gsub(/^.*\//, '').strip + '.mno'
  end

  data = XmlSimple.xml_in(info_file)
  ret = Hash.new

  data['infoitem'].each do |info|
    ret['name'] = info['value'] if info['key'] == 'ccLastSubmitter'
    ret['email'] = info['value'] if info['key'] == 'ccLastSubmitterEmail'
    ret['date'] = info['value'] if info['key'] == 'ccLastPublishDate'
  end

  return ret
end

@files = `find #{@base_path} -mmin -60 | grep .html$`
puts ERB.new(template).result

Comment on this post

You may use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>