This page is part of a searchable archive of the Code Style site log. Technical implementation notes that shed light on when, why and how the site has evolved since 2000.
Follow the latest entries to the site log.
Took some time to investigate the problem found on 11th November, where a new RewriteEngine directive was causing a 403 forbidden error with the site feedback script, Soupermail. Ultimately discovered the diagnosis was logged in the Apache error log: "...Options FollowSymLinks or SymLinksIfOwnerMatch is off which implies that RewriteRule directive is forbidden: /path/to/script". The simple addition of the SymLinksIfOwnerMatch directive to the script directory's .htaccess configuration solved the problem.
Options -Indexes SymLinksIfOwnerMatch ExecCGI
Also added an escape slash to the dot in the regular expression for the ejupiter.com bot. The F command on the rewrite rule issues an HTTP 403 error as intended.
RewriteCond %{HTTP_USER_AGENT} ejupiter\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NaverBot
RewriteRule .* - [F,PT]
Actions: Ask a question about this post, seek clarification or offer a correction.
Added several new questions to the Java language FAQ section and took the opportunity to re-group the questions, and those in the Java servlets FAQ section. Re-instated a getRequestDispatcher() question that was dropped in recent changes.
Thread and Runnable types?
Runnable with inheritance?
Actions: Ask a question about this post, seek clarification or offer a correction.
Went back to process the Code Style server logs for 2005 through the RSS user agent analysis system and added several new aggregators. Divided the RSS user agent listing into separate pages for Web, browser and email based aggregators, desktop readers, readers for mobile devices and RSS tools.
New agents include Web based services Bommie, RSSfeed and Etamp, RSS tools MyHeadlines and RSS Reader Plugin, and FreeNews in the RSS readers for mobile devices category.
Actions: Ask a question about this post, seek clarification or offer a correction.
Made some final amendments to the RSS user agent database to de-duplicate some aggregator names and standardise the capitalisation and spacing on others. The final version adds three new categories to the classification: browser based readers, email based readers and readers for mobile devices. Updated the public RSS user agents page with the full listing, which lists dozens of new aggregators, including Internet Explorer 7.0 beta.
Actions: Ask a question about this post, seek clarification or offer a correction.
Completed the working draft of the RssAgentLogger class and refined the SQL post-processing scripts. Processed all Code Style log files for 2004 and added many new aggregators to the master reference table, with amendments to existing names and URLs. Also refined the HTML output from the RssAgentLogger class to serve as a drop-in replacement for the current RSS user agents list.
Actions: Ask a question about this post, seek clarification or offer a correction.
Refactored the draft RssAgentLogger class to create separate methods for processAgents and printAgents. The first loads the agent identifiers into a temporary database table via an Analyser instance, the latter prints out HTML formatted details of each aggregator after the data have been enhanced and classified with a series of SQL scripts.
The first SQL script identifies RSS agents from their identifiers and adds an agent name field and other details. The second script matches the agents with known aggregator names and URLs and updates the client table.
Retrospectively loaded all aggregator data from the RSS user agents listing and then all user agent data from the Metacentric service logs for the past 4 months to test.
Brought the working draft Analyser package up to coding standards and discarded the static parseClients method.
Actions: Ask a question about this post, seek clarification or offer a correction.
Discovered the recent addition of Apache rewrite rules to the root level .htaccess configuration had been causing HTTP 403, access forbidden, errors on the Soupermail feedback script. Not immediately obvious why this would cause a conflict, so simply removed the less critical rewrite rules for now.
Actions: Ask a question about this post, seek clarification or offer a correction.
A number of indexing spiders have fallen into the trap set on 13th October. The path /badbot is prohibited by the robots.txt policy but Googlebot 2.1 fell in, with a number of other user agents:
ejupiter.com
Googlebot/2.1 (+http://www.google.com/bot.html)
OmniExplorer_Bot/4.32 (+http://www.omni-explorer.com) WorldIndexer
Actions: Ask a question about this post, seek clarification or offer a correction.
MKSearch is a free, open source search engine that indexes structured metadata in Web documents, not free text in the document body. The data acquisition system conforms to the Dublin Core metadata in HTML recommendations; supports other application profiles, such as the UK e-Government Metadata Standard; and indexes native RDF formats, including RSS 1.0.
The MKSearch system has five major components:
meta and link elements
The two main elements of the MKSearch system can be used independently. The data acquisition system can be used to gather large quantities of metadata from the Web and store it as RDF. The query system can be used to provide a typical search engine-style interface to existing RDF content.
The MKSearch beta 1 distribution includes sample configurations that crawl a Web site and create:
This distribution also includes a demonstration of the MKSearch query interface, in the form of a Web Application Archive (WAR) that can be deployed directly to an existing servlet container. The sample search content is from an index of the MKSearch project Web site on 2 November 2005. See the site documentation below:
MKSearch is written in the Java programming language and is designed to run on any platform that supports a Java environment equivalent to the Sun Java 2 language specification.
The system has specifically been designed, developed and tested to run on GNU/Linux systems using the GNU Compiler for Java (GCJ) and Apache Tomcat 5 servlet container, as available on Fedora Core 4. This provision means that MKSearch can be built and run on software systems that are entirely open source and free from proprietary licensing.
The system has been tested extensively using the Sun Java SDK 1.5 on Microsoft Windows 2000. JUnit test suites for the MKSearch code base cover 99% of all code branches.
Actions: Ask a question about this post, seek clarification or offer a correction.
Find technical implementation notes on all aspects of the Code Style site.
For a summary overview, see the annotated site log contents.