J2EE integration module

Introduction
Based on the same architecture as ASP.NET integration module and PHP integration module, the J2EE module allows full integration of ClickTale with dynamic, session-based, behind-a-login and POST-processed pages.

The module is composed of a filter, a servlet and a test page and is added to the website as any other library.

Technical notes
If you are using struts version 2 and above, you may need to edit struts.xml and add a line to exclude our servlet from being processed by struts. See the line below:



For older struts versions:
 * Struts 1.3 - no changes to struts configuration are required.
 * Customers with JDK 1.5 should use the J2EE integration module version compiled with JDK 1.4 found below – tested and works

This Integration module is for J2EE sites using dynamic content which changes for each user independently. For example, shopping carts are often such pages. It will allow you to store the exact version that the user saw and allow our fetching bot to collect that version so it will be available to you in the playback of visitors visits and other reports. The J2EE Integration Module adds further support to cases that were not previously covered, such as POST processing pages.

The module allows better integration of a J2EE site with ClickTale by enabling accurate caching of the HTML that is sent to the visitor in ClickTale recorded pageviews.

Installation Guide
 Remove the existing ClickTale code and/or disable other integration methods. Download the distribution of the latest version of the ClickTale J2EE Integration module that suits your environment the most: for JVM 1.6 and above  for JVM 1.6 and above with dependencies - all referenced libraries packaged into single JAR package for quick deployment. for older JVM (1.4 and 1.5)  Go to the folder C:\Program Files\Apache Software Foundation\Tomcat 6.0\webapps\ \WEB-INF\lib  Place the J2EE IM JAR file in the \WEB-INF\lib. The ClickTale IM JAR file naming convention is as follows: o For java1.4: clicktale-filter-java1.4-<ClickTale IM version number>.jar o For java 1.6: clicktale-filter-<ClickTale IM version number>.jar (or clicktale-filter-with-deps-<ClickTale IM version number> .jar)</li> Decompress the clicktale-sample.war file.</li> Copy the following 2 configuration files from the \WEB-INF folder of the archive into your application's WEB-INF folder: ClickTale.properties</li> ClickTaleScripts.xml</li> </ol> Optional - Install the ClickTale sample application by placing the clicktale-sample WAR archive (clicktale-sample.war file included in the distribution package) into the Tomcat webapps folder. After installation of the ClickTale sample application (clicktale-sample.war file) you can navigate your browser to  http://yoursite/clicktale-sample/clicktale/index.html  to view some helpful information regarding the caching and other configurations. This can also be used as an initial instant test tool for validating proper integration module operation.</li>

Validate that the ClickTale J2EE IM dependencies are present in the same folder and verify their version numbers as detailed below: * Optional – required when caching with Terracotta’s ehcache library ** Optional – required if logging with the log4j library

The official releases of the above mentioned libraries are available at the following locations:  For slf4j-api and slf4j-log4j12 visit http://www.slf4j.org/dist/  For log4j visit http://archive.apache.org/dist/logging/log4j/  For common-codecs go to http://archive.apache.org/dist/commons/codec/  For ehcache (first log-in, then access) http://terracotta.org/downloads/open-source/destination?name=ehcache-core-2.4.4-distribution.tar.gz&bucket=tcdistributions&file=ehcache-core-2.2.0-distribution.tar.gz, instead of 2.2.0 in the URL, please type version number you need. </li> Modify your WEB-INF\web.xml file to include the filter and servlet declarations as in the web.xml file that is included in the archive. Below is an example of the file containing both the ClickTale Module settings and the Cache Provider settings: Please note: If there are multiple filters declared in web.xml, it is recommended to put the ClickTale first in the order. </li> </ol>

Replacement Rules Filter
The J2EE module includes the ability to modify or remove certain code from the pages saved to the cache. This prevents a situation in which sensitive data (PII or business-related) is sent over the internet to the ClickTale servers.

If you would like to know more about the filters, click 'Expand' below. Click 'Collapse' to close the section.

Configuring the J2EE Module
 Generate the ClickTale tracking code you want to use in your pages (or locate your already generated code).</li>

Locate the WEB-INF/ClickTaleScripts.xml file that comes in the clicktale-sample.war file and open it in a text editor. Paste your ClickTale Tracking Code from the previous step into the appropriate CDATA fields in the ClickTaleScripts.xml file.</li> Add the cache fetching redirection code, according to your tracking code type:    (How can I find which code type my project is using?) </li>

If you have code type Atlas, click 'Expand' below to see the instructions for placing your code. Click 'Collapse' to close the section.

If you have code type Balkan, click 'Expand' below to see the instructions for placing your code. Click 'Collapse' to close the section.

PLEASE NOTE: This is how the code looks like in a text editor. In a browser, "&amp;lt;" will be converted to "<" and "&amp;gt;" will become ">"

PLEASE ALSO NOTE: This code sample relates to HTTP and HTTPS page tracking. Your account may or may not have the option to record HTTPS pages. Please check and consult with your account manager, if you need assistance.</li>

</ol>

Optionally: see here to disable or enable yourself from being recorded. Apply this to everyone who is working with you on the site otherwise they will use recording quota, or enable this for testing to make sure all is working correctly.

Integration Module settings
The ClickTale J2EE Integration Module settings can be configured using the clicktale.properties file located in the WEB-INF folder. The list below documents all the configuration options, including their default values. The default values are used when the configuration options is not specified in the clicktale.properties file.

  ScriptsFile - relative path (local URL) to the xml file that contains the JavaScripts to be injected into the served pages Default value: ClickTaleScripts.xml </li>  DoNotProcessCookieName - ClickTale's web recorder ID cookie name Default value: WRUID </li>  DoNotProcessCookieValue - Value of the ClickTale's web recorder ID cookie that specifies that the recording should not be performed. Default value: 0 </li>  Disable - provides an easy way to disable the ClickTale filter and the caching without the need to restart the application. Default value: false </li>  AllowedAddresses - limiting of IP addresses to serve the cached pages to.

</li> <li> IPAddressHeaderFieldName - in case of presence of reverse proxy, the setting below should be used to change the default behavior of the allowed addresses check normally, the IP address of the source computer will be checked against the list, but in case of reverse proxy, usually X-Forwarded-For HTTP header is used Default value: X-Forwarded-For </li> <li> DeleteAfterPull - determines if the cached page will be immediately removed from cache when the ClickTale fetcher bot requests it. Default value: true </li> <li> IgnoreHttpStatusCode - by default, in case the HTTP request processing chain sets the HTTP status code to anything other than 200, the filter will not cache the page nor perform the insertion if the below line is uncommented, the behavior is changed and the filter will perform the injection regardless of the HTTP status code set by the HTTP request processing chain. Default value: true </li> <li> CacheProvider - to use ehcache cache, set this option to com.clicktale.cache.impl.EhCacheProviderImpl. to use hash map cache, set this option to com.clicktale.cache.impl.HashMapProviderImpl Default value: none </li> <li> EhCacheName - cache name in the ehcache to use. recommended value clickTaleCache. relevant only in case of ehcache based cache. Default value: none </li> <li> MaxCachedPages - the maximum number of the cached pages. relevant only in case of hash map-based cache. Default value: 100 <li> FilterRulesFilePath - Optional. The path to the FilterRules.xml file, containing filter rules for removing unwanted data (PII, for example) from the cached content. Default value: /WEB-INF/FilterRules.xml Two default rules are included in the filter, allowing you to remove any data from the cached HTML that is surrounded by "ClickTaleExcludeBlock" comment tags. <li>RunFilterRulesBeforeCache - Optional. Run the above filter rules before the page is cached, default - false

<li>CacheBeforeInjection - Cache the page before injecting the code. optional, default - true NOTE: using CacheBeforeInjection = false in conjunction with RunFilterRulesBeforeCache = true will have its performance penalty. It will cause running the regular expressions twice. For your consideration.

<li> UnauthorizedCacheAccessRedirectEnable=false - Unauthorized ClickTale cache servlet requests may be configured to cause redirect to a pre-configured URL. Default value is false. </li> <li>UnauthorizedCacheAccessRedirectUrl=.... - Unauthorized ClickTale cache servlet requests will be redirected to this URL in case UnauthorizedCacheAccessRedirectEnable is set to true. Default value is http://www.clicktale.com/

</ol>

Enabling Logging
After the files are copied you can navigate your browser to  http://yoursite/clicktale/index.html  to view some helpful information regarding the caching and other configurations. Changing info to debug in the log4j.properties file will output debug info to the console, this can be sent directly to a log file by editing the text to read: Note that the rootLogger value would need to be changed from info to debug

Below is an example of defining a filter for log4j configuration, such that only specific entries are outputted:

Depending how the root folder is defined for your server, you might need to set the path of the log file to ../logs/clicktale.log or use another (absolute) path to enable logging correctly.

Please ensure that production servers are set back to "info" when live.

"Invalid <url-pattern>" error message
It is possible that you will receive this error message with older version of Java (1.4) or specific web servers (Tomcat 5.5, etc). The error will appear in the log files of your web server. Additionally, pages on the site related to the modified web.xml file will not load. To resolve the issue edit the web.xml file and modify the value of the <url-pattern> element for the ClickTale module such that it will be valid for your environment. We had good results replacing "*" with "/*"

Abnormal Status Code (tracking code partially injected or not injected at all)
If the module isn't injecting the tracking code all or some of the pages, this might be the result of an incorrect HTTP status codes in your application (this can be verified in the server logs, where you might see lines such as: com.clicktale.filter.ClickTaleScriptInjectFilter : abnormal status code XXX, where XXX is any status code apart from 200). To overcome this un-comment the IgnoreHttpStatusCode setting in the module's configuration file.

Configuring the caching provider
The module caches the content of the pages so it can later provide the content to ClickTale servers for processing. This caching requires some persistent storage and so different caching providers are supported. ClickTale J2EE IM is capable of working with arbitrary caching system. The ClickTale J2EE IM package comes with two cache provider implementations:

- The HashMapProvider cache provider does not have any external dependencies and uses RAM memory to store the cached pages. It has the minimal necessary functionality to operate the cache. This is the default provider set. - The EhCacheProvider enables the ClickTale J2EE IM to store the cached pages using the Terracotta ehcache library, which is commonly used for caching in web applications. This library has many configuration options and can work with RAM and HDD separately and simultaneously. Also, ehcache can facilitate the centralized cache for a web server farm.

You can enable its provider by editing the ClickTale.properties file, commenting: and un-commenting:

Below are two examples of caching providers which could be integrated with the module: <ol> <li>EhCache - The following ehcache.xml files are the recommended base for the configuration for de-centralized cache- the first example is a local cache and the second is a distributed. </li> <li>Example configuration for scenario with Terracotta central cache server: </li> </ol>

Caching provider settings

 * DeleteAfterPull = (true / false)
 * If true, the cached page is deleted after clicktale fetches it from your site
 * MaxCachedPages = (integer)
 * The size of the cache dir in MegaBytes. If the number of cached pages cache gets larger than that, stale caches are removed

Use With "Expires" Header And Enable Reuse Of Cached Pages
Some websites use "cache-control" and "expires" headers to cache pages on the client. This improves performance but may be problematic when used with our module. Pages are removed from the module's cache right after they are being accessed (for security and performance reasons). So, if a visitor browses a page more than once without refreshing the content from the server (this is usually a result of using the back button), the cache will be called several times with the same token. This will cause a cache miss for any request beyond the first one. To overcome this problem, it is possible to use the DeleteAfterPull="false" parameter in the configuration. This will disable the removal of cached pages when the ClickTale cache is called. Cached data will be removed after MaxCachedPages="???" new pages are cached, so you might want to extend this parameter as well to allow sufficient traffic between the first pageview and the next (duplicate pageview).

Q&A

 * Q: Are cached pages protected from access by third parties?
 * A: Yes, several layers of protection are in place. Only certain IPs are allowed to request the cached pages (IPs of ClickTale servers) and only processes which already have access to the page have the secret token required to request the cached content.
 * You can redirect any unauthorized access attempt to the cache (bots, 3rd-party tools) to a different URL, by using the UnauthorizedCacheAccessRedirect settings in the ClickTale.properties file.


 * Q: Is it possible to inject the script in other places rather than after/before body tags?
 * A: Yes. By default the top script is injected after and the bottom before, but this can be changed by adding a InsertAfter attribute to the script[name="Top"] element or adding a InsertBefore attribute to the script[name="Bottom"] element, both are regular expressions.


 * Q: Some of my pages already have the ClickTale script, I do not want the script appear twice.
 * A: You should either remove the script from those pages and let the module handle the insertion or you should use DoNotReplaceCondition. See the code step 2 for an example. The default script in Step 2 is already configured to prevent double inclusion.


 * Q: After installing the module, I tried to watch a recording but I got the following notice instead: "Request from an unauthorized IP." . What should I do?
 * A: This could be a misconfiguration, a change in our IP addresses or a hacking attempt. Please contact us so we can investigate this further.


 * Q: I have installed the module but I don't see the tracking code in the source of the page. What is wrong?
 * A: The module will only inject the code for visitors who are classified as "to-record" (have WRUID cookie with non-zero value) or for those who are not classified (no WRUID cookie). Visitors who are classified as "not-to-record" will get no code.


 * Q: I'm using a proxy server for my website - what steps should I take to record properly?
 * A: Move the IP restriction rule to the proxy and set the IP in ClickTale.properties (located in the WEB-INF directory) to the IP of the proxy.


 * Q: Can I filter out specific pages from recording?
 * A: You may use the filter-mapping/url-pattern entry in the web.xml to set filters with some wildcard options.


 * '''Q: Can I make a programmatic decision if a page should be cached and recorded.
 * A: Yes - since module version 1.5. You should set the request attribute 'clickTaleDoNotProcess' to 'true' on pages you do not want to be processed.

Using a proxy server
If you have a proxy, whenever our servers will try to get the cache they will appear to have the address of the proxy which is not allowed by default. To fix this, you should set the IP address of the proxy to the AllowedAddresses value in the module's ClickTale.properties file and set a rule with your proxy to allow only our servers IP range access to the cache. This is best done by setting the wildcard hostname *.clicktale.net

If you need to use a static IP range (NOT recommended), our servers' IP ranges are: 5.153.32.64/29 50.97.162.64/26 (stands for 5.153.32.64 - 5.153.32.71 and 50.97.162.64 - 50.97.162.127).

Another way to address this issue is to leverage the proxy's feature of forwarding the original IP. The proxy will normally keep the original IP address of the client in an HTTP field (most usually X-Forwarded-For). You can configure the ClickTale J2EE IM to check the AllowedAddresses setting against that field instead of against the source IP address (which will always be the proxy IP address). So, to configure the exact name of the HTTP field that determines the original IP you should add the following line to the ClickTale.properties file:

IPAddressHeaderFieldName=X-Forwarded-For

NOTE: please replace the work X-Forwarded-For with the name of the ip address header field set by the proxy.

Reference
Integration module - Good read, for a general understanding of our integration modules.