Insert pre-defined template text to textfield using JQuery

Currently I am developing a web-application on a NoSQL database that would query hundreds of thousands of structured (XML) files. This is going to be an analytical tool to get your search results back depending on XPaths. One assumption is that non-technical teams may use and they are note necessarily are good in anything-xml. To help them out, I hid all “nastiness” behind HTML/JQuery.

For example, let’s have this as HTML:

Jquery (it would need to be loaded on document load):

Some (very) simple CSS:

You can find this example on JFiddle link: http://jsfiddle.net/LmsB6/

Efficient XPath Search using Marklogic

One of the problems of analysing structured (XML’y) data is efficiency vs accuracy problem: would you be 100% accurate or fast? This is where NoSQL databases can be helpful if used wisely with Indexes.

I have been given a problem where application needed to be developed that would crawl through hundreds of thousands of XML files and bring back accurate XML results. This is a challenge in it’s own domain and first questions that would pop to anyone’s minds are these:

  • what technology can I use
  • how big and how many files is a data set made of
  • how complex XPaths are
  • how efficiently could I remove as much non-necessary data
  • are false-positives an issue

If some of these questions you asked yourself, code below may help you out.

First, some theory and algorithmic ideas along with how this can be implemented on MarkLogic. Let’s create a hypothetical problem: we have hundreds of thousands of files that are never bigger than 2 MB in size. All of our files are in a well-formed XML format. We may or may not about existing schemas of these structured files along with their namespaces. Let us also assume that we classify at high level using collections. As for underlying technology, we assume all our data is in the same database with some of built-in indexes enabled.

With these assumptions in place, we could start thinking about the best way to implement this. As most of the time, best search algorithms try to remove the biggest chunk of data first. This makes a total sense – if you classified your data well using collections, you would be able to re-use MarkLogic’s collection index, therefore removing data we are not interested quickly and efficiently.

It is also worth to note that we do not expand (open XML file(-s) to run an XPath) until the last critical moment: this means we only try filter data using indexes. Let us take a look at diagram below:

XPath-Search-Algorithm
XPath Search Algorithm

As you can see, it is quite simple. Only two main phases that are very  different. On the left hand side, we have an efficient way of filtering data using indexes. On the other side, we have slow but accurate run of XPaths on every single document in a database.

So, let’s start with the first step: this algorithm relies on cts:uris() function that does an intersection on all document URIs that fit specific query. Firstly, it is collection intersection. You may think of this as Venn Diagrams – pretty simple, eh?

get-collection-intersections

After that, where an interesting part starts. What we try to do is to bring only those document uris that have specific elements in any order in a document. Why? so we remove any documents from final result that do not have a specific element from XPath that we are querying as it would not be a successful result back anyway.

get-docs-having-specific-elements

Last, but not least, we  just go through all files and expand them using a list of cts:uris() and run them against XPath.

Algorithm is also available as PDF from here.

 

Enable mcrypt PHP module on Ubuntu

I have had problem with enabling mcrypt PHP5 module on Unix. This has helped:

Final result was this:

Mcrypt PHP5 module
Mcrypt PHP5 module

Installing MarkLogic on Redhat

Download the latest MarkLogic Redhat version from here: http://developer.marklogic.com/products

Install it using yum:

If it does not work and yum is complaining about package signing, use

By default, MarkLogic is installed in /opt/MarkLogic directory whereas forest (data) is store at /var/opt/MarkLogic directory. Latter was not it was intended to be for my configuration: Data should have been stored on different drive, therefore symlink needed to be created.

Once done, start it up with

When this is done, go to http://localhost:8001, an admin console and finish configuration.

 UPDATE:
I also wrote a BASH script to do this automatically so it could be reused on Amazon Web Services.

 

Apache’s mod_rewrite

Sometimes it is necessary to rewrite URLs that we pass to our web server. This is sometimes is called a “clean URL” and is more human readable. Furthermore, some search engine web crawlers prefer having clean urls rather than URLs with many different parameters.

In order to turn this feature on apache, install “mod_rewrite” module. Once this is done, run:

That is it!

PHP Fatal error: Allowed memory size of {} bytes exhausted

I have been playing around with some php and mysql modules on my Apache2 test server. Unfortunately, I was constantly getting random errors until I looked at apache error.log file. This is how it looked:

[Mon Aug 19 15:12:15 2013] [error] [client 10.0.235.4] PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 78 bytes) in /path/to/your/system/system.install on line 1187, Continue reading