Replicate Complete Documentation

It is a set of CLASSes and templates that enable you to offer full Replication & Synchronisation (R&S) to your Clarion applications. R&S enables the sharing of data from one database (or more accurately: one site) to another. This was previously only available if you used SQL as your backend. Now it's available for any file driver: including TPS files!

Basically it works by logging all adds, changes and deletes done on a particular computer (say, a sales person's laptop) and then uses a 'transport' of your choice (a local area network (LAN), FTP or even email), to export the modifications to another site (say, the Head Office server), where these modifications are absorbed into that data 'set'. No more heartaches with attempting live links, hand-coded imports, user dependency, etc. This all happens on the fly and behind the scenes. It is foolproof! It doesn't even matter what file driver you use: you can even mix them up if you want!

How Does Replicate Work?

In a nutshell, you add an extra field and a key to each data file in your dictionary whose data you want to be able to synchronize. The field, called the Global Unique Identifier (named 'GUID'), is used to uniquely identify every change a user makes to his or her data 'set' when it is written to a log file.

There is even a little program (the Bulk Dictionary Editor) supplied that will add this field, its key and (optionally) a SiteID field to all the relevant files in your dictionary - automatically!

You will also need to create a Log Manager program. The easiest way to do this is to take one of the example APPs and recompile it with your dictionary, once you've added the necessary fields, etc.

How Do I Get It Working In My Application?

Decide how many 'sites' you have.

A 'site' is simply a set of data that one or more of your applications may use.

For example:
- There is a database at the Head Office. It's a sales database, with stock, orders, invoices and so on. There are four in-house sales people who sell over the phone and all four of them have access to the same database on the server. This is your first set of data or site.
- You also have a roving salesman with a complete database on his laptop. This is another set of data or site.
- Finally, you have a branch office with another two operators who use a database on their server. This is the third set of data or site.
So, in this example, we have three sites. (Although there may be seven operators running the program, there are only three sets of data - thus three sites.)
Label your sites appropriately.

For this, you use the Site Identifier which is a STRING(4).

The 'top' site, the site with no parent site, is called the Primary Site and you can only have one Primary Site for a particular application. The database at the Head Office will be the Primary Site and this should have a Site Identifier of B000. Why don't you label the Primary Site as A000? Because your client might expand in the future and you might need to add a new Primary Site that owns your existing one.

All sites below the Primary Site should be numbered so that it is easy to see who they belong to (e.g. B100, B200, B300 and so on). Think of the days when you had to number your BASIC programs. You always used line numbers in jumps of ten or a hundred, didn't you? This allowed the insertion of additional lines if necessary. Well, number your sites in that way too. It's good practice and could save tears later. Sub-sites or 'Children' of a site would continue this numbering, such as B110, B120 and so on. Stay with me, this will all become crystal clear in one moment.
Draw a Site Diagram Replicate requires a one-to-many parent-child site structure, each child site can only have one parent site (although a parent can, itself, be a child).

This is the Site Diagram of the example above.

Each box represents a site and the arrows represent the parent-child relationships, not the direction of replication. Replication is completely bi-directional. But a particular site can only synchronize with its 'parent' site. So both B100 and B200 can synchronize with B000 but not directly with each other in the field.

Here is the Site Diagram of a more complicated structure:

B000 is the Head Office and B100, B200 and B300 are regional offices. B110, B210, B220 and B310 are branches/stores/shops that report to those regional offices. The bottom layer are roving sales people with laptops. Branch B110 has three roving salesmen - B111, B112 and B113. Branch B220 has just one, B221, while branch B310 has two - B311 and B312.

The red crosses indicate illegal relationships. It is illegal for a site to have more than one parent. In reality, this means that salesmen B311 and B312, for example, cannot synchronize their data with each other while in the field, they have to do it through their branch B310.

Replicate can use the Site Identifier to distinguish between different sub-sets of data that are only relevant to certain 'branches' of the structure. For example, you might only want to distribute file changes that pertain to the B100 family (which in this case would include B110, B111, B112, B113) instead of the entire log file.
Follow the instructions in the main Replicate Help file to modify your dictionary to enable Replicate to work in your application or, even better, use the supplied Bulk Dictionary Editor utility to do it for you.

Incidentally, if you are sending out the 'Replicate-enabled' version of your application to existing users, there is a code template called SetSiteIfNew which will enable you to set all new occurrences of the Site field to whatever. Obviously, you will need to call a procedure, after FM3 has done its stuff, that runs this the first time a site is 'Replicate-enabled'.

Tip : Don't consider going down this road at all unless you have FM3/FM.
Basically ALL your structures are going to need converting, probably more than once. If you have to keep the files updated the hard way then it's going to be very painful - and it might even get to be impossible if you're dealing with already existing sites!
Create your Log Manager program.
Implement the 'communication' path between your sites. Replicate has this already built in when on a LAN. If you want something more flexible, such as email, we recommend you use NetTalk.
That's it!

I Need To Understand This a Bit More

The most important feature that makes Replicate different from other Replicate & Synchronisation systems is that it is totally AUTOMATIC and, therefore, 'invisible' to the end user.

Let's talk about the simplest example: an office computer and a laptop. Some of the time the laptop is in the office, connected to the office computer via a LAN, and sometimes it's out on the road. The laptop will ALWAYS use its own copy of the database. It will never see the office computer's data directly. When it's in the office, and connected to the network, any changes made by its user are made first to its local copy of the data, and then immediately sent to the office computer or server for merging into the main database. And vice-versa. Any changes made to the main database are immediately replicated in its database. We say immediately but you can configure what the synching interval is: daily, hourly, every couple of seconds: whatever suits the situation. The users won't even be aware of it because this process of synchronisation is entirely 'behind the scenes'. No-one actually chooses an option in the program called 'Synchronise'. Replicate ensures that the office computer and the laptop keep each other's data sets up to date, automatically.

Now, if the laptop user leaves the office (to go home or on a trip), the laptop is 'undocked' from the LAN physically. The transport detects that the connection has been broken and the Log Manager programs at both ends start 'logging' all database changes to a log file. Anytime he reconnects, either by redocking the laptop to the LAN or even dialing into the office through a modem, both sides are immediately updated with these changes.

The implementation of this is very easy. Firstly, you consider these as two separate 'sites' and you install your program onto both computers. During installation you set up some replication options such as pointing one system to the other. You will also set up some communication between the two: let's say for now, a LAN.

So How and When Does Synchronising Happen?

As soon as the two 'ends' can 'see' each other. As soon as a laptop is connected to the LAN, for example. If a log file exists on your machine, your Log Manager sends it off to your 'parent' database's Log Manager via the communication transport you select and receives any outstanding log file from the 'parent' database's Log Manager. The communication transport is then closed and the respective Log Managers then merge the data in the received log files with their data 'set'. It's as simple as that!

How Are Conflicts Handled?

Firstly, we need to determine what a 'conflict' is.

R&S doesn't add any more conflict issues than you already have when you run your program on a LAN. If User A & User B are simultaneously changing the same field, Clarion's Concurrency checking kicks in but if User A sets a date, say, to 12/12/02 and fifteen minutes later User B changes it to 12/12/03, does your program kick up a fuss? Of course not. So why should an R&S system? Acclaim Software (James Fortune)'s program Professional Investigator has always had an additional six 'audit' fields for every file/record: Date Created, Time Created, User Created, Date Last Amended, Time Last Amended, User Last Amended. These ensure that User A knows that User B last amended the record and when, so if there is a dispute, he knows who to ask. This would adequately cover any automatic changes done during a synchronisation, wouldn't it?

All you have to remember with R&S is that there is no record locking. So you will almost certainly want to limit access to most users depending on their "responsibilities" and so on. It is OK for "everyone to change anything" - but if User A makes a change to a record, and User B also makes a change to the record, then one of the changes will be lost. Any security template which
supports limiting access based on the record being changed (such as SecWin) should do the trick.

Some R&S systems simply say that during synchronisation someone has to be in charge to determine which value is the right one if a field in a record has a different value at 'both ends'. They describe it like this 'MyGreatProgram makes it easy to resolve conflicts by presenting you with a wizard that shows you the exact detail on each conflict between the master file and the replica.' Notice the word 'you' is used twice: who is 'you'? In the real world, this 'you' could be many different people and, indeed, in many cases this 'you' simply doesn't exist. The problem is that this person will be different depending on the data being changed. Who will be the 'you': the 'syncher'? The person in charge of one part of the data? Even if you make one person in the organisation responsible for all the synching then most of the information would be nonsense to them. And what if he or she is not available? No matter who the syncher is, 99% of the stuff it will ask him to decide about he will have to go to someone else. Now let's say you're the person who gets a few of these every day. You don't know who was right (Bob? Pete? Mike?) so you have to phone at least two of them. Assuming they're in. the whole process takes you maybe five minutes. So you can do roughly ten of these an hour. This is completely unscalable - it's a pain with 10 users, a major problem with a 100 users and completely unworkable at 1000! How long before 'you' start ignoring the changes completely? Which of your clients is going to thank you for the extra burden?

Finally, anything "manual" can be done wrong. Replicate does it all automatically!

So What About Unique Keys?

Modern programs should not try to 'molly-coddle' the user in the way that old programs did. You don't know if the user has four customers called JOHN WARD, do you?

We recommend that you remove the UNIQUE attribute from almost every key you have. If you really do need a key to be unique, case number or something, then you must tell your user to introduce rules where only one site can enter cases or what have you.

Secondly, to ensure there isn't a clash of your 'invisible, meaningless, primary key' (remember Dr Codd?), we suggest concatenating the SiteID in front of it:

CUS:ByCustomerID KEY

Containing Fields: SiteID STRING(4) ; CustomerID LONG

Can Replicate Help Network Traffic?

Yes. You see, synchronisation is basically a movement of a 'log' file. A text file. An XML file actually.

You could set up each machine on a network as another site. Log 'sharing' is done using Replicate's built-in 'instant' transport system.

What are the advantages?

Each user runs his own local copy of the program (fast) using his own local data. A very easy installation. You don't even need to ensure that every user has the same version of the program as the EXE only 'sees' its own local data files
You can take the server down for normal maintenance and all the workstations still run (off their local data). The users doesn't even know
that the network is down.!
You have lots of very up-to-date backups.
Reports run quicker, since all reads (lookups, processes, reports,
browses etc) are local.
The programs around the network can be different versions. After all,
you can't expect to upgrade all the machines at the same time - and during a beta cycle you wouldn't want to. (When you do want all the programs to be the same version, FM2's AutoNet comes into its own).
File-drivers can be different. You could have a version of your program that runs TPS files on the laptops and SQL at head-office, for example.

CapeSoft Replicate

REPLICATE FOR DUMMIES

By James Fortune

So What is Replicate?