REPLICATE FOR DUMMIES
By James Fortune
Note: This is purely the opinion of the author. While 
CapeSoft, it's employees  and owners (hereafter referred to as 'we') have done our best to proof read this 
document, we cannot be held responsible or liable for any damages, misunderstandings 
or losses whether implied or specific, direct or indirect because of the use of 
this document. By reading this document it is implied that you agree to these 
conditions.
Do your users drive you mad asking 'Can I copy my data onto my laptop'? Are you  
asked by your bigger clients to design systems so that their branch offices can  
update the Head Office with their 'figures' each evening? And what about roving  
sales people: all with different sets of data on their laptops? These are nightmare  
scenarios, aren't they?
 
Well, not any more! Replicate is here to do all this for you and more with just  
a few clicks of your mouse!
 
So What is Replicate?
It is a set of CLASSes and templates that enable you to offer full Replication  
& Synchronisation (R&S) to your Clarion applications. R&S enables  
the sharing of data from one database (or more accurately: one site) to another.  
This was previously only available if you used SQL as your backend. Now it's available  
for any file driver: including TPS files! 
 
Basically it works by logging all adds, changes and deletes done on a particular  
computer (say, a sales person's laptop) and then uses a 'transport' of your choice  
(a local area network (LAN), FTP or even email), to export the modifications to  
another site (say, the Head Office server), where these modifications are absorbed  
into that data 'set'. No more heartaches with attempting live links, hand-coded  
imports, user dependency, etc. This all happens on the fly and behind the scenes.  
It is foolproof! It doesn't even matter what file driver you use: you can even  
mix them up if you want!
 
How Does Replicate Work?
In a nutshell, you add an extra field and a key to each data file in your dictionary  
whose data you want to be able to synchronize. The field, called the Global Unique  
Identifier (named 'GUID'), is used to uniquely identify every change a user makes  
to his or her data 'set' when it is written to a log file. 
 
There is even a little program (the Bulk Dictionary Editor) supplied that will  
add this field, its key and (optionally) a SiteID field to all the relevant files  
in your dictionary - automatically!
 
You will also need to create a Log Manager program. The easiest way to do this  
is to take one of the example APPs and recompile it with your dictionary, once 
you've added the necessary fields, etc.  
How Do I Get It Working In My Application?
	- Decide how many 'sites' you have. 
 
 A 'site' is simply a set of data that one or more of your applications  
		may use.
 
 For example:
 
 
		 	- There  is a database at the Head Office. It's a sales database, with stock, orders,  
				invoices and so on. There are four in-house sales people who sell over the  
				phone and all four of them have access to the same database on the server.  
				This is your first set of data or site.
- You also have a roving salesman with a complete database on his laptop. This 
				is another set of data or site.
- Finally,  you have a branch office with another two operators who use a database on 
				their server. This is the third set of data or site.
 So, in this example, we have three sites. (Although  
		there may be seven operators running the program, there are only three sets  
		of data - thus three sites.)
-  Label your sites appropriately.
 
 For this, you use the Site Identifier which is a STRING(4).
 
 The 'top' site, the site with no parent site, is called the Primary Site  
		and you can only have one Primary Site for a particular application. The database  
		at the Head Office will be the Primary Site and this should have a Site Identifier  
		of B000. Why don't you label the Primary Site as A000? Because your client might  
		expand in the future and you might need to add a new Primary Site that owns  
		your existing one.
 
 All sites below the Primary Site should be numbered so that it is easy  
		to see who they belong to (e.g. B100, B200, B300 and so on). Think of the days  
		when you had to number your BASIC programs. You always used line numbers in  
		jumps of ten or a hundred, didn't you? This allowed the insertion of additional  
		lines if necessary. Well, number your sites in that way too. It's good practice  
		and could save tears later. Sub-sites or 'Children' of a site would continue  
		this numbering, such as B110, B120 and so on. Stay with me, this will all become  
		crystal clear in one moment.
-  Draw a Site Diagram Replicate requires a one-to-many parent-child site structure, each child  
		site can only have one parent site (although a parent can, itself, be a child).
 
 This is the Site Diagram of the example above.
 
   
 Each box represents a site and the arrows represent the parent-child relationships,  
		not the direction of replication. Replication is completely bi-directional.  
		But a particular site can only synchronize with its 'parent' site. So both B100  
		and B200 can synchronize with B000 but not directly with each other in the field.
 
 Here is the Site Diagram of a more complicated structure:
 
  
 
 B000 is the Head Office and B100, B200 and B300 are regional offices. B110,  
		  B210, B220 and B310 are branches/stores/shops that report to those regional  
		  offices. The bottom layer are roving sales people with laptops. Branch B110  
		  has three roving salesmen - B111, B112 and B113. Branch B220 has just one, B221,  
		  while branch B310 has two - B311 and B312.
 
 The red crosses indicate illegal relationships. It is illegal for a site  
		to have more than one parent. In reality, this means that salesmen B311 and  
		B312, for example, cannot synchronize their data with each other while in the  
		field, they have to do it through their branch B310.
 
 Replicate can use the Site Identifier to distinguish between different  
		sub-sets of data that are only relevant to certain 'branches' of the structure.  
		For example, you might only want to distribute file changes that pertain to  
		the B100 family (which in this case would include B110, B111, B112, B113) instead  
		of the entire log file.
-  Follow  the instructions  
		in the main Replicate Help file to modify your dictionary to enable Replicate  
		to work in your application or, even better, use the supplied Bulk  
		Dictionary Editor utility to do it for you.
 
 Incidentally, if you are sending out the 
		'Replicate-enabled' version of  
		your application to existing users, there is a code template called SetSiteIfNew  
		which will enable you to set all new occurrences of the Site field to whatever.  
		Obviously, you will need to call a procedure, after FM3 has done its  
		stuff, that runs this the first time a site is 'Replicate-enabled'.
 
 Tip  : Don't consider going down this road at all unless you have FM3/FM.
 Basically ALL your structures are going to need converting, probably  
		more than once. If you have to keep the files updated the hard way  
		then it's going to be very painful - and it might even get to be impossible  
		if you're dealing with already existing sites!
- Create your Log Manager program.
- Implement  the 'communication' path between your sites. Replicate has this already built  
		in when on a LAN. If you want something more flexible, such as email, we recommend  
		you use NetTalk.
- That's it!
 
I Need To Understand This a Bit More
The most important feature that makes Replicate different from other Replicate  
& Synchronisation systems is that it is totally AUTOMATIC and, therefore, 'invisible' to the end user.
 
Let's talk about the simplest example: an office computer and a laptop. Some  
of the time the laptop is in the office, connected to the office computer via  
a LAN, and sometimes it's out on the road. The laptop will ALWAYS use its own  
copy of the database. It will never see the office computer's data directly.  
When it's in the office, and connected to the network, any changes made by its  
user are made first to its local copy of the data, and then immediately  
sent to the office computer or server for merging into the main database. And  
vice-versa. Any changes made to the main database are immediately replicated  
in its database. We say immediately but you can configure what the synching interval  
is: daily, hourly, every couple of seconds: whatever suits the situation. The  
users won't even be aware of it because this process of synchronisation is entirely 
'behind the scenes'. No-one actually chooses an option in the program called 'Synchronise'.  
Replicate ensures that the office computer and the laptop keep each other's data  
sets up to date, automatically.
 
Now, if the laptop user leaves the office (to go home or on a trip), the laptop  
is 'undocked' from the LAN physically. The transport detects that the connection  
has been broken and the Log Manager programs at both ends start 'logging' all  
database changes to a log file. Anytime he reconnects, either by redocking the  
laptop to the LAN or even dialing into the office through a modem, both sides  
are immediately updated with these changes.
 
The implementation of this is very easy. Firstly, you consider these as two separate 
'sites' and you install your program onto both computers. During installation  
you set up some replication options such as pointing one system to the other.  
You will also set up some communication between the two: let's say for now, a  
LAN. 
So How and When Does Synchronising Happen?
As soon as the two 'ends' can 'see' each other. As soon as a laptop is connected  
to the LAN, for example. If a log file exists on your machine, your Log Manager  
sends it off to your 'parent' database's Log Manager via the communication transport  
you select and receives any outstanding log file from the 'parent' database's  
Log Manager. The communication transport is then closed and the respective Log  
Managers then merge the data in the received log files with their data 'set'. It's as simple as that!  
How Are Conflicts Handled?
Firstly, we need to determine what a 'conflict' is.
 
R&S doesn't add any more conflict issues than you already have when you run  
your program on a LAN. If User A & User B are simultaneously changing the  
same field, Clarion's Concurrency checking kicks in but if User A sets a date,  
say, to 12/12/02 and fifteen minutes later User B changes it to 12/12/03, does  
your program kick up a fuss? Of course not. So why should an R&S system? Acclaim  
Software (James Fortune)'s program Professional Investigator has always had an  
additional six 'audit' fields for every file/record: Date Created, Time Created,  
User Created, Date Last Amended, Time Last Amended, User Last Amended. These ensure  
that User A knows that User B last amended the record and when, so if there is  
a dispute, he knows who to ask. This would adequately cover any automatic changes  
done during a synchronisation, wouldn't it?
 
All you have to remember with R&S is that there is no record locking. So you  
will almost certainly want to limit access to most users depending on their "responsibilities"  
and so on. It is OK for "everyone to change anything" - but if  
User A makes a change to a record, and User B also makes a change to the record,  
then one of the changes will be lost. Any security template which
 
supports limiting access based on the record being changed (such as SecWin) should  
do the trick.
 
Some R&S systems simply say that during synchronisation someone has to be  
in charge to determine which value is the right one if a field in a record has  
a different value at 'both ends'. They describe it like this 'MyGreatProgram makes  
it easy to resolve conflicts by presenting you with a wizard that shows you the  
exact detail on each conflict between the master file and the replica.' Notice  
the word 'you' is used twice: who is 'you'? In the real world, this 'you' could  
be many different people and, indeed, in many cases this 'you' simply doesn't  
exist. The problem is that this person will be different depending on the data  
being changed. Who will be the 'you': the 'syncher'? The person in charge of  
one part of the data? Even if you make one person in the organisation responsible  
for all the synching then most of the information would be nonsense to  
them. And what if he or she is not available? No matter who the syncher is, 99%  
of the stuff it will ask him to decide about he will have to go to someone else.  
Now let's say you're the person who gets a few of these every day. You don't know  
who was right (Bob? Pete? Mike?) so you have to phone at least two of them. Assuming  
they're in. the whole process takes you maybe five minutes. So you can do roughly  
ten of these an hour. This is completely unscalable - it's a pain with 10 users,  
a major problem with a 100 users and completely unworkable at 1000! How long before 
'you' start ignoring the changes completely?  Which of your clients is going  
to thank you for the extra burden?
 
Finally, anything "manual" can be done wrong. Replicate does it all  
automatically! 
So What About Unique Keys?
Modern programs should not try to 'molly-coddle' the user in the way that old  
programs did. You don't know if the user has four customers called JOHN WARD,  
do you?
 
We recommend that you remove the UNIQUE attribute from almost every key you have. 
If you really do need a key to be unique, case number or something, then  
you must tell your user to introduce rules where only one site can enter  
cases or what have you.
 
Secondly, to ensure there isn't a clash of your 'invisible, meaningless, primary 
key' (remember Dr Codd?), we suggest concatenating the SiteID in front of it:
 
CUS:ByCustomerID KEY
 
Containing Fields: SiteID STRING(4) ; CustomerID LONG 
 
Can Replicate Help Network Traffic?
Yes. You see, synchronisation is basically a movement of a 'log' file. A text  
file. An XML file actually.
 
You could set up each machine on a network as another site. Log 'sharing' is done  
using Replicate's built-in 'instant' transport system. 
What are the advantages? 
- Each  user runs his own local copy of the program (fast) using his own local data.  
A very easy installation. You don't even need to ensure that every user has  
the same version of the program as the EXE only 'sees' its own local data  
files 
- You can take the server down for normal maintenance and all the workstations still 
run (off their local data). The users doesn't even know
 that the network is down.!
- You have lots of very up-to-date backups.
- Reports run quicker, since all reads (lookups, processes, reports,
 browses etc) are local.
- The  programs around the network can be different versions. After all,
 you can't expect to upgrade all the machines at the same time - and during  
a beta cycle you wouldn't want to. (When you do want all the programs to be  
the same version, FM2's AutoNet comes into its own).
- File-drivers can be different. You could have a version of your program that runs TPS files 
on the laptops and SQL at head-office, for example.
Communication Transports