1 files changed, 364 insertions, 0 deletions
diff --git a/unix/boot/vmcached/notes b/unix/boot/vmcached/notes
new file mode 100644
index 00000000..f5da300b
--- /dev/null
+++ b/unix/boot/vmcached/notes
@@ -0,0 +1,364 @@
+Virtual Memory Caching Scheme
+Mon Oct 25 1999 - Thu Jan 20 2000
+
+
+OVERVIEW [now somewhat dated]
+
+Most modern Unix systems implement ordinary file i/o by mapping files into
+host memory, faulting the file pages into memory, and copying data to and
+from process memory and the cached file pages.  This has the effect of
+caching recently read file data in memory.  This scheme replaces the old
+Unix buffer cache, with the advantage that there is no builtin limit on
+the size of the cache.  The global file cache is shared by both data files
+and the file pages of executing programs, and will grow until all physical
+memory is in use.
+
+The advantage of the virtual memory file system (VMFS) is that it makes
+maximal use of system memory for caching file data.  If a relatively static
+set of data is repeatedly accessed it will remain in the system file cache,
+speeding access and minimizing i/o and page faulting.  The disadvantage
+is the same thing: VMFS makes maximal use of system memory for caching
+file data.  Programs which do heavy file i/o, reading a large amount of
+data, fault in a great deal of file data pages which may only be accessed
+once.  Once the free list is exhausted the system page daemon runs to
+reclaim old file pages for reuse.  The system pages heavily and becomes
+inefficient.
+
+The goal of the file caching scheme presented here is to continue to cache
+file data in the global system file cache, but control how data is cached to
+minimize use of the pageout daemon which runs when memory is exhausted.  This
+scheme makes use of the ** existing operating system kernel facilities **
+to cache the file data and use the cached data for general file access.
+The trick is to try to control how data is loaded into the cache, and when
+it is removed from the cache, so that cache space is reused efficiently
+without invoking the system pageout daemon.  Since data is cached by the
+system the cache benefits all programs which access the cached file data,
+without requiring that the programs explicitly use any cache facilities
+such as a custom library.
+
+
+HOW IT WORKS
+
+
+INTERFACE
+
+
+	      vm = vm_initcache (initstr)
+		  vm_closecache (vm)
+
+		   vm_cachefile (vm, fname, flags)
+		     vm_cachefd (vm, fd, flags)
+		 vm_uncachefile (vm, fname)
+		   vm_uncachefd (vm, fd)
+
+		 vm_cacheregion (vm, fd, offset, nbytes, flags)
+	       vm_uncacheregion (vm, fd, offset, nbytes)
+		vm_reservespace (vm, nbytes)
+		        vm_sync (vm, fd)
+
+
+vm_cacheregion (vm, fd, offset, nbytes, flags)
+
+    check whether the indicated region is mapped (vm descriptor)
+    if not, free space from the tail of the cache; map new region
+    request that mapped region be faulted into memory (madvise)
+    move referenced file to head of cache
+
+    redundant requests are harmless, but will reload any missing pages,
+    and cause the file to again be moved to the head of the cache list
+
+    may need to scan the cache periodically to make adjustments for
+    files that have changed in size, or been deleted, while still in
+    the cache
+
+    cached regions may optionally be locked into memory until freed
+
+    the cache controller may function either as a library within a process,
+    or as a cache controller server process shared by multiple processes
+
+
+vm_uncacheregion (vm, fd, offset, nbytes)
+
+    check whether the indicated region is mapped
+    if so, unmap the pages
+    if no more pages remain mapped, remove file from cache list
+
+
+vm_reservespace (vm, nbytes)
+
+    unmap file segments from tail of list until the requested space
+    (plus some extra space) is available for reuse
+
+
+data structures
+
+    caching mechanism is file-oriented
+    linked list of mapped regions (each from a file)
+	for each region keep track of file descriptor, offset, size
+    linked list of file descriptors
+	for each file keep track of file size, mtime,
+	type of mapping (full,region) and so on
+
+    some dynamic things such as the size of a file or wether pages are memory
+    resident can only be determined by querying the system at runtime
+
+
+
+Solaris VM Interface
+
+	     madvise (addr, len, advice)
+		mmap (addr, len, prot, flags, fildes, off)
+	      munmap (addr, len)
+	       mlock (addr, len)
+	     munlock (addr, len)
+	     memcntl (addr, len, cmd, arg, attr, mask)
+	        mctl (addr, len, function, arg)
+	     mincore (addr, len, *vec)
+	       msync (addr, len, flags)
+
+    Notes
+	Madvise can be used to request that a range of pages be faulted
+	into memory (WILL_NEED), or freed from memory (DONT_NEED)
+
+	Mctl can be used to invalidate page mappings in a region
+
+	Mincore can be used to determine if pages in a given address range
+	are resident in memory
+
+
+
+VMCACHED -- December 2001
+------------------------------
+
+Added VMcache daemon and IRAF interface to same
+Design notes follow
+
+
+Various Cache Control Algorithms
+
+    1.	No Cache
+
+	No VMcache daemon.  Clients use their builtin default i/o mechanism,
+	e.g., either normal or direct i/o depending upon the file size.
+
+    2.	Manually or externally controlled cache
+
+	Files are cached only when directed.  Clients connect to the cache
+	daemon to see if files are in the cache and if so use normal VM i/o
+	to access data in the cache.  If the file is not cached the client
+	uses its default i/o mechanism, e.g., direct i/o.
+
+    3.	LRU Cache
+
+	A client file access causes the accessed file to be cached.  Normal
+	VM i/o is used for file i/o.  As new files are cached the space
+	used by the least recently used files is reclaimed.  Accessing a
+	file moves it to the head of the cache, if it is still in the cache.
+	Otherwise it is reloaded.
+
+    4.	Adaptive Priority Cache
+
+	This is like the LRU cache, but the cache keeps statistics on files
+	whether or not they have aged out of the cache, and raises the 
+	cache priority or lifetime of files that are more frequently
+	accessed.  Files that are only accessed once tend to pass quickly
+	through the cache, or may not even be cached until the second
+	access.  Files that are repeatedly accessed have a higher priority
+	and will tend to stay in the cache.
+
+The caching mechanism and algorithm used are independent of the client
+programs, hence can be easily tuned or replaced with a different algorithm.
+
+Factors determining if a file is cached:
+
+	user-assigned priority (0=nocache; 1-N=cache priority)
+	number of references
+	time since last access (degrades nref)
+	amount of available memory (cutoff point)
+
+Cache priority
+
+	priority = userpri * max(0,
+	    (nref-refbase - ((time - last_access) / tock)) )
+
+Tunable parameters
+
+	userpri		User defined file priority.  Files with a higher
+			priority stay in the cache longer.  A zero priority
+			prevents a file from being cached.
+
+	refbase         The number of file references has to exceed refbase
+			before the file will be cached.  For example, if
+			refbase=0 the file will be cacheable on the first
+			reference.  If refbase=1 a file will only become
+			cacheable if accessed two or more times.  Refbase
+			can be used to exclude files from the cache that
+			are only referenced once and hence are not worth
+			caching.
+
+	tock            While the number of accesses increases the cache
+			priority of a file, the time interval since the
+			last access likewise decreases the cache priority
+			of the file.  A time interval of "tock" seconds
+			will cancel out one file reference.  In effect,
+			tock=N means that a file reference increases the
+			cache priority of a file for N seconds.  A
+			frequently referenced file will be relatively
+			unaffected by tock, but tock will cause
+			infrequently referenced files to age out of the
+			cache within a few tocks.
+
+Cache Management
+
+    Manual cache control
+
+	Explicitly caching or refreshing a file always maps the file into
+	memory and moves it to the head of the cache.
+
+    File access
+
+	Accessing a file (vm_accessfile) allows cache optimization to
+	occur.  The file nref and access time are updated and the priority
+	of the current file and all files (to a certain depth in the cache
+	list) are recomputed.  If a whole-file level access is being
+	performed the file size is examined to see if it has changed and
+	if the file has gotten larger a new segment is created.  The
+	segment descriptor is then unlinked and relinked in the cache in
+	cache priority order.  If the segment is above the VM cutoff it
+	is loaded into the cache: lower priority segments are freed as
+	necessary, and if the file is an existing file it is marked
+	WILL_NEED to queue the file data to be read into memory.
+
+	If the file is a new file it must already have been created
+	externally to be managed under VMcache.  The file size at access
+	time will determine the size of the file entry in the cache.  Some
+	systems (BSD, Sun) allow a mmap to extend beyond the end of a
+	file, but others (Linux) do not.  To reserve space for a large
+	file where the ultimate size of the file is known in advance, one
+	can write a byte where the last byte of the file will be (as with
+	zfaloc in IRAF) before caching the file, and the entire memory
+	space will be reserved in advance.  If a file is cached and later
+	extended, re-accessing the file will automatically cache the new
+	segment of the file (see above).
+
+    Data structures
+
+	Segment descriptors
+	List of segments linked in memory allocation order
+	    first N segments are cached (whatever will fit)
+	    remainder are maintained in list, but are not cached
+	    manually cached/refreshed segments go to head of list
+	    accessed files are inserted in list based on priority
+	List of segments belonging to the same file
+	    a file can be stored in the cache in multiple segments
+
+	File hash table
+	    provides fast lookup of an individual file
+	    hash dev+ino to segment
+	    segment points to next segment if collision occurs
+	    only initial/root file segment is indexed
+
+    Cache management
+
+	Relinking of the main list occurs only in certain circumstances
+	    when a segment is manually cached/uncached/refreshed
+		referenced segment moves to head of list
+		new segment is always cached
+	    when a file or segment is accessed
+		priority of each element is computed and segment is
+		placed in priority order (only referenced segment is moved)
+		caching/uncaching may occur due to new VM cutoff
+	    when a new segment is added
+	    when an old segment is deleted
+	Residency in memory is determined by link order
+	    priority normally determines memory residency
+	    but manual caching will override (for a time)
+
+
+File Driver Issues
+
+    Image kernels
+
+	Currently only OIF uses the SF driver.  FXF, STF, and QPF (FMIO)
+	all use the BF driver.  Some or all could be changed to use SF
+	if it is made compatible with BF, otherwrise the VM hooks need
+	to go into the BF driver.  Since potentially any large file can
+	be cached, putting the VM support into BF is a reasonable option.
+
+	The FITS kernel is a problem currently as it violates device
+	block size restrictions, using a block size of 2880.
+
+	It is always a good idea to use falloc to pre-allocate storage for
+	a large imagefile when the size is known in advance.  This permits
+	the VM system to reserve VM space for a new image before data is
+	written to the file.
+
+    Direct I/O
+
+	Direct i/o is possible only if transfers are aligned on device
+	blocks and are an integral number of blocks in length.
+
+	Direct i/o flushes any VM buffered data for the file.  If a file
+	is mapped into memory this is not possible, hence direct i/o is
+	disabled for a file while it is mapped into memory.
+
+	This decision is made at read/write time, hence cannot be
+	determined reliably when a file is opened.
+
+    FITS Kernel
+
+	Until the block size issues can be addressed, direct i/o cannot
+	be used for FITS images.  Some VM cache control is still possible
+	however.  Options include:
+
+	o   Always cache a .fits image: either set vmcached to cache a file
+	    on the first access, or adjust the cache parameters based on
+	    the file type.  Use a higher priority for explicitly cached
+	    files (e.g. Mosaic readouts), so that running a sequence of
+	    normal i/o images through the cache does not flush the high
+	    priority images.
+
+	o   Writing to new files which have not been pre-allocated is
+	    problematic as a large amount of data can be written, causing
+	    paging.  One way to deal with this is to use large transfers
+	    (IMIO will already do this), and to issue a reservespace
+	    directive on each file write at EOF, to free up VM space as
+	    needed.  The next access directive would cause the new
+	    portion of the image to be mapped into the cache.
+
+	    A possible problem with this is that the new file may initially
+	    be too small to reach the cache threshold.  Space could be
+	    reserved in any case, waiting for the next access to cache
+	    the file; the cache daemon could always cache new files of a
+	    certain type; or the file could be cached when it reaches the
+	    cache threshold.
+
+    Kernel File Driver
+
+	A environment variable will be used in the OS driver to define a
+	cache threshold or to disable use of VMcache entirely.  We need
+	to be able to specify these two things separately.  If a cache
+	threshold is set, files smaller than this size will not result in
+	a query to the cache daemon.  If there is no cache threshold but
+	VMcache is enabled, the cache daemon will decide whether the file
+	is too small to be cached.  It should also be possible to force
+	the use of direct i/o if the file is larger than a certain size.
+
+	Kernel file driver parameters:
+
+	    enable	boolean
+
+	    vmcache	Use vmcache only if the file size equals or exceeds
+			the specified threshold.
+
+	    directio    If the file size equals or exceeds the specified
+			threshold use direct i/o to access the file.  If
+			direct i/o is enabled in this fashion then vmcache
+			is not used (otherwise vmcache decides whether to
+			use direct i/o for a file).
+
+	    port	Socket number to be used.
+
+	    VMPORT=8797
+	    VMCLIENT=enable,threshold=10m,directio=10m
+