aboutsummaryrefslogtreecommitdiff
path: root/unix/boot/vmcached/notes
diff options
context:
space:
mode:
Diffstat (limited to 'unix/boot/vmcached/notes')
-rw-r--r--unix/boot/vmcached/notes364
1 files changed, 364 insertions, 0 deletions
diff --git a/unix/boot/vmcached/notes b/unix/boot/vmcached/notes
new file mode 100644
index 00000000..f5da300b
--- /dev/null
+++ b/unix/boot/vmcached/notes
@@ -0,0 +1,364 @@
+Virtual Memory Caching Scheme
+Mon Oct 25 1999 - Thu Jan 20 2000
+
+
+OVERVIEW [now somewhat dated]
+
+Most modern Unix systems implement ordinary file i/o by mapping files into
+host memory, faulting the file pages into memory, and copying data to and
+from process memory and the cached file pages. This has the effect of
+caching recently read file data in memory. This scheme replaces the old
+Unix buffer cache, with the advantage that there is no builtin limit on
+the size of the cache. The global file cache is shared by both data files
+and the file pages of executing programs, and will grow until all physical
+memory is in use.
+
+The advantage of the virtual memory file system (VMFS) is that it makes
+maximal use of system memory for caching file data. If a relatively static
+set of data is repeatedly accessed it will remain in the system file cache,
+speeding access and minimizing i/o and page faulting. The disadvantage
+is the same thing: VMFS makes maximal use of system memory for caching
+file data. Programs which do heavy file i/o, reading a large amount of
+data, fault in a great deal of file data pages which may only be accessed
+once. Once the free list is exhausted the system page daemon runs to
+reclaim old file pages for reuse. The system pages heavily and becomes
+inefficient.
+
+The goal of the file caching scheme presented here is to continue to cache
+file data in the global system file cache, but control how data is cached to
+minimize use of the pageout daemon which runs when memory is exhausted. This
+scheme makes use of the ** existing operating system kernel facilities **
+to cache the file data and use the cached data for general file access.
+The trick is to try to control how data is loaded into the cache, and when
+it is removed from the cache, so that cache space is reused efficiently
+without invoking the system pageout daemon. Since data is cached by the
+system the cache benefits all programs which access the cached file data,
+without requiring that the programs explicitly use any cache facilities
+such as a custom library.
+
+
+HOW IT WORKS
+
+
+INTERFACE
+
+
+ vm = vm_initcache (initstr)
+ vm_closecache (vm)
+
+ vm_cachefile (vm, fname, flags)
+ vm_cachefd (vm, fd, flags)
+ vm_uncachefile (vm, fname)
+ vm_uncachefd (vm, fd)
+
+ vm_cacheregion (vm, fd, offset, nbytes, flags)
+ vm_uncacheregion (vm, fd, offset, nbytes)
+ vm_reservespace (vm, nbytes)
+ vm_sync (vm, fd)
+
+
+vm_cacheregion (vm, fd, offset, nbytes, flags)
+
+ check whether the indicated region is mapped (vm descriptor)
+ if not, free space from the tail of the cache; map new region
+ request that mapped region be faulted into memory (madvise)
+ move referenced file to head of cache
+
+ redundant requests are harmless, but will reload any missing pages,
+ and cause the file to again be moved to the head of the cache list
+
+ may need to scan the cache periodically to make adjustments for
+ files that have changed in size, or been deleted, while still in
+ the cache
+
+ cached regions may optionally be locked into memory until freed
+
+ the cache controller may function either as a library within a process,
+ or as a cache controller server process shared by multiple processes
+
+
+vm_uncacheregion (vm, fd, offset, nbytes)
+
+ check whether the indicated region is mapped
+ if so, unmap the pages
+ if no more pages remain mapped, remove file from cache list
+
+
+vm_reservespace (vm, nbytes)
+
+ unmap file segments from tail of list until the requested space
+ (plus some extra space) is available for reuse
+
+
+data structures
+
+ caching mechanism is file-oriented
+ linked list of mapped regions (each from a file)
+ for each region keep track of file descriptor, offset, size
+ linked list of file descriptors
+ for each file keep track of file size, mtime,
+ type of mapping (full,region) and so on
+
+ some dynamic things such as the size of a file or wether pages are memory
+ resident can only be determined by querying the system at runtime
+
+
+
+Solaris VM Interface
+
+ madvise (addr, len, advice)
+ mmap (addr, len, prot, flags, fildes, off)
+ munmap (addr, len)
+ mlock (addr, len)
+ munlock (addr, len)
+ memcntl (addr, len, cmd, arg, attr, mask)
+ mctl (addr, len, function, arg)
+ mincore (addr, len, *vec)
+ msync (addr, len, flags)
+
+ Notes
+ Madvise can be used to request that a range of pages be faulted
+ into memory (WILL_NEED), or freed from memory (DONT_NEED)
+
+ Mctl can be used to invalidate page mappings in a region
+
+ Mincore can be used to determine if pages in a given address range
+ are resident in memory
+
+
+
+VMCACHED -- December 2001
+------------------------------
+
+Added VMcache daemon and IRAF interface to same
+Design notes follow
+
+
+Various Cache Control Algorithms
+
+ 1. No Cache
+
+ No VMcache daemon. Clients use their builtin default i/o mechanism,
+ e.g., either normal or direct i/o depending upon the file size.
+
+ 2. Manually or externally controlled cache
+
+ Files are cached only when directed. Clients connect to the cache
+ daemon to see if files are in the cache and if so use normal VM i/o
+ to access data in the cache. If the file is not cached the client
+ uses its default i/o mechanism, e.g., direct i/o.
+
+ 3. LRU Cache
+
+ A client file access causes the accessed file to be cached. Normal
+ VM i/o is used for file i/o. As new files are cached the space
+ used by the least recently used files is reclaimed. Accessing a
+ file moves it to the head of the cache, if it is still in the cache.
+ Otherwise it is reloaded.
+
+ 4. Adaptive Priority Cache
+
+ This is like the LRU cache, but the cache keeps statistics on files
+ whether or not they have aged out of the cache, and raises the
+ cache priority or lifetime of files that are more frequently
+ accessed. Files that are only accessed once tend to pass quickly
+ through the cache, or may not even be cached until the second
+ access. Files that are repeatedly accessed have a higher priority
+ and will tend to stay in the cache.
+
+The caching mechanism and algorithm used are independent of the client
+programs, hence can be easily tuned or replaced with a different algorithm.
+
+Factors determining if a file is cached:
+
+ user-assigned priority (0=nocache; 1-N=cache priority)
+ number of references
+ time since last access (degrades nref)
+ amount of available memory (cutoff point)
+
+Cache priority
+
+ priority = userpri * max(0,
+ (nref-refbase - ((time - last_access) / tock)) )
+
+Tunable parameters
+
+ userpri User defined file priority. Files with a higher
+ priority stay in the cache longer. A zero priority
+ prevents a file from being cached.
+
+ refbase The number of file references has to exceed refbase
+ before the file will be cached. For example, if
+ refbase=0 the file will be cacheable on the first
+ reference. If refbase=1 a file will only become
+ cacheable if accessed two or more times. Refbase
+ can be used to exclude files from the cache that
+ are only referenced once and hence are not worth
+ caching.
+
+ tock While the number of accesses increases the cache
+ priority of a file, the time interval since the
+ last access likewise decreases the cache priority
+ of the file. A time interval of "tock" seconds
+ will cancel out one file reference. In effect,
+ tock=N means that a file reference increases the
+ cache priority of a file for N seconds. A
+ frequently referenced file will be relatively
+ unaffected by tock, but tock will cause
+ infrequently referenced files to age out of the
+ cache within a few tocks.
+
+Cache Management
+
+ Manual cache control
+
+ Explicitly caching or refreshing a file always maps the file into
+ memory and moves it to the head of the cache.
+
+ File access
+
+ Accessing a file (vm_accessfile) allows cache optimization to
+ occur. The file nref and access time are updated and the priority
+ of the current file and all files (to a certain depth in the cache
+ list) are recomputed. If a whole-file level access is being
+ performed the file size is examined to see if it has changed and
+ if the file has gotten larger a new segment is created. The
+ segment descriptor is then unlinked and relinked in the cache in
+ cache priority order. If the segment is above the VM cutoff it
+ is loaded into the cache: lower priority segments are freed as
+ necessary, and if the file is an existing file it is marked
+ WILL_NEED to queue the file data to be read into memory.
+
+ If the file is a new file it must already have been created
+ externally to be managed under VMcache. The file size at access
+ time will determine the size of the file entry in the cache. Some
+ systems (BSD, Sun) allow a mmap to extend beyond the end of a
+ file, but others (Linux) do not. To reserve space for a large
+ file where the ultimate size of the file is known in advance, one
+ can write a byte where the last byte of the file will be (as with
+ zfaloc in IRAF) before caching the file, and the entire memory
+ space will be reserved in advance. If a file is cached and later
+ extended, re-accessing the file will automatically cache the new
+ segment of the file (see above).
+
+ Data structures
+
+ Segment descriptors
+ List of segments linked in memory allocation order
+ first N segments are cached (whatever will fit)
+ remainder are maintained in list, but are not cached
+ manually cached/refreshed segments go to head of list
+ accessed files are inserted in list based on priority
+ List of segments belonging to the same file
+ a file can be stored in the cache in multiple segments
+
+ File hash table
+ provides fast lookup of an individual file
+ hash dev+ino to segment
+ segment points to next segment if collision occurs
+ only initial/root file segment is indexed
+
+ Cache management
+
+ Relinking of the main list occurs only in certain circumstances
+ when a segment is manually cached/uncached/refreshed
+ referenced segment moves to head of list
+ new segment is always cached
+ when a file or segment is accessed
+ priority of each element is computed and segment is
+ placed in priority order (only referenced segment is moved)
+ caching/uncaching may occur due to new VM cutoff
+ when a new segment is added
+ when an old segment is deleted
+ Residency in memory is determined by link order
+ priority normally determines memory residency
+ but manual caching will override (for a time)
+
+
+File Driver Issues
+
+ Image kernels
+
+ Currently only OIF uses the SF driver. FXF, STF, and QPF (FMIO)
+ all use the BF driver. Some or all could be changed to use SF
+ if it is made compatible with BF, otherwrise the VM hooks need
+ to go into the BF driver. Since potentially any large file can
+ be cached, putting the VM support into BF is a reasonable option.
+
+ The FITS kernel is a problem currently as it violates device
+ block size restrictions, using a block size of 2880.
+
+ It is always a good idea to use falloc to pre-allocate storage for
+ a large imagefile when the size is known in advance. This permits
+ the VM system to reserve VM space for a new image before data is
+ written to the file.
+
+ Direct I/O
+
+ Direct i/o is possible only if transfers are aligned on device
+ blocks and are an integral number of blocks in length.
+
+ Direct i/o flushes any VM buffered data for the file. If a file
+ is mapped into memory this is not possible, hence direct i/o is
+ disabled for a file while it is mapped into memory.
+
+ This decision is made at read/write time, hence cannot be
+ determined reliably when a file is opened.
+
+ FITS Kernel
+
+ Until the block size issues can be addressed, direct i/o cannot
+ be used for FITS images. Some VM cache control is still possible
+ however. Options include:
+
+ o Always cache a .fits image: either set vmcached to cache a file
+ on the first access, or adjust the cache parameters based on
+ the file type. Use a higher priority for explicitly cached
+ files (e.g. Mosaic readouts), so that running a sequence of
+ normal i/o images through the cache does not flush the high
+ priority images.
+
+ o Writing to new files which have not been pre-allocated is
+ problematic as a large amount of data can be written, causing
+ paging. One way to deal with this is to use large transfers
+ (IMIO will already do this), and to issue a reservespace
+ directive on each file write at EOF, to free up VM space as
+ needed. The next access directive would cause the new
+ portion of the image to be mapped into the cache.
+
+ A possible problem with this is that the new file may initially
+ be too small to reach the cache threshold. Space could be
+ reserved in any case, waiting for the next access to cache
+ the file; the cache daemon could always cache new files of a
+ certain type; or the file could be cached when it reaches the
+ cache threshold.
+
+ Kernel File Driver
+
+ A environment variable will be used in the OS driver to define a
+ cache threshold or to disable use of VMcache entirely. We need
+ to be able to specify these two things separately. If a cache
+ threshold is set, files smaller than this size will not result in
+ a query to the cache daemon. If there is no cache threshold but
+ VMcache is enabled, the cache daemon will decide whether the file
+ is too small to be cached. It should also be possible to force
+ the use of direct i/o if the file is larger than a certain size.
+
+ Kernel file driver parameters:
+
+ enable boolean
+
+ vmcache Use vmcache only if the file size equals or exceeds
+ the specified threshold.
+
+ directio If the file size equals or exceeds the specified
+ threshold use direct i/o to access the file. If
+ direct i/o is enabled in this fashion then vmcache
+ is not used (otherwise vmcache decides whether to
+ use direct i/o for a file).
+
+ port Socket number to be used.
+
+ VMPORT=8797
+ VMCLIENT=enable,threshold=10m,directio=10m
+