1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
|
Virtual Memory Caching Scheme
Mon Oct 25 1999 - Thu Jan 20 2000
OVERVIEW [now somewhat dated]
Most modern Unix systems implement ordinary file i/o by mapping files into
host memory, faulting the file pages into memory, and copying data to and
from process memory and the cached file pages. This has the effect of
caching recently read file data in memory. This scheme replaces the old
Unix buffer cache, with the advantage that there is no builtin limit on
the size of the cache. The global file cache is shared by both data files
and the file pages of executing programs, and will grow until all physical
memory is in use.
The advantage of the virtual memory file system (VMFS) is that it makes
maximal use of system memory for caching file data. If a relatively static
set of data is repeatedly accessed it will remain in the system file cache,
speeding access and minimizing i/o and page faulting. The disadvantage
is the same thing: VMFS makes maximal use of system memory for caching
file data. Programs which do heavy file i/o, reading a large amount of
data, fault in a great deal of file data pages which may only be accessed
once. Once the free list is exhausted the system page daemon runs to
reclaim old file pages for reuse. The system pages heavily and becomes
inefficient.
The goal of the file caching scheme presented here is to continue to cache
file data in the global system file cache, but control how data is cached to
minimize use of the pageout daemon which runs when memory is exhausted. This
scheme makes use of the ** existing operating system kernel facilities **
to cache the file data and use the cached data for general file access.
The trick is to try to control how data is loaded into the cache, and when
it is removed from the cache, so that cache space is reused efficiently
without invoking the system pageout daemon. Since data is cached by the
system the cache benefits all programs which access the cached file data,
without requiring that the programs explicitly use any cache facilities
such as a custom library.
HOW IT WORKS
INTERFACE
vm = vm_initcache (initstr)
vm_closecache (vm)
vm_cachefile (vm, fname, flags)
vm_cachefd (vm, fd, flags)
vm_uncachefile (vm, fname)
vm_uncachefd (vm, fd)
vm_cacheregion (vm, fd, offset, nbytes, flags)
vm_uncacheregion (vm, fd, offset, nbytes)
vm_reservespace (vm, nbytes)
vm_sync (vm, fd)
vm_cacheregion (vm, fd, offset, nbytes, flags)
check whether the indicated region is mapped (vm descriptor)
if not, free space from the tail of the cache; map new region
request that mapped region be faulted into memory (madvise)
move referenced file to head of cache
redundant requests are harmless, but will reload any missing pages,
and cause the file to again be moved to the head of the cache list
may need to scan the cache periodically to make adjustments for
files that have changed in size, or been deleted, while still in
the cache
cached regions may optionally be locked into memory until freed
the cache controller may function either as a library within a process,
or as a cache controller server process shared by multiple processes
vm_uncacheregion (vm, fd, offset, nbytes)
check whether the indicated region is mapped
if so, unmap the pages
if no more pages remain mapped, remove file from cache list
vm_reservespace (vm, nbytes)
unmap file segments from tail of list until the requested space
(plus some extra space) is available for reuse
data structures
caching mechanism is file-oriented
linked list of mapped regions (each from a file)
for each region keep track of file descriptor, offset, size
linked list of file descriptors
for each file keep track of file size, mtime,
type of mapping (full,region) and so on
some dynamic things such as the size of a file or wether pages are memory
resident can only be determined by querying the system at runtime
Solaris VM Interface
madvise (addr, len, advice)
mmap (addr, len, prot, flags, fildes, off)
munmap (addr, len)
mlock (addr, len)
munlock (addr, len)
memcntl (addr, len, cmd, arg, attr, mask)
mctl (addr, len, function, arg)
mincore (addr, len, *vec)
msync (addr, len, flags)
Notes
Madvise can be used to request that a range of pages be faulted
into memory (WILL_NEED), or freed from memory (DONT_NEED)
Mctl can be used to invalidate page mappings in a region
Mincore can be used to determine if pages in a given address range
are resident in memory
VMCACHED -- December 2001
------------------------------
Added VMcache daemon and IRAF interface to same
Design notes follow
Various Cache Control Algorithms
1. No Cache
No VMcache daemon. Clients use their builtin default i/o mechanism,
e.g., either normal or direct i/o depending upon the file size.
2. Manually or externally controlled cache
Files are cached only when directed. Clients connect to the cache
daemon to see if files are in the cache and if so use normal VM i/o
to access data in the cache. If the file is not cached the client
uses its default i/o mechanism, e.g., direct i/o.
3. LRU Cache
A client file access causes the accessed file to be cached. Normal
VM i/o is used for file i/o. As new files are cached the space
used by the least recently used files is reclaimed. Accessing a
file moves it to the head of the cache, if it is still in the cache.
Otherwise it is reloaded.
4. Adaptive Priority Cache
This is like the LRU cache, but the cache keeps statistics on files
whether or not they have aged out of the cache, and raises the
cache priority or lifetime of files that are more frequently
accessed. Files that are only accessed once tend to pass quickly
through the cache, or may not even be cached until the second
access. Files that are repeatedly accessed have a higher priority
and will tend to stay in the cache.
The caching mechanism and algorithm used are independent of the client
programs, hence can be easily tuned or replaced with a different algorithm.
Factors determining if a file is cached:
user-assigned priority (0=nocache; 1-N=cache priority)
number of references
time since last access (degrades nref)
amount of available memory (cutoff point)
Cache priority
priority = userpri * max(0,
(nref-refbase - ((time - last_access) / tock)) )
Tunable parameters
userpri User defined file priority. Files with a higher
priority stay in the cache longer. A zero priority
prevents a file from being cached.
refbase The number of file references has to exceed refbase
before the file will be cached. For example, if
refbase=0 the file will be cacheable on the first
reference. If refbase=1 a file will only become
cacheable if accessed two or more times. Refbase
can be used to exclude files from the cache that
are only referenced once and hence are not worth
caching.
tock While the number of accesses increases the cache
priority of a file, the time interval since the
last access likewise decreases the cache priority
of the file. A time interval of "tock" seconds
will cancel out one file reference. In effect,
tock=N means that a file reference increases the
cache priority of a file for N seconds. A
frequently referenced file will be relatively
unaffected by tock, but tock will cause
infrequently referenced files to age out of the
cache within a few tocks.
Cache Management
Manual cache control
Explicitly caching or refreshing a file always maps the file into
memory and moves it to the head of the cache.
File access
Accessing a file (vm_accessfile) allows cache optimization to
occur. The file nref and access time are updated and the priority
of the current file and all files (to a certain depth in the cache
list) are recomputed. If a whole-file level access is being
performed the file size is examined to see if it has changed and
if the file has gotten larger a new segment is created. The
segment descriptor is then unlinked and relinked in the cache in
cache priority order. If the segment is above the VM cutoff it
is loaded into the cache: lower priority segments are freed as
necessary, and if the file is an existing file it is marked
WILL_NEED to queue the file data to be read into memory.
If the file is a new file it must already have been created
externally to be managed under VMcache. The file size at access
time will determine the size of the file entry in the cache. Some
systems (BSD, Sun) allow a mmap to extend beyond the end of a
file, but others (Linux) do not. To reserve space for a large
file where the ultimate size of the file is known in advance, one
can write a byte where the last byte of the file will be (as with
zfaloc in IRAF) before caching the file, and the entire memory
space will be reserved in advance. If a file is cached and later
extended, re-accessing the file will automatically cache the new
segment of the file (see above).
Data structures
Segment descriptors
List of segments linked in memory allocation order
first N segments are cached (whatever will fit)
remainder are maintained in list, but are not cached
manually cached/refreshed segments go to head of list
accessed files are inserted in list based on priority
List of segments belonging to the same file
a file can be stored in the cache in multiple segments
File hash table
provides fast lookup of an individual file
hash dev+ino to segment
segment points to next segment if collision occurs
only initial/root file segment is indexed
Cache management
Relinking of the main list occurs only in certain circumstances
when a segment is manually cached/uncached/refreshed
referenced segment moves to head of list
new segment is always cached
when a file or segment is accessed
priority of each element is computed and segment is
placed in priority order (only referenced segment is moved)
caching/uncaching may occur due to new VM cutoff
when a new segment is added
when an old segment is deleted
Residency in memory is determined by link order
priority normally determines memory residency
but manual caching will override (for a time)
File Driver Issues
Image kernels
Currently only OIF uses the SF driver. FXF, STF, and QPF (FMIO)
all use the BF driver. Some or all could be changed to use SF
if it is made compatible with BF, otherwrise the VM hooks need
to go into the BF driver. Since potentially any large file can
be cached, putting the VM support into BF is a reasonable option.
The FITS kernel is a problem currently as it violates device
block size restrictions, using a block size of 2880.
It is always a good idea to use falloc to pre-allocate storage for
a large imagefile when the size is known in advance. This permits
the VM system to reserve VM space for a new image before data is
written to the file.
Direct I/O
Direct i/o is possible only if transfers are aligned on device
blocks and are an integral number of blocks in length.
Direct i/o flushes any VM buffered data for the file. If a file
is mapped into memory this is not possible, hence direct i/o is
disabled for a file while it is mapped into memory.
This decision is made at read/write time, hence cannot be
determined reliably when a file is opened.
FITS Kernel
Until the block size issues can be addressed, direct i/o cannot
be used for FITS images. Some VM cache control is still possible
however. Options include:
o Always cache a .fits image: either set vmcached to cache a file
on the first access, or adjust the cache parameters based on
the file type. Use a higher priority for explicitly cached
files (e.g. Mosaic readouts), so that running a sequence of
normal i/o images through the cache does not flush the high
priority images.
o Writing to new files which have not been pre-allocated is
problematic as a large amount of data can be written, causing
paging. One way to deal with this is to use large transfers
(IMIO will already do this), and to issue a reservespace
directive on each file write at EOF, to free up VM space as
needed. The next access directive would cause the new
portion of the image to be mapped into the cache.
A possible problem with this is that the new file may initially
be too small to reach the cache threshold. Space could be
reserved in any case, waiting for the next access to cache
the file; the cache daemon could always cache new files of a
certain type; or the file could be cached when it reaches the
cache threshold.
Kernel File Driver
A environment variable will be used in the OS driver to define a
cache threshold or to disable use of VMcache entirely. We need
to be able to specify these two things separately. If a cache
threshold is set, files smaller than this size will not result in
a query to the cache daemon. If there is no cache threshold but
VMcache is enabled, the cache daemon will decide whether the file
is too small to be cached. It should also be possible to force
the use of direct i/o if the file is larger than a certain size.
Kernel file driver parameters:
enable boolean
vmcache Use vmcache only if the file size equals or exceeds
the specified threshold.
directio If the file size equals or exceeds the specified
threshold use direct i/o to access the file. If
direct i/o is enabled in this fashion then vmcache
is not used (otherwise vmcache decides whether to
use direct i/o for a file).
port Socket number to be used.
VMPORT=8797
VMCLIENT=enable,threshold=10m,directio=10m
|