Filesystem
The OSL provides a universal portable and high performance interface that can access file system functionality on any operating systems. The interface has a few main goals:
The path specifications always has to be absolute. Any usage of relative path specifications is forbidden. Exceptions are
osl_getSystemPathFromFileURL()
,osl_getFileURLFromSystemPath()
andosl_getAbsoluteFileURL()
. Most operating systems provide a "Current Directory" per process, which is the reason why relative path specifications can cause problems in multithreading environments.Proprietary notations of file paths are not supported. Every path notation must the file URL specification. File URLs must be encoded in UTF8 and after that escaped. Although the URL parameter is a unicode string, the must contain only ASCII characters.
The caller cannot get any information whether a file system is case sensitive, case preserving or not. The operating system implementation itself should determine if it can map case-insensitive paths. The case correct notation of a filename or file path is part of the "File Info". This case correct name can be used as a unique key if necessary.
Obtaining information about files or volumes is controlled by a bitmask which specifies which fields are of interest. Due to performance reasons it is not recommended to obtain information which is not needed. But if the operating system provides more information anyway the implementation can set more fields on output as were requested. It is in the responsibility of the caller to decide if they use this additional information or not. But they should do so to prevent further unnecessary calls if the information is already there.
The input bitmask supports a flag
osl_FileStatus_Mask_Validate
which can be used to force retrieving uncached validated information. Setting this flag when callingosl_getFileStatus()
in combination with no other flag is a synonym for a "FileExists". This should only be done when processing a single file (i.e. before opening) and never during enumeration of directory contents on any step of information processing. This would change the runtime behaviour from O(n) to O(n*n/2) on nearly every file system. On Windows NT-based operating systems, reading the contents of an directory with 7000 entries and getting full information about every file only takes 0.6 seconds. Specifying the flagosl_FileStatus_Mask_Validate
for each entry will increase the time to 180 seconds (!!!).
File URIs
The filesystem abstraction uses file URIs as a way of handling the different file system naming conventions in a cross-platform way. The format of a file URI is specified in RFC8089 and looks like the following:
The host part is the name of the system on which to locate the file (and should be the FQDN), and the path is the directory name that specifies the location of the file in the filesystem. The host part is optional, so you can specify file:///path/to/file.txt
There is an exception for DOS and Windows drive letters, in that the file URI will include the drive letter and a colon, then the absolute path:
Absolute file URIs
The API does not refer to file URIs as Universal Resource Indicators, but as file URLs (Universal Resource Locations), which is actually a misnomer as a URL specifically refers to web resources and not files on local filesystems.
To get an absolute file URI, you must call osl_getAbsoluteFileURL()
- the first parameter being the base directory of the relative path, and the path relative to the base directory. Alternatively, if the base parameter is set to NULL or is empty, then the OSL expects the relative parameter to actually hold an absolute URI. This function returns an error code, and uses the third parameters as an output parameter to hold the absolute file URI it generates.
System paths
A system path is a filesystem location encoded in the format required by the underlying operating system. Both Unix and Windows have specific quirks that must be converted before LibreOffice can form a file URI. On Unix, the osl_getFileURLFromSystemPath()
first checks if the path starts with the ~ character (or ~user), and if so replaces it with the appropriate home directory, and it converts any occurences of repeated slashes to a single slash.
Note: the POSIX standard actually states that any path starting with double-slashes should be treated in an implementation-defined manner. This is a bug reported in bug 107967.
Interestingly, we have a quandry I have emailed the listed author of RFC8089 about: When we convert from system paths to file URIs, the RFC handles everything except for system paths on POSIX systems that start with double slashes. POSIX defines this as up to the operating system to implement. However, I cannot see anywhere in the RFC where it describes how to handle initial double slashes in file URIs. I literally have no idea what we should be doing in this case...
There is also another issue whereby
~user
does not expand to the user - which I believe is largely because we haven't implemented anything that lets us impersonate users via the logon functions inOslSecurity
.
On Windows, the function checks to see if a UNC path is being used (i.e. of the form \\server\path\to\file.txt
), in which case it converts it to the file URI form.
File searches
Both Windows and Unix have a way of directing the command processor or shell to find files in the filesystem. Both use the environment variable $PATH
to influence searches, however each does this differently. On Windows, the %path%
is searched after the current directory is searched for the executable. On Unix, only the paths in $PATH
are searched. Thus, a search function to unify the two operating systems is used in LibreOffice - osl_getFileURLFromSystemPath()
which searches for a specified filename in a listed search path, and thereafter searches each of the directories in the system's PATH. The delimiter is not unified, however, so on Windows you must use the semicolon (;) and on Unix, you must use the colon (:). The API Doxygen comment for the function states that:
The value of an environment variable should be used (e.g.
LD_LIBRARY_PATH
) if the caller is not aware of the Operating System and so doesn't know which path list delimiter to use.
Temp files
To create a temp file, you must be fairly careful to ensure that you don't lead to a race condition whereby a temp file is created, then another process writes or replaces the file.
There are two functions that can be called:
osl_getTempDirURL()
- gets the location of temporary filesosl_createTempFile()
- creates a secure temporary file
File status
In LibreOffice a file is described by its status, or list of attributes associated with the file. This is defined in oslFileStatus
:
File types are:
TODO: document how to set file attributes (and file time)
File operations
As with any file system, you can perform a number of logical operations on the files that reside within it via the LibreOffice OSL API. The OSL API follows the Unix file system convention, which uses the following paradigm:
Open the file for usage by the process
The API function that performs this is:
The function is given a file URI, which it converts to a system path, and is provided a set of flags to tell it what mode to open the file in. A file handle that represents the file descriptor is passed back as an output parameter. This is used as a token to refer to the opened file when performing file operations.
Windows and Unix systems use the following flags:
osl_File_OpenFlag_Read
osl_File_OpenFlag_Write
osl_File_OpenFlag_Create
osl_File_OpenFlag_NoLock
Unix systems use the following flags:
osl_File_OpenFlag_Trunc
osl_File_OpenFlag_NoExcl
osl_File_OpenFlag_Private
Move the cursor (current position) to the location in the file where you will be performing an operation (often called seeking).
The API function that sets the position in the file is:
It takes a handle to a file and sets the file position based on an offset (
uPos
) from either the start of the file, from the current cursor position, or from the end of the file (uHow
can beosl_Pos_Absolut
,osl_Pos_Current
oros_Pos_End
- if the latter then the offset must be negative).To get the cursor position in the file, you use:
To test if the end of the file is reached, call:
Read or write to the file at this cursor position, and if necessary move the cursor again; repeat as necessary.
The function to read the file is:
The function again takes a handle to an opened file,
pBuffer
is a pointer to a which recieves the data,uBytesRequested
specifies the number of bytes to be read. When the file is finished reading, the number of bytes read is returned bypBytesRead
.The function that writes to a file is:
Similar to
osl_ReadFile()
,pBuffer
is a pointer to the data to be written to the file,uBytesToWrite
specifies how many bytes should be written, andpBytesWritten
is how many bytes are actually written to the file after the function completes.There are two variants that allow reads and writes from specific positions in the file, they are
osl_readFileAt()
andosl_writeFileAt()
.When all file processing is finished, then indicate that the process is done with it by closing the file.
To close the file you call:
Another function that is quite useful is osl_readLine()
, which reads from a file descriptor until it either hits a carriage-return (CR), carriage-return/line-feed (CRLF), or just a line-feed (LF).
Shared file mapping is explained further in the IPC chapter, as it can be used as inter-process communication, as well as for other functions that map the file to memory.
Copy, move and delete files
To delete a file, call on osl_removeFile(filename)
. This only works on regular files, if a directory is specified then it returns osl_File_E_ISDIR
. To copy a file (not a directory) then call on osl_copyFile(sourcefile, destfile)
, and to move a file call on osl_moveFile(sourcefile, destfile)
. When moving a file, file time and attributes are preserved, but no assumptions can be made about files that are copied.
Directory operations
Volume operations
Last updated