Skip to content

File handling

PERSEUS can save files in the database. Each file has to be linked to an object within the database. This way, you can e.g. add metadata to a file or create a logical link to an object.

Additionally, every file can hold tags. You can use them to categorize files as well as later determine their intended purpose, i.e. when you cannot rely on a consistent filename.

To add a file to the database, use the method DatabaseManager().add_associated_file() and pass

  • the related object (this object has to already exist in the database),
  • the file as BinaryIO,
  • the filename / identifier (optional),
  • a list of tags (optional).

The following example shows how you can link a publication as a PDF to a project.

from perseus.datamanager import DatabaseManager, Project
project_object_id = "67af092f16affd02f502c903a3"
db_manager = DatabaseManager()
project = db_manager.get_item(Project, oid=project_object_id)
publication = open("uploaded_publication.pdf", "rb") # important: use rb instead of r to read as BinaryIO
db_manager.add_associated_file(
project, # The database item to attach the file to
publication, # accepts BinaryIO (Buffered I/O)
"publication.pdf", # File identifier
["publication"], # A list of tags
)

To retrieve a file, use the method DatabaseManager().get_file(). You need to pass the object id assigned to the file you want to fetch.

To get the object id for a file, first retrieve the object the file is associated with. Then, you can access the attribute files on this object. This attribute holds a dictionary, using the filenames as keys and the corresponding object id as values. Take a look at the following example where we want to fetch the file proposal.pdf for a specific project:

...
db_manager = DatabaseManager()
project = db_manager.get_item(Project, search_filter={"abbreviation": "my-project-abbreviation"})
if project is not None:
file_object_id = project.files["proposal.pdf"]
proposal_file = db_manager.get_file(file_object_id)
...

DatabaseManager().get_file() returns an instance of gridfs.grid_file.GridOut, which behaves like a binary file-like object and can therefore be read like a file object. More information can be found here.

The following example is from the internal API router responsible for the functionality to download a file by calling a specific API endpoint:

...
db_manager = DatabaseManager()
file = db_manager.get_file(ObjectId(file_id))
path = (
f"{pathlib.Path(__file__).parent.resolve()}/../temp/{file_id}-{file.filename}"
)
with open(path, "wb") as f: # important: use wb instead of w because of binary mode
f.write(file.read())
...