File Entity Reference
Fields
size
(integer, positive, non-zero): Size of file in bytes. Eg: 1048576.md5
(string): MD5 hash in lower-case hex. Eg: "d41efcc592d1e40ac13905377399eb9b".sha1
(string): SHA-1 hash in lower-case hex. Not technically required, but the most-used of the hash fields and should always be included. Eg: "f013d66c7f6817d08b7eb2a93e6d0440c1f3e7f8".sha256
: SHA-256 hash in lower-case hex. Eg: "a77e4c11a57f1d757fca5754a8f83b5d4ece49a2d28596889127c1a2f3f28832".urls
: An array of "typed" URLs. Order is not meaningful, and may not be preserved.url
(string, required): Eg: "https://example.edu/~frau/prcding.pdf".rel
(string, required): Eg: "webarchive", see vocabulary below.
mimetype
(string): Format of the file. If XML, specific schema can be included after a+
. Example: "application/pdf"content_scope
(string): for situations where the file does not simply contain the full representation of a work (eg, fulltext of an article, for anarticle-journal
release), describes what that scope of coverage is. Eg, entireissue
,corrupt
file. See vocabulary below.release_ids
(array of string identifiers): references torelease
entities that this file represents a manifestation of. Note that a single file can contain multiple release references (eg, a PDF containing a full issue with many articles), and that a release will often have multiple files (differing only by watermarks, or different digitizations of the same printed work, or variant MIME/media types of the same published work).extra
(object with string keys): additional metadata about this filepath
: filename, with optional path prefix. path must be "relative", not "absolute", and should use UNIX-style forward slashes, not Windows-style backward slashes
URL rel
Vocabulary
web
: generic public web sites; forhttp/https
URLs, this should be the defaultwebarchive
: full URL to a resource in a long-term web archiverepository
: direct URL to a resource stored in a repository (eg, an institutional or field-specific research data repository)academicsocial
: academic social networks (such as academia.edu or ResearchGate)publisher
: resources hosted on publisher's websiteaggregator
: fulltext aggregator or search engine, like CORE or Semantic Scholardweb
: content hosted on distributed/decentralized web protocols, such asdat://
oripfs://
URLs
content_scope
Vocabulary
This same vocabulary is shared between file, fileset, and webcapture entities; not all the fields make sense for each entity type.
- if not set, assume that the artifact entity is valid and represents a complete copy of the release
issue
: artifact contains an entire issue of a serial publication (eg, issue of a journal), representing several releases in fullabstract
: contains only an abstract (short description) of the release, not the release itself (unless therelease_type
itself isabstract
, in which case it is the entire release)index
: index of a journal, or series of abstracts from a conferenceslides
: slide deck (usually in "landscape" orientation)front-matter
: non-article content from a journal, such as editorial policiessupplement
: usually a file entity which is a supplement or appendix, not the entire workcomponent
: a sub-component of a release, which may or may not be associated with acomponent
release entity. For example, a single figure or table as part of an articleposter
: digital copy of a poster, eg as displayed at conference poster sessionssample
: a partial sample of the entire work. eg, just the first page of an article. distinct fromtruncated
truncated
: the file has been truncated at a binary level, and may also be corrupt or invalid. distinct fromsample
corrupt
: broken, mangled, or corrupt file (at the binary level)stub
: any other out-of-scope artifact situations, where the artifact represents something which would not link to any possible in-scope release in the catalog (except astub
release)landing-page
: for webcapture, the landing page of a work, as opposed to the work itselfspam
: content is spam. articles, webpages, or issues which include incidental advertisements within them are not counted asspam