Code structure
Phasing out
This documentation is about the old PHP-based API, which wil be phased out in favor of (e.g. the new Python-based API (using FastAPI)). See the 'Server' tab for more information.
Important resources¶
REST API docs: www.openml.org/apis
Controller: https://github.com/openml/OpenML/blob/master/openml_OS/controllers/Api_new.php
Models: https://github.com/openml/OpenML/tree/master/openml_OS/models/api/v1
Templates: https://github.com/openml/OpenML/tree/master/openml_OS/views/pages/api_new/v1
Golden Rules for Development¶
- Code Maintainability before anything else. The code has to be understandable, and if not conflicting with that, short. Avoid code duplications as much as possible.
- The API controller is the only entity giving access to the API models. Therefore, the responsibility for API access can be handled by the controller
- Read-Only operations are of the type GET. Operations that make changes in the database are of type POST or DELETE. Important, because this is the way the controller determines to allow users with a given set of privileges to access functions.
- Try to avoid direct queries to the database. Instead, use the respective models functions: 'get()', 'getWhere()', 'getById()', insert(), etc (Please make yourself familiar with the basic model: read-only and write)
- No external program/script execution during API calls (with one exception: data split generation). This makes the API unnecessarily slow, hard to debug and vulnerable to crashes. If necessary, make a cronjob that executes the program / script
Backend code structure¶
The high-level architecture of the website, including the controllers for different parts of the website (REST API, html, ...) and connections to the database.
Code¶
The source code is available in the 'OpenML' repository: https://github.com/openml/OpenML
Important files and folders¶
In this section we go through all important files and folder of the system.
Root directory¶
The root directory of OpenML contains the following files and folders.
-
system: This folder contains all files provided by CodeIgniter 2.1.3. The contents of this folder is beyond the scope of this document, and not relevant for extending OpenML. All the files in this folder are in the same state as they were provided by Ellislabs, and none of these files should ever be changed.
-
sparks: Sparks is a package management system for Codeigniter that allows for instant installation of libraries into the application. This folder contains two libraries provided by third party software developers, oauth1 (based on version 1 the oauth protocol) and oauth2 (similarly, based on version 2 of the oauth protocol). The exact contents of this folder is beyond the scope of this document and not relevant for extending OpenML.
-
openml_OS: All files in this folder are written specifically for OpenML. When extending the functionality OpenML, usually one of the files in this folder needs to be adjusted. As a thorough understanding of the contents of this folder is vital for extending OpenML, we will discuss the contents of this folder in [[URL Mapping]] in more detail.
-
index.php: This is the “bootstrap” file of the system. Basically, every page request on OpenML goes through this file (with the css, images and javascript files as only exception). It then determines which CodeIgniter and OpenML files need to be included. This file should not be edited.
-
.htaccess: This file (which configures the Apache Rewrite Engine) makes sure that all URL requests will be directed to
index.php
. Without this file, we would need to includeindex.php
explicitly in every URL request. This file makes sure that all other URL requests withoutindex.php
embedded in it automatically will be transformed toindex.php
. Eg., http://www.openml.org/frontend/page/home will be rewritten to http://www.openml.org/index.php/frontend/page/home. This will be explained in detail in [[URL Mapping]]. -
css: A folder containing all stylesheets. These are important for the layout of OpenML.
-
data: A folder containing data files, e.g., datasets, implementation files, uploaded content. Please note that this folder does not necessarily needs to be present in the root directory. The OpenML Base Config file determines the exact location of this folder.
-
downloads: Another data folder, containing files like the most recent database snapshot.
-
img: A folder containing all static images shown on the webpage.
-
js: A folder containing all used Javascript files and libraries, including third party libraries like jQuery and datatables.
-
Various other files, like .gitignore, favicon.ico, etc.
openml_OS¶
This folder is (in CodeIgniter jargon) the “Application folder”, and contains all files relevant to OpenML. Within this folder, the following folders should be present: (And also some other folders, but these are not used by OpenML)
-
config: A folder containing all config files. Most notably, it contains the file BASE_CONFIG.php, in which all system specific variables are set; the config items within this file differs over various installations (e.g., on localhost,
openml.org
). Most other config files, like database.php, will receive their values from BASE_CONFIG.php. Other important config files are autoload.php, determining which CodeIgniter / OpenML files will be loaded on any request, openML.php, containing config items specific to OpenML, and routes.php, which will be explained in [[URL Mapping]]. -
controllers: In the Model/View/Controller design pattern, all user interaction goes through controllers. In a webapplication setting this means that every time a URL gets requested, exactly one controller gets invoked. The exact dynamics of this will be explained in [[URL Mapping]].
-
core: A folder that contains CodeIgniter specific files. These are not relevant for the understanding of OpenML.
-
helpers: This folder contains many convenience functions. Wikipedia states: “A convenience function is a non-essential subroutine in a programming library or framework which is intended to ease commonly performed tasks”. For example the file_upload_helper.php contains many functions that assist with uploading of files. Please note that a helper function must be explicitly loaded in either the autoload config or the files that uses its functions.
-
libraries: Similar to sparks, this folder contains libraries specifically written for CodeIgniter. For example, the library used for all user management routines is in this folder.
-
models: In the Model/View/Controller design pattern, models represent the state of the system. In a webapplication setting, you could say that a model is the link to the database. In OpenML, almost all tables of the database are represented by a model. Each model has general functionality applicable to all models (e.g., retrieve all records, retrieve record with constraints, insert record) and functionality specific to that model (e.g., retrieve a dataset that has certain data properties). Most models extend an (abstract) base class, located in the abstract folder. This way, all general functionality is programmed and maintained in one place.
-
third_party: Although the name might suggests differently, this folder contains all OpenML Java libraries.
-
views: In the Model/View/Controller design pattern, the views are the way information is presented on the screen. In a webapplication setting, a view usually is a block of (PHP generated) HTML code. The most notable view is frontend_main.php, which is the template file determining the main look and feel of OpenML. Every single page also has its own specific view (which is parsed within frontend_main.php). These pages can be found (categorized by controller and name) in the pages folder. More about this structure is explained in [[URL Mapping]].
Frontend code structure¶
Architecture and libraries involved in generating the frontend functions.
Code: https://github.com/openml/website/tree/master/openml_OS/views
High-level¶
All pages are generated by first loading frontend_main.php. This creates the 'shell' in which the content is loaded. It loads all css and javascript libraries, and contains the html for displaying headers and footers.
Create new page¶
The preferred method is creating a new folder into the folder
<root_directory>/openml_OS/views/pages/frontend
This page can be requested by
http://www.openml.org/frontend/page/<folder_name>
or just
http://www.openml.org/<folder_name>
This method is preferred for human readable webpages, where the internal
actions are simple, and the output is complex. We will describe the
files that can be in this folder.
-
pre.php: Mandatory file. Will be executed first. Do not make this file produce any output! Can be used to pre-render data, or set some variables that are used in other files.
-
body.php: Highly recommended file. Intended for displaying the main content of this file. Will be rendered at the right location within the template file (
frontend_main.php
). -
javascript.php: Non-mandatory file. Intended for javascript function on which
body.php
relies. Will be rendered within a javascript block in the header of the page. -
post.php: Non mandatory file. Will only be executed when a POST request is done (e.g., when a HTML form was send using the POST protocol). Will be executed after
pre.php
, but before the rendering process (and thus, beforebody.php
andjavascript.php
). Should handle the posted input, e.g., file uploads.
It is also recommended to add the newly created folder to the mapping in
the routes.php
config file. This way it can also be requested by the
shortened version of the URL. (Note that we deliberately avoided to
auto-load all pages into this file using a directory scan, as this makes
the webplatform slow.)
URL to Page Mapping¶
Most pages in OpenML are represented by a folder in
frontend_main.php
template, as described in [[backend]]. In
this section we explain the way an URL is mapped to a certain OpenML
page.
URL Anatomy¶
By default, CodeIgniter (and OpenML) accepts a URL in the following
form:
http://www.openml.org/index.php/<controller>/<function>/<p1>/<pN>/<free>
The various parts in the URL are divided by slashes. Every URL starts
with the protocol and server name (in the case of OpenML this is
http://www.openml.org/
). This is followed by the bootstrap file, which
is always the same, i.e., index.php
. The next part indicates the
controller that needs to be invoked; typically this is frontend
,
rest_api
or data
, but it can be any file from the openml_OS
folder
controllers
. Note that the suffix .php
should not be included in the
URL.
The next part indicates which function of the controller should be
invoked. This should be a existing, public function from the controller
that is indicated in the controller part. These functions might have one
or more parameters that need to be set. This is the following part of
the URL (indicated by p1
and pN
). The parameters can be followed by
anything in free format. Typically, this free format is used to pass on
additional parameters in name
- value
format, or just a way of
adding a human readable string to the URL for SEO purposes.
For example, the following URL
http://www.openml.org/index.php/frontend/page/home
invokes
the function page
from the frontend
controller and sets the only
parameter of this function, $indicator
, to value home
. The function
page
loads the content of the specified folder ($indicator
) into the
main template. In this sense, the function page
can be seen as some
sort of specialized page loader.
URL Shortening¶
Since it is good practice to have URL’s as short as possible, we have
introduced some logic that shortens the URL’s. Most importantly, the URL
part that invokes index.php
can be removed at no cost, since this file
is always invoked. For this, we use Apache’s rewrite engine. Rules
for rewriting URL’s can be found in the .htaccess
file, but is
suffices to say that any URL in the following format
http://www.openml.org/index.php/<controller>/<function>/<params>
can due to the rewrite engine also be requested with
http://www.openml.org/<controller>/<function>/<params>
Furthermore, since most of the pages are invoked by the function page
of the frontend
controller (hence, they come with the suffix
frontend/page/page_name
) we also created a mapping that maps URL’s in
the following form
http://www.openml.org/<page_name>
to
http://www.openml.org/frontend/page/<page_name>
Note that Apache’s rewrite engine will also add index.php
to this. The
exact mapping can be found in routes.php
config file.
Additional Mappings¶
Additionally, a mapping is created from the following type of URL:
http://www.openml.org/api/<any_query_string>
to
http://www.openml.org/rest_api/<any_query_string>
This was done for backwards compatibility. Many plugins make calls to
the not-existing api
controller, which are automatically redirected to
the rest_api
controller.
Exceptions¶
It is important to note that not all pages do have a specific page
folder. The page folders are a good way of structuring complex GUI’s
that need to be presented to the user, but in cases where the internal
state changes are more important than the GUI’s, it might be preferable
to make the controller function print the output directly. This happens
for example in the functions of rest_api.php
and free_query.php
(although the former still has some files in the views folder that it
refers to).
XSD Schemas¶
In order to ensure data integrity on the server, data that passed to upload functions is checked against XSD schema's. This ensures that the data that is uploaded is in the correct format, and does not contain any illegal characters. XSD schema's can be obtained through the API (exact links are provided in the API docs, but for example: https://www.openml.org/api/v1/xsd/openml.data.upload (where openml.data.upload can be replaced by any other schema's name). Also XML examples are provided, e.g., https://www.openml.org/api/v1/xml_example/data . The XSD schema's are exactly the same as used on the server. Whenever an upload fails and the server mentions an XML/XSD verification error, please run the uploaded xml against one of the provided XSD schema's, for example on this webtool: http://www.freeformatter.com/xml-validator-xsd.html
In order to maintain one XSD schema for both uploading and downloading stuff, the XSD sometimes contains more fields than seem necessary from the offset. Usually, the additional fields that are indicated as such in the comments (for example, in the upload dataset xsd this are the id, upload_date, etc fields). The XSD's maintain basically three consistencies functions:
- Ensure that the correct fields are uploaded
- Ensure that the fields contain the correct data types.
- Ensure that the fields do not contain to much characters for the database to upload.
For the latter two, it is important to note that the XSD seldom accept default string content (i.e., xs:string). Rather, we use self defined data types, that use regular expressions to ensure the right content. Examples of these are oml:system_string128, oml:casual_string128, oml:basic_latin128, where the oml prefix is used, the name indicates the level of restriction and the number indicates the maximum size of the field.
IMPORTANT: The maximum field sizes are (often) chosen with great care. Do not extend them without consulting other team members.
User authentication¶
Authentication towards the server goes by means of a so-called api_key (a hexa-decimal string which uniquely identifies a user). Upon interaction with the server, the client passes this api_key to the server, and the server checks the rights of the user. Currently this goes by means of a get or post variable, but in the future we might want to use a header field (because of security). It is recommended to refresh your api_key every month.
IMPORTANT: Most authentication operations are handled by the ION_Auth library (http://benedmunds.com/ion_auth/). DO NOT alter information directly in the user table, always use the ION_Auth API.
A user can be part of one or many groups. The following user groups exists:
- Admin Group: With great power comes great responsibility. Admin users can overrule all security checks on the server, i.e., delete a dataset or run that is not theirs, or even delete a flow that contains runs.
- Normal Group: Level that is required for read/write interaction with the server. Almost all users are part of this group.
- Read-only Group: Level that can be used for read interaction with the server. If a user is part of this group, but not part of 'Normal Group', he is allowed to download content, but can not upload or delete content.
- Backend Group: (Work in Progress) Level that has more privileges than 'Normal Group'. Can submit Data Qualities and Evaluations.
The ION_Auth functions in_group(), add_to_group(), remove_from_group() and get_users_groups() are key towards interaction with these tables.