README.md

![](logo.png)
# In-Sylva Information System

## Table of content
- [In-Sylva Information System](#in-sylva-information-system)
  * [Table of content](#table-of-content)
  * [Short architecture description](#short-architecture-description)
  * [Requirements](#requirements)
  * [Getting source code](#getting-source-code)
  * [Build project](#build-project)
    + [For development](#for-development)
    + [For production](#for-production)
    + [Run project](#run-project)
  * [Keycloak configuration](#keycloak-configuration)
  * [Admin user creation](#admin-user-creation)
  * [Upload in-sylva standard](#upload-in-sylva-standard)
  * [Application access](#application-access)
    + [Portal tool](#portal-tool)
    + [Search tool](#search-tool)
  * [Data dump and restore](#data-dump-and-restore)
  * [Health check for elasticsearch](#health-check-for-elasticsearch)
  * [Attention](#attention)

## Short architecture description
* In-Sylva Information System relies on docker.
* It is built on a microservices' scheme. 
* Each microservice runs independently in its own docker container.
* You will find information about each microservice in their respective repositories' `README.md`
* You will find schematics explaining the project's architecture, the databases in the `./documentation` folder. 
* In-Sylva Information System access is managed with keycloak authentication.
* In-Sylva Information System has been successfully tested on debian (9 and 10) hosts.

## Requirements
* docker >= 17.12.0+
* docker-compose

## Getting source code
To download this repository, you can use the following command:
```bash
git clone https://forgemia.inra.fr/in-sylva-development/in-sylva.information-system.git
```
You will find other microservice's repositories in the [in-sylva development GitLab group](https://forgemia.inra.fr/in-sylva-development).

Use `git clone` to download each project's source code if you want to see or modify it.

## Build project

### Local development
Execute this command to build docker images for development:
```sh
./build.sh -k id_rsa -e dev
```

### For production
Execute this command to build docker images for production:
```sh
./build.sh -k id_rsa -e prod -d <url> -ip <IP_address> -p <port>
```
where you set
* `<url>` as access URL (e.g., http://www.mydomain.world/insylva/)
* `<IP_address>` as the IP address of the server on which in-sylva applications are running
* `<port>` as the port number of the server on which in-sylva applications are running

## SSL Certificates

### Production and pre-production
To handle SSL certificates, the reverse-proxy (nginx) service uses the certbot tool to generate and renew the certificates on production servers.

Certificates are stored in the ssl_certificates/pem directory.

The certificates are renewed on production and preproduction every day at 2 A.M. using a cron job. The cron job is defined in the crons/certificate_auto_renewer_installer file.

On deployment, the playbook .ansible/playbook.yml will execute the crons/certificate_auto_renewer_installer script to install the cron job.

### Local development
For local development, the reverse-proxy service uses self-signed certificates. These certificates must be stored in the ssl_certificates/pem directory.

To generate them, you can use the following command :

```docker compose -f docker-compose.certs.yml up install-certs-dev```


It will generate the certificates and store them in the ssl_certificates/pem directory.

## Run project
The first time `start_in-sylva.sh` is executed a `.env` file is created.
The script will exit inviting you to edit this file with your own values.
This step is mandatory as it contains necessary configuration for each microservice.

The `.env` file contains explanation for each value so take time
to understand otherwise the project will not work properly.

> ⚠️ Project will need to re-built after editing environment variables.

So the first time you want to run this project, you should:
1) Execute `./start-in-sylva.sh` Note: if not executable, run `chmod +x start-in-sylva.sh`
2) Edit `.env` configuration file
3) [Build](#build-project) project
4) Follow [instructions bellow](#keycloak-configuration)

After that, you will need to run `./start-in-sylva.sh` to start the project.

At this point, all microservices' containers should be running, but not fully functional yet.

## Keycloak configuration
* Go to [pgAdmin](http://localhost:5050/) and log-in using credentials from `.env` file (`PGADMIN_DEFAULT_EMAIL` and `PGADMIN_DEFAULT_PASSWORD`)
* Create access to postgres server:
  * Click on `Add New Server`
  * Add a name in the `Name` field (e.g., `insylva`)
  * In `Connection` tab, add the postgres container's IP address in `Host name/address`. Two ways to find it:
    * Go to [portainer containers' list](http://localhost:9000/#!/1/docker/containers), then find `in-sylva.postgres` row and `IP Address` column
    * Or using `ip -a address` as root on the host container
  * In `Username` and `Password` fields, add the corresponding credentials from `.env` file (`POSTGRES_USER` and `POSTGRES_PASSWORD`)
  * Click `Save`
* Then open a query tab on the keycloak database (public schema) and execute this SQL query:
```sql
update REALM set ssl_required = 'NONE' where id = 'master';
```
* Restart the keycloak container using [portainer](http://localhost:9000) 
* Connect to [keycloak](http://localhost:7000/keycloak/auth/) using credentials from `.env` file (`KEYCLOAK_USER` and `KEYCLOAK_PASSWORD`)
* On page's top-left corner, click on `Master` and select `Add Realm` button and import `realm-export.json` file located in `./keycloak/` subfolder.

## Admin user creation
Create an admin user for the system. This step is mandatory to access the portal.
* In a terminal, execute `curl --location --request POST 'http://localhost:4000/user/create-system-user'`
* Restart the login container using [portainer](http://localhost:9000)

## Upload in-sylva standard
* [Connect to the portal](http://localhost:3000/) using credentials given in `.env` (`IN_SYLVA_ADMIN_USERNAME` and `IN_SYLVA_ADMIN_PASSWORD`)
* In the `Fields` tab you can upload a standard in csv format. Note: a version of this file can be found [here](https://data.inrae.fr/dataset.xhtml?persistentId=doi:10.15454/ELXRGY). 

## Application access

### Portal tool
The portal is accessible:
* at `http://localhost:3000/portal` for development environment 
* at the URL set as build parameter for production (e.g. `http://www.mydomain.world/si/portal`)

This application allows you to:
* Access in-sylva microservices tools: Portainer, PgAdmin, Kibana, mongo-express, Elasticsearch, Keycloak
* Manage in-sylva administration (users' accounts, roles and groups, sources, policies)
* Upload metadata records to the system

### Search tool
The search tool is accessible:
* at `http://localhost:3001/search` for development environment
* with the URL set as build parameter for production (e.g. `http://www.mydomain.world/si/search`)

This application allows you to:
* Search for metadata records in the catalog (basic and advanced search)
* Export metadata records after a specific search

## Data dump and restore
Scripts used to dump and restore data are provided in `dump_restore_tools` directory.

According to your own backup policy,
you can use insylva_bdds_dump_all.sh to dump all data from microservices of the SI
(postgres, mongodb, and elasticsearch).
The result of the dump procedure is an `archive.tar` file stored in the dump_restore_tools directory.
On the hosting machine, you can install a cron job, running each days, the script insylva_bdds_dump_all.sh.

The crontab main contains several lines, and you MUST adapt these with the full path to your in-sylva SI installation:

```
* # for montly dump
* 0 0 1 * * bash -c 'cd dump_restore_tools && ./insylva_bdds_dump_all.sh'
* # on each friday, generate weekly dump and replaced each week
* 10 0 * * 5 bash -c 'cd dump_restore_tools && ./insylva_bdds_dump_all.sh'
* # on each day, generate daily dump
* 30 0 * * 1-4 bash -c 'cd dump_restore_tools && ./insylva_bdds_dump_all.sh'
* # (optional) synchronise dump storage to an s3 ressource (see below for confiuration)
* 30 1 * * 1-5 bash -c 'cd dump_restore_tools && ./send_dumps_to_s3.sh'
```
### S3 configuration file
For the last point (synchronising dump archives in a S3 storage), you have to create a file ***s3config_file*** available in dump_restore_tools directory.

This file should be generated with the command:
-  ```s3cmd --configure -c dump_restore_tools/s3config_file``` 

If you decide to activate this command, you will have, on your s3 ressource, exactly the same dump files as in the dump_restore_tools/DUMPS directory


The script insylva_bdds_restore_all.sh can be used to restore an archive. 
To properly restore data, you have to start from a new installation. 
For this, you must re-do all the above installation and setup procedures. 
Then run the restore script and follow instructions given at the end to restart microservices' containers.

## Health check for elasticsearch
When re-booting, search-api container needs to be restarted after elasticsearch container is fully started.
This is done automatically using a script executed after reboot using crontab. 

To set this up on a new host, add the following line to your crontab:
```
@reboot /usr/local/insylva/in-sylva.information-system/tools/restart_search_api.sh
```
If you encounter a problem with the search tool (e.g., results are empty), you can also manually run this script.

## Attention
Unless needed, you do not need to generate Certificate Authority (pem file with openssl) nor edit `docker-compose.yml`.
If you want to change those files and settings, please read the below instructions carefully.

For production workloads, make sure the host setting `vm.max_map_count` is set to at least 262144.
On the Open Distro for Elasticsearch Docker image, this setting is the default.

To check this, start a Bash session in the container and run: `cat /proc/sys/vm/max_map_count`

To increase this value, you have to modify the host operating system.

On the RPM installation, you can add the following line at the end of the host machine `/etc/sysctl.conf` file:
```
vm.max_map_count=262144
```
Then run `sudo sysctl -p` to reload.

This value is controlled when you run build.sh script.
A warning message will be displayed in case of vm.max_map_count incompatible with Open Distro Elasticsearch Docker image.

The docker-compose.yml file also contains several key settings:
`bootstrap.memory_lock=true, ES_JAVA_OPTS=-Xms512m -Xmx512m`, nofile 65536 and port 9600.
These settings respectively:
* Disable memory swapping (along with memlock)
* Set the size of the Java heap (we recommend half of system RAM)
* Set a limit of 65536 open files for the Elasticsearch user and allow you to access Performance Analyzer on port 9600