Skip to content

Archipelago-deployment-live 1.4.0 to 1.5.0: Upgrading Solr 9.2 to 9.8

What is this documentation for?

This documentation will help you ugprade your Solr Index from 9.2 to 9.8 releases. And is meant to be a guide/helper. There is no simple way of saying this, but because the way a Solr index (sort of a Binary tree) is built, any larger change in the schema, field type definitions requires either a complete reindex but really, most of the time, a wipe, and start fresh situation. There is no perfect way around and sometimes, depending on your own customizations might also NOT be required at all.

So we will have to check logs before deciding.

Even if there are very complex ways of keeping an old Server running and serving searches while you re-index a new one, but honestly, and assuming you are reading this guide because your primary Job is not managing Solr, the how and approach will depend on your existing knowledge of Solr, your skills (even memory!) to execute so, and documenting those hacks are beyond the scope of this documentation. What is proven is what we explain in this document

Requirements

  • An archipelago-deployment-live instance (working, tested) deployed using provided instructions via Docker running either Solr 8.x or Solr <= 9.2
  • Good knowledge, patience and instincts (+ courage and time) on how to run Terminal Commands.
  • Patience(again but also patience from your users since search will be unavailable until you reindex). You can't skip steps here.
  • For shell Commands documented here please copy line by line--not the whole block.
  • You are running already version control and know how to git pull/push/merge.

Backing up and preparing for the upgrade

Backups are always going to be your best friends. Archipelago's code, database, and settings are mostly self-contained in your current archipelago-deployment-live repo folder, and backing up is simple because of that.

Step 1:

To make upgrading simpler we will clone archipelago-deployment-live (empty one) into a different folder. That way we can copy complete folders of configs and files instead of fetching them from github one by one.

Go to your home folder (for the sake of this documentation it will be /home/ec2-user but you can also use the $HOME environmental variable instead )

cd /home/ec2-user
git clone https://github.com/esmero/archipelago-deployment-live archipelago-deployment-live-1.5.0
cd archipelago-deployment-live-1.5.0
git switch 1.5.0

Now, on a terminal, cd into your running (again: not the previously cloned one, your actual running one) archipelago-deployment-live folder, then cd inside the deploy/ec2-docker subfolders and shut down your docker-compose ensemble by running the following:

docker-compose down

Step 2:

Verify that all containers are actually down. The following command should return an empty listing:

docker ps

If anything is still running, wait a little longer and run the command again.

Step 3:

Note: If you are coming from the more general Upgrade 1.5.0 guide you probably already did the backend a few minutes ago. Skip to Step 4 if you are certain you did.

Now let's tar.gz the whole ensemble with data and configs. We will exclude here the local source caches generated by Cantaloupe. If these or not exist will depend on how custom your deployment is.

As an example we will save this into your $HOME folder. As a good practice we append the current date (YEAR-MONTH-DAY) to the filename. Here we assume today is July 16th of 2025.

We will cd back to the parent folder of your running archipelago-deployment-live folder, so three levels down, assuming you are right now inside archipelago-deployment-live/deploy/ec2-docker

cd ../../..
sudo tar --exclude=archipelago-deployment-live/data_storage/iiifcache --exclude=archipelago-deployment-live/data_storage/iiiftmp -czvpf $HOME/archipelago-deployment-D10-20250716.tar.gz archipelago-deployment-live

The process may take a few minutes. Now let's verify that all is there and that the tar.gz is not corrupt.

tar -tvvf $HOME/archipelago-deployment-D10-20250716.tar.gz

You will see a listing of files, and at the end you will see something like this: Archive Format: POSIX pax interchange format, Compression: gzip. If corrupt (Do you have enough space? Did your ssh connection drop?) you will see the following:

tar: Unrecognized archive format

Step 4:

cd again into your running archipelago-deployment-live folder, then cd inside the deploy/ec2-docker Restart your docker-compose ensemble, and wait a little while for all to start.

docker-compose up -d

Step 5:

Export/backup all of your live Archipelago 1.4.0 (or 1.3.0?), Drupal 10 configurations (this allows you to compare/come back in case you lose something custom during the upgrade).

docker exec esmero-php mkdir config/backup
docker exec esmero-php drush cex --destination=/var/www/html/config/backup

Good. Now it's safe to begin the upgrade process.


Upgrading to Solr 9.8

Step 0: Get familiar with what changed.

Running a Production Server requires some informed decision making and thus, we believe, a good pre-step is reviewing what changed between releases. In specific focus on this folder.

https://github.com/esmero/archipelago-deployment-live/tree/1.5.0/config_storage/solrconfig/conf

and

https://github.com/esmero/archipelago-deployment-live/tree/1.5.0/data_storage/solrlib

Step 1: Edit docker-composer.yml

You want to replace your current Solr Service (in its enterity. Please make sure indendation is 1:1) with this.

  solr:
    container_name: esmero-solr
    restart: always
    image: "solr:9.8.1"
    # If running Docker < 20.10.10 please uncomment the following lines
    # See https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#solr-9-2 
    #security_opt:
    #  - seccomp:unconfined
    tty: true
    environment:
      SOLR_HEAP: 1024m
      SOLR_OPTS: -Dsolr.jetty.request.header.size=65535 -Dsolr.install.dir=/opt/solr
      SOLR_MODULES: "extraction,langid,ltr,analysis-extras,scripting"
      SOLR_LOG_LEVEL: "WARN"
    ports:
      - "8983:8983"
    networks:
      - host-net
      - esmero-net
    volumes:
      - ${ARCHIPELAGO_ROOT}/data_storage/solrcore:/var/solr/data
      - ${ARCHIPELAGO_ROOT}/config_storage/solrconfig:/drupalconfig
      - ${ARCHIPELAGO_ROOT}/data_storage/solrlib:/var/solr/data/lib
    entrypoint:
      - docker-entrypoint.sh
      - solr-precreate
      - drupal
      - /drupalconfig

Please double check and use any of these as reference:

  • https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/deploy/ec2-docker/docker-compose-aws-s3-arm64.yml
  • https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/deploy/ec2-docker/docker-compose-aws-s3.yml

For this, if you have not already, navigate to deploy/ec2-docker and run:

docker-compose down

Then open your docker-compose.yml file, find the solr: key and replace with the previous YAML snippet. If for some unknown reason your voluments do not match our defaults, please adapt to your custom edits so they match where the Solr Core and Libraries are saved:

nano docker-compose.yml

Save your changes.

Step 2A: Hopefully try to upgrade without reindexing.

If you are coming from a "I deployed 1.4.0 and have been running Solr without touching anything on the backend" situation, Step 2A might be for you!

To be sure: if after comparing https://github.com/esmero/archipelago-deployment-live/tree/1.5.0/config_storage/solrconfig/conf with your current live configuration found inside your archipelago-deployment-live at data_storage/solrcore/drupal/conf, you don't see MAYOR field definition changes, there is extra hope. Solr 9.8 does bring larger changes in the way Libraries (now named Modules) are used so we still will need to customize some of your Live configurations and check logs afterwards.

What is a mayor change?

Look for schema_extra_types.xml on both 1.5.0 and your data_storage/solrcore/drupal/conf. Diff them. If, for example your live/running schema_extra_types.xml has a field definition like

<fieldType name="text_ngramstring" class="solr.TextField" positionIncrementGap="100" termPositions="false" termOffsets="true" storeOffsetsWithPositions="false">

while 1.5.0 has (it has)

<fieldType name="text_ngramstring" class="solr.TextField" positionIncrementGap="100" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true" termVectors="true">

Then `"text_ngramstring" will differ in the way it stores its values at the physical index, and some changes might require a complete re-index. You could still try 2A but if something fails (logs) you migth end having to jump to Step 2B.

If your live/running schema_extra_types.xml has entries with name="some_name" that is not present in 1.5.0, then you might need to manually edit your future liveschema_extra_types.xml`, after copying from 1.5.0, to ensure your custom types are preserved. If unsure jump to Step 2B.

If you are still here, then let's upgrade!

We need the new configurations for your Solr. Remember we downloaded a reference/empty Archipelago Deployment Live 1.5.0. For this guide let's assume it is located at /home/ec2-user/archipelago-deployment-live-1.5.0. We are going to use the files there to replace your startup and also live configs. cd back to your live deployment assuming here it is (still) /home/ec2-user/archipelago-deployment-live Run (line by line)

cd /home/ec2-user/archipelago-deployment-live
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/config_storage/solrconfig/conf/* config_storage/solrconfig/conf/.
sudo cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/config_storage/solrconfig/conf/* data_storage/solrcore/drupal/conf/.
sudo sudo chown -R 8983:8983 data_storage/solrcore

Now we need to remove the old OCR library and replace with the new one

rm /home/ec2-user/archipelago-deployment-live/data_storage/solrlib/*.jar
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/data_storage/solrlib/solr-ocrhighlighting-0.9.4-SNAPSHOT.jar data_storage/solrlib/.

Done. You can jump to Step 3!

Step 2B: Wipe clean.

You decided Step 2A was not right for you and you are 100% aware going this route here will require reindexing which, depending on the size of your repository might span from several hours to days. you Sure?

Get the new configs. Get the new OCR Highlight library

Wait! (breath.)

Repating: This step is only required if you are moving from Solr 8.x to 9.x or inside 9.x you have solr field type definition changes. If you had a stock 1.3.0 with Solr 9.1 or a stock 1.4.0 with solr 9.2 and want to move to solr 9.8 you can skip deleting everything and go back to Step 2A!

This step requires some nerve. Be sure you know where you are inside your terminal (always)

Inside your archipelago-deployment-live folder run:

cd data_storage/solrcore
pwd

You should see something like

/home/ec2-user/archipelago-deployment-live/data_storage/solrcore

Which means you are in the correct folder. Now time to clean your index (really think twice here ok? You have a backup. Never run any of these without a backup)

sudo rm -rf *

Now we need the new configurations for your Solr (so then docker container can re-create the index from scratch). Remember we downloaded a reference/empty Archipelago Deployment Live 1.4.0 at /home/ec2-user/archipelago-deployment-live-1.4.0. We are going to use the files there to replace your own configs. cd back to your live deployment assuming here it is (still) /home/ec2-user/archipelago-deployment-live

cd /home/ec2-user/archipelago-deployment-live
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/config_storage/solrconfig/conf/* config_storage/solrconfig/conf/.

Now we need to remove the old OCR library and replace with the new one

rm /home/ec2-user/archipelago-deployment-live/data_storage/solrlib/*.jar
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/data_storage/solrlib/solr-ocrhighlighting-0.9.4-SNAPSHOT.jar data_storage/solrlib/.

Done.

Step 3: docker pull and check

Time to fetch the latest Solr:

Navigate to your deploy/ec2-docker and run:

docker compose pull
docker compose up -d

Give all a little time to start. Please be patient. To ensure all is well, run (more than once if necessary) the following:

docker ps

You should see something like this if you synced all containers to the latest (your versions and database might vary depending on your server's platform, hashes and time up too depending on when you ran the commands):

CONTAINER ID   IMAGE                                                  COMMAND                  CREATED          STATUS          PORTS                              NAMES
5b06ee366f58   jonasal/nginx-certbot                                  "/docker-entrypoint.…"   10 minutes ago   Up 10 minutes   0.0.0.0:8001->80/tcp               esmero-web
1409f41b6068   solr:9.8.1                                             "docker-entrypoint.s…"   10 minutes ago   Up 10 minutes   0.0.0.0:8983->8983/tcp             esmero-solr
e9361ed424ab   esmero/cantaloupe-s3:6.0.5-noturbojpeg-multiarch       "sh -c 'java -Dcanta…"   10 minutes ago   Up 10 minutes   0.0.0.0:8183->8182/tcp             esmero-cantaloupe
1dc524aeb6b4   mariadb:10.6.22-focal                                  "docker-entrypoint.s…"   10 minutes ago   Up 10 minutes   3306/tcp                           esmero-db
85bedadf9732   redis:6.2-alpine                                       "docker-entrypoint.s…"   10 minutes ago   10 minutes ago                                     esmero-redis
6a9e9d8647a9   minio/minio:RELEASE.2022-06-11T19-55-32Z               "/usr/bin/docker-ent…"   10 minutes ago   Up 10 minutes   0.0.0.0:9000-9001->9000-9001/tcp   esmero-minio
aa82d6b42ec6   esmero/php-8.3-fpm:1.5.0-multiarch                     "docker-php-entrypoi…"   10 minutes ago   Up 10 minutes   9000/tcp                           esmero-php
458e826199bd   esmero/esmero-nlp:1.4.2-multiarch                      "/usr/local/bin/entr…"   10 minutes ago   Up 10 minutes   0.0.0.0:6400->6400/tcp             esmero-nlp

Important here is the STATUS column. It needs to be a number that goes up in time every time you run docker ps again (and again).

Check your Solr logs for failure messages.

docker logs -f esmero-solr -n 100

Try searching (general one first, then a full text involving OCR - highlights - e.g in IABookReader or Mirador) in your repo.

Step 4 (Optional): New Drupal Solr Field configs

You might want to review/compare (now or later) your current Drupal Search API Index against our most current one here:

In specific:

  • https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/drupal/config/sync/search_api.server.esmero_solr.yml
  • https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/drupal/config/sync/search_api.index.default_solr_index.yml

and anything that starts with search_api. too.

If you decide/not decide to syncronize Drupal's Search API Server, Index and fields is personal decision (optional). We do recommend it but your Solr Index might have many customizations already, so a better/pro approach would be do diff the .yml files and decide selectively.

If you make Drupal Search API changes, go to Step 5 if you did execute Step 2B (wipe) go to Step 5 If neither you are DONE! (Step 6)

Step 5: Re-index Drupal Search API

Only run this if you ran Step 2B and/or Step 4. SKIP IF NOT.

Run the following:

docker exec esmero-php drush search-api-reindex
docker exec esmero-php drush search-api-index

Check your Drupal logs, try some searches.

STEP 6: DONE!

Done! Hurrah!


Need help? Strange logs? Searching for happiness leads to no results? Missed a step? Need a hug or someone that listens to you in silence?

If you see any issues or errors or need help with a step, please let us know (ASAP!). You can either open an issue in this repository or use the Google Group. We are here to help.

Caring & Coding + Fixing + Testing

License

GPLv3