Archipelago-deployment-live 1.4.0 to 1.5.0: Upgrading Solr 9.2 to 9.8
What is this documentation for?
This documentation will help you ugprade your Solr Index from 9.2 to 9.8 releases. And is meant to be a guide/helper. There is no simple way of saying this, but because the way a Solr index (sort of a Binary tree) is built, any larger change in the schema, field type definitions requires either a complete reindex but really, most of the time, a wipe, and start fresh situation. There is no perfect way around and sometimes, depending on your own customizations might also NOT be required at all.
So we will have to check logs before deciding.
Even if there are very complex ways of keeping an old Server running and serving searches while you re-index a new one, but honestly, and assuming you are reading this guide because your primary Job is not managing Solr, the how and approach will depend on your existing knowledge of Solr, your skills (even memory!) to execute so, and documenting those hacks are beyond the scope of this documentation. What is proven is what we explain in this document
Requirements
- An archipelago-deployment-live instance (working, tested) deployed using provided instructions via Docker running either Solr 8.x or Solr <= 9.2
- Good knowledge, patience and instincts (+ courage and time) on how to run Terminal Commands.
- Patience(again but also patience from your users since search will be unavailable until you reindex). You can't skip steps here.
- For shell Commands documented here please copy line by line--not the whole block.
- You are running already version control and know how to git pull/push/merge.
Backing up and preparing for the upgrade
Backups are always going to be your best friends. Archipelago's code, database, and settings are mostly self-contained in your current archipelago-deployment-live
repo folder, and backing up is simple because of that.
Step 1:
To make upgrading simpler we will clone archipelago-deployment-live (empty one) into a different folder. That way we can copy complete folders of configs and files instead of fetching them from github one by one.
Go to your home folder (for the sake of this documentation it will be /home/ec2-user
but you can also use the $HOME
environmental variable instead )
cd /home/ec2-user
git clone https://github.com/esmero/archipelago-deployment-live archipelago-deployment-live-1.5.0
cd archipelago-deployment-live-1.5.0
git switch 1.5.0
Now, on a terminal, cd
into your running (again: not the previously cloned one, your actual running one) archipelago-deployment-live
folder, then cd
inside the deploy/ec2-docker
subfolders and shut down your docker-compose
ensemble by running the following:
docker-compose down
Step 2:
Verify that all containers are actually down. The following command should return an empty listing:
docker ps
If anything is still running, wait a little longer and run the command again.
Step 3:
Note: If you are coming from the more general Upgrade 1.5.0 guide you probably already did the backend a few minutes ago. Skip to Step 4 if you are certain you did.
Now let's tar.gz
the whole ensemble with data and configs. We will exclude here the local source caches generated by Cantaloupe. If these or not exist will depend on how custom your
deployment is.
As an example we will save this into your $HOME
folder. As a good practice we append the current date (YEAR-MONTH-DAY) to the filename. Here we assume today is July 16th of 2025.
We will cd
back to the parent folder of your running archipelago-deployment-live
folder, so three levels down, assuming you are right now inside archipelago-deployment-live/deploy/ec2-docker
cd ../../..
sudo tar --exclude=archipelago-deployment-live/data_storage/iiifcache --exclude=archipelago-deployment-live/data_storage/iiiftmp -czvpf $HOME/archipelago-deployment-D10-20250716.tar.gz archipelago-deployment-live
The process may take a few minutes. Now let's verify that all is there and that the tar.gz
is not corrupt.
tar -tvvf $HOME/archipelago-deployment-D10-20250716.tar.gz
You will see a listing of files, and at the end you will see something like this: Archive Format: POSIX pax interchange format, Compression: gzip
. If corrupt (Do you have enough space? Did your ssh connection drop?) you will see the following:
tar: Unrecognized archive format
Step 4:
cd
again into your running archipelago-deployment-live
folder, then cd
inside the deploy/ec2-docker
Restart your docker-compose
ensemble, and wait a little while for all to start.
docker-compose up -d
Step 5:
Export/backup all of your live Archipelago 1.4.0 (or 1.3.0?), Drupal 10 configurations (this allows you to compare/come back in case you lose something custom during the upgrade).
docker exec esmero-php mkdir config/backup
docker exec esmero-php drush cex --destination=/var/www/html/config/backup
Good. Now it's safe to begin the upgrade process.
Upgrading to Solr 9.8
Step 0: Get familiar with what changed.
Running a Production Server requires some informed decision making and thus, we believe, a good pre-step is reviewing what changed between releases. In specific focus on this folder.
https://github.com/esmero/archipelago-deployment-live/tree/1.5.0/config_storage/solrconfig/conf
and
https://github.com/esmero/archipelago-deployment-live/tree/1.5.0/data_storage/solrlib
Step 1: Edit docker-composer.yml
You want to replace your current Solr Service (in its enterity. Please make sure indendation is 1:1) with this.
solr:
container_name: esmero-solr
restart: always
image: "solr:9.8.1"
# If running Docker < 20.10.10 please uncomment the following lines
# See https://solr.apache.org/guide/solr/latest/upgrade-notes/major-changes-in-solr-9.html#solr-9-2
#security_opt:
# - seccomp:unconfined
tty: true
environment:
SOLR_HEAP: 1024m
SOLR_OPTS: -Dsolr.jetty.request.header.size=65535 -Dsolr.install.dir=/opt/solr
SOLR_MODULES: "extraction,langid,ltr,analysis-extras,scripting"
SOLR_LOG_LEVEL: "WARN"
ports:
- "8983:8983"
networks:
- host-net
- esmero-net
volumes:
- ${ARCHIPELAGO_ROOT}/data_storage/solrcore:/var/solr/data
- ${ARCHIPELAGO_ROOT}/config_storage/solrconfig:/drupalconfig
- ${ARCHIPELAGO_ROOT}/data_storage/solrlib:/var/solr/data/lib
entrypoint:
- docker-entrypoint.sh
- solr-precreate
- drupal
- /drupalconfig
Please double check and use any of these as reference:
- https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/deploy/ec2-docker/docker-compose-aws-s3-arm64.yml
- https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/deploy/ec2-docker/docker-compose-aws-s3.yml
For this, if you have not already, navigate to deploy/ec2-docker
and run:
docker-compose down
Then open your docker-compose.yml file, find the solr:
key and replace with the previous YAML snippet. If for some unknown reason your voluments do not match our defaults, please adapt to your custom edits so they match where the Solr Core and Libraries are saved:
nano docker-compose.yml
Save your changes.
Step 2A: Hopefully try to upgrade without reindexing.
If you are coming from a "I deployed 1.4.0 and have been running Solr without touching anything on the backend" situation, Step 2A might be for you!
To be sure: if after comparing https://github.com/esmero/archipelago-deployment-live/tree/1.5.0/config_storage/solrconfig/conf with your current live configuration found inside your archipelago-deployment-live at data_storage/solrcore/drupal/conf
, you don't see MAYOR field definition changes, there is extra hope. Solr 9.8 does bring larger changes in the way Libraries (now named Modules) are used so we still will need to customize some of your Live configurations and check logs afterwards.
What is a mayor change?
Look for schema_extra_types.xml
on both 1.5.0 and your data_storage/solrcore/drupal/conf
. Diff them. If, for example your live/running schema_extra_types.xml
has a field definition like
<fieldType name="text_ngramstring" class="solr.TextField" positionIncrementGap="100" termPositions="false" termOffsets="true" storeOffsetsWithPositions="false">
while 1.5.0 has (it has)
<fieldType name="text_ngramstring" class="solr.TextField" positionIncrementGap="100" termPositions="true" termOffsets="true" storeOffsetsWithPositions="true" termVectors="true">
Then `"text_ngramstring"
will differ in the way it stores its values at the physical index, and some changes might require a complete re-index. You could still try 2A but if something fails (logs) you migth end having to jump to Step 2B.
If your live/running schema_extra_types.xml
has entries with name="some_name" that is not present in 1.5.0, then you might need to manually edit your future live
schema_extra_types.xml`, after copying from 1.5.0, to ensure your custom types are preserved. If unsure jump to Step 2B.
If you are still here, then let's upgrade!
We need the new configurations for your Solr. Remember we downloaded a reference/empty Archipelago Deployment Live 1.5.0. For this guide let's assume it is located at /home/ec2-user/archipelago-deployment-live-1.5.0
.
We are going to use the files there to replace your startup and also live configs. cd
back to your live deployment assuming here it is (still) /home/ec2-user/archipelago-deployment-live
Run (line by line)
cd /home/ec2-user/archipelago-deployment-live
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/config_storage/solrconfig/conf/* config_storage/solrconfig/conf/.
sudo cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/config_storage/solrconfig/conf/* data_storage/solrcore/drupal/conf/.
sudo sudo chown -R 8983:8983 data_storage/solrcore
Now we need to remove the old OCR library and replace with the new one
rm /home/ec2-user/archipelago-deployment-live/data_storage/solrlib/*.jar
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/data_storage/solrlib/solr-ocrhighlighting-0.9.4-SNAPSHOT.jar data_storage/solrlib/.
Done. You can jump to Step 3!
Step 2B: Wipe clean.
You decided Step 2A was not right for you and you are 100% aware going this route here will require reindexing which, depending on the size of your repository might span from several hours to days. you Sure?
Get the new configs. Get the new OCR Highlight library
Wait! (breath.)
Repating: This step is only required if you are moving from Solr 8.x to 9.x or inside 9.x you have solr field type
definition changes. If you had a stock 1.3.0 with Solr 9.1 or a stock 1.4.0 with solr 9.2 and want to move to solr 9.8 you can skip deleting everything and go back to Step 2A!
This step requires some nerve. Be sure you know where you are inside your terminal (always)
Inside your archipelago-deployment-live folder run:
cd data_storage/solrcore
pwd
You should see something like
/home/ec2-user/archipelago-deployment-live/data_storage/solrcore
Which means you are in the correct folder. Now time to clean your index (really think twice here ok? You have a backup. Never run any of these without a backup)
sudo rm -rf *
Now we need the new configurations for your Solr (so then docker container can re-create the index from scratch). Remember we downloaded a reference/empty Archipelago Deployment Live 1.4.0 at /home/ec2-user/archipelago-deployment-live-1.4.0
.
We are going to use the files there to replace your own configs. cd
back to your live deployment assuming here it is (still) /home/ec2-user/archipelago-deployment-live
cd /home/ec2-user/archipelago-deployment-live
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/config_storage/solrconfig/conf/* config_storage/solrconfig/conf/.
Now we need to remove the old OCR library and replace with the new one
rm /home/ec2-user/archipelago-deployment-live/data_storage/solrlib/*.jar
cp -rpv /home/ec2-user/archipelago-deployment-live-1.5.0/data_storage/solrlib/solr-ocrhighlighting-0.9.4-SNAPSHOT.jar data_storage/solrlib/.
Done.
Step 3: docker pull and check
Time to fetch the latest Solr:
Navigate to your deploy/ec2-docker
and run:
docker compose pull
docker compose up -d
Give all a little time to start. Please be patient. To ensure all is well, run (more than once if necessary) the following:
docker ps
You should see something like this if you synced all containers to the latest (your versions and database might vary depending on your server's platform, hashes and time up too depending on when you ran the commands):
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5b06ee366f58 jonasal/nginx-certbot "/docker-entrypoint.…" 10 minutes ago Up 10 minutes 0.0.0.0:8001->80/tcp esmero-web
1409f41b6068 solr:9.8.1 "docker-entrypoint.s…" 10 minutes ago Up 10 minutes 0.0.0.0:8983->8983/tcp esmero-solr
e9361ed424ab esmero/cantaloupe-s3:6.0.5-noturbojpeg-multiarch "sh -c 'java -Dcanta…" 10 minutes ago Up 10 minutes 0.0.0.0:8183->8182/tcp esmero-cantaloupe
1dc524aeb6b4 mariadb:10.6.22-focal "docker-entrypoint.s…" 10 minutes ago Up 10 minutes 3306/tcp esmero-db
85bedadf9732 redis:6.2-alpine "docker-entrypoint.s…" 10 minutes ago 10 minutes ago esmero-redis
6a9e9d8647a9 minio/minio:RELEASE.2022-06-11T19-55-32Z "/usr/bin/docker-ent…" 10 minutes ago Up 10 minutes 0.0.0.0:9000-9001->9000-9001/tcp esmero-minio
aa82d6b42ec6 esmero/php-8.3-fpm:1.5.0-multiarch "docker-php-entrypoi…" 10 minutes ago Up 10 minutes 9000/tcp esmero-php
458e826199bd esmero/esmero-nlp:1.4.2-multiarch "/usr/local/bin/entr…" 10 minutes ago Up 10 minutes 0.0.0.0:6400->6400/tcp esmero-nlp
Important here is the STATUS
column. It needs to be a number that goes up in time every time you run docker ps
again (and again).
Check your Solr logs for failure messages.
docker logs -f esmero-solr -n 100
Try searching (general one first, then a full text involving OCR - highlights - e.g in IABookReader or Mirador) in your repo.
Step 4 (Optional): New Drupal Solr Field configs
You might want to review/compare (now or later) your current Drupal Search API Index against our most current one here:
In specific:
- https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/drupal/config/sync/search_api.server.esmero_solr.yml
- https://github.com/esmero/archipelago-deployment-live/blob/1.5.0/drupal/config/sync/search_api.index.default_solr_index.yml
and anything that starts with search_api.
too.
If you decide/not decide to syncronize Drupal's Search API Server, Index and fields is personal decision (optional). We do recommend it but your Solr Index might have many customizations already, so a better/pro approach would be do diff
the .yml files and decide selectively.
If you make Drupal Search API changes, go to Step 5 if you did execute Step 2B (wipe) go to Step 5 If neither you are DONE! (Step 6)
Step 5: Re-index Drupal Search API
Only run this if you ran Step 2B and/or Step 4. SKIP IF NOT.
Run the following:
docker exec esmero-php drush search-api-reindex
docker exec esmero-php drush search-api-index
Check your Drupal logs, try some searches.
STEP 6: DONE!
Done! Hurrah!
Need help? Strange logs? Searching for happiness leads to no results? Missed a step? Need a hug or someone that listens to you in silence?
If you see any issues or errors or need help with a step, please let us know (ASAP!). You can either open an issue
in this repository or use the Google Group. We are here to help.