Tuesday, August 28, 2018

OSSEC postfix email using localhost doesn't work

OSSEC had issues sending me emails with the following error message in the /var/ossec/logs/ossec.log

ERROR: Error Sending email to localhost (smtp server)

OSSEC was configured to use POSTFIX as my SMTP host as configured in the /var/ossec/etc/ossec-server.conf

  <global>  
   <email_notification>yes</email_notification>  
   <email_to>philip.wu@anu.edu.au</email_to>  
   <smtp_server>localhost</smtp_server>  
   <email_from>patient-lookup@130.56.244.180</email_from>  
  </global>  


Once I changed localhost to 127.0.0.1, postfix emails worked:

  <global>  
   <email_notification>yes</email_notification>  
   <email_to>philip.wu@anu.edu.au</email_to>  
   <smtp_server>127.0.0.1</smtp_server>  
   <email_from>patient-lookup@130.56.244.180</email_from>  
  </global>  

Reference:

https://github.com/ossec/ossec-hids/issues/1122 

Tuesday, August 21, 2018

Encrypting postgres backups

Lately I've been dabbling in the world of security. While I'd more interested doing other things like building features and tackling research problems, security is something that should be part of every day thinking when designing solutions. One area of security focuses on databases.

While I've made the effort to doubly encrypt the postgres data at rest: One at the table column level, where certain fields are encrypted and two, at the file system level as a separate attached volume where postgres lives, these efforts would be useless if the database backups were stored as plain text. True, the encrypted fields would remain encrypted, but for peace of mind, let's encrypt the backups themselves!

Here I'll using GPG (GNU Privacy Guard) encryption on a Centos 7 machine with a postgres database. While there is a lot of information about GPG on the web, I couldn't find a comprehensive article on how to do this. So here we go!

First let's install GPG


yum install gnupg2


Since I'm using the postgres user to perform the automated backups with ident authentication, we need to switch to the postgres user (assuming we are already the root user):


# become the postgres user
su postgres


When generating GPG keys, it will ask for a passphrase using TTY. Unfortunately, GPG doesn't work well when running the terminal in an 'su session' just as we have done with the above command. To workaround this, we issue the following command:

# workaround to generate gpg key in a su session as postgres
script /dev/null

Redirecting the script to /dev/null causes screen to not try to write to the controlling terminal, so it doesn't hit the permission problem.

Now can generate the GPG keys for the postgres user. You will be asked for a passphrase - keep this some where safe.

bash-4.2$ gpg2 --gen-key
gpg (GnuPG) 2.0.22; Copyright (C) 2013 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection?
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048)
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
        = key expires in n days
      w = key expires in n weeks
      m = key expires in n months
      y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

GnuPG needs to construct a user ID to identify your key.

Real name: postgres
Email address:
Comment:
You selected this USER-ID:
    "postgres"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.

The important thing to take note of is the 'Real name' which I've specified as 'postgres'. We will use this 'Real name' later when we perform the encryption.

At this stage, it seemed to just hang without any idea if it was doing anything at all. In my first attempt, I had let it sit for over an hour and still nothing. Turns out Entropy takes a long time if there's no system activity. So let's introduce 'random activity' in another terminal:


yum install rng-tools
rngd -r /dev/urandom



After running the rngd command, you notice almost immediately in the other terminal, that the GPG key gen has complated. Now you can kill the rngd process that's still running in the background.


ps -aux | grep rngd
root     25652  0.0  0.0  13216   368 ?        Ss   14:37   0:00 rngd -r /dev/urandom
root     25665  0.0  0.0 112704   976 pts/0    S+   14:37   0:00 grep --color=auto rngd
kill -9 25652


To troubleshoot entropy availability, you can monitor entropy availability here which should sit at around 1450 when idle. When being consumed, it should be much lower:

watch cat /proc/sys/kernel/random/entropy_avail

Now that we have our GPG keys, we are ready to encrypt files. Here I've created a script to execute the postgres backups, compression and encryption all in one step:

pg_dump -U postgres db_name | gzip > /backups/db_backup_$(date +%Y-%m-%d).psql.gz
gpg -e –r postgres /home/backups/patient_lookup_$(date +%Y-%m-%d).psql.gz
rm -rf /home/backups/patient_lookup_$(date +%Y-%m-%d).psql.gz
chmod 0600 -R /backups/*.gpg

The first line using pg_dump generates a compressed GZ backup file.
The second line then takes the GZ file and encrypts it, creating a new GPG file. The -e argument tells GPG to encrypt and the -r argument specifies the recipient which in this case is the postgres user that we specified earlier when generating the GPG keys.
Since GPG creates a new file, we remove the GZ file in the third line.
Then we only allow read/write permissions for the postgres user on the fourth line.

You can run the script on a cron job to routinely do your backups.

Of course, before you put this into production, you should check to ensure you can successfully decrypt the backups.

su postgres
script /dev/null
gpg postgres_backup.gpg

If this helped you please like! Thx

References:

Sunday, July 22, 2018

SELINX and postgres troubles

OS version: Centos 7

Upon enabling SELINUX, I noticed that the postgres service hadn't started. When I checked the logs I noticed the following error message:


 [root@webserver data]# systemctl status postgresql.service  
 ● postgresql.service - PostgreSQL database server  
   Loaded: loaded (/usr/lib/systemd/system/postgresql.service; enabled; vendor preset: disabled)  
   Active: failed (Result: exit-code) since Sun 2018-07-22 23:18:52 UTC; 8s ago  
  Process: 2903 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=1/FAILURE)  
  Process: 2897 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)  
 Jul 22 23:18:51 webserver.novalocal systemd[1]: Starting PostgreSQL database server...  
 Jul 22 23:18:51 webserver.novalocal pg_ctl[2903]: postgres cannot access the server configuration file "/var/lib/pgsql/data/postgresql.conf": Permission denied  
 Jul 22 23:18:52 webserver.novalocal pg_ctl[2903]: pg_ctl: could not start server  
 Jul 22 23:18:52 webserver.novalocal systemd[1]: postgresql.service: control process exited, code=exited status=1  
 Jul 22 23:18:52 webserver.novalocal systemd[1]: Failed to start PostgreSQL database server.  
 Jul 22 23:18:52 webserver.novalocal systemd[1]: Unit postgresql.service entered failed state.  
 Jul 22 23:18:52 webserver.novalocal systemd[1]: postgresql.service failed.  

To view the SELinux security context:
 [root@webserver var]# ls -Z /var/lib/pgsql/data/  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 base  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 global  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_clog  
 -rw-------. postgres postgres system_u:object_r:unlabeled_t:s0 pg_hba.conf  
 -rw-------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_ident.conf  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_log_t:s0 pg_log  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_multixact  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_notify  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_serial  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_snapshots  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_stat_tmp  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_subtrans  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_tblspc  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_twophase  
 -rw-------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 PG_VERSION  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_xlog  
 -rw-------. postgres postgres system_u:object_r:default_t:s0  postgresql.conf  
 -rw-------. postgres postgres system_u:object_r:postgresql_db_t:s0 postmaster.opts  

We can see that the postgresql.conf file was incorrectly assigned a type of default_t.

I noticed there were several other files in the postgresql data folder that had a similar problem. To fix the type for all files under the data folder run the following command:


 chcon -R system_u:object_r:postgresql_db_t:s0 /var/lib/pgsql/data/**  

Rechecking the SElinux contexts:

 [root@webserver var]# ls -Z /var/lib/pgsql/data/  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 base  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 global  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_clog  
 -rw-------. postgres postgres system_u:object_r:unlabeled_t:s0 pg_hba.conf  
 -rw-------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_ident.conf  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_log_t:s0 pg_log  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_multixact  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_notify  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_serial  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_snapshots  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_stat_tmp  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_subtrans  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_tblspc  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_twophase  
 -rw-------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 PG_VERSION  
 drwx------. postgres postgres unconfined_u:object_r:postgresql_db_t:s0 pg_xlog  
 -rw-------. postgres postgres system_u:object_r:postgresql_db_t:s0 postgresql.conf  
 -rw-------. postgres postgres system_u:object_r:postgresql_db_t:s0 postmaster.opts  

Now that it's fixed, turn on postgresql


 service postgresql start  

References:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/security-enhanced_linux/sect-security-enhanced_linux-working_with_selinux-selinux_contexts_labeling_files

Thursday, May 24, 2018

Gradle OutOfMemoryError



·         OutOfMemoryError – It’s possible that gradle is running in 32-bit mode when it should be running in 64 bit mode. To check whether gradle is running in 32-bit mode or 64-bit mode, in the build.gradle file, dump out a few system properties as follows:
println System.properties['os.arch']
println System.properties['sun.arch.data.model']

If sun.arch.data.model has a value of 32, then it’s running in 32-bit mode.

Double check that the JAVA_HOME environment variable is set to a path similar to
C:\Program Files\Java\jdk1.8.0_171
If the path points to C:\Program Files (x86), then it is likely to be running in 32 bit mode. In this case reinstall jdk for java in 64-bit.
Another symptom of running in 32-bit mode is if you try increasing the memory allocation higher than 1 GB you may get the following error: 

C:\Users\Philip\git\lims>gradle clean
Error occurred during initialization of VM
Could not reserve enough space for 2097152KB object heap

Sunday, May 14, 2017

ElasticSearch: Issues Adding a new node

The Issue

The new node was visible on the cluster but existing shards were not relocating to the new node

The steps

I had a pre-existing elasticsearch cluster of 3 nodes, and I went about adding a new node. In a round-robin fashion, I updated the elasticsearch.yml configuration of the pre-existing nodes to include the new node by updating the list of hosts and the minimum number of master nodes:

elasticsearch.yml
 discovery.zen.ping.unicast.hosts: ["10.0.0.1", "10.0.0.2", "10.0.0.3", "10.0.0.4"]  
 discovery.zen.minimum_master_nodes: 3  

Restarting each node, and checking the health status as follows:


 [root@mongo-elastic-node-1 centos]# curl -XGET 10.0.0.1:9200/_cluster/health?pretty  
 {  
  "cluster_name" : "cpi",  
  "status" : "green",  
  "timed_out" : false,  
  "number_of_nodes" : 4,  
  "number_of_data_nodes" : 4,  
  "active_primary_shards" : 40,  
  "active_shards" : 71,  
  "relocating_shards" : 2,  
  "initializing_shards" : 0,  
  "unassigned_shards" : 0,  
  "delayed_unassigned_shards" : 0,  
  "number_of_pending_tasks" : 0,  
  "number_of_in_flight_fetch" : 0,  
  "task_max_waiting_in_queue_millis" : 0,  
  "active_shards_percent_as_number" : 100.0  
 }  

The important item to notice from above, is the bit about "relocating_shards". Here it's saying that the cluster is relocating 2 shards. To find out which shards are going where, you can check with this command:

 [root@mongo-elastic-node-1 centos]# curl -XGET http://10.0.0.9:9200/_cat/shards | grep RELO  
  % Total  % Received % Xferd Average Speed  Time  Time   Time Current  
                  Dload Upload  Total  Spent  Left Speed  
 100 7881 100 7881  0   0  318k   0 --:--:-- --:--:-- --:--:-- 334k  
 cpi12         2 p RELOCATING 6953804  5.8gb 10.0.0.2 cpi2 -> 10.0.0.4 fBmdkD2gT6-jTJ6k_bEF0w cpi4  
 cpi12         0 r RELOCATING 6958611  5.5gb 10.0.0.3 cpi3 -> 10.0.0.4 fBmdkD2gT6-jTJ6k_bEF0w cpi4  

Here's it's saying that cluster is trying to send shards belonging to the index called cpi12 from node cpi3 and node cpi2 to node cpi4. More specifically, it's trying to send shard #2 and shard #0 by RELOCATING them to cpi4. To monitor it's progress, I would login into cpi4 and see if the diskspace usage was going up. And here is where I noticed my first problem:


 [root@elastic-node-4 elasticsearch]# df -h  
 Filesystem   Size Used Avail Use% Mounted on  
 /dev/vdb     69G  52M  66G  1% /mnt  

The mounted folder where I expected to find my elasticsearch data remained unchanged at 52 MB.

Debugging

I remained stumped on this one for a long time and did the following checks:

  • The elasticsearch.yml config file for every node ensuring that discovery.zen.ping.unicast.hosts was correctly.
  • Every node could ping the new node and vice versa.
  • Every node could access ports 9200 and 9300 on the new node and vice-versa using the telnet command.
  • Every node had sufficient diskspace for the shard relocation
  • New node had the right permissions to write to it's elasticsearch folder
  • Check cluster settings: curl 'http://localhost:9200/_cluster/settings?pretty' and look for cluster.routing settings
  • Restarted elasticsearch on each node 3 times over
However, none of the above solved the issue. Even worse, the repeated restarts of each node, managed to get my cluster into an even worse state where now some of shards became UNASSIGNED:

 [root@mongo-elastic-node-1 bin]# curl -XGET http://10.0.0.1:9200/_cat/shards | grep UNASS  
  % Total  % Received % Xferd Average Speed  Time  Time   Time Current  
                  Dload Upload  Total  Spent  Left Speed  
 100 5250 100 5250  0   0  143k   0 --:--:-- --:--:-- --:--:-- 146k  
 .marvel-es-2017.05.13 0 p UNASSIGNED  
 .marvel-es-2017.05.13 0 r UNASSIGNED  
 .marvel-es-2017.05.14 0 p UNASSIGNED  
 .marvel-es-2017.05.14 0 r UNASSIGNED  
 cpi14         1 p UNASSIGNED  
 cpi13         1 p UNASSIGNED  
 cpi13         4 p UNASSIGNED  

After much browsing on the web, there was one forum that mentioned the state of the plugins on all nodes must be exactly the same as referenced from here: http://stackoverflow.com/questions/28473687/elasticsearch-cluster-no-known-master-node-scheduling-a-retry

The solution

The question about the plugins got my memory jogging where I had previously installed the marvel plugin. To see what plugins are installed for each node, run the plugin command from the command-line:

 [root@elastic-node-3 elasticsearch]# cd /usr/share/elasticsearch/bin  
 [root@elastic-node-3 bin]# ./plugin list  
 Installed plugins in /usr/share/elasticsearch/plugins:  
   - license  
   - marvel-agent  

It turned out my pre-existing 3 nodes each had the license and marvel-agent plugins installed. Whereas the fresh install of the 4th node had no plugins at all. Because of this, the nodes were able to acknowledge each other, but refused to talk. To fix this, I manually removed the plugins for each node:

 [root@elastic-node-3 bin]# ./plugin remove license  
 -> Removing license...  
 Removed license  
 [root@elastic-node-3 bin]# ./plugin remove marvel-agent  
 -> Removing marvel-agent...  
 Removed marvel-agent  

Before I could see if shard relocation would work, I first had to assign the UNASSIGNED shards:

 [root@mongo-elastic-node-1 elasticsearch]# curl -XPOST -d '{ "commands" : [{ "allocate" : { "index": "cpi14", "shard":1, "node":"cpi4", "allow_primary":true } }]}' localhost:9200/_cluster/reroute?pretty  

I had repeat this command for every UNASSIGNED shard. Checking the cluster health, I could see that there were no more unassigned shards, and that there were 2 shards currently relocating:

 [root@elastic-node-4 elasticsearch]# curl -XGET localhost:9200/_cluster/health?pretty  
 {  
  "cluster_name" : "cpi",  
  "status" : "green",  
  "timed_out" : false,  
  "number_of_nodes" : 4,  
  "number_of_data_nodes" : 4,  
  "active_primary_shards" : 40,  
  "active_shards" : 71,  
  "relocating_shards" : 2,  
  "initializing_shards" : 0,  
  "unassigned_shards" : 0,  
  "delayed_unassigned_shards" : 0,  
  "number_of_pending_tasks" : 0,  
  "number_of_in_flight_fetch" : 0,  
  "task_max_waiting_in_queue_millis" : 0,  
  "active_shards_percent_as_number" : 100.0  
 }  

Again, checking the diskspace usage on the new node this time showed that shards were indeed relocating this time! Yay!

References

http://stackoverflow.com/questions/23656458/elasticsearch-what-to-do-with-unassigned-shards

http://stackoverflow.com/questions/28473687/elasticsearch-cluster-no-known-master-node-scheduling-a-retry

https://www.elastic.co/guide/en/elasticsearch/plugins/2.2/listing-removing.html

Wednesday, May 3, 2017

MongoDB switching to WireTiger storage engine

We were already running in production with a mongodb cluster of 3 nodes (replicated) which were running out of diskspace, each node having access to a 750 GB drive at 77% usage. The obvious solution was to expand the diskspace, but at the same time I wanted to be more efficient with the disk space usage itself.

Previously we were using the storage engine callled MMAPv1 which had no support for compression and I wanted to switch over to the WireTiger storage engine which does have support for compression options.

Here I describe the strategy I used :


Since my mongoDB cluster was replicated, I was able to take down one node at a time to perform the switch over to WiredTiger. Once I was finished with one node, I could bring it back up, and take down the next node, and so on until all nodes were upgraded. By doing it this way, there was no downtime whatsoever from the perspective of the user.


For each node I did the following:



  • Shutdown the mongod service
  • Moved the mongo data folder, which in my case was /var/lib/mongo, to another volume attached storage for backup purposes in case procedure fails.
  • Recreate the mongo data folder, in my case /var/lib/mongo and assign the appropriate permissions: chown mongod:mongod /var/lib/mongo
  • Modify the /etc/mongod.conf configuration file to include the following: storageEngine=wiredTiger
  • Restart mongod service
  • Check wiredTiger is configured correctly using the mongo command-line:
 db.serverStatus().storageEngine  
 { "name" : "wiredTiger", "supportsCommittedReads" : true }  

Now that the node is back up and running, replication will happen in the background. If you head over to your primary mongo node, and type rs.status() and you should see a status of STARTUP2.
Once the node has replicated successfully, repeat the same procedure for the next node.

Reference:


https://docs.mongodb.com/v3.0/release-notes/3.0-upgrade/?_ga=1.86531032.1131483509.1428671022#change-replica-set-storage-engine-to-wiredtiger

https://askubuntu.com/questions/643252/how-to-migrate-mongodb-2-6-to-3-0-with-wiredtiger

Tuesday, March 14, 2017

mongodb won't start: Data directory /data/db not found

Our VM provider suffered a hardware failure and one of our mongo nodes failed to start up. Using the command:

 sudo mongod 

 I had the following error message:
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] MongoDB starting : pid=1347 port=27017 dbpath=/data/db 64-bit host=mongodb-node-3.novalocal  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] db version v3.2.5  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] git version: 34e65e5383f7ea1726332cb175b73077ec4a1b02  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] allocator: tcmalloc  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] modules: none  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] build environment:  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten]   distmod: rhel70  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten]   distarch: x86_64  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten]   target_arch: x86_64  
 2017-03-15T09:05:27.963+1100 I CONTROL [initandlisten] options: {}  
 2017-03-15T09:05:27.985+1100 I STORAGE [initandlisten] exception in initAndListen: 29 Data directory /data/db not found., terminating  
 2017-03-15T09:05:27.985+1100 I CONTROL [initandlisten] dbexit: rc: 100  
I've highlighted the important error message in red.

It appears mongo was completely ignoring my configured dbpath specified in the /etc/mongod.conf.

So what was going on?

Turns out that mongo's journal folder got corrupted when the server immediately shutdown. So removing /var/lib/mongo/journal folder solved the problem.


Then I restarted mongod and that got everything back up and running again!

Reference: http://stackoverflow.com/questions/20729155/mongod-shell-doesnt-start-data-db-doesnt-exsist