Dynamically List Git Branches in Jenkins Parameter

Jenkins git parameter plugin doesn’t list git branches if you work with bitbucket or at least it didn’t work for me, so I had to find other solution.

To get my git branches dynamically in a parameter I’m using Active Choices Plug-in with two scripts: one groovy that return the results to jenkins and the other one is a wrapper in bash which the groovy script use to get a list of git branches (because I don’t really know groovy:))

Prerequisite

Before using this workaround you need to configure git on your jenkins server

How To

  • create get_git_branches.sh bash script that will list your git branches
vi /usr/local/bin/get_git_branches.sh
#!/bin/bash
GIT_URL=$1
git ls-remote --heads --tags ${GIT_URL} | awk -F" " '{print $NF}'
  • make sure the script is executable
chmod +x /usr/local/bin/get_git_branches.sh
  • In jenkins job configuration add “Active Choices Reactive Parameter”
  • In the name field enter BRANCH (or what ever you want)
  • Click on Groovy script and enter the following script
tags = []
text = "get_git_branches.sh https://user:[email protected]/project/repo_name.git".execute().text
text.eachLine { tags.push(it) }
return tags

jenkins_git_branches_1

 

  • In the “Source  Code Management” section in “Branch to build” enter ${BRANCH}

jenkins_git_branches_2

if you have better suggestions or better groovy script please write a comment

 

Automatic Backup of AWS instances

There is no builtin option in AWS to backup instances automatically, so I created a ruby script that can run from crontab and create automatic AMI images from ec2 instances.

aws_ami_autobackup.rb works with ec2 tags, the script get tag and value and create AMI from all instances that has this tag and value.

Here is how to install and use the script

Prerequisite

  • Install ruby (I use ruby 2.2) with aws-sdk-resources gem.
  • Create IAM account with privileges to create and remove ec2 snapshots and AMI and save his access key and secret key. The quickest way is to use AmazonEC2FullAccess policy.
  • Create credentials file for the user that will run the tool in ~/.aws/credentials
vi ~/.aws/credentials
[default]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY

 

How To Use aws_ami_autobackup.rb

Here are few examples on how to use the tool:

  • To take an ami on all instances that contain the tag daily-backup with value of true in us-east-1 region and keep them for 7 days:
/usr/local/bin/aws_ami_autobackup.rb -t daily_backup -v true -r us-east-1 -x 7
  • To take an ami on all instances that contain the tag daily-backup with value of true in us-east-1 region from multiple profiles (aws accounts):
for i in dev qa test; do /usr/local/bin/aws_ami_autobackup.rb -t daily_backup -v true -r us-east-1 -x 7 -p ${i}; done
  • Create cronjobs that take an ami every day at 00:00 and keep them for 30 days:
00 00 * * * /usr/local/bin/aws_ami_autobackup.rb -t daily_backup -v true -r us-east-1 -x 7

Now you just need to add the right tags to your instances and test it 🙂

Stop Start AWS Instances Automatically

In order to save money in AWS you can stop dev instances at night and weekends and start them again in the morning.

I created a wrapper script for AWS cli tools (stop_start_aws_instances.sh) that with cronjobs can help you automatically stop aws instances when you don’t use them.

The script is located here:
https://github.com/nachum234/scripts/blob/master/stop_start_aws_instances.sh

Prerequisite

In order to use the script you need to install and configure aws tools.

Here is a quick how to install and configure aws tools:

  • Install aws cli tools
sudo pip install awscli
  • Create IAM account with privileges to stop and start ec2 instances and save his access key and secret key. The quickest way is to use AmazonEC2FullAccess policy.
  • Configure aws cli tools. You need to enter the user access key and secret key
aws configure

or if you want to configure a specific profile

aws --profile dev configure

For more information use AWS guide: http://docs.aws.amazon.com/cli/latest/userguide.

How To Use stop_start_aws_instances.sh

Here are few examples on how to use the script:

  • To stop all instances that contain the tag daily-stop with value of true in us-east-1 region:
stop_start_aws_instances.sh -p default -a stop-instances -f Name=tag:daily-stop,Values=true -r us-east-1
  • To test on which instances the action will apply on, run the script with describe-instances action:
stop_start_aws_instances.sh -p default -a describe-instances -f Name=tag:daily-stop,Values=true -r us-east-1
  • To stop all instances that contain the tag daily-stop with value of true in us-east-1 region from multiple profiles:
for i in dev qa test; do stop_start_aws_instances.sh -p $i -a stop-instances -f Name=tag:daily-stop,Values=true -r us-east-1; done
  • Create cronjobs that start instances every working days at 9:00 and stop instances at every day at 19:00:
00 09 * * 1-5 /usr/local/bin/stop_start_aws_instances.sh -p default -a start-instances -f Name=tag:daily-start,Values=true -r us-east-1
00 19 * * * /usr/local/bin/stop_start_aws_instances.sh -p default -a stop-instances -f Name=tag:daily-stop,Values=true -r us-east-1

Now you just need to add the right tags to your instances and test it 🙂

Install docker swarm with consul, consul-template, registrator and haproxy

Tested On

OS: Ubuntu 14.04
Docker version: 1.10

About

Docker is great platform for build, ship and run application. Docker swarm is a native clustering for docker.

Swarm need discovery service for managing docker nodes and I choose to use consul for that because it’s a simple discovery service application and they also have consul-template which can be used to build dynamic configuration files for haproxy or other web servers. Other good options that docker support are etcd or zookeeper.

Consul can also be used as a key value store and monitoring system but here I am going to use it to manage docker nodes and my app services with registrator.

Network Architecture:

Docker swarm - New Page

Swarm discovery:

  1. (a) Swarm manager register it self in consul server that runs on the same host
  2. (b) Each swarm agent register it self in his local consul-client
  3. (c) The consul-client forward the registration action to the consul-server and the consul server register and add the swarm client to the cluster
  • In production you should run at least 3 consul servers and 3 swarm servers for high availability

App discovery:

  1. registrator listen for new containers that start inside docker
  2. registrator register the published ports of the new container in consul-client
  3. consul-client forward the publish ports to consul server
  4. consul-template run as a daemon and generate new haproxy configuration file based on template that include all added/removed containers of app and reload haproxy.

Installation

I am going to use 3 servers:

  1. mgr
  2. docker-1
  3. docker-2
  • install docker (all servers)
apt-get update
apt-get upgrade
apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
echo "deb https://apt.dockerproject.org/repo ubuntu-trusty main" > /etc/apt/sources.list.d/docker.list
apt-get update
apt-get purge lxc-docker
apt-get install linux-image-extra-$(uname -r) -y
apt-get install docker-engine -y
echo "DOCKER_OPTS=\"--cluster-advertise=192.168.11.10:2375 --cluster-store=consul://swarm-mgr:8500 -H tcp://0.0.0.0:2375 -H unix:///var/run/docker.sock \"" >> /etc/default/docker service docker restart
  • start consul server (mgr server)
export PRIVATE_IP=192.168.11.10
docker run -d --name consul-srv-1 --restart=always -h consul-srv-1 -v /var/lib/consul:/data -p ${PRIVATE_IP}:8300:8300 -p ${PRIVATE_IP}:8301:8301 -p ${PRIVATE_IP}:8301:8301/udp -p ${PRIVATE_IP}:8302:8302 -p ${PRIVATE_IP}:8302:8302/udp -p ${PRIVATE_IP}:8400:8400 -p ${PRIVATE_IP}:8500:8500 -p ${PRIVATE_IP}:53:53/udp progrium/consul -server -advertise ${PRIVATE_IP} -bootstrap-expect 1
  • start consul client (docker-1 and docker-2)
export PRIVATE_IP=192.168.11.11
docker run -d --name consul-client --restart=always -h consul-client -p ${PRIVATE_IP}:8300:8300 -p ${PRIVATE_IP}:8301:8301 -p ${PRIVATE_IP}:8301:8301/udp -p ${PRIVATE_IP}:8302:8302 -p ${PRIVATE_IP}:8302:8302/udp -p ${PRIVATE_IP}:8400:8400 -p ${PRIVATE_IP}:8500:8500 -p ${PRIVATE_IP}:53:53/udp progrium/consul -advertise ${PRIVATE_IP} -join 192.168.11.10
  • start swarm manager (mgr server)
docker run -d --name swarm-mgr -p 3375:2375 --restart=always swarm manage -H tcp://0.0.0.0:2375 consul://192.168.11.10:8500/
  • start swarm agent (docker-1 and docker-2)
docker run -d --name swarm-agent --restart=always swarm join --advertise=192.168.11.11:2375 consul://192.168.11.10:8500/
  • run registrator (all servers)
docker run -d --name=registrator --restart=always --net=host --volume=/var/run/docker.sock:/tmp/docker.sock gliderlabs/registrator:latest consul://192.168.11.10:8500

Haproxy

for simplicity I will install haproxy and consul-template on the mgr server as a regular daemon. You can also install them on a separate server or inside docker.

  • install haproxy
apt-get install haproxy
  • download and install consul-template
cd /usr/local/src
wget https://releases.hashicorp.com/consul-template/0.13.0/consul-template_0.13.0_linux_amd64.zip
unzip consul-template_0.13.0_linux_amd64.zip
mv consul-template /usr/local/bin/
  • create template for consul-temaplate
vi /etc/haproxy/haproxy.ctmpl
global
 log /dev/log local0
 chroot /var/lib/haproxy
 user haproxy
 group haproxy
 daemon

defaults
 log global
 mode http
 option httplog
 
frontend app
 bind *:80
 default_backend app

backend app
 balance roundrobin
 {{range service "app"}}
 server {{.Node}}-{{.Port}} {{.Address}}:{{.Port}} check fall 3 rise 5 inter 2000 weight 2 {{end}}
  • run consul-template
consul-template -consul 192.168.11.10:8500 -template /etc/haproxy/haproxy.ctmpl:/etc/haproxy/haproxy.cfg:service haproxy reload

Kafka OffsetOutOfRange error

Today I got the following error on our ruby on rails app server: Poseidon::Errors::OffsetOutOfRange

If you get offset out of range error from kafka client you need to reset your kafka consumer  group offset with the following commands:

  • find smallest offset
cd ${KAFKA_HOME}
./bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list kafka_server.local:9092 --topic topic_name --time -2

–time -2 – is for getting the smallest offset, for largest offset use -1

  • stop all consumers in consumer group
  • set kafka consumer group offset in zookeeper
cd ${KAFKA_HOME}
./bin/zookeeper-shell.sh zookeeper_server.local
...
Connecting to zk1.nyj.taptica.info
Welcome to ZooKeeper!
JLine support is disabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
...

set /consumers/consumer_group_name/offsets/topic_name/partition_number new_offset

run the set command for each partition offset that you get from the previous get offset command that you ran.

  • check your new configured offset
./bin/kafka-consumer-offset-checker.sh --zookeeper=zookeeper_server.local:2181 --topic=topic_name --group=consumer_group_name
  • start your consumer

All commands I got from this great blog https://metabroadcast.com/blog/resetting-kafka-offsets

Continuous deployment with jenkins and chef

Jenkins is a great CI/CD open source tool that we are using to deploy our applications, and chef is configuration management tool that helps you build your infrastructure automatically. I use both of them to create CI/CD environment in our projects.

Why use CI/CD:

  1. check that your code compiling
  2. can detect errors and fix them earlier
  3. commit smaller units which help you to revert back more easily
  4. automated tests helps you find broken features
  5. can release in any time
  6. avoid human errors
  7. deploy with zero downtime
  8. revert back to previous version with one click
  9. and more…

Here I am going to use chef recipes to deploy jenkins artifacts, but you can use any configuration management or scripts to do the same job.

Continuous deployment configuration

For continuous deployment I installed jenkins pipeline plugin and build the following pipeline:

jenkins1

  1. First job (build-manual) triggered manually by clicking the run icon and it creates new pipeline in our build pipeline
  2. Second job (build-deb) create deb package automatically after a successful or stable build  of  build-manual
  3. Third job (deploy-all) is a script that do the following:
    1. get all app servers from chef
    2. for each server it run chef recipe that download the created deb package and install it
    3. restart app service
    4. check that the server is ready by running a check command and only when it ready the script continue to the next server

The deployment script let us upgrade version with 0 downtime, because we upgrade one server at a time so we always have running servers behind our load balancer.

what we earn from this deployment:

  1. zero downtime
  2. automatic deployment (avoid human errors)
  3. ability to revert back – we can trigger the deploy script at any time from an older pipeline and it will download the deb package from that pipeline and install it on the servers
  4. deployment board – we know who run each build and when we deploy each version

we still can upgrade the pipeline by adding integration tests job, code coverage job, deploy to staging jobs and more.

Maybe I will get to it someday 🙂

Install pacemaker on ubuntu

Tested On

OS: Ubuntu 14.04
Pacemaker Version: 1.1.10
Corosync Version: 2.3.3

About

Pacemaker is a cluster system for linux systems. pacemaker help you create highly available services by automatically recover/failover to multiple servers.

In this guide I will explain how I install pacemaker and corosync on ubuntu and configure haproxy cluster on two servers.

Install and configure pacemaker and corosync

  • run the following steps on both servers
  • install packages using apt-get
apt-get install pacemaker corosync fence-agents
  • configure corosync (change ring0_addr to the right address):
vi /etc/corosync/corosync.conf
# Please read the corosync.conf.5 manual page
totem {
 version: 2
 secauth: off
 cluster_name: pacemaker1
 transport: udpu
}

nodelist { 
 node { 
 ring0_addr: haproxy-1
 nodeid: 101 
 } 
 node { 
 ring0_addr: haproxy-2
 nodeid: 102 
 } 
}

quorum { 
 provider: corosync_votequorum 
 two_node: 1 
 wait_for_all: 1 
 last_man_standing: 1 
 auto_tie_breaker: 0 
}

logging {
 fileline: off
 to_logfile: yes
 to_syslog: no
 debug: on
 logfile: /var/log/corosync/corosync.log
 debug: off
 timestamp: on
 logger_subsys {
 subsys: AMF
 debug: off
 }
}
  • configure corosync to start
vi /etc/default/corosync
# start corosync at boot [yes|no]
START=yes
  • start corosync

service corosync start

  • download haproxy ocf resource
cd /usr/lib/ocf/resource.d/heartbeat
 40 curl -O https://raw.githubusercontent.com/thisismitch/cluster-agents/master/haproxy
chmod +x haproxy

ocf resource is a script that pacemaker use to start, stop and monitor a resource (service)

  • install and configure haproxy
apt-get install software-properties-common
add-apt-repository ppa:vbernat/haproxy-1.6
apt-get update
apt-get install haproxy

vi /etc/haproxy/haproxy.cfg
global
 log /dev/log local0
 log /dev/log local1 notice
 user haproxy
 group haproxy
 daemon

defaults
 mode http
 option forwardfor
 option http-server-close

frontend test
 bind 192.168.10.10:80
 default_backend test

backend test
 balance roundrobin
 server server1 192.168.20.11:8080 weight 10 check fall 5
 server server2 192.168.20.12:8080 weight 10 check fall 5
  • configure kernel parameter non local bind so we can start haproxy on both servers even if the server don’t own the vip
vi /etc/sysctl.conf
...
net.ipv4.ip_nonlocal_bind=1
  • reload sysctl.conf file
sysctl -p

Configure pacemaker resources

  • run the following steps on one server
  • configure vip resource
crm configure primitive test-ip ocf:heartbeat:IPaddr2 params ip=192.168.10.10 cidr_netmask=24 op monitor interval=30s

Here we configure a vip that in case of a problem with one server will failover to the other server.

  • configure haproxy resource
crm configure primitive haproxy ocf:heartbeat:haproxy op monitor interval=15s

we configure pacemaker to start and monitor haproxy every 15s, but we want to start haproxy on both servers so we will create a clone resource

  • clone haproxy resource
crm configure clone haproxy-clone haproxy

we create clone resource named haproxy-clone by cloning our haproxy resource. This configuration tell pacemaker to start haproxy on both servers at the same time.
now we need to make sure that the vip resource  is running where haproxy is healthy/running

  • create colocation resource
crm configure colocation test-ip-haproxy inf: test-ip haproxy-clone

This configuration tell pacemaker to run the test-ip resource where the haproxy is running so if we have a problem with haproxy on one server and pacemaker can’t restart haproxy automatically then pacemaker will make sure that test-ip will run on the server with the healthy haproxy by migrating the test-ip resource to the right server.

for more informatrion about pacemaker

Create mysql replication

Create mysql replication is a simple procedure that usually can be done with the following steps:

  1. enable bin-log on your master
    /etc/my.cnf
    [mysqld]
    # Replication
    server-id = 1
    relay-log = mysql-relay-bin
    log-bin=mysql-bin
  2. create replication user
    mysql
    mysql> CREATE USER 'repl'@'%.mydomain.com' IDENTIFIED BY 'slavepass';
    mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%.mydomain.com';
    
  3. lock your database and write master position
    mysql> FLUSH TABLES WITH READ LOCK;
    mysql> SHOW MASTER STATUS;
  4. take mysql dump of the database
    mysqldump --all-databases --master-data > fulldb.dump
  5. unlock the database
    mysql> UNLOCK TABLES;
  6. prepare mysql slave server
    /etc/my.cnf
    [mysqld]
    server-id=2
    relay-log = mysql-relay-bin
    log-bin=mysql-bin
  7. restore mysql data
    mysql < fulldb.dump
  8. start replication on the slave server with the change master command
    mysql> CHANGE MASTER TO
        ->     MASTER_HOST='master_host_name',
        ->     MASTER_USER='replication_user_name',
        ->     MASTER_PASSWORD='replication_password',
        ->     MASTER_LOG_FILE='recorded_log_file_name',
        ->     MASTER_LOG_POS=recorded_log_position;
    
    mysql> START SLAVE;

but if you have very big database let say 1TB and you can’t except downtime?

If you prepare right you storage or you are using cloud services then you can lock the database for a few seconds take a snapshot and then copy the data from the snapshot.

if you didn’t prepare right mysql storage then you need to use the right flags in mysqldump command.

These are the flags that I used (relevant for transactional DB like InnoDB):

mysqldump --all-databases --master-data=2 --single-transaction --quick | gzip > outputfile.sql.gz
--all-databases - Used to backup all the databases in mysql server
--master-data=2 - Writes binary log name and position in mysql remark to the dump file
--single-trasaction - This is an important flag that send start trasaction to the mysql server and dump the consistent state of the database at the time when start transaction started. this flag let you use the database while the dump is running. The flag is usefull only for transactional tables like InnoDB.
--quick - Used for large tables to retrieve rows from a table one raw at a time instead of retrieving the entire row set and buffer it in memory before writing it.

To me the dump took about a day and then I restore it with the following command:

gunzip -c outputfile.sql.gz | mysql

The restore took me much longer, it was about 4-5 days. If you have other methods to make the dump or restore faster please let me know.

After the restore we need to run the change master command so we need to grub it from the dump file:

zcat all_db.sql.gz | head -n 200 | grep "CHANGE MASTER"

mysql
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.xxxx', MASTER_LOG_POS=1111133333;
mysql> start slave;

To check the slave status use the following command:

mysql> SHOW SLAVE STATUS\G;

check that Slave_IO and Slave_SQL are running and wait for the Seconds_Behind_Master to decrease to 0 (to me it took ~4 days).

On the new slave server that I created I installed LVM with enough free space for snapshots so next time I can do the following:

  1. lock mysql databases
  2. flush the tables
  3. get master binary log file and position
  4. create LVM snapshots
  5. unlock mysql databases
  6. rsync the data to another server

These steps should take much less time then mysqldump and restore.

During this work I got help from the following links:

  1. mysql docs – http://dev.mysql.com/doc/refman/5.7/en/replication-howto.html
  2. mysql docs – https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_quick
  3. server fault – http://serverfault.com/questions/220322/how-to-setup-mysql-replication-with-minimal-downtime

 

sysctl network tuning template

Sometimes you need your linux server to work in a very high performance network environment, so I created a template to start from that contain sysctl variables that can be tuned.

I used this template in my couchbase cluster and I got these values from the following url:
http://www.couchbase.com/connect/agenda/tuning-couchbase-server-os-network-maximum-performance/

sysctl.conf:

# http://www.couchbase.com/connect/agenda/tuning-couchbase-server-os-network-maximum-performance/
#net.core.somaxconn=
#net.ipv4.tcp_max_syn_backlog=
#net.ipv4.tcp_fin_backlog=
net.core.rmem_max=134217728
net.core.wmem_max=134217728
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864
#net.ipv4.tcp_sack
#net.ipv4.tcp_fack
#net.ipv4.tcp_fin_timeout
#net.ipv4.tcp_tw_reuse
#net.ipv4.tcp_keepalive_intvl
#net.ipv4.tcp_moderate_rcvbuf
#net.ipv4.tcp_window_scaling

# Max listen queue backlog
# make sure to increase nginx backlog as well if changed
net.core.somaxconn = 16384
# Max number of packets that can be queued on interface input
# If kernel is receiving packets faster than can be processed
# this queue increases
net.core.netdev_max_backlog = 16384
# Only retry creating TCP connections twice
# Minimize the time it takes for a connection attempt to fail
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2
# Timeout closing of TCP connections after 7 seconds
# net.ipv4.tcp_fin_timeout = 7
# Avoid falling back to slow start after a connection goes idle
# keeps our cwnd large with the keep alive connections
net.ipv4.tcp_slow_start_after_idle = 0

PFSense stable site to site configuration

I know it’s not the most secure configuration but its stable and works great for my use case.

  1. Enable make-before-break in ipsec advanced settings
    VPN -> IPSec -> Advanced settings
    Check "Initiate IKEv2 reauthentication with a make-before-break"
  2. Phase 1 configuration:
    Mode: Main
    My Identifier: IP Address
    Encryption: 3DES
    Hash: SHA1
    DH Group: 1
    Lifetime: 86400
    Auth: PSK
  3. Phase 2
    Protocol: ESP
    Encryption: 3DES (others unchecked)
    Hash: SHA1 (MD5 unchecked)
    PFS: off
    Lifetime: 86400
  4. Create this configuration on both pfsense servers

I used the following topic to create this configuration:
https://forum.pfsense.org/index.php?topic=21515.5