Bootup Process in Ubuntu

Bootup Process in Ubuntu

Scope of this page:

  • give a general overview over the booting process (from BIOS to kernel booted and root mounted)
  • demystify some of the black magic in the dozens of installation howtos
  • give clues for some advanced booting schemes
  • link to more specific instructions and howtos

 

Boot up phases

There are 4 phases to starting up the system:

  1. BIOS
  2. Boot loader
  3. Kernel
  4. Upstart (which manages system tasks and services )

Some core boot tasks started by upstart are

  1. Plymouth – The graphical boot animation and logger
  2. mountall – Mounts all filesystems defined on /etc/fstab
  3. network* – Network related services
  4. Display Manager (GDM,KDM,XDM,…)

(upstar tasks/services are configured at /etc/init)

 

BIOS Phase

When the computer begins execution, it starts by executing this code, which is also called the firmware, as it is normally stored in a permanent form of memory, such as ROM, on the computer’s motherboard. On a Macintosh computer this is the OpenFirmware.

This code must initialize the hardware other than the CPU, and obtain the code for the next step, the boot loader. Modern computers provide several possibilities for the boot loader and the choice is normally set on the BIOS startup screen.

 

Boot Loader Phase

There are several possible types of boot loaders and ways for the BIOS to obtain them.

A. A boot loader stored in the first sector of a hard disk, the Master Boot Record, or MBR. This may be GRUB or LILO or yaboot or others.

B. A boot loader stored on some other storage device, such as a CDR or USB flash drive.

C. A boot loader which uses the network, such as the Pre-Execution Environment (PXE). This code is normally stored on a ROM on the networking card itself.

The need for the initial parts of the bootloader code on the first part of a storage medium explains why some hard drives are ‘bootable’ and others are not.

The job of the boot loader is to begin the next phase, loading the kernel and an initial ram disk filesystem.

 

Kernel Phase

The kernel is the core code of the operating system, providing access to hardware and other services. The bootloader starts the kernel running. To keep kernels to a reasonable size and permit separate modules for separate hardware, modern kernels also use a file system which is present in memory, called an ‘initrd’ for ‘initial ram disk’.

Both the kernel file to load and the initial ram disk are normally specified as options to the boot loader.

The kernel launches the init script inside the initrd file system, which loads hardware drivers and finds the root partition.

 

System startup

After the kernel is running, the remainder of the operating system is brought online.

First the root partition and filesystem is located, checked and mounted. Next the init process is started, which runs the initialization scripts. These scripts involve different /etc/rc scripts and upstart events that eventually gives you a ready-to-use computer with a login screen.

 

Booting components

 

MBR (IBM-compatible PCs)

The master boot record is the first sector on a disk and contains in general a partition table for the disk and a simple boot loader. This simple boot loader will in most cases just look for an active partition on the same disk and jump to the boot sector on that partition. The boot sector will contain the real boot loader.

 

GRUB Boot loader

Because the GRUB boot loader provides menus of choices and can handle many different forms of hardware, it is larger than the code which can fit in a single MBR. It has 3 stages: stage 1 in the MBR, stage 1.5 in the remainder of the first cylinder of the disk, and stage 2 within in file on the disk.

Grub will find the /boot/grub/menu.lst which configures its interactive menu. The location of the menu.lst, as well as stage1.5 and stage2 files, is hard-coded into grub when it is installed to the boot sector. Grub locates and loads the kernel and the initrd, using BIOS calls and its build-in recognition of file systems (thanks to the different available stage1.5 parts). And finally boots the kernel.

In some cases, the operating system is split over several partitions (like /usr), and these partitions are mounted by the boot scripts as soon as they can be.

 

Conditions for success

  • First, the BIOS has to find the boot loader and this depends on your hardware’s capabilities.
  • Second, the boot loader has to find the kernel and initrd. It will likely use BIOS calls, so this again depends on your BIOS.
  • Finally, the kernel will boot and must, with the help of the initrd, find the root partition

 

Finding the root partition

The root partition with the operating system can be somewhere completely different than the kernel, for instance on another drive or on a remote computer.In some cases the kernel may not find the root partition on the disk, because the initrd is missing the modules to access the partition. If this your case, rebuild your initrd to include the missing modules (see man mkinitramfs and man update-initramfs).

Setting Up Networking in Ubuntu server using Command Line Interface.

Network Configuration

Ubuntu ships with a number of graphical utilities to configure your network devices. This document is geared toward server administrators and will focus on managing your network on the command line.

  • Ethernet Interfaces
  • IP Addressing
  • Name Resolution
  • Bridging
  • Resources

Ethernet Interfaces

Ethernet interfaces are identified by the system using the naming convention of ethX, where X represents a numeric value. The first Ethernet interface is typically identified as eth0, the second as eth1, and all others should move up in numerical order.

Identify Ethernet Interfaces

To quickly identify all available Ethernet interfaces, you can use the ifconfig command as shown below.

ifconfig -a | grep eth
eth0      Link encap:Ethernet  HWaddr 00:15:c5:4a:16:5a

Another application that can help identify all network interfaces available to your system is the lshw command. In the example below, lshwshows a single Ethernet interface with the logical name of eth0 along with bus information, driver details and all supported capabilities.

sudo lshw -class network
  *-network
       description: Ethernet interface
       product: BCM4401-B0 100Base-TX
       vendor: Broadcom Corporation
       physical id: 0
       bus info: pci@0000:03:00.0
       logical name: eth0
       version: 02
       serial: 00:15:c5:4a:16:5a
       size: 10MB/s
       capacity: 100MB/s
       width: 32 bits
       clock: 33MHz
       capabilities: (snipped for brevity)
       configuration: (snipped for brevity)
       resources: irq:17 memory:ef9fe000-ef9fffff

Ethernet Interface Logical Names

Interface logical names are configured in the file /etc/udev/rules.d/70-persistent-net.rules. If you would like control which interface receives a particular logical name, find the line matching the interfaces physical MAC address and modify the value of NAME=ethX to the desired logical name. Reboot the system to commit your changes.

Ethernet Interface Settings

ethtool is a program that displays and changes Ethernet card settings such as auto-negotiation, port speed, duplex mode, and Wake-on-LAN. It is not installed by default, but is available for installation in the repositories.

sudo apt-get install ethtool

The following is an example of how to view supported features and configured settings of an Ethernet interface.

sudo ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full 
                                100baseT/Half 100baseT/Full 
                                1000baseT/Half 1000baseT/Full 
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: d
        Current message level: 0x000000ff (255)
        Link detected: yes

Changes made with the ethtool command are temporary and will be lost after a reboot. If you would like to retain settings, simply add the desired ethtool command to a pre-up statement in the interface configuration file /etc/network/interfaces.

The following is an example of how the interface identified as eth0 could be permanently configured with a port speed of 1000Mb/s running in full duplex mode.

auto eth0
iface eth0 inet static
pre-up /sbin/ethtool -s eth0 speed 1000 duplex full

Although the example above shows the interface configured to use the static method, it actually works with other methods as well, such as DHCP. The example is meant to demonstrate only proper placement of the pre-up statement in relation to the rest of the interface configuration.

IP Addressing

The following section describes the process of configuring your systems IP address and default gateway needed for communicating on a local area network and the Internet.

Temporary IP Address Assignment

For temporary network configurations, you can use standard commands such as ip, ifconfig and route, which are also found on most other GNU/Linux operating systems. These commands allow you to configure settings which take effect immediately, however they are not persistent and will be lost after a reboot.

To temporarily configure an IP address, you can use the ifconfig command in the following manner. Just modify the IP address and subnet mask to match your network requirements.

sudo ifconfig eth0 10.0.0.100 netmask 255.255.255.0

To verify the IP address configuration of eth0, you can use the ifconfig command in the following manner.

ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:15:c5:4a:16:5a  
          inet addr:10.0.0.100  Bcast:10.0.0.255  Mask:255.255.255.0
          inet6 addr: fe80::215:c5ff:fe4a:165a/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:466475604 errors:0 dropped:0 overruns:0 frame:0
          TX packets:403172654 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2574778386 (2.5 GB)  TX bytes:1618367329 (1.6 GB)
          Interrupt:16

To configure a default gateway, you can use the route command in the following manner. Modify the default gateway address to match your network requirements.

sudo route add default gw 10.0.0.1 eth0

To verify your default gateway configuration, you can use the route command in the following manner.

route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.0.0        0.0.0.0         255.255.255.0   U     1      0        0 eth0
0.0.0.0         10.0.0.1        0.0.0.0         UG    0      0        0 eth0

If you require DNS for your temporary network configuration, you can add DNS server IP addresses in the file /etc/resolv.conf. The example below shows how to enter two DNS servers to /etc/resolv.conf, which should be changed to servers appropriate for your network. A more lengthy description of DNS client configuration is in a following section.

nameserver 8.8.8.8
nameserver 8.8.4.4

If you no longer need this configuration and wish to purge all IP configuration from an interface, you can use the ip command with the flush option as shown below.

ip addr flush eth0

Flushing the IP configuration using the ip command does not clear the contents of /etc/resolv.conf. You must remove or modify those entries manually.

Dynamic IP Address Assignment (DHCP Client)

To configure your server to use DHCP for dynamic address assignment, add the dhcp method to the inet address family statement for the appropriate interface in the file /etc/network/interfaces. The example below assumes you are configuring your first Ethernet interface identified as eth0.

auto eth0
iface eth0 inet dhcp

By adding an interface configuration as shown above, you can manually enable the interface through the ifup command which initiates the DHCP process via dhclient.

sudo ifup eth0

To manually disable the interface, you can use the ifdown command, which in turn will initiate the DHCP release process and shut down the interface.

sudo ifdown eth0

Static IP Address Assignment

To configure your system to use a static IP address assignment, add the static method to the inet address family statement for the appropriate interface in the file /etc/network/interfaces. The example below assumes you are configuring your first Ethernet interface identified aseth0. Change the address, netmask, and gateway values to meet the requirements of your network.

auto eth0
iface eth0 inet static
address 10.0.0.100
netmask 255.255.255.0
gateway 10.0.0.1

By adding an interface configuration as shown above, you can manually enable the interface through the ifup command.

sudo ifup eth0

To manually disable the interface, you can use the ifdown command.

sudo ifdown eth0

Loopback Interface

The loopback interface is identified by the system as lo and has a default IP address of 127.0.0.1. It can be viewed using the ifconfig command.

ifconfig lo
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2718 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2718 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:183308 (183.3 KB)  TX bytes:183308 (183.3 KB)

By default, there should be two lines in /etc/network/interfaces responsible for automatically configuring your loopback interface. It is recommended that you keep the default settings unless you have a specific purpose for changing them. An example of the two default lines are shown below.

auto lo
iface lo inet loopback

Name Resolution

Name resolution as it relates to IP networking is the process of mapping IP addresses to hostnames, making it easier to identify resources on a network. The following section will explain how to properly configure your system for name resolution using DNS and static hostname records.

DNS Client Configuration

Traditionally, the file /etc/resolv.conf was a static configuration file that rarely needed to be changed or automatically changed via DCHP client hooks. Nowadays, a computer can switch from one network to another quite often and the resolvconf framework is now being used to track these changes and update the resolver’s configuration automatically. It acts as an intermediary between programs that supply nameserver information and applications that need nameserver information. Resolvconf gets populated with information by a set of hook scripts related to network interface configuration. The most notable difference for the user is that any change manually done to /etc/resolv.conf will be lost as it gets overwritten each time something triggers resolvconf. Instead, resolvconf uses DHCP client hooks, and /etc/network/interfaces to generate a list of nameservers and domains to put in /etc/resolv.conf, which is now a symlink:

/etc/resolv.conf -> ../run/resolvconf/resolv.conf

To configure the resolver, add the IP addresses of the nameservers that are appropriate for your network in the file/etc/network/interfaces. You can also add an optional DNS suffix search-lists to match your network domain names. For each other valid resolv.conf configuration option, you can include, in the stanza, one line beginning with that option name with a dns- prefix. The resulting file might look like the following:

iface eth0 inet static
    address 192.168.3.3
    netmask 255.255.255.0
    gateway 192.168.3.1
    dns-search example.com
    dns-nameservers 192.168.3.45 192.168.8.10

The search option can also be used with multiple domain names so that DNS queries will be appended in the order in which they are entered. For example, your network may have multiple sub-domains to search; a parent domain of example.com, and two sub-domains,sales.example.com and dev.example.com.

If you have multiple domains you wish to search, your configuration might look like the following:

iface eth0 inet static
    address 192.168.3.3
    netmask 255.255.255.0
    gateway 192.168.3.1
    dns-search example.com sales.example.com dev.example.com
    dns-nameservers 192.168.3.45 192.168.8.10

If you try to ping a host with the name of server1, your system will automatically query DNS for its Fully Qualified Domain Name (FQDN) in the following order:

  1. server1.example.com
  2. server1.sales.example.com
  3. server1.dev.example.com

If no matches are found, the DNS server will provide a result of notfound and the DNS query will fail.

Static Hostnames

Static hostnames are locally defined hostname-to-IP mappings located in the file /etc/hosts. Entries in the hosts file will have precedence over DNS by default. This means that if your system tries to resolve a hostname and it matches an entry in /etc/hosts, it will not attempt to look up the record in DNS. In some configurations, especially when Internet access is not required, servers that communicate with a limited number of resources can be conveniently set to use static hostnames instead of DNS.

The following is an example of a hosts file where a number of local servers have been identified by simple hostnames, aliases and their equivalent Fully Qualified Domain Names (FQDN’s).

127.0.0.1	localhost
127.0.1.1	ubuntu-server
10.0.0.11	server1.example.com server1 vpn
10.0.0.12	server2.example.com server2 mail
10.0.0.13	server3.example.com server3 www
10.0.0.14	server4.example.com server4 file

In the above example, notice that each of the servers have been given aliases in addition to their proper names and FQDN’s. Server1has been mapped to the name vpn, server2 is referred to as mail, server3 as www, and server4 as file.

Name Service Switch Configuration

The order in which your system selects a method of resolving hostnames to IP addresses is controlled by the Name Service Switch (NSS) configuration file /etc/nsswitch.conf. As mentioned in the previous section, typically static hostnames defined in the systems/etc/hosts file have precedence over names resolved from DNS. The following is an example of the line responsible for this order of hostname lookups in the file /etc/nsswitch.conf.

hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns4
  • files first tries to resolve static hostnames located in /etc/hosts.
  • mdns4_minimal attempts to resolve the name using Multicast DNS.
  • [NOTFOUND=return] means that any response of notfound by the preceding mdns4_minimal process should be treated as authoritative and that the system should not try to continue hunting for an answer.
  • dns represents a legacy unicast DNS query.
  • mdns4 represents a Multicast DNS query.

To modify the order of the above mentioned name resolution methods, you can simply change the hosts: string to the value of your choosing. For example, if you prefer to use legacy Unicast DNS versus Multicast DNS, you can change the string in /etc/nsswitch.conf as shown below.

hosts:          files dns [NOTFOUND=return] mdns4_minimal mdns4

Bridging

Bridging multiple interfaces is a more advanced configuration, but is very useful in multiple scenarios. One scenario is setting up a bridge with multiple network interfaces, then using a firewall to filter traffic between two network segments. Another scenario is using bridge on a system with one interface to allow virtual machines direct access to the outside network. The following example covers the latter scenario.

Before configuring a bridge you will need to install the bridge-utils package. To install the package, in a terminal enter:

sudo apt-get install bridge-utils

Next, configure the bridge by editing /etc/network/interfaces:

auto lo
iface lo inet loopback

auto br0
iface br0 inet static
        address 192.168.0.10
        network 192.168.0.0
        netmask 255.255.255.0
        broadcast 192.168.0.255
        gateway 192.168.0.1
        bridge_ports eth0
        bridge_fd 9
        bridge_hello 2
        bridge_maxage 12
        bridge_stp off

Enter the appropriate values for your physical interface and network.

Now restart networking to enable the bridge interface:

sudo /etc/init.d/networking restart

The new bridge interface should now be up and running. The brctl provides useful information about the state of the bridge, controls which interfaces are part of the bridge, etc. See man brctl for more information.

Debug Openstack code Local / Remote with Eclipse and PyDev Plug-In

This article is a result of my exhaustive search for finding a concrete way to debug Openstack. After referring several places, I have come up with a manual of my own on how to setup eclipse environment to debug and understand openstack code flow. It should be a good read if you have similar questions as posted below in your mind.

  • Is it possible to debug openstack code end-to-end?
  • Should I debug locally (everything configure inside eclipse)?
  • How to debug remotely running openstack services?Or, combination of above two?
  • What developer tools/IDEs to use for debugging? (eclipse +pydev, pdb, winpdb, pycharm)?
  • What’s the best/easiest/more sophisticated method, to get set everything quickly?

And there’s bunch of other questions, followed by multiple alternatives to chose from.

Here in this post, I have tried debugging using Eclipse with pydev plug-in.

Development Environment:

Linux Distro: centos/ ubuntu, (I used VM workstation)
Install eclipse as per os type 32/64 bit on one of the VM: http://www.eclipse.org/
Configure python Interpreter in eclipse.
Install git plugin (only for local debug): http://download.eclipse.org/egit/updates, add this in Help-> install new software

How to  Debug Openstack Services Locally?

To begin with, you can try with keystone in eclipse.

Also, setup environment variables under debug configuration for keystone service to pick up.
OS_USERNAME,
OS_PASSWORD,
OS_TENANT_NAME,
OS_REGION_NAME,
OS_AUTH_URL
Optionally, Setup keystone.conf file as argument under debug configuration dialog.
For example, to test setup, put a break-point at:
File: keystone/identity/core.py
Method: def get_all_tenants(self, context, **kw):
Now, execute keystone-all (debug as-> python run) from eclipse

As you have already install keystoneclient by following above link, from terminal execute:

$keystone tenant-list

(check db is running, iptables service not blocking port – just in case if get 500 error with tenant-list)
This should hit break-point in keystone service running in eclipse and ask to move to debug perspective.

Voila, You have just got setup for local debugging.

Remote Debugging: 

Development Environment: 

In this case, I have used two VMs, one is centos and other is ubuntu 12.04.
Ubuntu VM- running  eclipse IDE with pydev plug-in.
Centos VM –  openstack services running.
Configure python Interpreter in eclipse.

Configure pydev debug server in eclipse.

To Remote debug, following link has most of the answers:

http://pydev.org/manual_adv_remote_debugger.html

Now, copy /pysrc directory from ubuntu vm to centos vm.

/pysrc – will be found in eclipse installation plugins/org.python.pydev_<version>/pysrc

On centos (Remote machine), preferred place to copy under python site-package.

Ex: /usr/lib/python2.6/site-packages/pysrc/

Example-1: Remote debug keystone

Run the debug server in eclipse, note the debug server port.
File: keystone/keystone/identity/core.py

Function: def get_all_tenants(self, context, **kw):  # gives tenant-list

Under this function add line:

import pydevd;pydevd.settrace(<IP addr of eclipse vm>, port=8765, stdoutToServer=True, stderrToServer=True,suspend=True)

Next,File: /keystone/bin/keystone-all
To add pysrc to PYTHONPATH:  add following line after “import sys” line

sys.path.append(‘/usr/lib/python2.6/site-packages/pysrc/’)

eventlet.patcher.monkey_patch(all=False, socket=True, time=True, thread=monkeypatch_thread)
Comment out above this line, and add following line:

eventlet.patcher.monkey_patch(all=False, socket=True, time=True, thread=False)

This most important for debugging, otherwise you will received “ThreadSuspended” error in eclipse.

As, the debug server listen to single thread, above line will take away green threading of thread module.

Restart keystone service
$service keystone restart

$keystone tenant-list

On eclipse, switch to debug perspective.

You should be able to hit break-point in core.py file, and step through further debug execution.

Eclipse_debug_server2

Example-2: Debugging keystone(get auth-token) + nova-api

$nova flavor-list                #will debug this cli

File: /keystone/keystone/service.py
Class: class TokenController
Method:def authenticate(self, context, auth=None):

Add following line:
import pydevd;pydevd.settrace(‘<IP addr of eclipse vm>’, port=8765, stdoutToServer=True, stderrToServer=True,suspend=True)

Next,File: /keystone/bin/keystone-all
To add pysrc to PYTHONPATH:  add following line after “import sys” line
sys.path.append(‘/usr/lib/python2.6/site-packages/pysrc/’)
eventlet.patcher.monkey_patch(all=False, socket=True, time=True, thread=monkeypatch_thread)
Comment this line, and add following line:

eventlet.patcher.monkey_patch(all=False, socket=True, time=True, thread=False)

This most important for debugging, otherwise you will received “ThreadSuspended” error in eclipse.

As, the debug server listen to single thread, above line will take away green threading of thread module.

Restart keystone service

Next, File: nova/nova/api/openstack/compute/flavors.py
Class: class Controller(wsgi.Controller):
Method:
@wsgi.serializers(xml=FlavorsTemplate)

def detail(self, req):

Add following line under this function:

import pydevd;pydevd.settrace((<IP addr of eclipse vm>, port=8765, stdoutToServer=True, stderrToServer=True,suspend=True)

File: nova/bin/nova-api
Add line after “import sys”

sys.path.append(‘/usr/lib/python2.6/site-packages/pysrc/’)

eventlet.monkey_patch(os=False)
comment above line, and change to:

eventlet.monkey_patch(all=False,socket=True,time=True,os=False,thread=False)

$service keystone restart
$service nova-api restart
$ nova flavor-list

In eclipse, this should hit break-point in service.py for keystone

Eclipse_debug_server3

After keystone token generated, control move to flavors.py

Eclipse_debug_server4

Things observed:

Path resolution:

If python paths and/or openstack code paths different on both VMs, eclipse will not be able to locate correct file to open, and respond with open file dialog, just cancel the dialog and file from remote machine get displayed. This file will get store into prsrc/temporary_file directory.
To avoid this, on server running openstack service, go to pysrc directory, and modify the file, pydevd_file_utils.py.

More info on this: http://pydev.org/manual_adv_remote_debugger.html

The whole idea for this blog post is to try out alternatives to debug openstack code.
I have taken simplest possible examples in very short time, to demonstrate, it works!!

Git – Bare Minimum you need to know.

Git – A Quick Start

The other Post explains some of the basic concepts of Git and SCM. This post is primarily aimed at explaining to you some of the git commands that you would use most often.

Git Lingo

Repositories

Git Repositories/repos are basically referred to a complete bundle of Code+metadata (history of commits and logs). Advantages of git over other SCMs like SVN is that it is a Distributed Version Control System . This in simple terms mean that the metadata would reside with the code and would get copied where-ever you clone the code and hence in a synced condition, the system that has cloned the repository would have the same amount of meta-data and info which the the server from which the cloned from had, When you clone a repository, the complete history/metadata gets copied along with the code, so every working copy has the complete information from Commit 1 to the latest commit into the repo.

Branches

Branches allow parallel development of code for two different threads in the same repository. There could be several Use-Cases requiring this like, if you have a current stable-release branch on which you are fixing bugs and and it out for consumers and could have a separate branch for future development .  Commits onto any particular branch would affect the code onto only that branch and keep all the rest of the banches untouched. By-default, there is a master branch, which gets created initially. You may fork out a branch out of any given branch give it a name, continue your development, and finally merge the branch back to the branch from which you had initially forked the branch.

One good tool to view all the branching in a repo graphically is by using a utility called gitk. gitk would graphically represent the branch layout and point your location in the tree.

Tags

Tags can be compared to some specific milestone in  the commit-history of the code. Stable-releases, release-versions can be be used as tags. Basically, a tag would associate one specific commit-id with some logical-tag-name. Developers can identity any  milestone-release, by listing the tags in the git repositories and can get exact version of code by checking out that particular tag/commit=id.

Commit

Committing a code is analogous to saving any file to your local system after making some changes to it. Each time you execute a git commit command. All the delta-changes from the previous commit are calculated and added to the git history and commit-id is generated in return. Also, it is a good practice to put a message in each of the commits explaining the logic behind you  making all the given changes to the file/files

GitHub

GitHub is a web-based hosting service for software development projects that use the Git revision control system. GitHub offers both paid plans for private repositories, and free accounts for open source projects. As of May 2011, GitHub was the most popular open source code repository site. To avoid any confusions, github is a site offering service based on Git.

Basic Commands

===================

Git init

What is does ?

Creates an empty Git repository or reinitialize an existing one.

How  it does ?

This command creates an empty Git repository – basically a .git directory with subdirectories forobjectsrefs/headsrefs/tags, and template files. An initial HEAD file that references the HEAD of the master branch is also created.

       If the $GIT_DIR environment variable is set then it specifies a path to use instead of ./.git for the base of the repository.

       If the object storage directory is specified via the $GIT_OBJECT_DIRECTORY environment variable then the sha1 directories are created underneath – otherwise the default $GIT_DIR/objects directory is used.

        Running git init in an existing repository is safe. It will not overwrite things that are already there. The primary reason for rerunning git init is to pick up newly added templates (or to move the repository to another place if –separate-git-dir is given).

How do I use it?

Start a new Git repository for an existing code base

$ cd /path/to/my/codebase $ git init <1> $ git add . <2>
So two steps would be as follows :
  1. prepare /path/to/my/codebase/.git directory
  2. add all existing file to the index

To learn about all the options in details, you can run

 $ man git init

Git Clone

What is does ?

Clone a repository into a new directory.

How  it does ?

Clones a repository into a newly created directory, creates remote-tracking branches for each branch in the cloned repository (visible using git branch -r), and creates and checks out an initial branch that is forked from the cloned repository’s currently active branch.

After the clone, a plain git fetch without arguments will update all the remote-tracking branches, and a git pull without arguments will in addition merge the remote master branch into the current master branch, if any (this is untrue when “–single-branch” is given; see below).

This default configuration is achieved by creating references to the remote branch heads underrefs/remotes/origin and by initializing remote.origin.url and remote.origin.fetchconfiguration variables.

How do I use it?

Clone from upstream:

$ git clone git://git.kernel.org/pub/scm/.../linux-2.6 my2.6
$ cd my2.6
$ make
Make a local clone that borrows from the current directory, without checking things out:
$ git clone -l -s -n . ../copy
$ cd ../copy
$ git show-branch

Clone from upstream while borrowing from an existing local directory:

$ git clone --reference my2.6 \
	git://git.kernel.org/pub/scm/.../linux-2.7 \
	my2.7
$ cd my2.7

Create a bare repository to publish your changes to the public:
$ git clone --bare -l /home/proj/.git /pub/scm/proj.git

Create a repository on the kernel.org machine that borrows from Linus:
$ git clone --bare -l -s /pub/scm/.../torvalds/linux-2.6.git \
    /pub/scm/.../me/subsys-2.6.git

To learn about all the options in details, you can run

 $ man git clone

Git Pull

What is does ?

Fetches from and merges with another repository or a local branch

How  it does ?

Incorporates changes from a remote repository into the current branch. In its default mode, git pull is shorthand for git fetch followed by git merge FETCH_HEAD.

More precisely, git pull runs git fetch with the given parameters and calls git merge to merge the retrieved branch heads into the current branch. With --rebase, it runs git rebase instead of git merge.

<repository> should be the name of a remote repository as passed to git-fetch(1). <refspec> can name an arbitrary remote ref (for example, the name of a tag) or even a collection of refs with corresponding remote-tracking branches (e.g., refs/heads/*:refs/remotes/origin/*), but usually it is the name of a branch in the remote repository.

Default values for <repository> and <branch> are read from the “remote” and “merge” configuration for the current branch as set by git-branch(1) --track.

Assume the following history exists and the current branch is “master“:

	  A---B---C master on origin
	 /
    D---E---F---G master

Then “git pull” will fetch and replay the changes from the remote master branch since it diverged from the local master (i.e., E) until its current commit (C) on top of master and record the result in a new commit along with the names of the two parent commits and a log message from the user describing the changes.

	  A---B---C remotes/origin/master
	 /         \
    D---E---F---G---H master

See git-merge(1) for details, including how conflicts are presented and handled.

In Git 1.7.0 or later, to cancel a conflicting merge, use git reset --mergeWarning: In older versions of Git, running git pull with uncommitted changes is discouraged: while possible, it leaves you in a state that may be hard to back out of in the case of a conflict.

If any of the remote changes overlap with local uncommitted changes, the merge will be automatically cancelled and the work tree untouched. It is generally best to get any local changes in working order before pulling or stash them away with git-stash(1).
How do I use it?

  • Update the remote-tracking branches for the repository you cloned from, then merge one of them into your current branch:
    $ git pull, git pull origin 

    Normally the branch merged in is the HEAD of the remote repository, but the choice is determined by the branch.<name>.remote and branch.<name>.merge options; see git-config(1) for details. 

  • Merge into the current branch the remote branch next:
    $ git pull origin next 

    This leaves a copy of next temporarily in FETCH_HEAD, but does not update any remote-tracking branches. Using remote-tracking branches, the same can be done by invoking fetch and merge:

    $ git fetch origin
    $ git merge origin/next 

If you tried a pull which resulted in complex conflicts and would want to start over, you can recover with git reset.

To learn about all the options in details, you can run

 $ man git pull 

Git Checkout

What is does ?

Checkout a branch or paths to the working tree

How  it does ?

Updates files in the working tree to match the version in the index or the specified tree. If no paths are given, git checkout will also update HEAD to set the specified branch as the current branch.

git checkout <branch>
To prepare for working on <branch>, switch to it by updating the index and the files in the working tree, and by pointing HEAD at the branch. Local modifications to the files in the working tree are kept, so that they can be committed to the <branch>.

If <branch> is not found but there does exist a tracking branch in exactly one remote (call it <remote>) with a matching name, treat as equivalent to

$ git checkout -b <branch> --track <remote>/<branch>

You could omit <branch>, in which case the command degenerates to “check out the current branch”, which is a glorified no-op with a rather expensive side-effects to show only the tracking information, if exists, for the current branch.

git checkout -b|-B <new_branch> [<start point>]
Specifying -b causes a new branch to be created as if git-branch(1) were called and then checked out. In this case you can use the --track or --no-track options, which will be passed to git branch. As a convenience, --track without -b implies branch creation; see the description of --track below.

If -B is given, <new_branch> is created if it doesn’t exist; otherwise, it is reset. This is the transactional equivalent of

$ git branch -f <branch> [<start point>]
$ git checkout <branch>

that is to say, the branch is not reset/created unless “git checkout” is successful.

git checkout –detach [<branch>]
git checkout <commit>
Prepare to work on top of <commit>, by detaching HEAD at it (see “DETACHED HEAD” section), and updating the index and the files in the working tree. Local modifications to the files in the working tree are kept, so that the resulting working tree will be the state recorded in the commit plus the local modifications.

Passing --detach forces this behavior in the case of a <branch> (without the option, giving a branch name to the command would check out the branch, instead of detaching HEAD at it), or the current commit, if no <branch> is specified.

git checkout [-p|–patch] [<tree-ish>] [–] <pathspec>…
When <paths> or --patch are given, git checkout does not switch branches. It updates the named paths in the working tree from the index file or from a named <tree-ish> (most often a commit). In this case, the -b and --track options are meaningless and giving either of them results in an error. The <tree-ish> argument can be used to specify a specific tree-ish (i.e. commit, tag or tree) to update the index for the given paths before updating the working tree.

The index may contain unmerged entries because of a previous failed merge. By default, if you try to check out such an entry from the index, the checkout operation will fail and nothing will be checked out. Using -f will ignore these unmerged entries. The contents from a specific side of the merge can be checked out of the index by using --ours or --theirs. With -m, changes made to the working tree file can be discarded to re-create the original conflicted merge result.

How do I use it?
The following sequence checks out the master branch, reverts the Makefile to two revisions back, deletes hello.c by mistake, and gets it back from the index.

$ git checkout master
$ git checkout master~2 Makefile
$ rm -f hello.c
$ git checkout hello.c
switch branch

take a file out of another commit

restore hello.c from the index

If you want to check out all C source files out of the index, you can say

$ git checkout — ‘*.c’
Note the quotes around *.c. The file hello.c will also be checked out, even though it is no longer in the working tree, because the file globbing is used to match entries in the index (not in the working tree by the shell).

If you have an unfortunate branch that is named hello.c, this step would be confused as an instruction to switch to that branch. You should instead write:

$ git checkout — hello.c
After working in the wrong branch, switching to the correct branch would be done using:

$ git checkout mytopic
However, your “wrong” branch and correct “mytopic” branch may differ in files that you have modified locally, in which case the above checkout would fail like this:

$ git checkout mytopic
error: You have local changes to ‘frotz’; not switching branches.
You can give the -m flag to the command, which would try a three-way merge:

$ git checkout -m mytopic
Auto-merging frotz
After this three-way merge, the local modifications are not registered in your index file, so git diff would show you what changes you made since the tip of the new branch.

When a merge conflict happens during switching branches with the -m option, you would see something like this:

$ git checkout -m mytopic
Auto-merging frotz
ERROR: Merge conflict in frotz
fatal: merge program failed
At this point, git diff shows the changes cleanly merged as in the previous example, as well as the changes in the conflicted files. Edit and resolve the conflict and mark it resolved with git add as usual:

$ edit frotz
$ git add frotz

To learn about all the options in details, you can run

$ man git checkout

Git Add

What is does ?

Adds file contents to the index

How  it does ?

This command updates the index using the current content found in the working tree, to prepare the content staged for the next commit. It typically adds the current content of existing paths as a whole, but with some options it can also be used to add content with only part of the changes made to the working tree files applied, or remove paths that do not exist in the working tree anymore.

The “index” holds a snapshot of the content of the working tree, and it is this snapshot that is taken as the contents of the next commit. Thus after making any changes to the working directory, and before running the commit command, you must use the add command to add any new or modified files to the index.

This command can be performed multiple times before a commit. It only adds the content of the specified file(s) at the time the add command is run; if you want subsequent changes included in the next commit, then you must run git add again to add the new content to the index.

The git status command can be used to obtain a summary of which files have changes that are staged for the next commit.

The git add command will not add ignored files by default. If any ignored files were explicitly specified on the command line, git add will fail with a list of ignored files. Ignored files reached by directory recursion or filename globbing performed by Git (quote your globs before the shell) will be silently ignored. The git add command can be used to add ignored files with the -f (force) option.

Please see git-commit(1) for alternative ways to add content to a commit.

How do I use it?

  • Adds content from all *.txt files under Documentation directory and its subdirectories:
    $ git add Documentation/\*.txt 

    Note that the asterisk * is quoted from the shell in this example; this lets the command include the files from subdirectories of Documentation/ directory. 

  • Considers adding content from all git-*.sh scripts:
    $ git add git-*.sh 

    Because this example lets the shell expand the asterisk (i.e. you are listing the files explicitly), it does not consider subdir/git-foo.sh.

    To learn about all the options in details, you can run

     $ man git pull

Git Commit

What is does ?

Records changes to the repository

How  it does ?

The content to be added can be specified in several ways:

  1. by using git add to incrementally “add” changes to the index before using the commit command (Note: even modified files must be “added”);
  2. by using git rm to remove files from the working tree and the index, again before using the commitcommand;
  3. by listing files as arguments to the commit command, in which case the commit will ignore changes staged in the index, and instead record the current content of the listed files (which must already be known to Git);
  4. by using the -a switch with the commit command to automatically “add” changes from all known files (i.e. all files that are already listed in the index) and to automatically “rm” files in the index that have been removed from the working tree, and then perform the actual commit;
  5. by using the –interactive or –patch switches with the commit command to decide one by one which files or hunks should be part of the commit, before finalizing the operation. See the “Interactive Mode” section of git-add(1) to learn how to operate these modes.

The --dry-run option can be used to obtain a summary of what is included by any of the above for the next commit by giving the same set of parameters (options and paths).

If you make a commit and then find a mistake immediately after that, you can recover from it with git reset.

How do I use it?

When recording your own work, the contents of modified files in your working tree are temporarily stored to a staging area called the “index” with git add. A file can be reverted back, only in the index but not in the working tree, to that of the last commit with git reset HEAD -- <file>, which effectively reverts git add and prevents the changes to this file from participating in the next commit. After building the state to be committed incrementally with these commands, git commit (without any pathname parameter) is used to record what has been staged so far. This is the most basic form of the command. An example:

$ edit hello.c
$ git rm goodbye.c
$ git add hello.c
$ git commit

Instead of staging files after each individual change, you can tell git commit to notice the changes to the files whose contents are tracked in your working tree and do corresponding git add and git rm for you. That is, this example does the same as the earlier example if there is no other change in your working tree:

$ edit hello.c
$ rm goodbye.c
$ git commit -a

The command git commit -a first looks at your working tree, notices that you have modified hello.c and removed goodbye.c, and performs necessary git add and git rm for you.

After staging changes to many files, you can alter the order the changes are recorded in, by giving pathnames to git commit. When pathnames are given, the command makes a commit that only records the changes made to the named paths:

$ edit hello.c hello.h
$ git add hello.c hello.h
$ edit Makefile
$ git commit Makefile

This makes a commit that records the modification to Makefile. The changes staged for hello.c andhello.h are not included in the resulting commit. However, their changes are not lost — they are still staged and merely held back. After the above sequence, if you do:

$ git commit

this second commit would record the changes to hello.c and hello.h as expected.

After a merge (initiated by git merge or git pull) stops because of conflicts, cleanly merged paths are already staged to be committed for you, and paths that conflicted are left in unmerged state. You would have to first check which paths are conflicting with git status and after fixing them manually in your working tree, you would stage the result as usual with git add:

$ git status | grep unmerged
unmerged: hello.c
$ edit hello.c
$ git add hello.c

After resolving conflicts and staging the result, git ls-files -u would stop mentioning the conflicted path. When you are done, run git commit to finally record the merge:

$ git commit

Git push

What is does ?

Updates remote refs along with associated objects 

How  it does ?

Updates remote refs using local refs, while sending objects necessary to complete the given refs.

You can make interesting things happen to a repository every time you push into it, by setting up hooksthere. See documentation for git-receive-pack(1).

When the command line does not specify where to push with the <repository> argument,branch.*.remote configuration for the current branch is consulted to determine where to push. If the configuration is missing, it defaults to origin.

When the command line does not specify what to push with <refspec>... arguments or --all--mirror--tags options, the command finds the default <refspec> by consulting remote.*.pushconfiguration, and if it is not found, honors push.default configuration to decide what to push (See git1for the meaning of push.default).

How do I use it?

git push

Works like git push <remote>, where <remote> is the current branch’s remote (or origin, if no remote is configured for the current branch).

git push origin

Without additional configuration, works like git push origin :.

The default behavior of this command when no <refspec> is given can be configured by setting thepush option of the remote, or the push.default configuration variable.

For example, to default to pushing only the current branch to origin use git config remote.origin.push HEAD. Any valid <refspec> (like the ones in the examples below) can be configured as the default for git push origin.

git push origin :

Push “matching” branches to origin. See <refspec> in the OPTIONS section above for a description of “matching” branches.

git push origin master

Find a ref that matches master in the source repository (most likely, it would findrefs/heads/master), and update the same ref (e.g. refs/heads/master) in origin repository with it. If master did not exist remotely, it would be created.

git push origin HEAD

A handy way to push the current branch to the same name on the remote.

git push mothership master:satellite/master dev:satellite/dev

Use the source ref that matches master (e.g. refs/heads/master) to update the ref that matchessatellite/master (most probably refs/remotes/satellite/master) in the mothershiprepository; do the same for dev and satellite/dev.

This is to emulate git fetch run on the mothership using git push that is run in the opposite direction in order to integrate the work done on satellite, and is often necessary when you can only make connection in one way (i.e. satellite can ssh into mothership but mothership cannot initiate connection to satellite because the latter is behind a firewall or does not run sshd).

After running this git push on the satellite machine, you would ssh into the mothership and rungit merge there to complete the emulation of git pull that were run on mothership to pull changes made on satellite.

git push origin HEAD:master

Push the current branch to the remote ref matching master in the origin repository. This form is convenient to push the current branch without thinking about its local name.

git push origin master:refs/heads/experimental

Create the branch experimental in the origin repository by copying the current master branch. This form is only needed to create a new branch or tag in the remote repository when the local name and the remote name are different; otherwise, the ref name on its own will work.

git push origin :experimental

Find a ref that matches experimental in the origin repository (e.g. refs/heads/experimental), and delete it.

Source Code Management — Git (Basic Concepts)

Author : Rahul Krishna Upadhyaya

Topic : Source Code Management

Date : 16/04/2013

Overview : What is source control management? Why do we need to manage the source code? Git as an example of Source Control Management. Some basic commands to get started with working with Git.

Source Control Management.

Source Control Management, as the name would suggest,  is the means by which the source-code for any project is managed. Some of the feature that every SCM-application should contain :

  • It should maintain a incremental history of each file . Every time you tell the SCM-tool that you want to save (commit) the code, it tracks the changes and saves those delta changes in its history.
  • Means to tracks the author of each particular change to every given file being monitored.
  • Provides means to several developers to collaborate and contribute,even while writing to a single file simultaneously, ensuring the integrity of the previously saved code and avoiding adding any new-code which conflicts with the previous version.

Where is the need Source Control Management ?

Q. Case where I am the single Author of the complete content : If I find that I added some code and everything stopped working, and I want to revert back to a state where I know everything was working fine.

A. Every single time you save (commit) your code a commit-id is generated for that. Also, you add a commit message stating some logical reasoning for the changes that you have added. This enables you to go to the history. Choose any commit-id and see all the additions and deletions done in that particular commit and the logical reasoning behind making those changes as the commit message. Here,using this, you can navigate to any particular stage of the  code back in time and check what changes were made or revert back to any stable code-base.

Q. Case where several Authors are writing into the same project and possibly in the same file too.

A. Any source-control management tool would make life really easy for collaboration between a group of developers. You write the changes and “push” this code to a remote central server. All the other developers can then “pull” the changes from this central remote server and update/synchronize their work-spaces with this changed code. If you are working on the same file and making changes to the same section, you will face “conflicts” . You either need to resolve them manually or set a policy to choose theirs or our version of code for updating your work-space with the latest code.

Git – The Most Powerful SCM Tool.

Git is a distributed revision control and source code management (SCM) system with an emphasis on speed.Initially designed and developed by Linus Torvalds for Linux kernel development, Git has since been adopted by many other projects.

Every Git working directory is a full-fledged repository with complete history and full revision tracking capabilities, not dependent on network access or a central server.

A Short History of Git

As with many great things in life, Git began with a bit of creative destruction and fiery controversy. The Linux kernel is an open source software project of fairly large scope. For most of the lifetime of the Linux kernel maintenance (1991–2002), changes to the software were passed around as patches and archived files. In 2002, the Linux kernel project began using a proprietary DVCS system called BitKeeper.

In 2005, the relationship between the community that developed the Linux kernel and the commercial company that developed BitKeeper broke down, and the tool’s free-of-charge status was revoked. This prompted the Linux development community (and in particular Linus Torvalds, the creator of Linux) to develop their own tool based on some of the lessons they learned while using BitKeeper. Some of the goals of the new system were as follows:

  • Speed
  • Simple design
  • Strong support for non-linear development (thousands of parallel branches)
  • Fully distributed
  • Able to handle large projects like the Linux kernel efficiently (speed and data size)

Since its birth in 2005, Git has evolved and matured to be easy to use and yet retain these initial qualities. It’s incredibly fast, it’s very efficient with large projects, and it has an incredible branching system for non-linear development

The Three States- In Local Operations

Now, pay attention. This is the main thing to remember about Git if you want the rest of your learning process to go smoothly. Git has three main states that your files can reside in: committed, modified, and staged. Committed means that the data is safely stored in your local database. Modified means that you have changed the file but have not committed it to your database yet. Staged means that you have marked a modified file in its current version to go into your next commit snapshot.

This leads us to the three main sections of a Git project: the Git directory, the working directory, and the staging area.

Local Operations in Git

Local Operations in Git

The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer.

The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify.

The staging area is a simple file, generally contained in your Git directory, that stores information about what will go into your next commit. It’s sometimes referred to as the index, but it’s becoming standard to refer to it as the staging area.

The basic Git workflow goes something like this:

  1. You modify files in your working directory.
  2. You stage the files, adding snapshots of them to your staging area.
  3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory.

If a particular version of a file is in the git directory, it’s considered committed. If it’s modified but has been added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified.

Git Interaction with Remote Servers

Once as explained above, we are done with the changes and commits in our local system and we want to add these changes to the Remote Git Server – So that the central repository containing the code gets updated and all the peers can synchronize their work-spaces with our changes.

We use git push <some added params> command to push those changes to the remote repositories. At the same time we should know which is the remote repository that we are pointing to : git remote -v  would list down the remote repositories that my current git repository knows of. If there are none , perhaps you need to add one before running a git push command.

A good Step by Step Reference

A Website which you could follow for a step by step comprehensive Learning of git.

Git Immersion

Working on github is also majorly same as working with git on a remote/local machine, only that it has a few more additional things that you need to know.

A good place to learn that would be TryGit Tutorial.

References :