Linux

Linux is the OS that runs most of the worlds servers, phones and IOT devices. Following is a detailed PDF that provides a great overview of Linux.

Linux subjects that you should be familiar with along with common questions and their answers.

The Basic Commands

You should have a basic understanding on how to use the following commands

  • cd
  • man and info
  • ls
  • mv
  • mkdir and mkdir -p
  • cp
  • touch
  • ssh and scp

Linux Concepts

  • Everything is a file: It is one of the core concepts and defining features of Unix and Linux systems. The idea is that a wide variety of input output resources (files, directories, disks, keyboards, virtual terminals, printers, modems, inter-process, and network communications) are all byte streams that are exposed through the file system namespace (see below for details). If it is not a “file” then it is a running process. The key concept is that since everything is a file the same set of utilities and APIs can be used on all of the aforementioned resources.
  • Virtual Consoles: A console is the keyboard and monitor that is connected to your computer. Since it is possible to have multiple concurrent users on a machine at a time, and to be remotely connected, Linux provides virtual consoles. A virtual console, or virtual terminal (VT) is an application that simulates a physical connection to the machine. Following is a deeper explanation and history.
  • Shell: The shell is a program that takes commands that you input and translates them into commands for the OS. BASH is the most common shell.

User Space vs Kernel Space

For primarily security reasons system memory is divided into two “buckets”, user space and kernel space. User space is where non kernel processes run, and kernel space is where the code, data, and kernel instructions are stored.

----------------------------------------
|          User Applications           |
|                                      |  <-- User Space
|               glibc                  |
------------------|---------------------
|                 v                    |
|          System Call Iface           |
|                                      |  <-- Kernel Space
|               Kernel                 |
|                                      |
|         Board Support Package        |
------------------|---------------------
                  v
----------------------------------------
|              Hardware                |
----------------------------------------

Board Support Package: architecture dependent kernel code

The Board Support Package (BSP) is architecture dependent kernel code. It is the OS layer that contains hardware specific code and drivers that allow the operating system to interface with specific devices.

Kernel Overview

----------------------------------------
|     Sys Call Interface (SCI)         |
|                                      |
|   Process Mgmt       Virtual File    |
|                      System (VFS)    |
|                                      |
|    Mem Mgmt          Network Stack   |
|                                      |
|     Arch <----> Device Drivers       |
|                                      |
----------------------------------------

The major subsystems of the kernel are as follows:

  • System Call Interface: Allows calls from User Space into Kernel space, and can be architecture independent
  • Virtual File System (VFS): Common interface abstraction layer to specific file system implementations. VFS is a switching layer between the SCI and the file systems supported by the kernel
----------------------------------------
|                                      |
|    -----------------------------     |
|    |             VFS           |     |
|    -----------------------------     |
|      |            |          |       |
|     ext3  ....   zfs       /proc     |
|      |            |                  |
|    ------------------                |
|    | buffer cache   |                |
|    ------------------                |
|            |                         |
----------------------------------------
             |
             v
       Device Drivers
             |
             v
      Physical Devices
  • Process or thread management: The O(1) scheduler/scheduling algo that operates in constant time allowing for scheduling of n number of threads in constant time.
  • Memory Management: Memory is managed in pages (4KB in most architectures). Handles swapping of pages out of RAM and onto disk.
  • Network Stack: Comprised of the following three layers, each sitting on top of the one below it
    • Sockets: invoked through the SCI. Standard API to the networking subsystems and provides a user interface to a variety of networking protocols.
    • TCP layer
    • IP layer
  • Architecture dependent code: The elements of the kernel that must take into account the underlying architecture (basically the BSP)
  • Device Drivers: Vast majority of the source code. Providing the implementation for a given piece of hardware.

Cgroups and Namespaces

Cgroups and Namespaces are the underlying technologies that enable container frameworks. In summary, for a given process cgroups limit the amount of resources that can be used, and namespaces limit scope of what can be seen by that process.

Cgroups

Cgroups provide the ability to limit

  • CPU
  • memory
  • disk I/O
  • network

A more detailed description can be found here.

Namespaces

Namespaces are a feature of Linux that enables the partitioning of resources such that they are segregated from different processes. There are eight types of namespaces:

Cgroup

Hides and abstracts the identity of the underlying cgroup for a given process. See man cgroup_namespaces for more details.

IPC

Provides a segregated set of inter-process communication resources that include pipes, shared memory, message queues, and semaphores.

Network

Provides for an independent and complete network stack to include ip addresses, routing tables, sockets, firewalls and other network resources.

Mount

Provides an independent set of mount points only seen by processes that are a member of that namespace. This enables mounted filesystems to be “presented” only to some namespaces and allows for the mounting and unmounting of filesystems in namespace that do not affect the underlying host.

PID

Provides processes with a unique set of ids, or PIDs (process id). They isolate the PIDs in a space such that other PIDs in different namespaces can have the same PID. They provide the underlying infrastructure for container based provisioning of processes for technologies like Docker and LXC.

Time

Virtualizes the values of the two system clocks; CLOCK_MONOTONIC and CLOCK_BOOTTIME.

User

Enables a unique set of user and group ids per namespace. For example, this enables one namespace in which a given uid has root privileges only in that user namespace.

UTS

Provides isolation of hostnames and domain names such that names can be resolved differently depending on which UTS group the process resides

SELinux

  • What is it?
    • A set of security modules (Linux Security Modules) that are loaded into the kernel to improve security on RedHat based systems.
  • What are the three modes?
    • Enforcing: all policies enforced strictly.
    • Permissive: prints policy warnings instead of enforcing them.
    • Disabled: turned off
  • What are the two types of security policies?
    • Targeted: Policies that apply to specific processes. Typically, those that accept network connections (httpd, ssh, named, etc.). These processes run in their own SELinux domain called a confined domain. The policies restrict access to any resources outside the confined domain. If a process in the confined domain attempts to access anything outside of the domain access is denied and the attempt is logged.
    • Multilevel Security Policy (MLS): A defined security schema that enforces the Bell–LaPadula Model (BLP) which was originally designed for the U.S. Defense community to enforce access to data based on the variety of increasingly restrictive security classifications.
  • What program do you use to change policies on files?
    • chcon

Files and Permissions

  • What is the difference between 644 and 755 as the definition of a file permission?
  • If there are 4 digits in a numerical file permission definition what is the first digit for?
    • It sets setuid, setgid, or sticky bit?
    • 0: removes setuid, setgid, or sticky bit
    • 4: sets the setuid bit
    • 2: sets the setgid bit
    • 1: sets the sticky bit
  • What are the three types of permission settings that you can use to enable elevated privileges on specific files?
    • setuid, setgid, and sticky bits
  • Explain setuid and setgid
    • The two are short for set User ID, and set Group ID, respectively. It enables certain users/groups to execute programs (or access directories) with the permissions of the owner or group of the file.

setuid example

Following is an example that illustrates exactly how this works. Setgid works basically the same way. The difference being that the file in question is executed as either the owner (setuid), or group (setgid) of the file. Nothing magical about it, but worth going through to get a hands-on understanding of how it works.

As the root user, create a directory and a text file that only has read permissions for the root user.

mkdir /var/tmp/test-setuid
echo "Success!  Content from /var/tmp/test-setuid/file.txt" > /var/tmp/test-setuid/file.txt 
chmod 0640 /var/tmp/test-setuid/file.txt 
ls -al /var/tmp/test-setuid/file.txt 

You should see the following output

-rw-r----- 1 root root 35 Jan  2 12:38 /var/tmp/test-setuid/file.txt

Copy and paste the following c code into /var/tmp/test-setuid/read-file.c and compile it. If you do not have the “build essentials” set of packages installed for your machine that is an exercise for the reader.

#include <stdio.h>
  
int main() {
    char *filepath = "/var/tmp/test-setuid/file.txt";
    FILE *f = fopen(filepath, "r");

    if (f == NULL) {
        printf("Error: unable to open file%s", filepath);
        return 1;
    }

    // Read the file, one character at a time and print to STDOUT.
    char c;
    while ((c = fgetc(f)) != EOF)
        putchar(c);

    fclose(f);
    return 0;
}

As the root users, compile the program, set the execute permissions for other and set the setuid bit.

gcc -o /var/tmp/test-setuid/read-file /var/tmp/test-setuid/read-file.c
chmod 4755 /var/tmp/test-setuid/read-file
ls -al /var/tmp/test-setuid/read-file

You should see the following output

-rwsr-xr-x 1 root root 16816 Jan  2 12:56 /var/tmp/test-setuid/read-file

As a non-root user attempt to cat the contents of the text file. You should get a “Permission denied” error because non-root users do not have read access to the text file itself.

cat /var/tmp/test-setuid/file.txt 

As a non-root user execute the read-file program and you will then be able to read the contents of the text file.

/var/tmp/test-setuid/read-file
  • What is a sticky bit?
    • A file permission that enables only its owner to delete and/or rename the file. The root user is an exception as it can mutate any files with the sticky bit set. A good example is the /tmp dir.

UMASK

  • What is it?
    • A C-shell built-in which enables the system administrator to specify the default permissions for the creation of a new file or directory. It is a 4 digit octal number that is SUBTRACTED from 0777
  • Where is it defined?
    • Typically /etc/profile on RedHat based machines.
  • Where else can it be overridden/defined?
    • Can also be overridden in ~/.bashrc

Pipes

anonymous pipes

What are the different runlevels and what does each do?

RunlevelNameDescription
0OffTurns off the device. Different distros have different specific commands that are executed, but the end result is the same.
1Single-user modeFor administrative tasks. Only root can login
2Multi-user mode without networkingDoes not start the network stack or automatically start daemon processes
3Multi-user mode with networkingNormal system operation without a display manager automatically started
4UndefinedUndefined and/or user definable
5Graphical Interface (X11)Everything in runlevel 3 plus the display manager
6RebootReboots the device

What are the steps in a Linux boot process with System V?

  1. BIOS: Basic Input/Output System and execution of the Master Boot Record which contains the primary boot loader
  2. MBR: MBR scans through the partition table for an active partition. When it finds one it ensures that the remaining partitions are not active and then loads the boot record from the active partition and executes it.
  3. Kernel Loader (LILO or GRUB): Loads the Linux kernel
  4. Kernel: Kernel executes typically from a compressed kernel image
  5. Init: The kernel starts the first user-space program, typically /sbin/init
  6. Runlevel: Runlevel programs executed from /etc/rc.d/ etc.

What are the steps with Systemd?

  1. BIOS
  2. MBR
  3. GRUB/Kernel Loader
  4. Kernel
  5. Systemd

Systemd

Systemd is a linux init and service management system that replaced System V and provides a more comprehensive and integrated process management system.

Shells and Bash

  • What is the shell?
    • A program that takes commands from the keyboard and gives them to the OS:
  • What are some common shells?
    • bash
    • tcsh
    • ksh
    • zsh
  • . How can you see a list of the shells available on a system?
    • cat /etc/shells
  • What is an xterm, gnome-terminal, konsole, etc?
    • A ‘terminal emulator’ that renders a window in a GUI and allows you to interact with the shell.
  • What is the difference between .bashrc and .bash_profile?
    • .bashrc is shell config it is sourced for each new terminal that is created
    • .bash_profile is login configs and gets sourced once on login. Subsequent changes to it can be applied to new and existing shells by sourcing the file, or by logging out and logging back in

Processes and Threads

  • What is a process?
    • An instance of a computer program that is currently running. It may contain one or many threads.
  • What is a PID, PPID?
    • The PID is a unique process id and the PPID is the Parent process id. The id of the process that started the process with the PID
  • What is the difference between a process and a thread?
    • A process is the encapsulation of memory and code and other OS resources. It is (mostly) separate from other processes running on the system. A thread is code sequence running within the process. There may be multiple threads running in the same process and they serve to enable multiple tasks to execute simultaneously and share the same resource of the parent process.
  • How do you list user’s running processes (ps)
  • How do you list all running processes (ps -eaf)
  • How do you list a process matching some regex (pgrep regex, ps -eaf | grep regex)
  • Kill a process by PID (kill pid)
  • Kill a process by name (pkill -f name)
  • Kill a hung process (kill -9 pid)
  • List most busy processes (top)
  • List ports opened by processes (netstat -tulpn)
  • List all resources open by a process (lsof)
  • What is /proc contain?
    • It is a virtual file system that contains information about all of the running processes on the system.
  • Explain fork/exec
    • When a command is entered into a shell the shell does two things:
      • Forks the process. Copying the existing process that made the call and inheriting all its attributes (env vars, open files, user id, etc.) except: it gets a new pid, it does not inherit its parent memory locks and semaphores, the child does not inherit any outstanding asynchronous I/O operations or asynchronous I/O contexts. This copy of the shell program does NOT read any commands it immediately does an operation called …
      • Exec in which it causes the kernel to load the new program over the top of the child shell and run that program in its place.
    • The original shell now waits for the child to complete and return a value (int)

Virtual Memory

Virtual memory is a method of being able to access more memory than is physically available on your system. This combines physical RAM with the swap space as well as taking advantage of the fact that the entire program is not accessed at once and does not need to all be in memory at the same time.

The primary concept is a page, or a 4Kb area of memory and is the basic unit of memory with which both the kernel and the CPU operate with. Both can access bytes or bits, but a page is the size in which memory is usually managed.

A process only needs to have the pages in memory that is it currently using. If the process needs a page that isn’t in RAM, it will go and get it, usually from the disk. A good anaology is that of a notebook. The notebook represents the existing memory that is being changed and that books that are closed with bookmarks represent parts of the programs that are not changing or not being used. It is the kernel’s job to ensure that a process has the data in RAM that is needed to operate and to keep track where data is swapped out of RAM to disk.

New swap space is added with mkswap command and the system told to use it with the swapon command.

Page Tables, are an integral part of virtual memory, as they point to the actual pages in memory. Only processes in the TASK_RUNNING state are eligible to have pages swapped back in prevent wasteful I/O for threads that are currently blocked, or sleeping.

Demand Paging is the process of loading virtual pages into memory only as they are accessed/requested.

When a process attempts to access a virtual address that is currently not in memory, the OS must bring the appropriate page into memory from disk. In general, this is a process of resolving page faults and loading them into a physical RAM and resolving the virtual address as the program executes. Files contents are mapped into a processes virtual memory via memory mapping.

If a process needs to bring a virtual page into physical memory and there are no free physical pages available, the OS must make room for the page by discarding another page from physical memory and swap it between RAM and the disk. If the page to be discarded from physical memory came from an image/file that has not been written to then it can be discarded and brought back later from the original file on disk. If it has been changed, then it must be saved so that it can be accessed at a later time by the process that was using it. This is known as a ‘dirty page’ and it is the data written to a swap file.

A working set is the set of pages that a process is currently using.

What do the numbers in parentheses after commands shown on man pages mean?

  1. General commands
  2. System calls
  3. C library functions
  4. Special files (usually devices, those found in /dev) and drivers
  5. File formats and conventions
  6. Games and screensavers
  7. Miscellanea
  8. System administration commands and daemons