BioSec 2012: Consolidated Notes

From Soma-notes

Below is a summary of what was discussed in the 2012 run of Biological Approaches to Computer Security, organized by topic.

(Cheryl: I think a topical organization is easiest and clearest, but if you have other ideas feel free to go with them. I just made up some topics off the top of my head, please use ones that make sense based on the notes you find. Look at all of the pages linked to from the top-level biosec page: the notes for the first weeks, the misc notes pages, and the individual student pages. Please condense or drop stuff that is too detailed (particularly notes on evolution).)

Evolution

The origin of species by Charles Darwin is one of the most well known and respected pieces of literature to this day, even through today, it is half a century old. It has changed the way we as humans see and perceive the world as we know it. Darwin's theory on evolution challenged all the past and present views of how species came about. This not only includes theories in science, but also religion - where many believe in natural theology. That is species are created by a creator and a species' adaptation to their environment is nothing but `intelligent design'. Although still very much a controversial piece of literature as it was in Darwin's time; the theory of evolution has stood the test of time in the scientific realm and still remains to be the most accepted scientific explanation for the origin of species of all living organisms.

Summary of The origin of species

The entire argument that Darwin formulated throughout this piece of literature is based on variation. It should be noted that variations in a particular species is present irrespective of whether or not that particular species is domesticated. Each species is distinguishable from the other due to the numerous different adaptations and traits it poses. The variations in a particular species are present in either one, or a combination of physical, chemical and biological traits. These traits are often inherited from one generation to the next (hereditary) and is rooted in the species adapting to change in their environment at some point in history. To support this hypothesis Darwin gives numerous examples of remarkable adaptions that permitted different species to survive and in some cases thrive in their environment. A couple of these examples include: the beak of the woodpecker that allows it to better collect it's prey - insects, and the wings of the bat that permits it to fly. Furthermore small variations seen within a particular species directly correlated to variations seen across different species. Thus, Darwin's entire theory of evolution attempts to explain with proof of observation that variation is the entire cause of the origin of species.

Key to Darwin's argument is the notion of natural selection which explains how variation can eventually lead to the evolution of the particular species. In order to understand this concept, one needs to first understand the concept of struggle for existence. In chapter three through to chapter seven of \textit{the origin of species}, Darwin explains the struggle for existence as the reason why some species' characteristics survive and others go extinct. In addition, he notes that the huge amount of variation in species has permitted for species to adapt very well to their environments. That is, due to the unique characteristics that certain organisms have developed they can thrive in their specific environment. In addition, Darwin notes that only the characteristics that prove to be most advantageous (variations that permit a species to adapt to their environment better than other species) is passed from one generation onto the next (hereditary). This is when the concept of natural selection comes to play. Natural selection allows for the species that have best adapted to the environment to survive and/or prosper. However, at the same time the species that do not possess variations that are advantageous struggle for their existence, but do not succeed, thus becoming extinct.

In other words, natural selection is basically the mechanism that drives what we refer to as evolution. Living organisms continue to pass genes from one organism to another. These genes are not all the same, some carry variations, some do not. The variations in the genes can prove to be either advantageous or disadvantageous to future generations. This is due to the fact that only advantageous genes are naturally selected and thus survive. Implying that living organisms with the advantageous genes continue to reproduce passing their genes from one generation to the next. Eventually, this variation causes this group of living organisms to be branch off from their original species and become a species of their own. The continual branching of species into new species suggests that all species can be traced back to one single parent species. Moreover, this theory provides a simple but profound explanation as to why many species are very similar to each other. The reason behind which is that the species either evolved from one another or they have a closely related common parent.

Another concept that go hand in hand with natural selection is the limits of population increase. Nature can provide enough food and shelter for all the species that is inhabitants and at the same time be very destructive (natural disasters, animals prey on other species, etc.). This in turn causes species to struggle for their lives, and essentially prohibits some organisms to survive. The concept of limits of population increase (borrowed from Thomas Malthus) basically states that each generation increases the population of the species exponentially, which in turn implies that the population of the entire world is increasing constantly. However, this poses the problem of the world running out of room for the species to occupy. That is if the birth rate increases exponentially while the death rate remains the same from one generation to the next. This is not the case in reality, thus there must exist a limit that nature imposes on the total number of inhabitants. This in turn gives rise to competition, where each species must compete with each other in order to ensure that they survive thus threatening the survival of the other species.

The rest of this piece of literature is devoted to Darwin defending his theory against possible criticisms from other known scientists of that time, and still some today. For example, the existence of fossil records proves not to link the chains of evolution from one parent species. Darwin's argument in this case is that, many of the fossil fuels found today, are not perfect, their original conditions have been destroyed. In addition Darwin argues that geographical isolation proves to be a fundamental component to his theory. That is because his theory suggests that all living organisms develop from one or a handful of `original' parent species, there was a need for species to travel and immigrate to different areas of the world. However, this was easier said than done, especially when barriers such as water (oceans), height (hills and mountains), etc. highly restricts the possibility of living organisms to immigrate to another region of the world. Thus, the few that were able to escape their birth place and immigrate to another region shaped the rest of the species in that particular geographical area.

Computer security perspective

A lot can be said about applying computer security to Darwin's approach. To begin with, a lot of questions needs to be answered. For example, although Darwin's approach of evolution seems to be a fool proof way of doing things, it is really slow. In order for biology to get to the stage where it is at, it took billions of years. How are we supposed to replicate this billions of years worth of work in a matter of few days or months or even years? In addition, following Darwin's approach, we need to be able to accept failure most of the time. That is, biology has gotten to where it is currently at by accepting failure, which is present when species become extinct because they did not adapt a advantageous variation. Lastly, in Darwin's theory, a species survival was highly dependent on that of the environment. Although this might have worked within the realm of biology, we do not have the luxury of exploiting this in the realm of computer security. That is, computer security needs to work accurately irrespective of the environment it is in.\\

Diversity

Diversity can be defined as a variation. This variation can take place in many different form. In the previous chapter diversity among different species was briefly mentioned. This chapter discuss biological diversity and software diversity that is currently present today. The chapter then concludes with a brief section about how this all relates to computer security.

Biological diversity

Biodiversity, also known as biological diversity is the variety of different living organisms and all its processes found on Earth. This broad category is further divided into three categories: genetic diversity, species diversity and ecosystem diversity. Genetic diversity is basically the total amount of different genes found in the entire population of one species. Species diversity is the diversity of species within a particular region (i.e. the total number of different species that inhabit a particular area). Ecosystem diversity the total number of different habitats that are encompasses within a region. Needless to say, there is an abundance of diversity that is present in biology. This is mainly because of evolution. As explained in the previous chapter, the theory of evolution basically states that all organisms descended from one or a hand-full of common ancestors. This thus implies that they all share common features. At the very same time due to natural selection species need to continue to evolve and change over time to adapt to their specific environment. Thus, as each species changes and evolves into another, they add to the biodiversity that is already existent. This is because when a species evolves substantially, it becomes it's own `new species' which adds to the species diversity and subsequently adds to genetic diversity and ecosystem diversity.

With biodiversity comes a lot of dependence. That is different species depend on each other for food, thus in turn affecting the entire population of the species. For example, parasitism and mutualism. Parasitism is the symbiotic relationship between two organisms, where one organism benefits at the expense of the other organism. Mutualism is also a symbiotic relationship between two organisms, however the difference is that both organisms benefit from this relationship. This interdependence in biological systems means that if something were to happen, it would strain an entire species, or a couple of species, depending on the actual gene that it affects. However, the chances of one `virus' killing all the living organisms in the world is very slim. Unless it attacks the three aspects that are common to all living organisms: ATP, DNA or plasma membrane. Even if this is the case, all living organisms have a general mechanism of dealing with unknown dangers. This is in addition to the other various mechanisms that are more specific and deal with known dangers. Thus, diversity increase the probability that at some level in the headachy or organisms, at least one member, if not not many will be able to withstand a major attack.

Software diversity

From the days where computers were just affordable and in almost every office and house, Microsoft has dominated the operating system market. Although today, the use of other operating systems such as Linux, Mac OS X, etc. have increased in popularity, Microsoft dominates the current market. But unlike the past, security measures such as network access, anti-virus, etc. has been improving at a remarkable rate. However, this is improvement is driven by the need to always be one step ahead of the community that attacks these systems (hackers). Nevertheless, the problem is still very much prevalent in today. This is because of the lack of diversity. When hackers create a virus, unless they are attacking a specific system, they normally try to meet the following criteria:

  1. A OS/system that is most widely used so as to attack the most number of people at once
  2. A OS/system that has good documentation, thus permitting the virus to be created easily
  3. A OS/system that is not secure, and hence is extremely venerable, thus permitting a virus to propagate

Computer security perspective

With respect to computer security, diversity would imply aspects such as automatic scanning all the time which would then make it extremely difficult for attacks to propagate though a given system. Furthermore, the fact that a given script can easily attack many computer systems indicates that a computer system is in fact homogenous. Thus, diversifying the entire internal system of a computer by adding `junk' code/introns could provide an additional level of security against unknown attacks/threats. Currently, computer systems are protected against known attacks, and a very general methodology that is very low level and easily bypassed.

However, at the same time, there are disadvantages of making such a system. To begin with there is always the brute force method incase some part of a system is randomized. Hence, there needs to be some sort limit as to how many tries a given piece of software can utilize certain areas of the computer system. Additionally, creating an number of computer systems that are so diverse that they offer a lot of protection against attacks is not as straightforward as it seems. This is because biological systems have a huge genetic adaptation, but, the number of potential operating systems/applications is considerably small. In addition, currently all systems that are currently existent are vulnerable to more than one known attack. Thus, replacing a multitude of relatively homogeneous systems with equally vulnerable but diverse systems (diversity will result from modifying/randomizing a certain aspect of an `original' system) does not in any way increase security. Moreover, although the security aspect of a system would be satisfied, it would not be practically in terms of creating additional programs/applications.

Therefore,further research must be done in order to evaluate the various way of creating a diverse system. However, to start of with some parallels could be drawn from biodiversity. For example, parasitism/predator-prey relationships: - one can imagine the predators as the attackers/hackers. In order to simplify things, this could be seen as one whole group. However, a more complicated version of this model could be that the attackers/hackers are separated into different categories based on various characteristics such as skill. This is very similar to many predators after the same prey, some more harmful than others, but nevertheless they all exist. The prey could be viewed as the software itself. For simplicity this could start off being characterized by operating system. However, could potentially be more complicated and be based of different programs/applications, etc. In addition with this, various other concepts such as competition and extinction come into play.

Malicious organisms and code

The immune system is one of the best systems (and most obvious systems) that computer security should adopt. It has a multilevel security system that is put in place to protect the entire organism. In addition to protecting the organism, the immune system also cleans up after the battle/attack. The immune system is comprised of may little organisms. Antigens, proteins that are not native to the body, are recognized by the organism's antibodies (organism's `immune system's detectors'). The key here is that these antibodies have receptors that are extremely specific to certain antigens. If any other antigen even attempts to bind to these antibodies, an alarm goes off. This `alarm' then triggers a whole mechanism that ultimately causes the antigen that does not belong to be destroyed.

Malicious organisms

One of the most well studied immune cells are called T cells. The production of these cells involves a negative selection process. Like any of the other antibodies, T cells also have receptors that are highly specific. Any antigen that is present within the organism and it does not fit one of these receptors, is a malicious organism. Thus, it can be concluded that the receptors of these cells are designed in such a way that the it never identifies its own self as a antigen.

The negative selection process is is the process by which any cell that reacts with other cells that are part of the organism are automatically filtered out. An example of this process is the production of T cells. The cells are produced through a pseudo random genetical process. Once this is done, any cell that recognizes it's self (the organism that it is in) is eliminated through the negative selection process. The rest of the antibodies that are created are distributed throughout the organism which is then looks for potential malicious organisms. Another key aspect of the immune system is that it has the potential to react with both antigens it has never been in contact with and antigens that it has been in contact with before. Once it has come in contact with the antigen before, it has the ability of remembering that antigen for more than 70 years, i.e. potentially the rest of a human's life. Moreover, the response of the immune system to antigens it has seen/has contact with before is much quicker than its response to antigens it is in contact with for the first time.

Malicious code

Malicious code is basically the same thing as malicious organisms, but with respect to computer systems. That is, malicious code is code that is `intruding' the system and is not supposed to be there. A common example of malicious code is malware (malicious software). Malware, is software designed specifically to damage or disrupt a system. This disruption could potentially take over the entire system and kill it. An example of such a software is a Trojan Horse. The current problem that people who are trying to defend these system are facing is that current methods of virus detection do not detect polymorphic malware. That is, malware that uses a defence mechanism such as encryption to avoid detection. The problem is that even if there was a anti-virus that could potentially detect itself the malware, the system does not have sufficient time to respond. In addition there are malware that makes it next to impossible to detect until the damage is done. An example of such a system is serendipitious seeding of malware. Thus the only key to accurately detect these viruses in time is to recognize the fact that these mechanisms under go evolution constantly.

Computer security perspective

It is a well known fact that security of any kind works in three stages: prevention, detection and response. Although prevention is most cost effective way of dealing with security, it could potentially be very harmful if one replies only this method. This is because if prevention fails at any given time, it could potentially be very costly to fix the system. Moreover, preventing intrusion on a highly complex machine can be a daunting task. However, there are ways such as a multiple level security system that could make it just a little bit easier. On the other hand, detection is a much easier goal to achieve than prevention altogether. It is key to have an excellent detection and response procedure in order to defend any system from attackers both within and outside a given system.

Homeostasis/feedback

Cell Communication

Hormone: messenger molecule/small chemical messages

  • creates localized state change
  • kind of an interface to the cell
  • hormones mediate reactions
  • used for regulating homeostasis
  • work with the nervous system to communicate throughout the body
  • hormones aren't surface bound, they go into the cell
  • they are global signals, and can have systemic effects
  • there are different hormone receptors and mechanisms
    • they induce change on the inside of cells instead of triggering reactions from the outside
  • seem to be an early evolutionary construct
    • sort of a blunt stick form of communication
    • govern emotions, fight or flight-type reactions, growth
    • they have systemic and far-reaching effects
  • hormones are sort of like datagrams
  • only about 50 hormones exist
    • they don't convey much information, or much interpretation
    • however, concentrations don't need to be high for them to have effects
  • one-to-many communication

Crosstalk:

  • different hormones interfere with each other
  • a given receptor can be activated by different molecules
  • a molecule can activate different receptors
  • the network begins as a fully connected graph, and then connections are pruned away
  • crosstalk is why drugs have complicated and unpredictable side effects

We could consider the "drug discovery problem" to be equivalent to the "computer security problem".

  • Engineering challenge
    • every input is connected to every output
    • through trial and error, select for the pathways that work
  • moral of the story: there needs to be more coupling than we think in computer
    • we need to allow for feedback loops, running parallel to the main operations
  • Metabolic diseases are really receptor diseases
    • the question is "what receptor does it target?"
    • this is why viruses only affect certain tissues: the tissues where the receptors are located are affected
  • some diseases (such as avian flu) can be caught by humans from animals, but not spread between humans

In this chapter the differences between cellular communication and the communication that takes place in computer programs was discussed. In cellular communication, the process seems to be top down: all links are established, then some are pared away. In computer programs, the process is bottom up: links are established on an as-needed basis. My first thought is that having more links could be a security problem - if you want information to stay where it's put, not having many links seems to make sense. However, having a system with more links could allow for more feedback and could potentially better support an evolutionary system.