BioSec 2012: Elizabeth

Elizabeth's BioSec Notes

(Organized by class dates) Brain dumps, class notes, useful insights, points of confusion, it's all here.

Jan 25

Class readings:

Chapter 2: Origins of Life
Chapter 3: Selection, Biodiversity, and Biosphere

In-class notes:

Chemistry Review
- energy difference between reactants and products in a chemical reaction
- however, you need an input of energy to begin the reaction
a catalyst changes (lowers the energy needed to reach the intermediate state, making the reaction more likely to take place
- the catalyst is unchanged in the process
Biological catalysts are enzymes (proteins) that hold the reactants and situate them in such a way that the reaction can happen more easily
- enzymes move the reactants around
- enzymes can have crystalline structure
cell logic is built on pattern-matching
- enzyme is looking for the reactants that fit its receptors
ATP: Adenosine Triphosphate
- energy carrier/source for cells
- universal resource, used by all cells
ADP: Adenosine Diphosphate
- similar to ATP, but has one fewer phosphate group
- has lower energy than ATP
- the cell expends energy to turn it into ATP
- then the cell breaks up the ATP to use the stored energy
Eukaryotic cells vs. Prokaryotic cells
- in eukaryotic cells, the genetic sequence isn't simply copied from DNA to RNA. Instead, parts of different sequences are picked and chosen and edited into proteins.
- this means that a lot of the information in the DNA is there to control and regulate how parts are edited and assembled.
Because of how evolution works (building on what already worked), understanding how a system works is equivalent to understanding its history, and why it is the way it is.
- however, it can be hard to know where stuff came from, and what came first

Jan 27

Class readings:

Chapter 4: Energy and Enzymes
Chapter 5: Membranes and Transport

In-class notes:

ATP provides the energy to shake things up, get things moving so that reactions can go
Fluid Membrane Model: active transport
- steric: does it fit through the transport channel?
- charge: does it have the right charge?
- selectively open: channels can be open or closed

Chapter 6 (Cellular respiration) preview
respiration = the process of getting oxygen into the cell
glycolysis: ancient process to create ATP, doesn't involve oxygen
Figure 6.5 - important diagram
- glucose oxidation happens in steps so that more energy (ATP) can be harnessed and not lost as heat (wasted energy)
there isn't anything particularly special about the molecules used in cellular respiration (glucose), but it is noteworthy that all eukaryotic cells use the same molecules and seem to be evolutionarily related
look at the process/architecture of the Calvin and Krebs cycles

Feb 1

Class readings:

Chapter 6: Cellular Respiration
Chapter 7: Photosynthesis

Pre-class notes:

Both chapters address how cells make ATP and other byproducts.

not a lot of discussion about how the two processes fit together (I mean, photosynthesis is the more important because it creates the glucose for cellular respiration to use?)
much of the in-depth chemistry was confusing
In Ch 7, I didn't fully understand the last section about photorespiration and how plants avoid it
- what is the problem, really?
- I understand that the C4 cycle resolves it

Possible application-y thoughts

both photosynthesis and cellular respiration involve a lot of cyclical processes (like loops, I suppose) that transform one product into another
the cellular structure model seems like it could be applied to computers (and is similar to what exists), but maybe the metaphor could be extended to be larger?
what would ATP map to in the computer world? Information output?
It seems that the processes are finely tuned so that most of the by-products (except energy lost in heat) get used - is there a moral in that story?

In-class notes:

Photorespiration
- Carbon dioxide (CO2) is split up to get oxygen (O2) and carbon (C). The carbon is used to make glucose, and the oxygen is toxic to an enzyme
- is an example of a way in which the process isn't completely specific/specialized, and has included limitations, even through selection
- evolution isn't perfect (and has limitations), and sometimes, these get papered over and the cell goes on living with them
- in this case, the C4 cycle has developed to handle the limitation
Linear Electron Chain
- begins with photon from the sun (energy)
- the chain of molecules that take the energy from the sun and pass it along
  - each stage uses what it can of the energy and passes the rest along
  - most of the energy ends up going to proton pumps with create ATP
NADP is an electron carrier molecule

Q: How would this kind of system evolve? What kind of pressures must have existed?

Photosynthesis is a pretty efficient process
- however, it is less efficient than cellular respiration
- and considerably less efficient than any process designed by humans (such as the internal combustion engine)
Photosynthesis and cellular respiration aren't divorced processes
- the plant uses its glucose (from photosynthesis) in the mitochondria, to create more ATP (when the sun isn't shining)
- but animals get their glucose from what they eat, which is then used by the mitochondria
- plants effectively store energy in glucose (ex. maple tree sap)
Plants are net producers of oxygen, and net consumers of carbon dioxide
Humans are net producers of carbon dioxide, and net consumers of oxygen
- Earth's atmosphere is 78% nitrogen, 21% oxygen, and only 0.03% carbon dioxide

Computer security

plants provide energy to almost everyone else, but why? why haven't they evolved to protect themselves from animals?
animals prevent the plants from consuming everything
- by themselves, the plants are unsustainable
- they need someone to eat them
so, might computer security be in need of a predator?
- do we need to find some constant pressure to keep security on the path?
How can we work out a system where the pressures create better, stronger systems?
- i.e. one where evolution will take place
- predation addresses material imbalances
- what are the inputs and outputs of computer security?
  - on the internet, there seems to be a lot of information, but the challenge is in parsing it into wisdom

Feb 3

Class readings:

Chapter 8: Cell Communication

In-class notes:

Hormone: messenger molecule

creates localized state change
kind of an interface to the cell
hormones mediate reactions

Crosstalk:

different hormones interfere with each other
a given receptor can be activated by different molecules
a molecule can activate different receptors
the network begins as a fully connected graph, and then connections are pruned away
crosstalk is why drugs have complicated and unpredictable side effects

We could consider the "drug discovery problem" to be equivalent to the "computer security problem".

Engineering challenge
- every input is connected to every output
- through trial and error, select for the pathways that work
moral of the story: there needs to be more coupling than we think in computer
- we need to allow for feedback loops, running parallel to the main operations

Metabolic diseases are really receptor diseases
- the question is "what receptor does it target?"
- this is why viruses only affect certain tissues: the tissues where the receptors are located are affected
some diseases (such as avian flu) can be caught by humans from animals, but not spread between humans

Feb 8

Class readings:

None, discussion of the wiki, plans for moving forward

In-class notes:

DNA: Deoxyribonucleic acid

two strands, twisted around each other into a double helix
- form is very stable, sort of like a zipper
C-G, A-T pairs of nucleotides
different ends on each strand (3' and 5')
the chain forms redundant representations
- the duplication and the structure are in place in order to protect the information
- the duplication also provides a built-in way of replicating the information

DNA - polymerase

attaches to DNA, ratchets itself along the nucleotides
- proceeds from 3' to 5', only in one direction
copies the DNA as it goes
does error checking, but the duplication is still a moment of vulnerability for DNA corruption

DNA structure

the structure of DNA is rigid, and takes up a lot of space
to conserve room, DNA is wrapped around histones to form chromosomes
- ball of string metaphor
when packaged like this, it is effectively "off" and can't be used
- so different cells unpackage and use different parts of the DNA
- chromosomes are dynamic structures, only created in duplication processes

Hydrogen bonds

more like magnets than glue
when you pull them apart, the bonds disappear, but you can then put them back together

Technology

technology is fundamentally destructive
what assumptions do you have to make about putting it back together
- we assume we know how many pieces there are
- we assume the pieces are unique
- errors in recombining cause diseases (structural problems)

Telomeres

tails on chromosomes that tell how many times a cell can reproduce
linked to aging and premature aging diseases

What are the skeletons in computer science's closet?

emergence
- emergent behaviour == bugs
  - the products of interactions that you didn't know about
computability (?)
is computer science really a science?
- it doesn't seem to have the big questions of methodology that would make it a science
- basically, is math + engineering
- is there a larger discipline of which biology and computer science are subfields?

Feb 10

Class readings:

Chapter 9: Cell Cycles

In-class notes:

Missed class for me

Feb 15

Class readings:

Chapter 10: Genetic Recombination

In-class notes:

evolution = variability + selection

Bacterial reproduction looks like an API

but how do you upgrade a trillion machines that use it?

meiosis binds reproduction and variability
conjugation is a similar process to what happens in software
- plugins
- an updating process
meiosis is so complicated - so why do we have it?
- eukaryotic organisms have so many copies of their genomes that updates make no sense
- reinstallation is a better option
- begin with an initial cell, and rebuild the network
- the only upgrade path is mix & match
- the machinery of life is set up to make these upgrades go smoothly
- the reason for limited lifespans
  - the environment will kill you eventually, so it is better to reproduce and die on schedule

sexual reproduction is a gradual iterative process
- big changes over many generations
- wait, how do we get new species from this?
- genome is code for full working organism
so when you get to a certain level of complexity, you have to move to a sexual model of reproduction
- ubuntu cycles

Why does variation work?

because of the environment

Conjugation

bacteria have a single ring-shaped chromosome
- sometimes also have plasmids, which are other chromosomes
we happen to know about the process of conjugation because there is something to observe

PCR: Polymerase Chain Reaction

in a test tube, you can have DNA polymerase work on its own
this is the process used in sequencing DNA

Transposons

jumping DNA
protein folding
DNA moving around
biology is more about subtraction than addition
- start with everything and prune

Archaea

extremophiles
how do they manage reproduction in weird (and unfriendly) environments?
probably, we (and all other organisms) are the descendants of archaea

Feb 17

Class readings:

Chapter 11: Mendel, Genes, and Inheritance

In-class notes:

Genetics

genetics as code
why did Mendel choose peas?
- why did he choose the traits to examine?
  - chose discrete, binary characteristics
  - clean patterns
- there is some potential for deception if you choose your sample this way
  - however, it probably lets you draw some clearer conclusions from initial work and see some patterns you otherwise might not

this means the world is divided into Mendelian genetics and non-Mendelian genetics, but they aren't an even split
- non-Mendelian genetics is a much bigger than Mendelian genetics, because it is essentially everything else
Anil: the notion of a gene is a useless concept: genes are just code

Can we see Mendelian-type patterns in code reuse?

Mohammed: if it works, the probability of inheritance is 1
diploid: 2 copies of everything, backups of everything
men: XY
- the one X gene is used everywhere, so if there are problems on it, they show up
women: XX
- one X is used some places, the other in some places
- division is random

Hybrid vigor

inbreeding leads to weakened traits, as more copies of code are likely to have the same defects
selective breeding is trying to make both copies of the gene the same (so that breeding is true)

Annie: We need version control! Luc: How can we get a better understanding of what is going on in genes / chromosomes?

diploidy = having a backup copy polyploidy = having multiples

Do we want diploidy in our programs?

voting-based systems employ diploidy or polyploidy
- multiple systems generate the same thing, and vote on the answers
- n-version programming
unfortunately, humans tend to think in the same way, and generate the same erros
- so, human-generated diversity tends to be a bust (
- Elizabeth: I think this is SO COOL, how cool are people?

Making the program work != debugging

the idea of debugging implies that we're aiming for perfection
redundancy will protect against bugs
- think of engineering a bridge: there is no way to ensure that all bugs are out of it, so there are tolerances designed in to account for those bugs
- there is no expectation of perfection

Defensive programming

does checks to make sure that data (etc.) is in the proper format

In code, as in biology

there is lots of info that we don't understand the reasons for
the interactions aren't understandable
- so throwing away code that works is a bad idea
- instead, evolve it, change it, refactor it

Linux kernel

5 million lines of code (huge!)
has evolved over time to be cleaner, clearer
- it's because it has been worked on
- people have tried to fix the small things

Problems with legacy code

the culture is lost
dead code, the paradigm is gone away
code is part of a human system, but when you take away the humans, the code ceases to function
Luc: code depends on its humans, the people who use it, keep it alive, help it to evolve
code as part of an ecosystem of humans

Are there lessons here about why open-source code is long-term viable?

people stick with the process, keep the code alive

Games

live (while being developed), then die after being released because of lack of engagement
- post-release (and subsequent enthusiasm) there is a lack of community involvement
- this is similar to biology - things live, and then die (and quickly!)

Legacy code is really a statement about the humans behind the code.

Feb 22

Reading week

Feb 24

Reading week

Feb 29

Class readings:

Chapter 12: Genes, Chromosomes, and Human Genetics

In-class notes:

What might genetic recombinations mean in computer programs?

errors at reproduction
errors deriving from one parent (or both, in combination)
different types of error
- deletions, duplications, inversions

Genetic mapping using selective breeding

is an elegant technology, a nice way to determine a map of something that can't be seen
gives answers to questions about whether traits are linked, or partially linked
- are two traits on the same chromosome?
- if they are on the same chromosome, how close are they?
in a system with homologous chromosomes, gene crossover is a way to increase variability

How did they map the chromosomes?

a process similar to debugging (with actual bugs!)
trying to figure out the the root cause of various behaviour
- look at different variants to find which one has the code
- then reverse engineering the code

Genetic Inheritance

Cytoplasmic inheritance
- genetic material passed in mitochondria and chloroplasts
- i.e., from the mother, through the egg cell
Genetic counselling
- looks at the interaction of genes between parents
- analysis of family tree (ancestors, and their genetic traits, if known)
- now there are more tests, but not full genetic sequencing
- genetic counselling only covers diseases we already know about
  - this is really the tip of the iceberg
- provides some kind of probability of different genetic diseases
- is really the opposite of eugenics: is trying to increase genetic diversity, rather than optimize to one set of genetics

March 2

Class readings:

Chapter 15: Control of Gene Expression

In-class notes:

(Turned out to be pretty sick that day, didn't understand very well)

operon: group of enzymes

turning on/off switches depending on the environment
however, not a simple on/off switch (partial control)

lac operon

operon for the consumption of lactose
first operon to be discovered and understood
sort of like an if statement, but in reality, the logic isn't boolean
- if lactose: remove repressor
- else: leave repressor

March 7

Class readings:

Chapter 15: Control of Gene Expression

In-class notes:

This material made so much more sense when I wasn't running a fever!

Lac Operon

logic for implementing lactose use
uses biological logic, not boolean logic
sort of a genetic circuit
i/o
- inputs: glucose, lactose
what does it mean for the system to "sense" glucose and lactose?
- it means there is a certain concentration
- is a stochastic process
the repressor protein binds to the glucose (actually, a byproduct, allolactose), but as the level of glucose drops, it turns off
this process is really about how many copies are produced
- it is a self-regulatory process, as opposed to an external control structure (as in most computer systems)

Sigmoid functions

are continuous, and differentiable, but effectively divide things to one side or the other
these are similar to how transistors work, and the actual physical processes going on in computers, but aren't how we conceptualize these processes
biological systems sometimes work like this, but there are other systems that have more gradual curves, or more linear change
some systems will have fixed boolean logic, but others will be more linear

Do cells have memory?

yes
- methylation tags on DNA
- markers on proteins
  - which proteins are in the cells
inputs always change living systems, and this is the way they maintain state
homeostasis
- a forgetting (resetting?) mechanism
  - however, not everything is controlled this way
  - an active forgetting process (a wipe clean, sort of)
- what gets remembered?
  - changes to DNA (from radiation), etc.
cell habituation
- give a cell the same treatment over time, and at first, the cell will react strongly, and then less so over time
- an emergent property

In biological systems, state and logic are tied up together. So why can't computers habituate?

programmers never account for situations they haven't thought of
programs avoid path dependence (loops don't change each time you go through them in relation to what came before)

Microprocessors:

run code processors in parallel, then abandon branches that don't work out
- simulate serialism
- branch prediction
- can affect the way new code is written to run on them
in essence, processors 'habituate' to code
- the more you run the same code, the faster it goes
- however, it optimizes speed over functionality, making it kind of "stupid" habituation

What does it mean to program biological systems?

how can we make conditional statements that accommodate all possible events?
where do the emergent properties come from?
it seems like there should be a "better"/"new" way of factoring code to allow emergent behaviour to develop, and state memory (normally, we avoid state dependence)
- we need to give up control (in terms of the history of the system)
- history: how you do a computation should depend on how it was done in the past
- self-repression in cells - error handlers should self-perfect
- the system doesn't habituate, but the user does - what are the implications of this? does the user constitute part of the system in this situation?
unlike in computational systems, 'how' things happens matters in biological systems
- means and ends matter

March 9

Class readings:

Individual chapters

Chapter 45: Population Ecology

mathematical models of population

Population characteristics

geographic range, habitat
population size
population density
dispersion (clumped, random, uniform)
age structure
generation time
sex ratio
proportion reproducing

Demography: processes that change a population's size and density

immigration and emigration
age-specific mortality
age-specific fecundity
survivorship curves (type I, II, III)

Life histories

energy budget
- used for maintenance, growth, reproduction
passive/active parental care

Modelling population growth

exponential models
logistic models
intrinsic rate of increase

Population regulation

crowding, predation, parasitism, disease affect population density
density-independent factors
- ex. weather
population cycles

Human population growth

population diversity is needed to ensure survival

In-class notes:

Dan:
Chapter 22 - Viruses, Viroids & Prions: Infectious Biological Particles

prion: bad shaped protein

ex. mad cow disease

infected brains getting mixed with meat, but is very slow-moving, not a traditional disease
sort of equivalent to a buffer overflow with only 1 byte
how does the changed DNA get into a cell with a virus?
- normally, cytoplasm protects against this sort of thing, this is why viruses only affect certain cell types

Cheryl:
Chapter 35 - The Endocrine Systems

Hormones: small chemical messages

used for regulating homeostasis
work with the nervous system to communicate throughout the body
hormones aren't surface bound, they go into the cell
they are global signals, and can have systemic effects
there are different hormone receptors and mechanisms
- they induce change on the inside of cells instead of triggering reactions from the outside
seem to be an early evolutionary construct
- sort of a blunt stick form of communication
- govern emotions, fight or flight-type reactions, growth
- they have systemic and far-reaching effects
hormones are sort of like datagrams
only about 50 hormones exist
- they don't convey much information, or much interpretation
- however, concentrations don't need to be high for them to have effects
one-to-many communication

Mohamed:
Chapter 43 - Regulating the Internal Environment

Osmoregulation

water in and out of cells
cell will shrink or collapse (or alternatively explode, both are bad!)
cells pump salt ions in and out, and water follows to dissolve the salt

Osmotic pressure

membranes are a bit like cloth: water can move through it, but only at a certain rate, and if you push harder (perhaps with a hose), you would break the cloth

Osmoregulation

is a homeostatic mechanism to maintain temperature, pressure, and environment
organisms evolve to expect a stable environment
- the stability of the environment affects efficiency, performance
- ex. when you put all of your energy into sprinting, your body compensates by stopping digestion to save energy

implication: code buried inside a large project doesn't tend o be nearly as robust as libraries used all over need to be
if the environment is stable, you can make assumptions about the kinds of conditions you will have

March 14

Class readings:

Individual chapters Chapter 46: Population Interactions and Community Ecology

In-class notes:

Communication in cells:

there is lots of "noise" present when cells collect information, so it gets processed out at the point of entry
- sensors habituate to this noise
unlike in computers, information processing is performed everywhere, and at every step of the communication process
- biological systems are messy!

Chapter 46: Population Interactions and Community Ecology

code is very specialized and optimized, so it can only function in a very specific environment
code can evolve under pressure to take less specialized inputs
- the transition of functions to become libraries
- ex. GTK: the Gimp Tool Kit that now runs graphical things for all of linux
actually being ported seems to be the only way to create portable code (it appears that efforts to write generally in the first place never work?)
- ex. linux started out on the 386 processor, and was heavily optimized for it, but over time it has been ported different places and has evolved in this way
strong parallels to economic systems

Dan: Protists

unicellular organisms (eukaryotic)
have movement (unlike plants)
don't have nervous systems or limbs (unlike animals)
separate kingdom, kind of a catch-all category
photosynthesizing protists
- ex. algae
amoebas
parasites
- occupy very specific niches, and can be very specific in these domains
- ex. malaria, beaver fever

what happens when we combine biological computation with techological computation
- ex. some robot that moves in relation to information from a slime mould
Cheryl: the trouble with designing these kinds of systems is that we don't understand the biological systems as well as we need to
Anil: we need to rely on trial and error, and stop trying to understand

March 16

Class readings:

Chapter 44: Defences again Disease

In-class notes:

Why do we have systemic reaction to attacks?

the whole biological system is interconnected
some things affect certain receptors, so they primarily affect tissues with lots of those receptors
resource diversion, cross talk, information diversion

Scalps Theory:

the immune system will react to signs of damage (dead cell parts) not just actual direct evidence of damage

Self/Non-Self Distinctions:

differentiating between biological entities that are part of the body, and those that aren't
there are inherent difficulties in doing this
- for example, bacteria in the gut

How does the body figure out which attacks it needs to react to?

combinatorial pattern matching
delete the patterns that react to the self
- how to avoid reacting to the self?
  - clean up as best as you can
  - but typically some self-reactive receptors slip by

So, why doesn't the immune system kill your body?

the system doesn't activate 'just' because it sees a strange molecule
needs secondary signals:
- cell death (messy!)
- cell help signals
memory cells do react right away

Autoimmune disease:

what kind of conditions should lead to more autoimmunity?
- cold climates, without much disease, or input to the body
autoimmune disorders are related to the signal-to-noise ratio
- the body gets confused if there isn't a strong difference to react to
allergies
- problems with environmental triggers

T-cells have a kind of habituation process

they get habituated to being right
as they get turned on correctly more, they need fewer secondary signals
the adaptive immune system sits on top of the innate immune system and normal cell maintenance

Computer security:

probabilistic security is looked down on (an attitude that probably comes from crypto
but there should be possibilities there

March 21

Class readings:

Brainstorm metaphors linking biology to computer security Goal is to see both the matches and the failures in the metaphor

In-class notes:

Two (rough) ideas:

1) Ecology of security:

What are the populations in security?
- attackers?
- "good" guys?
how do the populations interact?
how do populations reproduce?
time to live, population grow of malware (or "good" security products?)
- interactions between the two
- pressures on the populations
Problems:
- computer security lacks the necessary diversity

2) Cell wall/membranes == authentication

biology has physical boundaries
- can be disassembled, changed, and still be part of the system
- membrane is a rule-based system build out of many many many rules
  - leads to emergent behaviour
computers have partitions based on rules
- virtual machines
- authentication
Problems:
- authentication isn't based on simple minimal rules
  - what better authentication could there be?
- why are computers so easily fooled?
  - emotional input?
We need a richer model: to get richness out, you need to put richness in
- What more information can we give computers to process?
- implicit authentication (bio) gives sophisticated emergent behaviour, adapted via evolution

March 23

In-class notes:

Design exercise: A biological approach to solving phishing

Ex. Banking scenario, attacker wants credentials and is using a phishing email to get them

Assume: the user will miss all of the cues, and the only thing between the user and fraud is the system

What will be weird?

illegitimate email
link to fake bank site
credentials entered in the wrong domain
bad/missing/suspect certificate
- certificate/credential combo is suspect
misappropriated language, email images

Ultimately, this is a trust decision.

Human algorithm (architecture for a potential sol'n):

is the domain the same for the one where we normally send credentials?
usually, behaviour doesn't start in response to email
certificate is the same

To mimic a biological system, we want a continuum of trust (factoring all these pieces of information in)

in practice, no attack will mimic all the pieces of information perfectly
we have to create a complex network that will habituate to past behaviour patterns

biology works around architecture instead of building it
biological approach gives partial solutions, likes to reuse solutions
aggregate behaviour: the system should respond to many little cues

March 28

In-class notes:

More of the Phishing Design Exercise

Dan posted a long list of sensors on the wiki, and the question becomes: For each sensor, is it strong enough to survive on its own?

1) image filename/content sensor

content similarity measure
look for images from other websites that are similar
caching
notice arbitrary similarities between downloaded content

2) Cascading style sheet sensor

visual themes
- fonts, layouts, colours
could be used to detect mirrored sites

3) Content depth sensor

how many links can you follow before being redirected off to another url?
general utility to protect against dodgy websites

4) Semantic text analysis

how could this be of utility to the system apart from security
trigger for secondary authentication procedure

5) Domain/ip address sensor

correlate ip addresses with what kind of data they send
protect against masqueraders

Information immune system: protect against the flood of information being thrust upon us

"little brother"-type system
system gets smart about the information you see
but what to do with this information?
- browsing suggestions?
- persuasive design?

More general case from phishing: how do we help people browse the web more safely?

email analysis (similar to website analysis)
- secretary-like system
deflect the wrong information, sort and process information for users
if it went well, it would be great, but if it went badly, it would be 'awful'
currently, we have many small pieces of the system, but they don't integrate well
- anti-virus
- priority inbox, etc.
interaction styles
- affective design
- context appropriate design
- trust
maybe what we need is more like a dog than a secretary?
- eventually, more intelligent behaviour would develop, but it would be more of an attitude sniffer
- affected by things like time, and behaviour over time
- wouldn't necessarily have to be used: if you knew you were searching dodgy websites, you could ignore the cues

March 30

Last class, wrap-up discussion.

Questions from Anil:

what did we get out of the course?
what did we think of the approach to the course?
thoughts on the wiki?
What do we want out of biological systems in general when thinking of computer security?

Applications to computer security

Most of Unit 2 (Chapters 4 to 8) was about the internal workings of cells, how they create energy, and how they communicate and work together. As we have discussed the applications of this kind of biology to computer security, we have been focussing on how to create computer security systems that evolve, so that they can deal with threats in changing ways. One idea we discussed in class is that security needs a predation model to drive its evolution. The predator and prey would exert pressure on each other so that neither was allowed to overrun the system.

If we were going to develop a metaphor based on cellular structure and interaction, it seems like the energy source is a key concept. But what would be the ATP of a computer system? If security and non-security were competing, what would they be competing for? What sustains security? (The internet? information?)

We usually view computer security as a sort of moral dilemma - a fight between good and evil where "good" means keeping systems running, without loss of data, and with access control, and evil refers to attacks that want to compromise information, and incapacitate systems. In this construction, it is clear who should win the fight: good should prevail over evil (as in all the best tales). If we reframe this task as an evolutionary struggle, the notion of right and wrong is dropped from the picture, and the question becomes one of survival and selection. However, do we think that this will necessarily lead to a good outcome for users of the systems we want to keep secure? The term "secure" seems to imply a certain perspective, or goal. In terms of computer security, does the idea of a predation model imply that some users will be put on the chopping block to help the security of the rest? How could a predation model be set up so that the "right" features were selected for?

In the chapter about cell communication, we discussed the differences between cellular communication and the communication that takes place in computer programs. In cellular communication, the process seems to be top down: all links are established, then some are pared away. In computer programs, the process is bottom up: links are established on an as-needed basis. My first thought is that having more links could be a security problem - if you want information to stay where it's put, not having many links seems to make sense. However, I can see that having a system with more links could allow for more feedback and could potentially better support an evolutionary system.

Potential Metaphor ideas

(Please note: is rough and sketchy)

Authentication :: Cell Membrane and Cell Transport

Authentication's purpose is to show that someone is who they say they are. In computer systems, authentication is usually based on a shared secret, but it can also be based on some feature of the user or something they possess. It is sort of a very primitive pattern-matching system: if you can't match the username to the password (or the fingerprint, etc.), you don't get access.

The mechanism that cells use to distinguish the inside of cells from the outside of cells is the plasma membrane: a lipid bilayer that separates the aqueous cell from the environment. This layer contains a number of specialized proteins that are used to facilitate transport into and out of the cell, send signals, act as enzymes, and recognize other proteins.

There are certainly obvious structural differences between the cell membrane and limited-access computer systems. Cells have a clear physical structure: the nature of the division between "inside" and "outside" is much more fixed than in a computer system, where inside and outside are mostly distinguished by the rules that surround access. In addition, the barrier between the cell and the environment is structured in a fluid model, where the lipids and proteins in the membrane move around within the confines of the membrane structure. It is hard to find an analogue to this structure in a computer security system, where the structure is typically brittle and non-physical.

We could map the proteins in the cell membrane to the username and password combinations that most systems use to grant access. Similarly to cell membranes, computer systems typically have many different specialized channels to admit different users (who may then get access to different parts of the system, or different sets of permissions, etc.) Both systems rely on a kind of pattern-matching: computer systems require the correct username/password to match to known entries in their access lists, and proteins depend on the physical shapes to recognize different molecules.

One major mismatch in this metaphor is that cellular membranes and the proteins contained therein perform much more complex tasks than authentication systems. The proteins in cell membranes provide different kinds of transport, have multiple functions (only one of which, recognition, is really analogous to authentication). However, perhaps we can interpret from the metaphor some functions that authentication systems 'should' have?

The Ecology of Computer Security

Ecology is the science of studying different populations and their abundance, distributions, interactions, and environment. Ecology commonly looks at the populations of different organisms and attempts to understand the pressures that influence them. It seems possible to assume that there might be some value is using some of the same techniques to study the security community.

Ecology studies a number of factors that influence populations. These include:

geographic range
habitat
population size
population density
dispersion
age structure
generation time

It seems that we could examine similar population characteristics to study different viruses or malware, and also patches or antivirus programs. Not all of the population characteristics listed above have clear metaphoric matches, but some of them translate directly. In place of physical boundaries, we could look at the kinds of operating systems or websites targeted by different attacks and defences. Looking at generation time and age structure might help us understand how fast we need to respond to certain attacks.

It would also be interesting to look at population interactions: how do different pieces of malware affect each other? How do different patches affect malware and how do they interact among themselves? Are there environmental factors that affect uptake and spread of different attacks and defences? (What constitutes the environment? the user, the system specifications?)

What would be the benefit of this kind of study? To some extent, this is in place, but it seems like a larger overview could be useful. An approach like this could factor in end users, and could lead to a better understanding of what type of attacks are most successful, and which fixes are most widely adopted, and why. Having a better awareness of trends in both attacks and defences might allow better strategic decision-making.