BioSec 2012: Elizabeth: Difference between revisions
| Line 424: | Line 424: | ||
| ====In-class notes:==== | ====In-class notes:==== | ||
| This material made so much more sense when I wasn't running a fever! | |||
| Lac Operon | |||
| * logic for implementing lactose use | |||
| * uses biological logic, not boolean logic | |||
| * sort of a genetic circuit | |||
| * i/o | |||
| ** inputs: glucose, lactose | |||
| * what does it mean for the system to "sense" glucose and lactose? | |||
| ** it means there is a certain concentration | |||
| ** is a stochastic process | |||
| * the repressor protein binds to the glucose (actually, a byproduct, allolactose), but as the level of glucose drops, it turns off | |||
| * this process is really about how many copies are produced  | |||
| ** it is a self-regulatory process, as opposed to an external control structure (as in most computer systems) | |||
| [http://en.wikipedia.org/wiki/Sigmoid_function Sigmoid functions] | |||
| * are continuous, and differentiable, but effectively divide things to one side or the other | |||
| * these are similar to how transistors work, and the actual physical processes going on in computers, but aren't how we conceptualize these processes | |||
| * biological systems sometimes work like this, but there are other systems that have more gradual curves, or more linear change | |||
| * some systems will have fixed boolean logic, but others will be more linear | |||
| Do cells have memory? | |||
| * yes | |||
| ** methylation tags on DNA | |||
| ** markers on proteins | |||
| *** which proteins are in the cells | |||
| * inputs always change living systems, and this is the way they maintain state | |||
| * homeostasis | |||
| ** a forgetting (resetting?) mechanism | |||
| *** however, not everything is controlled this way | |||
| *** an active forgetting process (a wipe clean, sort of) | |||
| ** what gets remembered?  | |||
| *** changes to DNA (from radiation), etc. | |||
| * cell habituation | |||
| ** give a cell the same treatment over time, and at first, the cell will react strongly, and then less so over time | |||
| ** an emergent property | |||
| In biological systems, state and logic are tied up together. So why can't computers habituate? | |||
| * programmers never account for situations they haven't thought of | |||
| * programs avoid path dependence (loops don't change each time you go through them in relation to what came before) | |||
| Microprocessors: | |||
| * run code processors in parallel, then abandon branches that don't work out | |||
| ** simulate serialism  | |||
| ** branch prediction | |||
| ** can affect the way new code is written to run on them | |||
| * in essence, processors 'habituate' to code | |||
| ** the more you run the same code, the faster it goes | |||
| ** however, it optimizes speed over functionality, making it kind of "stupid" habituation | |||
| What does it mean to program biological systems?  | |||
| * how can we make conditional statements that accommodate all possible events? | |||
| * where do the emergent properties come from? | |||
| * it seems like there should be a "better"/"new" way of factoring code to allow emergent behaviour to develop, and state memory (normally, we avoid state dependence) | |||
| ** we need to give up control (in terms of the history of the system) | |||
| ** history: how you do a computation should depend on how it was done in the past | |||
| ** self-repression in cells - error handlers should self-perfect | |||
| ** the system doesn't habituate, but the user does - what are the implications of this? does the user constitute part of the system in this situation? | |||
| * unlike in computational systems, 'how' things happens matters in biological systems | |||
| ** means and ends matter | |||
| ===March 9=== | ===March 9=== | ||
Revision as of 18:03, 14 March 2012
Elizabeth's BioSec Notes
(Organized by class dates) Brain dumps, class notes, useful insights, points of confusion, it's all here.
Jan 25
Class readings:
Chapter 2: Origins of Life
Chapter 3: Selection, Biodiversity, and Biosphere
In-class notes:
- Chemistry Review
- energy difference between reactants and products in a chemical reaction
- however, you need an input of energy to begin the reaction
 
- a catalyst changes (lowers the energy needed to reach the intermediate state, making the reaction more likely to take place
- the catalyst is unchanged in the process
 
- Biological catalysts are enzymes (proteins) that hold the reactants and situate them in such a way that the reaction can happen more easily
- enzymes move the reactants around
- enzymes can have crystalline structure
 
- cell logic is built on pattern-matching
- enzyme is looking for the reactants that fit its receptors
 
- ATP: Adenosine Triphosphate
- energy carrier/source for cells
- universal resource, used by all cells
 
- ADP: Adenosine Diphosphate
- similar to ATP, but has one fewer phosphate group
- has lower energy than ATP
- the cell expends energy to turn it into ATP
- then the cell breaks up the ATP to use the stored energy
 
- Eukaryotic cells vs. Prokaryotic cells
- in eukaryotic cells, the genetic sequence isn't simply copied from DNA to RNA. Instead, parts of different sequences are picked and chosen and edited into proteins.
- this means that a lot of the information in the DNA is there to control and regulate how parts are edited and assembled.
 
- Because of how evolution works (building on what already worked), understanding how a system works is equivalent to understanding its history, and why it is the way it is.
- however, it can be hard to know where stuff came from, and what came first
 
Jan 27
Class readings:
Chapter 4: Energy and Enzymes
Chapter 5: Membranes and Transport
In-class notes:
- ATP provides the energy to shake things up, get things moving so that reactions can go
- Fluid Membrane Model: active transport
- steric: does it fit through the transport channel?
- charge: does it have the right charge?
- selectively open: channels can be open or closed
 
- Chapter 6 (Cellular respiration) preview
- respiration = the process of getting oxygen into the cell
- glycolysis: ancient process to create ATP, doesn't involve oxygen
- Figure 6.5 - important diagram
- glucose oxidation happens in steps so that more energy (ATP) can be harnessed and not lost as heat (wasted energy)
 
- there isn't anything particularly special about the molecules used in cellular respiration (glucose), but it is noteworthy that all eukaryotic cells use the same molecules and seem to be evolutionarily related
- look at the process/architecture of the Calvin and Krebs cycles
Feb 1
Class readings:
Chapter 6: Cellular Respiration
Chapter 7: Photosynthesis
Pre-class notes:
Both chapters address how cells make ATP and other byproducts.
- not a lot of discussion about how the two processes fit together (I mean, photosynthesis is the more important because it creates the glucose for cellular respiration to use?)
- much of the in-depth chemistry was confusing
- In Ch 7, I didn't fully understand the last section about photorespiration and how plants avoid it
- what is the problem, really?
- I understand that the C4 cycle resolves it
 
Possible application-y thoughts
- both photosynthesis and cellular respiration involve a lot of cyclical processes (like loops, I suppose) that transform one product into another
- the cellular structure model seems like it could be applied to computers (and is similar to what exists), but maybe the metaphor could be extended to be larger?
- what would ATP map to in the computer world? Information output?
- It seems that the processes are finely tuned so that most of the by-products (except energy lost in heat) get used - is there a moral in that story?
In-class notes:
- Photorespiration
- Carbon dioxide (CO2) is split up to get oxygen (O2) and carbon (C). The carbon is used to make glucose, and the oxygen is toxic to an enzyme
- is an example of a way in which the process isn't completely specific/specialized, and has included limitations, even through selection
- evolution isn't perfect (and has limitations), and sometimes, these get papered over and the cell goes on living with them
- in this case, the C4 cycle has developed to handle the limitation
 
- Linear Electron Chain
- begins with photon from the sun (energy)
- the chain of molecules that take the energy from the sun and pass it along
- each stage uses what it can of the energy and passes the rest along
- most of the energy ends up going to proton pumps with create ATP
 
 
- NADP is an electron carrier molecule
Q: How would this kind of system evolve? What kind of pressures must have existed?
- Photosynthesis is a pretty efficient process
- however, it is less efficient than cellular respiration
- and considerably less efficient than any process designed by humans (such as the internal combustion engine)
 
- Photosynthesis and cellular respiration aren't divorced processes
- the plant uses its glucose (from photosynthesis) in the mitochondria, to create more ATP (when the sun isn't shining)
- but animals get their glucose from what they eat, which is then used by the mitochondria
- plants effectively store energy in glucose (ex. maple tree sap)
 
- Plants are net producers of oxygen, and net consumers of carbon dioxide
- Humans are net producers of carbon dioxide, and net consumers of oxygen
- Earth's atmosphere is 78% nitrogen, 21% oxygen, and only 0.03% carbon dioxide
 
Computer security
- plants provide energy to almost everyone else, but why? why haven't they evolved to protect themselves from animals?
- animals prevent the plants from consuming everything
- by themselves, the plants are unsustainable
- they need someone to eat them
 
- so, might computer security be in need of a predator?
- do we need to find some constant pressure to keep security on the path?
 
- How can we work out a system where the pressures create better, stronger systems?
- i.e. one where evolution will take place
- predation addresses material imbalances
- what are the inputs and outputs of computer security?
- on the internet, there seems to be a lot of information, but the challenge is in parsing it into wisdom
 
 
Feb 3
Class readings:
Chapter 8: Cell Communication
In-class notes:
Hormone: messenger molecule
- creates localized state change
- kind of an interface to the cell
- hormones mediate reactions
Crosstalk:
- different hormones interfere with each other
- a given receptor can be activated by different molecules
- a molecule can activate different receptors
- the network begins as a fully connected graph, and then connections are pruned away
- crosstalk is why drugs have complicated and unpredictable side effects
We could consider the "drug discovery problem" to be equivalent to the "computer security problem".
- Engineering challenge
- every input is connected to every output
- through trial and error, select for the pathways that work
 
- moral of the story: there needs to be more coupling than we think in computer
- we need to allow for feedback loops, running parallel to the main operations
 
- Metabolic diseases are really receptor diseases
- the question is "what receptor does it target?"
- this is why viruses only affect certain tissues: the tissues where the receptors are located are affected
 
- some diseases (such as avian flu) can be caught by humans from animals, but not spread between humans
Feb 8
Class readings:
None, discussion of the wiki, plans for moving forward
In-class notes:
DNA: Deoxyribonucleic acid
- two strands, twisted around each other into a double helix
- form is very stable, sort of like a zipper
 
- C-G, A-T pairs of nucleotides
- different ends on each strand (3' and 5')
- the chain forms redundant representations
- the duplication and the structure are in place in order to protect the information
- the duplication also provides a built-in way of replicating the information
 
DNA - polymerase
- attaches to DNA, ratchets itself along the nucleotides
- proceeds from 3' to 5', only in one direction
 
- copies the DNA as it goes
- does error checking, but the duplication is still a moment of vulnerability for DNA corruption
DNA structure
- the structure of DNA is rigid, and takes up a lot of space
- to conserve room, DNA is wrapped around histones to form chromosomes
- ball of string metaphor
 
- when packaged like this, it is effectively "off" and can't be used
- so different cells unpackage and use different parts of the DNA
- chromosomes are dynamic structures, only created in duplication processes
 
Hydrogen bonds
- more like magnets than glue
- when you pull them apart, the bonds disappear, but you can then put them back together
Technology
- technology is fundamentally destructive
- what assumptions do you have to make about putting it back together
- we assume we know how many pieces there are
- we assume the pieces are unique
- errors in recombining cause diseases (structural problems)
 
Telomeres
- tails on chromosomes that tell how many times a cell can reproduce
- linked to aging and premature aging diseases
What are the skeletons in computer science's closet?
- emergence
- emergent behaviour == bugs
- the products of interactions that you didn't know about
 
 
- emergent behaviour == bugs
- computability (?)
- is computer science really a science?
- it doesn't seem to have the big questions of methodology that would make it a science
- basically, is math + engineering
- is there a larger discipline of which biology and computer science are subfields?
 
Feb 10
Class readings:
Chapter 9: Cell Cycles
In-class notes:
Missed class for me
Feb 15
Class readings:
Chapter 10: Genetic Recombination
In-class notes:
evolution = variability + selection
Bacterial reproduction looks like an API
- but how do you upgrade a trillion machines that use it?
- meiosis binds reproduction and variability
- conjugation is a similar process to what happens in software
- plugins
- an updating process
 
- meiosis is so complicated - so why do we have it?
- eukaryotic organisms have so many copies of their genomes that updates make no sense
- reinstallation is a better option
- begin with an initial cell, and rebuild the network
- the only upgrade path is mix & match
- the machinery of life is set up to make these upgrades go smoothly
- the reason for limited lifespans
- the environment will kill you eventually, so it is better to reproduce and die on schedule
 
 
- sexual reproduction is a gradual iterative process
- big changes over many generations
- wait, how do we get new species from this?
- genome is code for full working organism
 
- so when you get to a certain level of complexity, you have to move to a sexual model of reproduction
- ubuntu cycles
 
Why does variation work?
- because of the environment
Conjugation
- bacteria have a single ring-shaped chromosome
- sometimes also have plasmids, which are other chromosomes
 
- we happen to know about the process of conjugation because there is something to observe
PCR: Polymerase Chain Reaction
- in a test tube, you can have DNA polymerase work on its own
- this is the process used in sequencing DNA
Transposons
- jumping DNA
- protein folding
- DNA moving around
- biology is more about subtraction than addition
- start with everything and prune
 
Archaea
- extremophiles
- how do they manage reproduction in weird (and unfriendly) environments?
- probably, we (and all other organisms) are the descendants of archaea
Feb 17
Class readings:
Chapter 11: Mendel, Genes, and Inheritance
In-class notes:
Genetics
- genetics as code
- why did Mendel choose peas?
- why did he choose the traits to examine?
- chose discrete, binary characteristics
- clean patterns
 
- there is some potential for deception if you choose your sample this way
- however, it probably lets you draw some clearer conclusions from initial work and see some patterns you otherwise might not
 
 
- why did he choose the traits to examine?
- this means the world is divided into Mendelian genetics and non-Mendelian genetics, but they aren't an even split
- non-Mendelian genetics is a much bigger than Mendelian genetics, because it is essentially everything else
 
- Anil: the notion of a gene is a useless concept: genes are just code
Can we see Mendelian-type patterns in code reuse?
- Mohammed: if it works, the probability of inheritance is 1
- diploid: 2 copies of everything, backups of everything
- men: XY
- the one X gene is used everywhere, so if there are problems on it, they show up
 
- women: XX
- one X is used some places, the other in some places
- division is random
 
Hybrid vigor
- inbreeding leads to weakened traits, as more copies of code are likely to have the same defects
- selective breeding is trying to make both copies of the gene the same (so that breeding is true)
Annie: We need version control! Luc: How can we get a better understanding of what is going on in genes / chromosomes?
diploidy = having a backup copy polyploidy = having multiples
Do we want diploidy in our programs?
- voting-based systems employ diploidy or polyploidy
- multiple systems generate the same thing, and vote on the answers
- n-version programming
 
- unfortunately, humans tend to think in the same way, and generate the same erros
- so, human-generated diversity tends to be a bust (
- Elizabeth: I think this is SO COOL, how cool are people?
 
Making the program work != debugging
- the idea of debugging implies that we're aiming for perfection
- redundancy will protect against bugs
- think of engineering a bridge: there is no way to ensure that all bugs are out of it, so there are tolerances designed in to account for those bugs
- there is no expectation of perfection
 
Defensive programming
- does checks to make sure that data (etc.) is in the proper format
In code, as in biology
- there is lots of info that we don't understand the reasons for
- the interactions aren't understandable
- so throwing away code that works is a bad idea
- instead, evolve it, change it, refactor it
 
Linux kernel
- 5 million lines of code (huge!)
- has evolved over time to be cleaner, clearer
- it's because it has been worked on
- people have tried to fix the small things
 
Problems with legacy code
- the culture is lost
- dead code, the paradigm is gone away
- code is part of a human system, but when you take away the humans, the code ceases to function
- Luc: code depends on its humans, the people who use it, keep it alive, help it to evolve
- code as part of an ecosystem of humans
Are there lessons here about why open-source code is long-term viable?
- people stick with the process, keep the code alive
Games
- live (while being developed), then die after being released because of lack of engagement
- post-release (and subsequent enthusiasm) there is a lack of community involvement
- this is similar to biology - things live, and then die (and quickly!)
 
Legacy code is really a statement about the humans behind the code.
Feb 22
Reading week
Feb 24
Reading week
Feb 29
Class readings:
Chapter 12: Genes, Chromosomes, and Human Genetics
In-class notes:
What might genetic recombinations mean in computer programs?
- errors at reproduction
- errors deriving from one parent (or both, in combination)
- different types of error
- deletions, duplications, inversions
 
Genetic mapping using selective breeding
- is an elegant technology, a nice way to determine a map of something that can't be seen
- gives answers to questions about whether traits are linked, or partially linked
- are two traits on the same chromosome?
- if they are on the same chromosome, how close are they?
 
- in a system with homologous chromosomes, gene crossover is a way to increase variability
How did they map the chromosomes?
- a process similar to debugging (with actual bugs!)
- trying to figure out the the root cause of various behaviour
- look at different variants to find which one has the code
- then reverse engineering the code
 
Genetic Inheritance
- Cytoplasmic inheritance
- genetic material passed in mitochondria and chloroplasts
- i.e., from the mother, through the egg cell
 
- Genetic counselling
- looks at the interaction of genes between parents
- analysis of family tree (ancestors, and their genetic traits, if known)
- now there are more tests, but not full genetic sequencing
- genetic counselling only covers diseases we already know about
- this is really the tip of the iceberg
 
- provides some kind of probability of different genetic diseases
- is really the opposite of eugenics: is trying to increase genetic diversity, rather than optimize to one set of genetics
 
March 2
Class readings:
Chapter 15: Control of Gene Expression
In-class notes:
(Turned out to be pretty sick that day, didn't understand very well)
operon: group of enzymes
- turning on/off switches depending on the environment
- however, not a simple on/off switch (partial control)
lac operon
- operon for the consumption of lactose
- first operon to be discovered and understood
- sort of like an if statement, but in reality, the logic isn't boolean
- if lactose: remove repressor
- else: leave repressor
 
March 7
Class readings:
Chapter 15: Control of Gene Expression
In-class notes:
This material made so much more sense when I wasn't running a fever!
Lac Operon
- logic for implementing lactose use
- uses biological logic, not boolean logic
- sort of a genetic circuit
- i/o
- inputs: glucose, lactose
 
- what does it mean for the system to "sense" glucose and lactose?
- it means there is a certain concentration
- is a stochastic process
 
- the repressor protein binds to the glucose (actually, a byproduct, allolactose), but as the level of glucose drops, it turns off
- this process is really about how many copies are produced
- it is a self-regulatory process, as opposed to an external control structure (as in most computer systems)
 
- are continuous, and differentiable, but effectively divide things to one side or the other
- these are similar to how transistors work, and the actual physical processes going on in computers, but aren't how we conceptualize these processes
- biological systems sometimes work like this, but there are other systems that have more gradual curves, or more linear change
- some systems will have fixed boolean logic, but others will be more linear
Do cells have memory?
- yes
- methylation tags on DNA
- markers on proteins
- which proteins are in the cells
 
 
- inputs always change living systems, and this is the way they maintain state
- homeostasis
- a forgetting (resetting?) mechanism
- however, not everything is controlled this way
- an active forgetting process (a wipe clean, sort of)
 
- what gets remembered?
- changes to DNA (from radiation), etc.
 
 
- a forgetting (resetting?) mechanism
- cell habituation
- give a cell the same treatment over time, and at first, the cell will react strongly, and then less so over time
- an emergent property
 
In biological systems, state and logic are tied up together. So why can't computers habituate?
- programmers never account for situations they haven't thought of
- programs avoid path dependence (loops don't change each time you go through them in relation to what came before)
Microprocessors:
- run code processors in parallel, then abandon branches that don't work out
- simulate serialism
- branch prediction
- can affect the way new code is written to run on them
 
- in essence, processors 'habituate' to code
- the more you run the same code, the faster it goes
- however, it optimizes speed over functionality, making it kind of "stupid" habituation
 
What does it mean to program biological systems?
- how can we make conditional statements that accommodate all possible events?
- where do the emergent properties come from?
- it seems like there should be a "better"/"new" way of factoring code to allow emergent behaviour to develop, and state memory (normally, we avoid state dependence)
- we need to give up control (in terms of the history of the system)
- history: how you do a computation should depend on how it was done in the past
- self-repression in cells - error handlers should self-perfect
- the system doesn't habituate, but the user does - what are the implications of this? does the user constitute part of the system in this situation?
 
- unlike in computational systems, 'how' things happens matters in biological systems
- means and ends matter
 
March 9
Class readings:
Individual chapters Chapter 45: Population Ecology
In-class notes:
March 14
Class readings:
Individual chapters Chapter 46: Population Interactions and Community Ecology
In-class notes:
Applications to computer security
Most of Unit 2 (Chapters 4 to 8) was about the internal workings of cells, how they create energy, and how they communicate and work together. As we have discussed the applications of this kind of biology to computer security, we have been focussing on how to create computer security systems that evolve, so that they can deal with threats in changing ways. One idea we discussed in class is that security needs a predation model to drive its evolution. The predator and prey would exert pressure on each other so that neither was allowed to overrun the system.
If we were going to develop a metaphor based on cellular structure and interaction, it seems like the energy source is a key concept. But what would be the ATP of a computer system? If security and non-security were competing, what would they be competing for? What sustains security? (The internet? information?)
We usually view computer security as a sort of moral dilemma - a fight between good and evil where "good" means keeping systems running, without loss of data, and with access control, and evil refers to attacks that want to compromise information, and incapacitate systems. In this construction, it is clear who should win the fight: good should prevail over evil (as in all the best tales). If we reframe this task as an evolutionary struggle, the notion of right and wrong is dropped from the picture, and the question becomes one of survival and selection. However, do we think that this will necessarily lead to a good outcome for users of the systems we want to keep secure? The term "secure" seems to imply a certain perspective, or goal. In terms of computer security, does the idea of a predation model imply that some users will be put on the chopping block to help the security of the rest? How could a predation model be set up so that the "right" features were selected for?
In the chapter about cell communication, we discussed the differences between cellular communication and the communication that takes place in computer programs. In cellular communication, the process seems to be top down: all links are established, then some are pared away. In computer programs, the process is bottom up: links are established on an as-needed basis. My first thought is that having more links could be a security problem - if you want information to stay where it's put, not having many links seems to make sense. However, I can see that having a system with more links could allow for more feedback and could potentially better support an evolutionary system.