Talk:COMP 3000 Essay 1 2010 Question 10

From Soma-notes
Revision as of 23:39, 14 October 2010 by Abujaki (talk | contribs)

Hey all,

I think we should write down our emails here so we can further discuss stuff without having to login here. (***Note that discussions over email can't be counted towards your participation grade!***--Anil)


Geoff Smith (gsmith0413@gmail.com) - gsmith6


Andrew Bujáki (abujaki [at] Connect or Live.ca)

      • I'm usually on MSN(Live) for collaboration at nights, Just make sure to put in a little message about who you are when you're adding me. :)


I used Google Scholar and came to this page http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=812717&tag=1# Which briefly touches on the issues of Flash memory. Specifically, inability to update in place, and limited write/erase cycles.

Inability to update in place could refer to the way the flash disk is programmed, instead of bit-by-bit, it is programmed block-by-block. A block would have to be erased and completely reprogrammed in order to flip one bit after it's been set. http://en.wikipedia.org/wiki/Flash_memory#Block_erasure

Limited write/erase: Flash memory typically has a short lifespan if it's being used a lot. Writing and erasing the memory (Changing, updating, etc) Will wear it out. Flash memory has a finite amount of writes, (varying on manufacturer, models, etc), and once they've been used up, you'll get bad sectors, corrupt data, and generally be SOL. http://en.wikipedia.org/wiki/Flash_memory#Memory_wear


Filesystems would have to be changed to play nicely with these constraints, where it must use blocks efficiently and nicely, and minimize writing/erasing as much as possible.


I found a paper that talks about the performance, capabilities and limitations of NAND flash storage.

Abstract: "This presentation provides an in-depth examination of the fundamental theoretical performance, capabilities, and limitations of NAND Flash-based Solid State Storage (SSS). The tutorial will explore the raw performance capabilities of NAND Flash, and limitations to performance imposed by mitigation of reliability issues, interfaces, protocols, and technology types. Best practices for system integration of SSS will be discussed. Performance achievements will be reviewed for various products and applications. "

Link: http://www.flashmemorysummit.com/English/Collaterals/Proceedings/2009/20090812_T1B_Smith.pdf

There's no Starting place like Wikipedia, even if you shouldn't source it.

http://en.wikipedia.org/wiki/Flash_Memory

http://en.wikipedia.org/wiki/LogFS

http://en.wikipedia.org/wiki/Hard_disk

http://en.wikipedia.org/wiki/Wear_leveling

http://en.wikipedia.org/wiki/Hot_spot_%28computer_science%29

http://en.wikipedia.org/wiki/Solid-state_drive

Hey Guys,

We really don't have much time to get this done. Lets meet tomorrow after class and get our bearings to do this properly.

Fedor


A few of us have Networking immediately after class. I know personally I won't be able to make anything set on Tuesday. Additionally, he spoke briefly about hotspots on the disk for our question last week, where places on the disk would be written to far more often than others. As well, for bibliographical citing, http://bibme.org is a wonderful resource for the popular formats (I.e. MLA). If it should come down to that. ~Andrew


links

Start Posting some stuff to source from:

http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199079&tag=1 --"Introduction to flash memory"

http://portal.acm.org/citation.cfm?id=1244248 --"Wear Leveling" (it's about a proposed way of doing it, but explains a whole bunch of other things to do that)

http://portal.acm.org/citation.cfm?id=1731355 --"Online maintenance of very large random samples on flash storage" (ie dealing with the constraints of Flash Storage in a system that might actually be written to 100000 times)

http://vlsi.kaist.ac.kr/paper_list/2006_TC_CFFS.pdf --"An Efficient NAND Flash File System for Flash Memory Storage" discuses shortcomings of using hard disk based file systems and current flash based file systems

http://maltiel-consulting.com/NAND_vs_NOR_Flash_Memory_Technology_Overview_Read_Write_Erase_speed_for_SLC_MLC_semiconductor_consulting_expert.pdf --"NAND vs NOR Flash Memory" (note: i didn't get this off of Google scholar but it seems to be written by someone from Toshiba. is that ok?)

Hi everybody,

So here are the latest news. Geoff, Andrew and myself had a meeting after class today and came up with a plan for writing this thing.

We decided to have 3 parts:

1. What flash storage is, why its good but also why it must have the problems that it does (the assumption is that it must have them, why would it otherwise?) [don't know much about this just now... basics include that there is NOR (reads slightly faster)and NAND (holds more, writes faster, erases much faster, lasts about ten times longer) flash with NAND being especially popular for storage (what's NOR good for?). Here, we'd ideally want to talk about why flash was invented (supposed as an alternative to slow ROM), why it was suitable for that, and how it works on a technical level. Then, we'd want to mention why this technical functionality was innovative and useful but also why it came with two serious set-backs: having a limited-number of re-write cycles and needing to erase a block at a time.]

Either way, Flash storage affords far faster fetch times than the traditional platter-based HDD, and stability of information in a sense. Where the data is not actually stored, but reprogrammed, in a sense, the data is more secure and is less likely to be erased easily. On that note, in order to flip a single bit, that entire block will need to be erased, then reprogrammed. In an 'old' HDD, let's say, When the HDD fails at the end of its life cycle, your data is gone. (unless you're willing to shell out $200/hr to have it recovered, yes I've seen companies in Ottawa that do this.) In a flash HDD, when it reaches the end of its life, it merely becomes read-only. Bugger for Databases, but useful for technical notes and archives, let's say. With today's modern gaming computers, Flash memory can be good on quick load times, however with limited read-writes, it could afford better use to things that are not updated as frequently. I.e... Well I don't have a better example than a webserver hosting a company's CSS and scripts. ~Source: Years in the 'biz

Flash memory started out as a replacement for EPROMs. At the time EPROMs needed a UV photoemission to be erased while flash memory could be erased electronically. The first flash memory product came out in 1988 but it did not take off until the late 1990’s because it could not be reliable produced. NOR and NAND memory is named after the arrangement of the cells in the memory array. NOR based flash memory benefits from having very fast burst read times but slower write times. Due to the structure of NOR memory programs stored in NOR based memory can be executed without being loaded into RAM first. NAND flash memory has a very large storage capacity and can read and write large files relatively fast. NAND is more suited for storage while NOR memory is better suited for direct program execution such as in CMOS chips. source: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1199079&tag=1 , http://maltiel-consulting.com/NAND_vs_NOR_Flash_Memory_Technology_Overview_Read_Write_Erase_speed_for_SLC_MLC_semiconductor_consulting_expert.pdf

2. How a traditional disk-based file-system works and why the limitations of flash storage make the two a poor match [the obvious answer seems to be that traditional file-systems could just write to whatever memory was available but if they did this with a flash file-systems, certain chunks of memory would become unusable before others and the memory would be more difficult to work with. Also, disk-based file systems need to deal with seeking times which means that they want to organize their data in such a way as to reduce those (by putting related things together?) - with Flash, this isn't really a problem and thus one constraint the less to be concerned with.]

3. How a log based file-system works and why this method of operation is so well suited to working with flash memory especially in light of the latter's inherent limitations [...]

At this time, the plan is that Geoff will work on #3 today, Andrew will work on #1 tomorrow and I will work on #2 tomorrow. The three of us will make an effort to consult some somewhat more painfully technical literature in order to gain insight into our respective queries. Whatever insight we find will be posted here.

Then, we will meet again on Thursday after class to decide how to actually write the essay.

PS, if there is anybody in the group besides the three of us - let us know so you can find a way to contribute to this... as at least two of us are competent essayists, painfully technical research would on one or more of the above topics would be a great way to contribute... especially if you could post it here prior to one of us going over the same thing.

Fedor

-- I'm not that great (but absolutely horrid) at essays and I'm alright at research, but if nothing else I have Thursday off and nothing (else) that needs doing by Friday so I can probably spend a bunch of time working on it just before it's due. -- Nick L

-- Hay sorry I was unable to attend the meeting after class today. I am not too good at writing essays as well but I am pretty good at summarizing and researching. I am not too sure at what you would like me to do. Right now I'll assume you need me to research/summarizing articles for the 3 topics above. If you need me to do anything else post it here. I'll be checking the discussion regularly until this due. once again sorry for missing the meeting-- Paul Cox.

-- Hey i'm also supposed to be in on this. Sorry i couldn't contribute sooner because i was playing catchup in my other classes. Let me know what i can do and i'll be on it asap. - kirill (k.kashigin@gmail.com) update: i'm gonna be helping Fedor with #2

PS, this article http://docs.google.com/viewer?a=v&q=cache:E7-H_pv_18wJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.92.2279%26rep%3Drep1%26type%3Dpdf+flash+memory+and+disk-based+file+systems&hl=en&gl=ca&pid=bl&srcid=ADGEESgspy-jqIdLOpaLYlPPoM56kjLPwXcL3_eMbTTBRkI7PG0jQKl9vIieTAYHubPu0EdQ0V4ccaf_p0S_SnqKMirSIM0Qoq5E0NpLd0M7LAGaE51wkD0F55cRSkX8dnTqx_9Yx2E7&sig=AHIEtbS-yfGI9Y48DJ0WyEEhmsXInelRGw looks really useful for part 3.

---same article as above but shorter link: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.160.5142

PPS, and this article looks really great for understanding how log based file systems work: http://delivery.acm.org/10.1145/150000/146943/p26-rosenblum.pdf?key1=146943&key2=3656986821&coll=GUIDE&dl=GUIDE&CFID=108397378&CFTOKEN=72657973



Hey Luc (TA) here, Anandtech ran a series of articles on solid state drives that you guys might find useful. It mostly looked at hardware aspects but it gives some interesting insights on how to modify file systems to better support flash memory.

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403

http://www.anandtech.com/storage/showdoc.aspx?i=3531&p=1

http://anandtech.com/storage/showdoc.aspx?i=3631



--3maisons 19:44, 12 October 2010 (UTC)

Hey Paul&Kirill,

If one of you guys could help me out with #2, that would be really great. I was going to work on that tomorrow, but I also have another large assignment to deal with and not having to do this research would greatly ease my life. Moreover, I do intend to work on writing&polishing the essay on Thursday as I have a lot of experience with that and it far more than research. Let me know if either one of you can help me with this.

The other person could probably read over what Luc posted for us and see if it fits into our framework. Just be sure to state who is going to do what.

Nick,

Honestly, we really hope to have the research done by Thursday. If that is the only day that you are free and you're not a writer, I'm honestly not sure what you could do. Perhaps someone else can think of something.

- Fedor


I'm gonna have something for #2 up tonight. -kirill

So I found this article on Reddit, posted from Linux Weekly News on pretty much exactly what we are looking at. It's entitled "Solid-state storage devices and the block layer"

http://lwn.net/SubscriberLink/408428/68fa8465da45967a/ --Gsmith6 20:36, 13 October 2010 (UTC)

I wasn't exactly sure how much information i was supposed to present but here's what i got for #2:

Most conventional file systems are designed to me implemented on hard disk drives. This fact does not mean they cannot be implemented on a solid state drive (file storage that uses flash memory instead of magnetic discs). It would however, in many ways, defeat the purpose of using flash memory. The most consuming process for an HDD is seeking data by relocating the read-head and spinning the magnetic disk. A traditional file system optimizes the way it stores data by placing related blocks close-by on the disk to minimize mechanical movement within the HDD. One of the great advantages of flash memory, which accounts for its fast read speed, is that there is no need to seek data physically so there is no need to waste resources laying out the data in close proximity. A traditional HDD file system will also attempt to defragment itself, moving blocks of data around for closer proximity on the magnetic disk. This process, although beneficial for HDD's, is harmful and inefficient for flash based storage. A flash optimal file system needs to reduce the amount of erase operations, since flash memory only has a limited amount of erase cycles as well as having very slow erase speeds. When an HDD rewrites data to a physical location there is no need for it to erase the previously occupying data first, so a traditional disk based file system doesn't worry about erasing data from unused memory blocks. In contrast flash memory needs to first erase the data block before it can modify any of it contents. Since the erase procedure is extremely slow, its not practical to overwrite old data every time. It is also decremental to the life span of flash memory. To maximize the potential of flash based memory the file system would have to write new data to empty memory blocks. This method would also call for some sort of garbage collection to erase unused blocks when the system is idle, which does not get implemented in conventional file systems since it is not needed.

--kirill


So Fedor and I were talking in the labs, and we came to the conclusion that we have been focusing on just the translation from a regular file system to a flash drive. We were under the impression that this was in fact the "Flash Optimized System", but pulling up some more articles, I'm finding that this is not necessarily the case.

This paper here http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.128.6156&rep=rep1&type=pdf. shows an example from Axis Communications where they developed a file system specifically designed to be used on flash drives.

Now I haven't completely read it, so it might just be an optimized translational system, but its at least a start.

At our meeting today, we've decided that it would be best if people could post a rough summary of their notes in the appropriate sections, and I will rewrite them into an essay, which Fedor will go through later tonight to edit and add some more information.

Paul: I missed your comment that you weren't that great at writing, and good at research. If you want some articles behind the pay-walls, I've saved a bunch of them and emailed them to myself. Just email me (at the address at the top of the page) and I'll be more than happy to send some your way.

PS. some more references

Design tradeoffs for SSD performance http://portal.acm.org/citation.cfm?id=1404014.1404019 A log buffer-based flash translation layer using fully-associative sector translation http://delivery.acm.org/10.1145/1280000/1275990/a18-lee.pdf?key1=1275990&key2=0709607821&coll=GUIDE&dl=GUIDE&CFID=105787273&CFTOKEN=74601780

--Gsmith6 15:03, 14 October 2010 (UTC)

I don't have any notes on this computer. >: I will be adding more to my section later on tonight. Sorry. ~Andrew

Hello dudes,

Just a quick note, try to include citations in your paragraphs - each time that you make a claim which came from evidence, put a little number [X, pp. page-number (if applicable)] into your text. Then, put the same [X] at the bottom of the page with the bibliographical information about the source. The prof hasn't yet gotten back to me about his preferred citation format, so just stick with this one for now:

Authors. Title. Web-page. Date of article. Web (the word). Date you accessed it.

Here's an example:

[1] Kawaguchi, Nishioka, Tamoda. A Flash Memory Based File System. http://docs.google.com/viewer?a=v&q=cache:E7-H_pv_18wJ:citeseerx.ist.psu.edu/viewdoc/download%3Fdoi%3D10.1.1.92.2279%26rep%3Drep1%26type%3Dpdf+flash+memory+and+disk-based+file+systems&hl=en&gl=ca&pid=bl&srcid=ADGEESgspy-jqIdLOpaLYlPPoM56kjLPwXcL3_eMbTTBRkI7PG0jQKl9vIieTAYHubPu0EdQ0V4ccaf_p0S_SnqKMirSIM0Qoq5E0NpLd0M7LAGaE51wkD0F55cRSkX8dnTqx_9Yx2E7&sig=AHIEtbS-yfGI9Y48DJ0WyEEhmsXInelRGw . 1995. Web. Oct. 14, 2010.

Fedor

PS, its a good idea to check this fairly frequently between now and tomorrow morning - you never know when something will come up.


Phew... for a while there I was starting to think that I had nothing about the actual "Log-Based System", but it turns out that the "Transitional Layer" is the same thing. It looks like some articles are calling it the Log system, while others are calling it the transitional layer. Pretty sure I'm going to have an experts knowledge about flash drives after reading all these articles :P --Gsmith6 18:13, 14 October 2010 (UTC)



Hay Geoff this is what i got so far after reading a couple of the pdfs. the double tabed points are just my annotation on how they relate to the question.


Ware leveling: p1126-chang.pdf

  • Uneven wearing of flash memory due to storing data close together
  • Garbage collection prefers that no blocks have pages that have data that is constantly becoming invalid
  • data that remains the same for longs periods of time should be moved from block that have not be written to much and moved to blocks that haven been erased frequently.


Log file structure: 926-rosenblum.pdf

  • LFS based on assumption that frequently read files will be stored in cash and that the hard disk traffic will be dominated by writes
  • Writes all new info to disk in a sequential structure called a log
  • Data is stored permanently in these logs no other data is stored on the hard drive
  • Converts many small random synchronous writes to a large asynchronous sequential write
    • Good for flash because it cuts down on writing (prolongs drive life)
    • It also good because it writes to a bigger section then a page. This means it can fill a block at a time so it doesn’t fill up other blocks with random writes that would later need to be cleaned. Cuts down on cleaning.
  • Inode is stored in the log on the disk while an inode map is maintained in memory which points to the inode in the hard disk. as
    • This is good for flash drives because reading does not hurt the drives life and it is fast.
    • This means the map will not have to be updated on the disk as frequently cutting down on the writes.
  • Log systems weakness is that it is susceptible to becoming fragmented due to the larger writes.
    • Since flash drives do not require fragmentation this is fine. Also since flash drives have very fast random access the system does not become bogged down when the logs are fragmented
  • Log system implements a cleaning system that scans a segment in and sees if there is live data in it. If there is a certain percentage of invalid data which goes according to the cleaning policy it will be cleaned. All the live data will be copied out and the segment will be erased.
    • This is a garbage collector but its built into the file system.
  • Segments contain a number of blocks. Segments can contain logs or parts of logs.


I'm still sifting though the other pdfs you sent me. Would you like me to post more info or should i format this into a paragraph or 2?

--Paul

I'm sorting through my notes and putting them into a digital format. Difficult because It's hard to remember what I copied directly out of the notes for my own reference, so I'm trying to de-plagarize. I'll upload what I have to the section by 20:00, but expect more updates. ~Andrew