Memory Management Talk by Rik Van Riel

xiaoxiao2021-03-06 44

we are very pleaged to present you Today Rik Van Riel.

He is a kernel HACKER WORKING ON MEMORY Management.

Currently He Is Working At Conectiva S.a. in Brazil. As all of you know,

it is a big linux company from South America.

Appart from kernel Hacking, He Also Runs the Linux-mm Website and The

#kernelnewwbies Irc channel on openprojects.net

You can Find More About Him At:

Www.surriel.com (there you can find, among

Other Things The Slides of this Talk AT:

http://www.surriel.com/lectures/mmtour.html)

He Will Talk Here About Memory Management But Other Interests of Him

Are: High Availability, FileSystems and Various Other Things ...

The Talk Will Be Here, In #Linux Channel, MR RIEL SUGGESTED US To Make

Another Channel (#qc -> Questions Channel) to write Questions During the talk.

SHOULD you have any questions, comments, etc, Just Write them in #QC

and mr. Riel Will Reply.

Thank you to mr. Riel for command hend here and also to all of you

The title of his talk is:

Too little, TOO SLOW; MEMORY Management

Mr. Riel ...

i Guess It's Time To Begin ...........

OK, Welcome Everybody

Today I Will Be Giving A Talk about Linux Memory Management

The Slides Are AT

http://www.surriel.com/lectures/mmtour.html

We Will Begin with Some of the Slides Introducing Memory Management and Explaining Why We need memory management

if you have any quest questions about my talk, you can ask them in #qc in #qc, you can also discuss with each other the thing things i Talk About

this channel (#linux) is Meant to Be Completely Silent ... Except for me of course;)

... (Page 1) .....

Let Me Begin by Telling A Little Bit About What am doing at the moment

CONECTIVA IS PAYING Me TO Work ON Improving The Linux Kernel Full-Time

this means what i am Working for Linux Torvalds and alan Cox, But CONECTIVA IS PAYING ME;) [Thanks CONECTIVA :)]]

Now I'll Move on To The Real Talk ... (Page 2)

[for the new people ...

http://www.surriel.com/lectures/mmtour.html for the slide]

OK, I will begin by Explaining About Memory Management

Most of The Introduction I Will Skip

But I will tell a few things about the memory hierarchy and about page faults and page replacement Page Faults and Page Replacement

Lets Start with The Picture ON (Page 3)

this picture represents the "memory hierarchy"

Every Computer Has More Kinds of Memory

VERY FAST and VERY SMALL MEMORY

And Very Big But Very Slow Memory

Fast Memory Can Be The Registers Or L1 CPU Cache

Slow Memory Can Be L2 Cache or Ram

AND THEN you have hard disk, which is really real extremely solution;)

WHEN YOU SEE THISLVES The Question "But Why Doesn't My Machine Have ONLY FAST MEMORY?"

OR "Why Don't We Just Run Everything from Fast Memory?"

The Reason for this Is That IS Impossible To Make Very Big Fast Memory And Even IT WAS POSSIBLE, IT Would SIMPLY BE TOEEESIVE

and you cannot Run Everything from Fast Memory Because Program SIMPLY TO BIG

Now We've Talked About "Fast" and "Slow" Memory ... (Page 4) TELLS US ABOUT DIFFERENT KINDS OF SPEEDS

You Have "Latency" and "throughput"

OK, Good to Have EveryBody Back

if You Look At (Page 6) You Can See How Rediculously Slow Some Memory Things Are

OK, Lets Go To (Page 7)

"latency" == "if i ask for something, how long do i have to wait unient get the answer"

"Throughput" == "How much data can I get per minute"

I Think We do Not Have Time To Look At The L1 and L2 Cache Things

So lets move on to the Ram Management

ON (Page 14)

RAM IS The Slowest Electronic Memory in A Computer

It is offen 100 Times Slower Than THE CPU CORE (IN LATENCY)

this is very very slow

Buthen You see THEN RAM (IN LATENCY), SUDDENLY MEMORY LOOKS FAST AGAIN ...

this enormous Difference In Speed Makesit Very Important That You have get in memory That You NEED

6 if You Do Not Have The Data You Need in Ram, You NEED TO WAIT VERY LONG (OFTEN More Than 5 Million CPU CYCLES) Before your data is there and youprogram can Continue ON The Other Hand, Everybody Knows That You Never Have Enough Memory;

So the System Has To Chose Which Pages To Keep In Memory (or Which Pages To Read from Disk) and which Pages To Throw Away (SWAP OUT)

OK, Lets Try this again;

The Ping Timeout Probably Lost My Last 3 Minutes of The Talk

6 So Lets Move on To The Ram Management

ON (Page 14)

RAM IS The Slowest Electronic Memory in A Computer

It is offen 100 Times Slower Than THE CPU CORE (IN LATENCY)

this is very very slow

Buthen You see THEN RAM (IN LATENCY), SUDDENLY MEMORY LOOKS FAST AGAIN ...

<- Sadie Has Quit (Ping Timeout for Sadie [Orka.go2.pl])

this enormous Difference In Speed Makesit Very Important That You have get in memory That You NEED

But As We All Know, No Computer Ever Has ENOUGH MEMORY ...

and the speed Difference Is Really Big ... this means That The System Has To Choose Very Carefully What Data It Keeps in Ram and What Data It THROWS AWAY (SWAPS OUT)

Lets Move on To Page 18

OK, IF A Page OF A Process Is Not in Memory (But The Process Wants It) THE CPU WIVE AN ERROR AND ABORT THE PROGRAM

THE OPERATING SYSTEM (OS) Gets the Job of Fixing this Error and Letting The Program Continue This Trap is Called A "Page Fault"

The OS Fixes The Job by Getting A Free Page, Putting The Right Data In That Page And Giving The Page To That Program PROGRA

After this the process continues just like nothing happened

The only big problem is this such a page fault Easily takes_million_ cpu cycles

So you want to make you have one as little page faults as possibLE PAGE

The Other Problem is this allowed by "Little Bit" of memory in your machine

and you run out of free memory very fast

At this Point, The OS Needs To Choose Which Data It Keeps in Memory and Which Data It Swaps Out

..... Lets move to (page 19) of

http://www.surriel.com/lectures/mmtour.html ....

The "Perfect" Thing to Do Is To Throw Away (Swap Out) That Data Which Will Not Be Needed Again for the Longeest Time

That Way you have the longest time Between Page Faults and the minimum number of page faults per minimum ... so the best system perform

The only problem with this method is that you need to look inte

and this isn't real Possible ...;))))))))

So We Have To Come Up With Other Ideas That Approximate this IDEA

......

ONE IDEA IS LRU ... We swap out the page which Has NOT BEEN Used for the Longeest Time

The IDEA IS: "IF a page Has NOT BEEN Used for 30 Minutes, I Can Be Pretty Sure I Will Not Use It Again In The Next 5 Seconds"

Which really makes a lot of sense in Most Situation Unfortunately, There Are A Few (Very Common) Cases Where Lru Does The Exact Wrong Thing

Take for Example a System Where Somebody is Burning A CD

To Burn A CD AT 8-Speed, You Will Be "Stream" Your Data At 1.2MB Per Second

At this speed, IT Will Take Just 30 Seconds On Your 64MB Workstation Before your mail ready, is "Older" Than the old data from the cd write program

And Your System Will Swap Out The Mail Reader

Which is The Exact WRONG Thing to do

BECAUSE MOST LIKELY You Will Use your mail ready, you will burirc,

LFU Would Avoid this Situation

LFU Swaps Out The Page Which Has Been Used Least Offers

So it 10, the mail ready, is being use, all the time (Pages Used 400 Times In The Last 2 Minutes) While The CD Image Has Only Been Used One Time (Read from Disk, Burn To CD AND Forget About IT )

and LFU Would Nicely Throw Out The CD Image Data That Has Been Used

And Keep The Mail Reader in Memory

in this Situation LFU IS Almost Perfect

... Now we take annother example;)

IF WE LOOK AT GCC, You Will See That IT Consists of 3 Parts

A Preprocessor (CPP), A Compiler (CC1) and an assemer (as)

Suppose you only Have Memory for One of these AT A TIME

CPP WAS Running Just Fine and Used Its Memory 400 Times in The Last Minute

Now IT IS CC1's Turn to Do Work

But cc1 does Not Fit in Memory At the Same Time AS CPP

AND LFU WILL SWAP OUT Parts of CC1 Because CPP Used ITS Memory a Lot ... And Does The Exact Wroge ... CPP_Stopped Doing Work_ a Second Ago

and cc1 is now the important process to keep in memory

... this means what Both Lru and LFU Are Good for Some Situations, But Really Bad for Other Situations

I Got a Question if lru or lfu is better ... The answer is none of them;)

What We Reali Want Is Sometting That Has The Good Parts of Both Lru and LFU But Not The Bad Parts

Luckily We Have Such A Solution ... Page Aging (on page 20)

Page Aging Is Really Simple

The System Scans over ALL of Memory, And Each Page Has AN "Age" (Just Points)

IF The page Has Been Used Since We Scanned The Page Last, We Increase The Page Age

RIEL: How do you measure when a page Has Last Been "Used"?

... UMM YES ... I Almost Forgot About That Part;))

--- When a page is being used, the cpu sets a special bit, The "accessed bit" on the page (or the page table)

--- and we Only Have to Look at this bit to see reason the page was used

--- and after we look at the bit, We set it to being used aga in we scan

So Back to Page Aging Now

IF The Page Was Used Since We Last Scan It, We make the page agnate

IF The page is not used, We make the page agn smaller

and when the page agides for swapout ... we remove the data and use the memory for Something else

Now There Different Ways of Making The page agnount Ways of Making The page Age Bigger and Smaller for Making It Bigger, We Just Add A MAGIC NUMBER TO The Page Age ...

for Making It Smaller, We can do multiple things

IF WE SUBSTRACT A MAGIC NUMBER (Page-> Age - = 1), WE WILL BE Close To LFU

IF We Divide The page ag 2 (page-> agn / = 2), We will be close to lru

To BE Honest, i Have Absolutely No IDea Which of The Two Would Work Best

or if We Want System Administrators to Select this Themselves, Depending On What THE SYSTEM IS DOING

page aging is buy by linux 2.0, freebsd and linux 2.4

Somebody Thought It Would Be a Good Idea To Remove Page Aging In Linux 2.2, But it turned out not to work very well ...

... SO We Put Back for Linux 2.4;))

and another questions: Riel: What is linux using?

HORAPE: Linux 2.0 Uses the "Page-> Age - = 1" Strategy

Horape: and Linux 2.4 Uses The "Page-> AGE / = 2" Strategy

Maybe The First Strategy Is Better, Maybe The Second Strategy IS Better

IF WE HAVE ANY VOLUNTEERS WANT to Test this, Talk to Me After the Lecture;))

i will now go on to (Page 21) and Talk about Drop-Behind

Most of You Have Probably Heard About Read-ahead

WHERE THE SYSTEM TRIES TOET IN DATA from a file * Before * The Program Which Uses The File Needs IT

this Sounds Difficult, But if The program is just ready, home home

ONE PROBLEM IS THIS LINEAR READ WILL Quickly Fill Up All Memory if IT Is Very Fast And You Do Not Want That, Because You Also Have Other Things You Want To do with your memory

The Solution Is To Put All The Pages_Behind_ Where The Program Has Been Reading on The List of Pages We would Swap Out Next (The Inactive List)

So in Front of Where The Program Is Now, You Read in All The Data The Program Next (Very Friendly for the Program)

and in Exchange for That, You Remove The Data THE Program Will Probably Not Need Any More

of Course You Can Mistakes Here, But if you get it it right 90% of the time it is still good for performance ... you do not need to be perfect

... from the part about hard disks, i will skip almost everything

... ONLY (Page 23) I Will Discuss Today

As You Probly Know, Hard Disks Are Really Strange Devices

TheY Consist of a Bunch of METAL (Or Glass) Plates with a magnetic coating on the Magne Spin Around At Rediculously High Speeds

and there is a ready-write arm Which can seek Across the disk at very low speeds

The Consequences of this Design Are That Hard Disks Have A High Throughput ... 20 MB / Second Is Quite Normal Today

this is find enough to keep a modern cpu busy

on the other hand, if you need some piece of data, your cpu will have to wait for 5 _Million_ CPU CYCLES

SO Hard Disks Are Much Too Slow if You're NOT Reading The Disk from Beginning To End

this means what so called "Linear Reads" Are Very Fast

While "Random Access" is extremely solution You Should Not Be Surprised if 90% of The Data IS in linear Reads, But Hard Disks Spend 95% of Their Time Doing Random Disk Io

BECAUSE The Linear Io Is So Fast The Disk Can Do It in Almost No Time;

The Normal Optimisation for this is "IO Clustering", WHERE The OS Reads (or Writes) AS MUCH DATA in One Place of The Disk As Possible

The "As Possible" Can Not Be Too Large, HoWever ...

if you have "Only" 64 MB Ram in Your Machine, you probably do not want to do readahead in 2MB PIECES

BECAUSE THAT OUT OF MEMORY, Which You Will Need To Read in Again Later, ETC ...

SO it is good to read in a small part of data, but it is also good to read in very big parts of data ... and the OS will have to decide on some good value all by itself

Linux Has Some Auto-Tuning Readahead Code for this Situation (in mm / filemap.c :: generic_file_readahead (), for the interested) But That Code Still Needs Some Work to Make It Better

And of Course, Another Way to Make "Disk Accesses" Fast Is To Make Sure You Do Not Access The Disk

You can do this if the data you need is already (or still) in memory

Linux Uses All "Extra" Memory As a Disk Cache in the Hope That It Can Avoid Disk Reads

And Most Other Good Operating Systems Do The Same (FreeBSD for Example)

... Now i will go on with Linux memory management

... on (Page 28) and furter

I will explain how memory management in Linux 2.2 chooses which pages to swap out, what is wrong with that and how we fix the situation in Linux 2.4 and also the things that are still wrong in Linux 2.4 and need to BE FIXED LATER;)

OK, Another Question: RIEL: TO Keep Data IN MEMORY, HAVE you EVER THOUGHT ABOUT COMPRESSED DATA IN MEMORY?

--- this is a good idea in some circumstances

--- Research Has Shown That Compressed Cache Means That Somp IO and Are Faster

--- The Other Hand, for Some Other Systems It Makes The System Slower Because of The System Slower Because Of The System Of Compnession Of THE OVERHEAD OF Compression

--- It really depends on what you do with your system if The "compressed cache" trick is west it or not

--- and it Would Be Interesting to See As an Option On Linux Since It Is Really Useful for Some Special Systems

--- for Example, Systems Which Do Not Have Swap

... OK, Lets Move on To (Page 31)

Linux 2.2 Swapout Code Is Really Simple

(at Least, That's the IDEA)

The main function is do_try_to_free_pages ()

this function calls shrink_mmap (), swap_out () and a few other - Less important - functions

Shrink_mmap () Simply Scans All of Memory and Will Throw Away (Swap Out) All Cache Pages Which Were Not Used Since The Last Time We Scanned

and swap_out () Scans The Memory of All Programs and swaps Out Every Program Page Which Was Not Used Since The Last Time We Scanned IT

... (Page 32)

this is a really Simple System Which Works Well iF The System Load is not Too High But As Soon as The Load Gets Higher, IT CAN Completely Break Down for Some Reasons

IF, for Example, The Load on The System Is Very Variable, We get problems

if you have success memory for 30 minutes and all 30 minutes, then after 30 minutes _every_ page Has Been Used Since The Last Time We Scanned (30 Minutes AGO)

and then Something Happens in The System (Netscape Gets Started)

But The OS Has No IDea Which Page To Swap Out, Since All Pages Were Used In The Last 30 Minutes, When We Scanned Last

In That Situation, The OS Usually Swaps Out The 'WRONG' PAGES

and Those Wrong Pages Are Needed Again 5 MilliseConds Later

Which Makes The OS Swap OT * Other * WRONG PAG PAG PAG PAG PAG PAGES AGAIN, UnTil Everything Settles Down

So Every Time The System Load Increases, You Have a Period Where The System Is Really Slow and Has To Adjust To The Load ...

Another Problem Is That (in shrink_mmap) We scan and swap out pages from the Same FUNCTION

this Breaks Down WHEN WHEN WHEN A VERY HIGH LOAD IN The System and a Lot of the Pages We Want To Swap Out Need To Be Written To Disk First

Shrink_mmap () Will Scan Every Page In Memory and Start Disk Io for the Pages That Need To Be Written To Disk

After That It Will Start Scanning At the Beginning Again

and no points the last time we scanned it, since kswapd Was the only thing running

At this point the system -again- starts swapping out the wrong pages

a question: is this the do_try_to_free_pages () Printk WE HEAR SO MUCH ABOUT ON LKML? --- This Printk is Called when DO_TRY_TO_FREE_PAGES () Cannot Find Pages To Swap Out

--- not when_try_to_free_pages () swaps the Wrong Pages by Accident

--- So these Things Are Not The Same

... lets move on to (page 33) And Seeh How We fix these problems in linux 2.4

The Two Big Changes for Linux 2.4 Are Page aging and the subjection of page aging and page writeback to disk

Page Aging Means WE Are More PRECISE IN Chosing Which Page We Swap Out, So We Will Have A Better Chance of Having The Pages We need in Memory

and The System Will Perform Better When Memory Is getting Full

The Separation of page aging and page flushing means That We will not swap out the Wrong Page Just Because The Right Page Still Needs To Be Written To Disk and We Cannot Use It for Something Else Yet

... ON (Page 35) I WILL EXPLAIN About The Memory Queues We Have in Linux 2.4

WE HAVE 3 "Types" of Pages in Linux 2.4

ACTIVE PAGES AND INACTIVE_DIRTY PAGES AND INACTIVE_CLEAN PAGES

WE DO Page Aging on the Active Pages

and the inactive_dioty and inactive_clean pages are simply sitting there Waiting to be used for something else

... Now we go back to (page 34)

Having More Inactive Pages Means That The System Is Better Able To DEAL WIG Allocations and Spikes in System Load

However, Moving Pages from The Active To The Inactive List and Back Is A Lot of Overhead

so having LESS inactive pages is also good ... the solution Linux 2.4 takes is to see how much memory is being reclaimed for other purposes each second (averaged over a minute) and simply keep 1 second of allocations in The inactive queues

I am Pretty Certain We Can do this better for Linux 2.5, But nobody HAS HAD TIME YET TO Research this ...

What Linux 2.4 Also Does Is Some Light Background Scanning

Every Minute or So All The Cache Memory IS SCANED

and when the point-> agn of pages in the caches 0, It will be moved to the inactive list

SO WHEN THE SYSTEM GETS A BURST OF Activity Again After 30 Minutes, The System Knows Exactly Which Pages To Keep In Memory

this fixes the biggest problems we have with linux 2.2 VM

... BECAUSE WE HAVE Little Time Left, I Will Now Go To The Out of Memory (OOM) Killer ON (Page 43)

Which Will Be The Last Part of The LeCTure, After this You can ask questions;

OK, The Oom Killer

When Memory * And * Swap Are Full, There is not much you can do

in Fact, You Can Either Sit There and Wait Until a Program Goes Away, or You Can Kill a Program And Hope The System Goes On Running

in Linux, The System ALWAYS KILLS A Process

Linux 2.2 Kills The Process Which Is Currently Doing An Allocation, Which Is Very Bad if it happens to be syslog or init

Linux 2.4 Tries to Be Smart and SELECT A "Good" Process to Kill

for this, IT Looks at the size of the process (so killing 1 process gets us all the memory back we need)

but also at if it is a root process or if the process has direct hardware access (it is very bad to kill these programs) and at the amount of time the process has been running and the CPU time it has Used

BECAUSE IS BETTER TO KILL A 5-Second Old Netscape Than Kill Your Mathematical Calculation Which Has Been Running for 3 Weeks

Even if the netscape is smaller ...

Killing The Big Calculation Will Mean The Computer Loses a Lot of Work, Which is Bad

for Linux 2.5, i Guess Some People Will Also Want To Have The Oom Killer Look at Which_user_ is doing Bad Things to the system

But That Aref

... ON (Page 44) You can Find Some Urls with Interesting Information

... Thank you for your time, if you have any quest questions, feel free to join the discussion on #QC

... this is the end of my talk, but i will be in #qc for a bit More Time

Clap Clap Clap Clap Clap CLPA CLPAR CLAP

BTW, for People Intested in Linux Kernel Hacking We Have a Special Irc Channel

on Irc.OpenProjects.Net #kernelnewbies

See

http://kernelnewbies.org/ for the #kernelnewwbies Website

MOST OF THE TIME You Can Find Me (AND Other Kernel Hackers) on That Channel

if you have one in-Depth Questions or Find Something Interesting When Reading My Slides, You Can Always Go There

Well, My Friends

Feel Free to Continue Discussing AT #QC

转载请注明原文地址:https://www.9cbs.com/read-65867.html

9cbs

New Post(0)