Software R installation and use (Windows Computer)
Zhou Mai
Department of Statistics,
University
Of
Kentucky
R is a free statistical analysis software (GNU copyright, this is similar to Linux). It is almost a clone of SPLUS. (Do not want money). Almost all of all from R can apply in SPLUS, and vice versa. Splus is a very high quality, commonly used statistical software. The US Drug Inspection Bureau has approved 2 statistical software. Splus is one of them, the other is SAS. R There are various versions that can be run on the Unix computer, Apple, and here; including Windows 95, 98, ME, NT, 2000 can run. The latest version is R1.3.1 (9/2001).
1. Install R on the window computer
If you have a R's disc, you can save the time and trouble of download. Place the disc in the optical drive. Mouse point
Bring setupr.exe to start the installation.
If you don't have a C's disc, you can download "setupr.exe" from the following address:
http://cran.us.r-project.org/bin/windows/base/
Note that this is a 15MB file, so you will take a long time to download.
If you are installed, you will see R in Start-Programs. Use the mouse to select it to start R. You can type q () or use the mouse to exit R with a mouse. There is a pair of words at this time, ask if you need to deposit the history running in R. At this point No. (After you get a useful result, you should save the results and history).
R There are many additional functional packs, which are not used, and memory can be saved. If you want to call, use the mouse point Packages with mouse points to see if there are already installed (you want to install it first). If you have already installed, you can use the mouse to select.
If you want to install the function package, start R, click on the mouse in R
Package → Install Package from Local Zipfile
Then find the ZIP file you want to install in the window, select it. (For example, there is or downloaded in advance).
Now suppose your computer has successfully installed R. 2. the first lesson
Below is the demo of R, we will be slightly output, and only the input you typed here is here. # Back is note,
R is not executed. Note that R is sized (like UNIX, not like DOS).
Note> The character is the PROMPT, indicating that R is ready to accept your instruction.
> DATA1 <- C (21, 25, 25, 18, 44, 20, 25, 15, 19, 20, 30) # This creates a vector called DATA1
> 1S () # (is EL-S, not -S) see what is there (see DATA1?)
> Summary (DATA1)
> STEM (DATA1)
> HIST (DATA1) histogram
> DATA2 <-RNORM (100) Generate 100 normal random numbers, put in Data2
> HIST (DATA2)
> DATA () See those data have been installed in R.
> Data (Sunspots) Turn Data Sunspots
> SunSpots Display SunSpots on the screen.
> Plot (Sunspots)
You can play your name on the figure, learn:
> Text (Locator (1), "Your Name and ID")
At this point, click some place on the picture.
> RM (DATA1) erases the DATA1.
> DEMO (Graphics) See the map demo, you want to return to the instruction window several times.
> q () exits R
An error in beginners is a mistake to produce or define something, named and existing. For example, C, T, etc. At the beginning, it would not be guided to an error, but it will cause many confusion. So when you are name, you should avoid the renaming. If you want to name myData for your data or function, try:
> MyData
Error: Object "MyData" Not found
This indicates something that is not called MyData, you can use this name.
If you repeat an instruction, you can use the arrow to adjust the previously used instructions, and you can modify it. K-shell like UNIX.
2. 1 Print and image archive
R There is a "print" menu that can be clicked, and there are images and instruction windows. You can also click File.
→ Print is available. Or directly click the picture number of the printer.
You can also copy the image into Excel or World (convenient to edit with other text). First click the image window of R to become Active, then click File → Copy to Clip Board → As Bitmap. Open (Microsoft) Word or Excel, click Edit → Paste there. This way your map is in Word or Excel.
If you click File → Save As → PostScript, the map file is filed as a PostScript file, and so on.
2. 2 memory
R has its own memory management system. You can use GC () to see how much memory has been occupied. However, from R1.2.0
Start (now the latest version is 1.3.0) You don't have to worry about memory issues. Of course, if your PC is insufficient, R's run will be very slow.
2.3 Mathematical symbols in color mapping
R can generate a color chart. (Maybe you have seen it in Demo (Graphics). Use Plot (x, col = "red") to get a picture. To other colors, you can see the name of more than 600 colors with colors ().
You can also use Points (x, col = "white") to erase the red point you just got, (assuming you to use the White color). There are also many other functions available, including lines (), and so on.
R is better than SPLUS is R to make mathematical symbols and Greek letters (similar to Tex language) in the figure. The following is a simple example: (more visible DEMO ()).
> Plot (RNORM (100), Type = "n")
> Text (20, 0, Expression (Theta), col = "blue")
> text (40, 0, expression (Theta {"2 x"}), col = "blue")
3 second lesson
R is to learn, the following is some commonly used functions, try it.
Look at the Demo: Demo () or Demo (Graphics)
Delete x: rm (x)
See what you have: ls ()
Randomly generates 9 integers from 20 to 40 (no repetition): Sample (20: 40, 9, replace = false)
Random group, 18 East and West 3 groups: Sample (1: 3, 18, replace = true)
Randomly group, 18 East and West 3 groups, 6 Samples each group (Replace = false)
Check the feature of the sample () function, usage:? SAMPLE
Calculate the sample mean of DATA1 Mean (DATA1)
Calculate the sample standard deviation of DATA1 SD (DATA1)
Calculate Data1 sample variance VAR (DATA1)
Calculate Data1 Medium Median (DATA1)
Range (DATA1)
Boxplot (DATA1)
Calculate 5 steps of multiplication PROD (1: 5)
Calculate the different kinds of different kinds of different kinds of methods from 20 things (20, 5)
Random number RNORM (100) produced 100 standard normal distribution
Generate and put continuous random numbers for Table (CUT (RNORM (100), 8))
Another usage of R is to archive a few or dozens of instructions in an ASCII file (such as called mycode), then play in R
> Source ("MyCode") or with mouse points file → source r code.
For ready-made data, it is not necessary to reflate input, and R is read. First consolidate data as an ASCII file (
If you use wordpad, then do the following instructions in R: (Assuming your data in text.dat) (read the data in DATA3)
> data3 <-read.table ("c: /stat/test.dat", header = true) can also be used to read the data, using Write () to output data.
R can replace almost all statistical forms to get various probability
If z is a two distribution random variable, n = 25, p = 0.3, then p (z ≦ 5) is
> Pbino (5,25,0.3)
P (z = 5) is
> dbinom (5,25,0.3)
P (Z ≧ 5) is
> 1-Pbinorm (4, 25, 0.3) # Please note that it is 4 instead of 5.
Last P (5 ≦ z ≦ 10) is
> 1- (1-pbinorm (10, 25, 0.3) Pbino (4, 25, 0.3) or
> Pbino (10, 25, 0.3) -pbinorm (4, 25, 0.3) or
> Sum (Dbinom
5:10
, 25, 0.3)
It is also possible to print a probability table of two distributions (n = 25, p = 0.3)
> dbinom (0: 25, 25, 0.3)
In order to see more, you can try:
> Print (dbinom (0: 25, 25, 0.3), print.gap = 2) or
> Print (CBIND (0: 25, Dbinom (0: 25, 25, 0.3)), Print.gap = 3)
If Z is a standard normal distribution variable, p (z <1) is
> PNORM (1)
If you want to calculate non-standard normal probability, you want to give a mean and standard deviation. If Z is average -2, standard deviation
For 3 normal random changes, P (z <1) is:
> Pnorm (1, Mean = -2, SD = 3)
And P (2 > PNORM (3, Mean = -2, SD = 3) -pnorm (2, mean = -2, sd = 3) For the probability of super geometric distribution, Dhyper () or phyper () can be used to calculate: ------------------- | F11 | | 19 ------------------- Assuming that the left 2 × 2 table is our concern. want | | | 11 Calculate the distribution probability of F11. ------------------- 14 16 The probability of F11 = 6 is > DHYPER (6, 14, 16, 19) If you use phyper to get the probability of F11 ≦ 6. For the card square distribution (center distribution of degree of freedom). It is less than 3.84 probability > PCHISQ (3.84, DF = 1, NCP = 0) 4. Some exercises Question 1: If the entire master (all people) is half a half (in favor / oppose). And we use random sampling to investigate. Use R to calculate the following probability: (a) random sampling 10 people, 6 people or more (b) Random sampling 100 people, 60 people or more (c) Random sampling 1000 people, 600 people or more (d) Random sampling 2000 people, 1200 people or more (e) Random sampling 1500 people, which approach between 300 and 600 (including 300 and 600). According to the above calculations, if you randomly sampled 2,000 people, 1200 people agree. Do you also believe half of / half (pro - opposition)? Punch reasons. All calculations can be completed with Pbino () or R Print a small normal distribution probability table, please compare with the book. > PNORM (SEQ (-3.5, 3.5, 0.5))) Setz is a random variable of a normal distribution. The average value is 2, the standard deviation is 4. Calculate the probability of 3.085 Single sample T test. First save data into a vector (for example) called Data6 > DATA6 <-C (33.9, 52.4, 48.6, 53.5, 43.8) To test HO: μ = 46.5 HA: μ <46.5 (in fact only significant), > T. Test (Data6, Alternative = "Less", MU = 46.5) In addition, two opposing assumptions are "greater" and "towtage". This function gives a 95% confidence interval in addition to the significance. T. Test () can also do two sample T test. Assume that there are 2 sets of data called Xbefore and XAFTER. Also assume that the data is not paired. Test: HO: μ = 0 HA: μ <0 > T. Test (x = xbefore, y = xafter, alternative = "less", mu = 0, paired = false) If the data is paired, it will be changed = false to paired = true. A percentage of single samples. Assume that the data is: 1000 trials 600 success. Whether to inspect the probability of success is 0.5: HO: P = 0.5; HA: P ≠ 0.5, then > Prop.Test (600, n = 1000, P = 0.5, alternative = "two.sided") The other two opposite assumptions are "LESS" and "Greater". This function also gives 95% confidence intervals. Further information available? T. Test. Query online manuals. If you need an additional feature package with R, you can check it first. > library () lists all installed additional features. (Assuming you have CTest.) > Library (ctest) This is called in. (Which includes binom.test functions) > library (help = ctest) See those functions in the CTEST function package. (There is binom.test) > binom.test (600, n = 1000, p = 0.5, alternative = "two.siced") Use binom.test to do statistical inspection >? binom.test If you don't have the function you need in library (), you should install it first (see Section 1). The function binom.Test is similar to prop.test, but prop.test is approximately calculated, binom.test is accurate calculation. However, PROP.TEST has more applicableness and widerness. R can also call out-scale C processes and fortran programs. However, it is more complicated. To convert the C / Fortran program into a DLL executable. Then call it. You should try to complete your calculations in R. If you really have the need to call C / Fortran, please find English files. Some features of R: 1, R (splus) is vector language. Almost all calculations are preferably quantified. (Will be much faster than the FOR cycle). , -, *, /, ^, ..., etc. To a part of a vector, [] can be used to represent the subscript range. 2, R 's operations are done with functions. R has more than 3,000 functions. Exp (x), log (x), sqrt (x), q (), c (x, y) ..., etc., all functions, X can be vector. 3, you can easily define new functions in R (see you after see) 4, R's image function is very strong, you can interact, until satisfaction. 5. For very large data, R may not be appropriate. (> 1 gig) 6. Many additional features are exactly the same in R and SPLUs. (Such as Survival, Bootstrap, etc.) 7. Random numbers in R can be freely selected (if you are worried about the quality of the random number) 8, R Don't want money, you can let students give each person. You can calculate at home. Or on the laptop If you want to define a new function, you can write any editing software in R external to any editing software (ASCII File or .txt file) and then read it in r. You can also write directly to R. For example, the function of the R function Mean, the modification becomes your own function JUNK. > JUNK <-EDIT (mean) To modify your function, you can use: > FIX (JUNK) If the function you want to define is short, you can type directly in R, for example > JUNK <-Function (x) {x / (x 5)} The following is another example of a function of self-definition. (Give sample size, sample mean and standard deviation, resulting in fake data) Fakedata <-function (size, xbar, sdd) { IF (SDD <= 0) STOP ("SDD Must> 0") If (! is.numeric (xbar)) Stop ("XBar Must Be a Real Number) FAKE1 <-RNORM (Size) FAKE2 <-FAKE1 - Mean (fake1) FAKE2 * (SDD / SD (fake2)) XBAR } This function is useful in the following cases: Sometimes the sample size (= 50), the sample mean (= 11.8) and the standard deviation (= 0.6) are given in the exercises (= 0.6), but there is no original data. If you want to test, you can do it as follows: > MyData <- fakedata (50, 11.8, 0.6) > T. Test (MyData, MU = 12, Alternative = "less") Further reading can only read English. You can first look at "An Introduction TO R". This book is also free. Click Help in R, then select this book, you can print it out. R'm an instruction and output except the numbers / symbols, it is English, so the English seems to be necessary.