kleines bad sinnvoll einrichten
>> [music playing] david j. malan: all right. this is cs50, and thisis the end of week four. and one of the topics todayis that of digital forensics, the art of recovering information. and indeed, even thoughyou're in the midst right now of peace at threeand breakout, next week, the focus will be onprecisely this domain. >> so one of the coolest jobs i everhad was back in graduate school,
when i was working for the localmiddlesex county district attorney's office, doing forensics work. so essentially, the massachusettsstate police, on occasion, when working on cases wouldbring in things like hard drives and floppy disks andmemory cards and the like. and they would hand themto me and my mentor, and our goal was to find evidence,if there was any, on these media. now, you might have seen glimpsesof this world of forensics in the media, tv and movies.
but the job i had, anddaresay that world, is not quite like you would see it. let's take a look at whatyou've probably seen. [video playback] -ok. now, let's get a good look at you. >> -hold it. run that back. >> -wait a minute.
go right. -there. freeze that. -full-screen. >> -ok. -tighten up on that, will you? >> -vector in on thatguy by the back wheel. >> -zoom in right here on this spot. >> -with the right equipment, theimage can be enlarged and sharpened.
>> -what's that? >> -it's an enhancement program. >> -can you clear that up any? >> -i don't know. let's enhance it. >> -enhance section a6. i enhanced the detail, and-- -i think there's enough to enhance. release it to my screen.
>> -i enhanced the reflection in her eye. -let's run this throughvideo enhancement. >> -edgar, can you enhance this? >> -hang on. >> -i've been working on this reflection. >> -there's someone's reflection. >> -reflection. -there's a reflection of the man's face. >> -the reflection!
-there's a reflection. -zoom in on the mirror. you can see a reflection. >> -can you enhance the image from here? -can you enhance it? -can we enhance this? -hold on a second. i'll enhance. -zoom in on the door.
-times 10. -zoom. -move in. -more. -wait, stop. -stop. -pause it. -rotate us 75 degreesaround the vertical, please. >> -stop.
go back to the partabout the door again. >> -got an image enhancer that can bitmap? >> -maybe we can use the pradeep singhmethod to see into the windows. >> -the software is state of the art. >> -the eigenvalue is off. >> -with the rightcombination of algorithms-- >> -he's taken illuminationalgorithms to the next level, and i can use them toenhance this photograph. >> -lock on and enlarge the z-axis.
>> -enhance. enhance. -enhance. -freeze and enhance. [end video playback] david j. malan: so those areall words, but they were not used in sentences correctly. and indeed in the future, any time,please, you hear someone say the word, "enhance," chuckle just a little bit.
because when you try to enhance,for instance, this is what happens. >> so here's a gorgeous photo. this is cs50's own daven. and suppose that we wanted tofocus in on the twinkle in his eye, or the reflection of thebad guy that was clearly captured by the security camera. this is what happens whenyou zoom in on an image that has only a finite numberof bits associated with it. >> that is what you would get.
and indeed, in daven's eyeis but four, maybe six pixels that compose exactly whatwas glimmering there. so problem set four will ultimately haveyou explore this world, particularly by nature of somethingwe call file i/o, where i/o is just a fancy way ofsaying input and output. >> so thus far, all of the interactionswe've had with a computer have been largely with yourkeyboard and the screen, but not so much with the hard disk,or saving of files beyond the ones you yourself write.
your programs thus far havenot been creating, and saving, and updating their own files. >> well, what's a file? well, something like a jpeg. this is an image you mighthave or upload to facebook, or see anywhere on the web. indeed, that photo we justsaw of daven was a jpeg. and what's interestingabout files like jpegs is that they can be identified,typically, by certain patterns of bits.
>> in other words, what is it thatdistinguishes a jpeg from a gif from a ping from a worddocument from an excel file? well, it's just differentpatterns of bits. and those different patterns areusually at the start of those files. >> so that when your computer opens a worddoc, or when a computer opens a jpeg, it looks typically at thefirst several bits in the file. and if it recognizes a pattern,it says, oh, this is an image. let me display it tothe user as a graphic. or, oh, this looks like a word doc.
let me show it to the user as an essay. >> so for instance, jpegs,it turns out, are fairly sophisticatedunderneath the hood. but the first three bytes in most everyjpeg start with these three numbers. so byte zero, one, and two are, inmost every jpeg, 255, then the number 216, then the number 255. >> and what you'll be ableto start doing next week is actually poking underneaththe hood of files like jpegs and like bitmap files, and seeingwhat's always been there for as long
as you've been using a computer. >> but what's there is not typicallywritten like decimal numbers like this. computer scientists don'ttend to speak in decimal. they don't really speak in binary. typically, when we wantto express numbers, we actually use hexadecimal,which you may recall from, say, problem setone, which challenged you to think about a different system. >> we, of course, are familiarwith decimal, zero through nine.
we talked about binary. and we don't really haveto use that much here on out, because computers will use that. but programmers will veryoften, but not always, use hexadecimal, which just meansyou have 16 letters in your alphabet, as opposed to two or 10. >> so how do you count to higherthan nine in hexadecimal? you go 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,a, b, c, d, e, f, just by convention. but what's key is that eachof these is a single symbol.
there is no 10. there is no 11, per se, because eachof your digits, just like in decimal and just like in binary, should justbe a single character, by convention. >> so that then is the alphabet we haveat our disposal for hexadecimal. so what does a jpeg look like if youwere to write out those first three bytes not as decimal but,for instance, as hexadecimal? and why is hex even all that useful? >> well, a quick look at an example. so if i write out the bits thatrepresent these decimal numbers--
this might be a little rustynow from a few weeks back, but the left one and theright one are pretty easy. 255 was the biggest number wecould represent with eight bits. it was all ones. so the only one that's mildlyinteresting is the middle one. and if you kind of do out themath, you will deduce that, indeed, that pattern of one andzeros represents 216. so let's just stipulate fornow that these are correct. but why is this interesting?
>> well, a byte, of course, is eight bits. and it turns out that if you thinkof a byte as two chunks of four bits, like this. let me just add some space. so before, after. i've just added some white spacefor visualization's sake here. how might we now represent in,say, hexadecimal each quad of bits, each set of four bits? >> so for instance, on the leftnow, we have 1111 in binary.
what is that number in decimal,if you do out the math? you have the ones place, the twos place,the fours place, and the eights place. >> audience: 15. david j. malan: it's 15. so if we do eight plus fourplus two plus one, we get 15. so i could write down 15 below1111, but the whole point here is hexadecimal, not decimal. so instead of writing down 15, 1-5,i'm going to write that in hex, which if you think back, if you havezero through f, what is 15 going to be?
audience: f. david j. malan: so it turns out it's f. and you can work that out by saying,well, if a is 10, then ok, f is 15. so indeed, we could rewritethis same set of numbers as f f. and then if we do a bit of math,we'll deduce that that's d. eight is pretty easy, because wehave a one in the eights place. and then, we have a couple more f f's. >> so what humans tend to do by conventionwhen they use hexadecimal is they just write this a little more succinctly,get rid of most of that white space.
and just to be super clear toreaders that this is hexadecimal, the simple convention amonghumans is you write zero x, which has no meaning otherthan a visual identifier of, here comes a hex number. >> and then, you put the two digits, ff in this case, then d a, then f f. so long story short,hexadecimal just tends to be useful because each of itsdigits, zero through f, perfectly lines up with a pattern of four bits. >> so if you have two hexadecimal digits,zero through f, again and again,
that gives you perfectlyeight bits or one byte. so that's why it tends tobe conventionally useful. there's no intellectualcontent really beyond that, other than its actual utility. >> now jpegs aren't the onlyfile formats for graphics. you might recall that there arefiles like this in the world, at least from a few years back. >> so this was actuallyinstalled in windows xp on millions of pcs around the world.
and this was a bitmap file, bmp. and a bitmap file, as you'll see nextweek, just means a pattern of dots, pixels as they're called,a map on bits, really. >> so what's interesting, though,about this file format, bmp, is that underneath the hood, ithas more than just three bytes that compose its header, soto speak, the first few bites. it actually looks a littlecomplicated at first glance. and you'll see this in the p set. and getting somethingparticular out of this now
isn't so important, as just the factthat at the beginning of every bitmap file, a graphical format,there's a whole bunch of numbers. >> now microsoft, theauthor of this format, tends to call thosethings not ints and chars and floats but words and dwords and longs and bytes. so they're just different data types. they're different namesfor the same thing. but you'll see that in p set four. >> but this is only to say that if a humandouble-clicks some .bmp file on his
or her hard drive, and a window opensup showing him or her that image, that happened because the operatingsystem presumably noticed not only the .bmp file extensionin the file name, but also the fact that there's someconvention to the pattern of bits at the very beginningof that bitmap file. >> but let's now focus onsuch a complicated file, but instead on something like this. suppose here in gedit, ijust have the beginnings of a program that's pretty simple.
i've got some includes up top. now i've got #include "structs.h" buti'll come back to that in a moment. but this is useful for now. so this is a programthat's going to implement like the registrar's database. so a database of students,and every student in the world has a name and a house and probably someother stuff, but we'll keep it simple. every student has a name and a house. >> so if i wanted to write aprogram whose purpose in life
was just to iterate fromzero on up to three, if there's three studentsat harvard university. and i just want to get, using getstring,each student's name and house, and then just print those out. >> this is sort of like weekone, week two stuff now, where i just want a forloop or something like that. and i want to call getstring a fewtimes, and then print f a few times. so how might i do this, though,when both a name and a house are involved for each student?
>> so my first instinct mightbe to do something like this. i might first say, well, give me,say, an array of strings called names. and i don't want a hardcode three here. what do i want to put there? so students, because that's justa constant declared at the top, just so i don't have to hardcodethree in multiple places. this way, i can change it one place,and it affects a change everywhere. and then, i might dostring houses students. >> and now, i might do something likefor (int i = 0; i < students; i++.
so i'm typing fast, but this isprobably familiar syntax now. >> and now, this was more recent. if i want to put in the i-thstudent's name, i think i do this. and then, not namesbut houses bracket i. i do this, getstring, and letme go back and fix this line. agree? disagree? it's not very user-friendly. i haven't told the user what to do.
>> but now, if i alsowanted to later, let's say, print these thingsout-- so todo later. i'm going to do more withthis-- this arguably is a correct implementation ofgetting names and houses, three of them total of each, from a user. >> but this is not very good design, right? what if a student has not just a nameand a house, but also an id number, and a telephone number,and an email address, and maybe a home page, andmaybe a twitter handle,
and any number of other detailsassociated with a student or a person, more generally. how would we begin to addfunctionality to this program? >> well, i feel like the simplest way mightbe to do something like, let's say, int ids students. so i can put all their ids in there. and then, for somethinglike phone numbers, i'm not sure how torepresent that just yet. so let's go ahead and just callthis twitters students, which
is a little strange, but--and a bunch more fields. >> i've started to effectivelycopy and paste here. and this is going to grow prettyunwieldy pretty quickly, right? wouldn't it be nice if there werein the world a data structure known not as an int or a string, but somethinghigher level, an abstraction, so to speak, known as a student? c did not come with built-infunctionality for students, but what if i wanted to give it such? >> well, it turns out, i'm going toopen a file called structs.h here,
and you can do exactly that. and we're going to start doing this now. and underneath the hood of p set three,you've already been doing this now. there is no such thing as a g rect ora g oval in the programming language c. >> folks at stanford implemented thosedata types by using this approach here, declaring their own new datatypes using a new keyword called struct and anotherone called typedef. and indeed, even though the syntaxlooks a little different from stuff we've seen before, inprinciple, it's super simple.
>> this just means "define a type." that's going to be astructure, and a structure is just like a containerfor multiple things. and that structure is goingto have a string called name, and a string called house. and let's call, just for convenience,this whole data structure student. >> so the moment you get tothe semicolon, you have now created your own datatype called student that now stands alongside int,and float, and char, and string,
and g rect, and g oval, and any numberof other things people have invented. >> so what's useful about thisnow is that if i go back to struct 0 and finish thisimplementation, which i wrote in advance here, notice that allof the inevitable messiness that was about to start happening as i addedphone numbers and twitters and all these other things toa student's definition, now it's succinctly wrapped upas just one array of students. >> and each of those students nowhas multiple things inside of it. so that just leaves one question.
how do you get at the name,and the house, and the id, and whatever else isinside of the student? super simple, as well. new syntax, but a simple idea. >> you simply index into the array,as we did last week and this. and what's clearly thenew piece of syntax? just ., which means "go inside thestructure and get the field called name, get the field called house,get the field called student." >> so in p set three, if you'restill working on that,
and most folks stillare, realize that as you start using things likeg rects and g ovals and other things that don't seem tocome from week zero, one, or two, realize that that's because stanforddeclared some new data types. >> and indeed, that's exactly what we'lldo, as well, in p set four, when we start to deal with thingslike images, bitmaps, and more. so that's just a teaser and amental model for what is to come. now, i procrastinateda bit this morning. i was kind of curious to see whatthe microsoft wallpaper actually
looks like today. and it turns out someone in 2006actually went to almost precisely the same spot to photograph in realitywhat looks like that these days. the field is now a little overgrown. >> so speaking now of images,let's bring back daven here on the screen and nicholas,and just remind you that if you'd like to join us for lunchthis friday, head to our usual url here. >> so where did we leave off on monday?
we introduced this problem, right? this was seemingly a correctimplementation of swap, whereby you taking two ints,one called a, one called b, swap them, just like laura did hereon stage with the milk and the water, by using a temporaryvariable, or an empty cup, so that we could put b in a and a inb without making a mess of things. we used a variable. it's called temp. >> but what was the fundamentalproblem with this code on monday?
what was the problem here? yeah. >> audience: it takes up more space. >> david j. malan: takes up morespace, because i'm using a variable, and that's ok. that is true, but i'mgoing to say that's ok. it's only 32 bits in the grandscheme of things, so not a big deal. other thoughts? audience: it only swapsthe variables locally.
david j. malan: exactly. it only swaps the variables locally. because any time you call a function--when i had the trays from annenberg last time, you have main on the bottom. as soon as you call a function calledswap, swap does not get x and y, the original values. what does swap get, did we claim? audience: copies. david j. malan: so copies of them.
so it gets one and two, if yourecall the example from last time, but a copy of one and twothat are successfully swapped. but unfortunately in the end,those values are still the same. so we can see this with ournew friend, hopefully gdb, that you or the tfs and ca's havebeen guiding you toward as follows. >> so no swap recall looks like-- let'sopen up this-- looks like this. we initialized x to one, y to two. had a bunch of print f's. but then, the key callhere was to swap, which
is exactly the code wejust saw a moment ago. which is correct at firstglance, but functionally, this program does not work, becauseit doesn't permanently swap x and y. >> so let's see this, a quick warmup here with gdb, a ./noswap. a bunch of overwhelming information thati'll get rid of with control l for now. and now, i'm going togo ahead and run it. and unfortunately, thatwas not that useful. it ran the program inside of thisprogram called gdb, a debugger, but it didn't let me poke around.
>> so how can i actually pauseexecution inside this program? so break. and i could break on anyline number, one, 10, 15. but i can also break symbolicallyby saying break main. and that's going to set a breakpoint, apparently at line 16 in main. and where is line 16? let's go up to the codeand go up to noswap. and indeed, line 16 is thevery first in the program. >> so now, if i go ahead and typerun this time, enter, it paused.
so let's poke around. print x-- why is x zero? and ignore the dollar sign. that's just for fancierusage of the program. why is x zero at the moment? >> audience: it paused right beforeline 16, not actually on line 16. gdb, by default, has pausedexecution just before line 16. so it hasn't executed, whichmeans x is of some unknown value. and we got lucky that it'ssomething clean like zero.
so now if i type next,now it executed 16. it's waiting for me to execute 17. let me go ahead and print x. it's one. let me go ahead and print y. what should i see now? >> audience: [inaudible] >> david j. malan: a little louder. david j. malan: not quite a consensus.
so yes, we see some garbage value. now, y is 134514064 there. well, it's just some garbage value. my program uses ramfor different purposes. there's other functions. other people wrote inside my computer. so those bits have been used forother values, and what i'm seeing is the remnants of someprior use of that memory. >> so no big deal, because as soonas i type next and then print y,
it's initialized tothe value that i want. so now, let's go ahead a little faster. n for next. let's do it again. but i don't want to hitit here, because if i want to see what's going on insideof swap, what's the command? >> audience: steps. >> david j. malan: steps. so this steps me into afunction, rather than over it.
and now, it's a little cryptichonestly, but this is just telling me i'm in line 33 now. and let's do this again. print temp. garbage value, negative this time,but that's just still a garbage value. so let's do next, print temp. it's initialized to 1, whichwas the value of x, aka a. >> now, where are our a and x coming from? well, notice in main, wecalled these values x and y.
we then passed them to swap as follows. x came first, comma y. and then, swap could call them x and y. but for clarity, it'scalling them a and b. but a and b are now going to becopies of x and y, respectively. >> so if i go back to gdb, tempis now one and a is now one. but if i do next and now do printa, a has already been moved over. the milk has been poured into the formerorange juice's glass, or vice versa. >> and if i do next again, and nowif i print out as a sanity check,
a is still two, but b is now one. frankly, it's still there. i don't care what temp is. but as soon as i now type, let's say,continue to go back, now i'm at the end the program. and unfortunately, x isstill one and y is still two. >> so what was the utility of gdb there? it didn't help me fixthe problem per se, but it hopefully help meunderstand it by realizing
that yes, my logic is right, butmy code is not ultimately having a permanent impact. so that's a problem we'regoing to now solve today. >> but let's get there by way of this. string is a lie. it, too, not a data typethat exists in c. it's been a synonym for sometime for something else, and we can reveal that as follows. >> let me go ahead and open upa program called compare-0.
and rather than type this one out,we'll start to walk through the code i already wrote, butit's only a few lines. so this is compare-0. and the first thing i'm doingis getting a line of text. >> but notice what i'mdoing for the first time. what is different clearly about line 21? actually, wait a minute. this is copy two. that is not even the right program.
all right, spoiler alert. all right, so never mind that. that's the answer to a future question. >> here is compare-0, and i'mabout to get a line of text. program's much simpler. so this is straightforward. this is like week one, week two stuffat the moment. string s = getstring. now, i say it again down here. string t = getstring.
and then, the last thing in thisprogram, as its name suggests, is i'm going to try to compare them. >> so if s, the first string,equals = t, then i'm going to say you type the same thing. else, i'm going to sayyou type different things. so let's compile and run this program. so make compare zero. looks good. no compilation errors.
>> let me go ahead nowand type ./compare-0. let me go ahead and say something:daven and something :rob. and i type different things. so far, so good. program seems to be correct. >> but let's run it again. say something: gabe. all right. maybe i hit space baror something funky.
so zamyla. zamyla. different things. so what is going on? >> so we have these two lines ofcode, getstring being called twice. and then, i'm simplytrying to compare s and t. but what really then is going on? well, my handwriting's about tobutcher this example somewhat. and let's actually throwthis up over here, as well.
>> so we have a line likestring s = getstring. so that's simply the firstinteresting line from that program. but what all this time has beengoing on underneath the hood? well, on the left-hand side is string,which is some type of variable, and it's called s. so i know that this is using memory,or ram, in my computer somehow. so i'm going to abstractlydraw that as a square. 32 bits, it turns out, butmore on that in the future. and then, what's going on over here?
>> well, getstring obviouslygets a string from the user. and getstring gotzamyla or gabe or daven. so let's choose the firstof those, which was daven. so effectively, what getstring gotme in that first case was d-a-v-e-n. and then, what else didit give me secretly? audience: [inaudible] david j. malan: yeah,the /0 or null character. so it effectively gave me a string. but we already know from previouslooks that a string is just an array
of characters, and it's terminated bythis special sentinel character, /0. >> but if this is trueand this is a square, this is clearly a much bigger rectangle. and indeed, this is,i claim, only 32 bits. and this is clearly more than 32bits, because this is probably eight plus eight plus eightplus eight plus eight, just because of bytes in ascii. how the heck are we going to fitdaven into this little box here? >> well, what is getstring actually doing?
well, this grid here representsmy computer's memory or ram. so let's arbitrarily say that ifeach of these represents a byte, then we can think of eachbyte as having an address, like 33 oxford street, or 34oxford street, or 35 oxford street. >> so just like homes have addressesand buildings have addresses, so do individual bytes ofmemory have addresses or numbers that uniquely identify them. now, this is arbitrary. but to keep it simple, i'm going touse hexadecimal just by convention,
but the 0x means nothing otherthan "this is hexadecimal." and i'm going to claim that the"d" ends up at byte one in memory. >> i got nothing else going on inmemory, so daven got the first spot at byte one. this, then, is going to be 0x2. this is going to 0x3. this is going to be 0x4. this is going to 0x5. this is going to be 0x6.
>> but once you start thinkingabout what the computer's doing underneath the hood,you can start to infer how you, some years ago, wouldhave implemented c itself. what is getstring probablyreturning-- because it feels like it's notreturning daven, per se, because he's surely not goingto fit in this little box-- so what is getstring probably returning? >> david j. malan: the location of daven. and it's been doing thisever since week one.
what getstring is reallyreturning is not a string, per se. that's one of the little white lies. it's returning the address of thestring in memory, the unique address. daven lives at 33 oxford street. but more succinctly, gavin livesat 0x1, address number one. >> so what gets put in thislittle box then, to be clear, is just the address of that string. so all this time, thishas been going on. but what this hints atnow is that if all s has
is a number inside of it, who'sto stop you, the programmer, from putting any number inany variable and just jumping to that chunk of memory? and indeed, we'll seethat's a threat next time. >> but for now, this feels insufficient. if i say, get me astring, you give me daven. but you don't really give me daven. all you give me is daven's address. how do i then know for surewhere daven begins and ends--
the story's getting weird--where daven begins and ends, and then, the nextstring in memory starts? >> well, if you're handingme the beginning of daven, essentially, how do i knowwhere the end of his name is? that special null character, whichis all the more important now if strings underneath thehood are simply identified uniquely by their location in memory. so all this time, that'swhat's been going on. >> so when we look now atthe code here, explain
if you would the bug in line 26. why is zamyla and zamyla different? why is gabe and gabe different? yeah, in back. >> audience: they have different addresses. >> david j. malan: simply becausethey have different addresses. because when you call getstringagain, which i'll do quickly here, if this is the second line, stringt, as i did in that program, equals another call to getstring.
the next time i callgetstring, i'm going to get a different chunk of memory. >> getstring is allowedto ask the operating system for more and more memory. it's not going to reuse the samesix bytes every single time. it's going to get a newchunk of memory, which means t is going to getsome other value over here. >> so when i do s equals =t, you're not comparing d against this and a againstthis and v against this.
you're comparing thisagainst this, which frankly is pretty useful-- useless--is pretty useless, because who really cares where the strings are in memory? >> and indeed, we haven't. and we're not going tostart particularly caring. only to the extent that bugs can ariseand security threats can arise will we actually start to care about this. so let's fix this problem. turns out, you fix it super simply.
>> and let's actually, before ireveal that again, what would you do if in a cs50 class,and you had to implement a comparison against two strings. you clearly can't just use s equals = t. but just logically, howwould you compare this string against this string using c code? >> audience: just do thefor loop [inaudible] david j. malan: perfect. david j. malan: yeah.
just use a for loop or awhile loop or whatever. but just apply the basic idea that ifthis is a chunk of memory or an array and this is, iterate overboth at the same time. and just compare the letters. >> and you've got to be alittle careful, because you don't want one fingerto go past the other because one string islonger than the other. so you're going to want to check forthis special value at the end, null. but it really is, in theend, as simple as that.
and frankly, we don't needto reinvent that wheel. here is version two. and what i'm going to say here is thatinstead of comparing s equals = t, i'm instead going to say, if stringcomparison of s comma t equals = 0. now, what is string compare? >> it turns out, it's a function thatcomes with c, whose purpose in life is to compare two strings. and stir compare, if we read itsman page or documentation or cs50 reference, it willsimply tell you that stir
compare returns either a negativenumber or a positive number or zero, where zero means they're equal. >> so just conjecture. what might it mean ifstir compare returns negative value or positive value? audience: greater than or less than. david j. malan: yeah,greater than or less than. so if you wanted to sort a wholebunch of strings in a dictionary-- as we will eventually down the road--perfect function to use potentially,
because it's going to do thatcomparison of strings for you, and tell you does a comes before b, or doesb come before a alphabetically. we can do exactly that. >> and notice i did one otherthing in this example. what else has changed higherup in this main function? char*. and this is that other white lie. all this time, when you'vebeen writing string, we have been secretly rewritingstring as char* so that clang actually
understands you. >> in other words, in cs50.hand as we'll eventually see, we made a synonym called stringthat's the same thing as char*. and for now, know only that the*, in this context, at least, means the address. >> the address of what? well, the fact that i saidchar*, and not int* or float*, means that char* isthe address of a char. so this little box here, akastring, is really of type char*,
which is simply a fancy way of saying,in this box will go an address. and what does that address refer to? apparently, a char. >> but we could absolutelyhave int* and other things. but for now, char* is really the moststraightforward and one of interest. so this problem is goingto rise, though, again. >> suppose i open up this program. let's see if now we can predictwhat's wrong with this code. so in this program, copy-0, i'mgoing to go ahead and again call
getstring and store the value in s. >> and then, why am i doing this,just as a reminder from weeks past? we did say that getstringsometimes returns null. what does it mean ifgetstring returns null? something went wrong. it probably means the string is toobig, the computer's out of memory. it happens super, super, superrarely, but it could happen. we want to check for it,and that's all we're doing. >> because we'll see now, if you don'tstart checking habitually for things
like null, you mightactually start to go to addresses in memory that are invalid. and you're going to start inducingmore and more segmentation faults. or in a mac or a pc, justcausing a computer to hang or a program to freeze, potentially. >> so now, i claim in copy-0.c, that iam going to copy these strings by way of line 28. and then, i'm goingto claim at the bottom here that i'm goingto change one of them.
>> so notice this. i'm calling our old friend strlen. and just explain in englishwhat this line 34 is doing? what does t bracket 0represent on the left. >> audience: first character of t? david j. malan: first character of t. that's it. first character of t, i wantto assign the uppercase version of the first character in t.
so this is capitalizingthe first letter. and then, the very last thing i doin this program is i claim here's the original, s, and here's the copy, t. >> but based on the story we justtold about what strings really are, what is line 28 reallydoing, and what is the resulting bug goingto be on the screen? >> so first, the first question, 28. what is string t = s really doing? if we have on the left-handside here string t = s;
that gives me one boxhere and one box here. and suppose this address is 0x,let's say, 50 this time, arbitrarily. what does string t = sdo underneath the hood? >> david j. malan: it stores the memoryaddress there, so 0x50 goes there. so if now, i go to the firstcharacter in t and uppercase it, what am i effectively doing to s? i'm really doing the same thing, right? because if address 0x50-- and just, idon't have much room on the board here, but assume that this is 0x50 down here,somewhere in my computer's memory.
>> and i have, for instance, gabein lowercase here, like this. and i have said t bracket0 gets capitalized. well, t bracket 0 isthe first letter in t. so little g is going tobecome big g. but the problem is, what does s also point to? >> audience: the same. >> david j. malan: the same exact thing. so a simple explanation perhaps,even if the syntax is a little weird. so let's do this.
make copy-0 and then ./copy-0. and unfortunately, both ofthem have now been capitalized, but for that underlyingreason that we're simply now dealing with addresses. >> so how do we begin toaddress-- no pun intended-- how do we begin to addressthis particular problem? well, in copy1.c, things are goingto get a little more complicated. but i would claim aconceptually simple solution. >> so hard to get at first glance.
not going to be easy for the firsttime you type it out, perhaps, but if the problem is thatsimply doing t = s just copies the address, what,again if i can pick on you, is going to be the solutionfor actually copying a string? >> audience: we'll probablyuse a loop again. >> david j. malan: yeah. so we're going to need a loop again. and because if we want to copya string s into another string, we probably want to do itcharacter by character.
but the problem is, ifthis is originally s, now we need to start explicitlyallocating memory for t. >> in other words, let'sredraw this one last time. if this is string s = getstring. and let's put this up here, as well. this is getstring. and then, the picture for somethinglike that is going to be as before, g-a-b-e-/0. that looks a little something like this.
and s therefore, we call this 0x50,and that's going to be 51, 52. >> so this is 0x50. and then, i do string t. in memory, that's just going togive me a little square like this. so what's the key step now? if i want to copy s into t, whatblank do we need to fill in here? or what do we need todo at a high level? yeah? someone?
>> audience: we need to [inaudible]. david j. malan: yeah, weneed to fill in this blank. i can't copy and thencapitalize gabe's name until i ask the operating systemfor another chunk of memory that's at least as big as the original. so that leaves us with a question. >> how do i ask the operating system notjust for a simple little pointer-- as this is called, anaddress, a pointer-- not for a simple little boxlike this called a string?
how do i ask the operatingsystem for a big chunk of memory? thus far, i've only gotten that backindirectly by calling getstring. so how is getstringeven getting its memory? >> well, it turns out that there'sthis other function here that we'll now start to use. now, this looks way more cryptic than--and i am the only one who can see it-- this line looks way more crypticthen it should at first glance. but let's tease it apart. >> on the left-hand side, i have char* t.
so in english, let's start to formulateproper sentences in technical jargon. so this is allocating avariable of type char* called t. now, what does that really mean? >> well, that means, what am i goingto put in this variable called t? an address of a char. so that's just the simpler,more reasonable way of describing the left-hand side. so that creates this box here only. so the right-hand side,presumably, is going
to allocate that biggerchunk of memory how? so let's tease this apart. >> it's overwhelming at first glance,but what's going on inside here? first, there's malloc, whichis apparently our new friend, "memory allocate." so this is the argument being passedinto it, so it's a pretty big argument. >> strlen of s, of course, represents the-- audience: the number of characters. david j. malan: just thenumber of characters in s.
so the length of s, the original string. so g-a-b-e. so it's probably four in this case. why am i doing +1 aftercalling strlen of s? david j. malan: for thatspecial null character. if you ask me what's the length ofgabe's name, i am going to say four. underneath the hood, though, i needthat fifth byte for the null character. so that's why i'm doing the +1. >> now just in case you are running thisprogram on a computer other than, say,
the cs50 appliance,where the size of a char might be differentfrom my own computer-- turns out that i can call thisoperator sizeof, just ask the computer, what is the size of achar on this computer? >> and by multiplying five in thisexample by the size of a char, which on most computers willjust be one, malloc is going to allocate for me this bigchunk of memory over here on the right. and it's going to return--it is a function-- so it's going to return to me what?
audience: the address? david j. malan: the address of what? audience: of the memory it allocated? david j. malan: of thememory it allocated. so i have no idea, frankly,where this is going to end up. i'm going to propose thatit's going to end up at 0x88. completely arbitrary, butsomewhere other than 0x50, because the operating system, whatwindows and mac os do for me, is make sure that it's givingme different chunks of ram.
>> so this is the value where thischunk of memory might end up. so this is what ends up in here, 0x88. so now clearly, i can understandthat this is not the same as this, because they're pointing atdifferent chunks of memory. so if i now actually want to copy thisin, let's do your proposed solution. >> let's just go, create a for loop,and do t bracket i gets s bracket i. because now i can usethis array-like notation, because even though malloc verygenerically allocates me memory, memory is just contiguous bytes.
byte, byte, byte, back to back to back. >> i can surely as a programmertreat it as an array, which means i can use this finally familiarnotation of just some square brackets. >> so let me pause there, becausethis is a lot all at once, even though the basic idea to recapis that string, all this time, is not a new data type per se. it's just a so-called pointer,an address of a character, which just means it's a numberthat by human convention we tend to write as 0x something.
>> but it's just a number,like 33 oxford street, which happens to be thecs building's address. any questions on these details? >> audience: why do we checkfor t equal to null? >> david j. malan: why do wecheck for t equal to null? if we read the documentation--great question-- for malloc, it's going to say in fine print,sometimes malloc might return null, just like getstring. and indeed, getstring returns nullif, in turn, malloc returns null,
because getstring uses malloc. >> and that might happen if the os,mac os, windows, whatever, is simply out of memory for you. so that's what happened there. >> and let me reveal one other thingthat might just blow your mind or completely be too far over the line. but let me pull up thesame for loop for copying, which a moment ago, recall wasthis. t bracket i gets s bracket i. >> nice and user-friendly.
feels like week two again. but this version actually can berewritten as this, which looks cryptic. it's a technique called pointerarithmetic, address arithmetic. but why does this work? >> now annoyingly, theauthors of c decided to use the * symbol for different purposes. we've seen it used once already,char*, which means "give me a variable that's going to containthe address of a char." so char* in that contextmeans "give me a variable."
>> unfortunately, if you use the * withouta word in front of it, like char, it's now called thedereference operator. and we'll see more of this before long. but it just means "go there." it's like saying, if someone handed meon a piece of paper "33 oxford street," if i do "*33 oxford street," that means"go down the road to the cs building." >> so * just means go there ifthere's no word in front of it. so what is t, to be clear? t is the address of the chunk ofmemory that was given back to me.
s is the address of what, to be clear,in the example we've been discussing, of lowercase gabe? s is the address of-- audience: the string. david j. malan: of gabe's original name. so it's the address ofthis chunk of memory. so if i say t + i-- i, notice,is just our old friend. it's just an index variablethat's iterating from zero on up to the length of the string s.
so it's going to be zero, then one,then two, then three, then four. so let's assemble these newscratch-like puzzle pieces, if you will, even though, again, the syntaxis far more arcane than scratch. so t is an address +i is going to give me a number, because these are allnumbers that we've been drawing as hex. but they're just numbers. >> so if the address of t we saidwas 0x88, what's 0x88 plus zero. even if you're not comfortablewith hex yet, take a guess. >> audience: the original.
>> david j. malan: still 0x88. so what does * 0x88 mean? it means, "go there" which meanseffectively, "put your finger here." and now on the right-hand side ofthis expression, * and then in parens, s + i means s, which is theaddress up here of the little g. s + 0 is, of course, s, whatever s is. >> so now, it's *s, which just like *33oxford street means go to the address s. so here's this finger, right hand.
so what am i going to copy into what? the thing on the right, which isgabe, little g here, into here. >> and so the effect of thatfirst iteration of the loop, as you proposed, even though it lookscrazy more complicated than anything we've seen before, is simply sayinggo here and copy that character here. it's giving you a map to both locations. >> and we'll see far more of this. but for now, the hope is just tointroduce some of these basic ideas. and indeed, let's look atone final program here,
and then the promised claymation,which will make everything all right. so let me open up-- there we go. so let me-- we'll come backto this picture before long. let me open up this final example here. >> so here is a super, superprogram that accomplishes nothing in life that does the following. it first declares two variables, xand y, that are not numbers this time, per se. they're not integers, per se.
they are apparently int*. so just anyone, what does it meanif your data type, your variable, is of type int* star? that's the address of an int. >> so i've no idea where it is yet. it just means "put, eventually,the address of an int here." 0x50, 0x88, wherever it is inmemory, an address is going there. and that's what y isgoing to be, as well. >> if i now say x = malloc(sizeof(int)),this is a fancy way of saying,
hey operating system, via malloc,give me enough memory for the size of an int, which is probablygoing to be 32 bits or four bytes. >> so what does malloc return? malloc returns an address. so what's going to get stored in x? the address of the chunk ofmemory, the four bytes, that malloc just found for me by askingthe operating system. >> now meanwhile, linefour here, the *x = 42. just to be clear,what's going down there?
on the left-hand side, *x. that's like *33 oxford street. so *x means what? >> audience: go to. >> david j. malan: go to that address. wherever that chunk ofmemory is, go to it. and put what there, obviously? audience: 42. david j. malan: 42.
all right, *y, same idea. go to the address in y. put the number 13 there,but what is y at the moment? audience: there is no memory for y. david j. malan: thereis no memory for y. so what does y probablycontain, as we've been saying? >> audience: garbage. >> david j. malan: some garbage value. now, garbage value is still a number.
it can still be mistaken for an address. it's as though someonescribbled something down, and i misinterpreted it as meaningsome building down the street. and if you just try to go intosome building you don't own, or some chunk of memory you haven'tbeen given, bad things might happen. computer might crash, or some otherundetermined behavior might happen. >> so the intro, then, to binky is this. i still remember, 20some odd years later, where i was when i finallyunderstood pointers.
>> which is to say, if youleave here in three minutes and think i don'tunderstand pointers, realize i have remembered for 20years for some crazy reason when and why it finally sunkin, sitting with my teaching fellow, nishat mehta in theback of eliot dining hall. now, i've rememberedthis because this was one of the topics i, inparticular, struggled with. and then, it finally clicked,like i dare say a lot of topics eventually will.
and now, to make that feel all thehappier and all the more convincing, let's take a final look in ourlast three minutes here at binky, from our friend, nickparlante from stanford. >> [video playback] >> -hey, binky. wake up! it's time for pointer fun. learn about pointers? oh, goody!
>> -well, to get started, i guess we'regoing to need a couple pointers. this code allocates two pointers,which can point to integers. well, i see the two pointers, but theydon't seem to be pointing to anything. >> -that's right. initially, pointersdon't point to anything. the things they point to are calledpointees, and setting them up's a separate step. >> -oh, right, right. i knew that.
the pointees are separate. er, so how do you allocate a pointee? well, this code allocatesa new integer pointee, and this part sets x to point to it. >> -hey, that looks better. so make it do something. i'll dereference the pointer x tostore the number 42 into its pointee. for this trick, i'll need mymagic wand of dereferencing. >> -your magic wand of dereferencing?
that-- that's great. >> -this is what the code looks like. i'll just set up the number, and [pop] >> -hey, look. there it goes. >> -so doing a dereference on x followsthe arrow to access its pointee. in this case, a store 42 in there. hey try using it to store the number13 through the other pointer, y. i'll just go over here to y,and get the number 13 set up.
and then, take the wand ofdereferencing and just [buzz] >> -oh! >> -oh, hey! that didn't work. say, binky, i don't think dereferencingy is a good idea, because you know, setting up the pointeeis a separate step. and i don't think we ever did it. >> -good point. -yeah.
we allocated the pointer y, but wenever set it to point to a pointee. >> -very observant. -hey, you're looking good there, binky. can you fix it so that y pointsto the same pointee as x? >> -sure. i'll use my magic wandof pointer assignment. >> -is that going to bea problem like before? -no. this doesn't touch the pointees.
it just changes one pointer topoint to the same thing as another. >> -oh, i see. now y points to the same place as x. so wait. now, y is fixed. it has a pointee. so you can try the wand ofdereferencing again to send the 13 over. >> -uh, ok. here it goes. [pop]
>> -hey, look at that. now dereferencing works on y. and because the pointers are sharingthat one pointee, they both see the 13. sharing, whatever. so are we going to switch places now? >> oh, look. we're out of time. >> -but-- >> -just remember the three pointer rules.
number one, the basic structureis that you have a pointer, and it points over to a pointee. but the pointer andpointee are separate, and the common erroris to set up a pointer, but to forget to give it a pointee. >> number two, pointer dereferencingstarts at the pointer and follows its arrow overto access its pointee. as we all know, thisonly works if there is a pointee, which kind ofgets back to rule number one.
>> number three, pointerassignment takes one pointer and changes it to point to thesame pointee as another pointer. so after the assignment,the two pointers will point to the same pointee. sometimes, that's called sharing. and that's all there is to it, really. bye-bye now. david j. malan: that's it for cs50. we will see you next week.