Course-v1:MIT+MIT-500+en/en/block-v1:MIT+MIT-500+en+type@video+block@c93eae50a2cf457781994e14a4fc8597: Difference between revisions

From MLEB Master
Jump to navigation Jump to search
Edly-user (talk | contribs)
update_content
 
Edly-user (talk | contribs)
update_content
Line 1: Line 1:
{"@metadata": {"sourceLanguage": "en", "priorityLanguages": ["fr"], "allowOnlyPriorityLanguages": true}, "display_name": "Welcome to the Data Engineering Nanodegree Program"}
{"@metadata": {"sourceLanguage": "en", "priorityLanguages": ["fr"], "allowOnlyPriorityLanguages": true}, "display_name": "Welcome to the Data Engineering Nanodegree Program", "subtitle-80-2399-1": "we all use smartphones but have you ever", "subtitle-2399-4480-2": "wondered how much data it generates in", "subtitle-4480-5839-3": "the form of texts", "subtitle-5839-9120-4": "phone calls emails photos videos", "subtitle-9120-12000-5": "searches and music approximately 40", "subtitle-12000-14240-6": "exabytes of data gets generated every", "subtitle-14240-14880-7": "month", "subtitle-14880-18160-8": "by a single smartphone user now imagine", "subtitle-18160-20480-9": "this number multiplied by 5 billion", "subtitle-20480-21920-10": "smartphone users", "subtitle-21920-24160-11": "that's a lot for our mind even process", "subtitle-24160-24960-12": "isn't it", "subtitle-24960-27199-13": "in fact this amount of data is quite a", "subtitle-27199-29199-14": "lot for traditional computing systems to", "subtitle-29199-30000-15": "handle", "subtitle-30000-32000-16": "and this massive amount of data is what", "subtitle-32000-33200-17": "we term as", "subtitle-33200-36079-18": "big data let's have a look at the data", "subtitle-36079-38559-19": "generated per minute on the internet", "subtitle-38559-41840-20": "2.1 million snaps are shared on snapchat", "subtitle-41840-44320-21": "3.8 million search queries are made on", "subtitle-44320-45039-22": "google", "subtitle-45039-47680-23": "one million people log on to facebook", "subtitle-47680-50239-24": "4.5 million videos are watched on", "subtitle-50239-51399-25": "youtube", "subtitle-51399-54160-26": "188 million emails are sent", "subtitle-54160-57039-27": "that's a lot of data so how do you", "subtitle-57039-58000-28": "classify any", "subtitle-58000-60559-29": "data as big data this is possible with", "subtitle-60559-61440-30": "the concept of", "subtitle-61440-65119-31": "five v's volume velocity", "subtitle-65119-69040-32": "variety veracity and value", "subtitle-69040-70880-33": "let us understand this with an example", "subtitle-70880-72720-34": "from the health care industry", "subtitle-72720-75040-35": "hospitals and clinics across the world", "subtitle-75040-76159-36": "generate massive", "subtitle-76159-79600-37": "volumes of data 2 314", "subtitle-79600-82080-38": "exabytes of data are collected annually", "subtitle-82080-84080-39": "in the form of patient records and test", "subtitle-84080-85040-40": "results", "subtitle-85040-87119-41": "all this data is generated at a very", "subtitle-87119-88880-42": "high speed which attributes to the", "subtitle-88880-91119-43": "velocity of big data", "subtitle-91119-94000-44": "variety refers to the various data types", "subtitle-94000-95280-45": "such as structured", "subtitle-95280-98079-46": "semi-structured and unstructured data", "subtitle-98079-99200-47": "examples include", "subtitle-99200-103200-48": "excel records log files and x-ray images", "subtitle-103200-105119-49": "accuracy and trustworthiness of the", "subtitle-105119-107119-50": "generated data is termed as", "subtitle-107119-110000-51": "veracity analyzing all this data will", "subtitle-110000-112240-52": "benefit the medical sector by enabling", "subtitle-112240-114320-53": "faster disease detection", "subtitle-114320-117439-54": "better treatment and reduced cost", "subtitle-117439-120560-55": "this is known as the value of big data", "subtitle-120560-123040-56": "but how do we store and process this big", "subtitle-123040-123840-57": "data", "subtitle-123840-125680-58": "to do this job we have various", "subtitle-125680-127680-59": "frameworks such as cassandra", "subtitle-127680-130800-60": "hadoop and spark let us take hadoop as", "subtitle-130800-132000-61": "an example", "subtitle-132000-134560-62": "and see how hadoop stores and processes", "subtitle-134560-136000-63": "big data", "subtitle-136000-138720-64": "hadoop uses a distributed file system", "subtitle-138720-141599-65": "known as hadoop distributed file system", "subtitle-141599-143920-66": "to store big data if you have a huge", "subtitle-143920-146080-67": "file your file will be broken down into", "subtitle-146080-147200-68": "smaller chunks", "subtitle-147200-149840-69": "and stored in various machines not only", "subtitle-149840-151519-70": "that when you break the file", "subtitle-151519-153680-71": "you also make copies of it which goes", "subtitle-153680-154959-72": "into different nodes", "subtitle-154959-156879-73": "this way you store your big data in a", "subtitle-156879-158239-74": "distributed way", "subtitle-158239-160239-75": "and make sure that even if one machine", "subtitle-160239-164400-76": "fails your data is safe on another", "subtitle-164400-166800-77": "mapreduce technique is used to process", "subtitle-166800-167760-78": "big data", "subtitle-167760-170560-79": "a lengthy task a is broken into smaller", "subtitle-170560-171760-80": "tasks", "subtitle-171760-176239-81": "b c and d now instead of one machine", "subtitle-176239-178720-82": "three machines take up each task and", "subtitle-178720-180800-83": "complete it in a parallel fashion", "subtitle-180800-182879-84": "and assemble the results at the end", "subtitle-182879-185120-85": "thanks to this the processing becomes", "subtitle-185120-188159-86": "easy and fast this is known as parallel", "subtitle-188159-190640-87": "processing", "subtitle-190640-192400-88": "now that we have stored and processed", "subtitle-192400-193840-89": "our big data we can", "subtitle-193840-195519-90": "analyze this data for numerous", "subtitle-195519-196879-91": "applications", "subtitle-196879-199920-92": "in games like halo 3 and call of duty", "subtitle-199920-202480-93": "designers analyze user data to", "subtitle-202480-204480-94": "understand at which stage most of the", "subtitle-204480-205760-95": "users pause", "subtitle-205760-208720-96": "restart or quit playing this insight can", "subtitle-208720-210560-97": "help them rework on the story line of", "subtitle-210560-211200-98": "the game", "subtitle-211200-213840-99": "and improve the user experience which in", "subtitle-213840-216799-100": "turn reduces the customer churn rate", "subtitle-216799-219120-101": "similarly big data also helped with", "subtitle-219120-221120-102": "disaster management during hurricane", "subtitle-221120-222720-103": "sandy in 2012", "subtitle-222720-224000-104": "it was used to gain a better", "subtitle-224000-225920-105": "understanding of the storm's effect on", "subtitle-225920-227680-106": "the east coast of the u.s", "subtitle-227680-230000-107": "and necessary measures were taken it", "subtitle-230000-231920-108": "could predict the hurricane's landfall", "subtitle-231920-233200-109": "five days in advance", "subtitle-233200-235680-110": "which wasn't possible earlier these are", "subtitle-235680-237599-111": "some of the clear indications of how", "subtitle-237599-239360-112": "valuable big data can be", "subtitle-239360-241519-113": "once it is accurately processed and", "subtitle-241519-242799-114": "analyzed", "subtitle-242799-244879-115": "so here's a question for you which of", "subtitle-244879-246799-116": "the following statements is not correct", "subtitle-246799-248959-117": "about hadoop distributed file system", "subtitle-248959-252319-118": "hdfs a hdfs", "subtitle-252319-255599-119": "is the storage layer of hadoop b data", "subtitle-255599-257759-120": "gets stored in a distributed manner in", "subtitle-257759-259440-121": "hdfs", "subtitle-259440-262800-122": "c hdfs performs parallel processing of", "subtitle-262800-264000-123": "data", "subtitle-264000-266720-124": "d smaller chunks of data are stored on", "subtitle-266720-268080-125": "multiple data nodes in", "subtitle-268080-270880-126": "hdfs give it a thought and leave your", "subtitle-270880-272960-127": "answers in the comment section below", "subtitle-272960-275280-128": "three lucky winners will receive amazon", "subtitle-275280-276560-129": "gift vouchers", "subtitle-276560-278240-130": "now that you have learned what big data", "subtitle-278240-280080-131": "is what do you think will be the most", "subtitle-280080-280880-132": "significant", "subtitle-280880-283360-133": "impact of big data in the future let us", "subtitle-283360-285040-134": "know in the comments below", "subtitle-285040-287040-135": "if you enjoyed this video it would only", "subtitle-287040-289759-136": "take a few seconds to like and share it", "subtitle-289759-291759-137": "also to subscribe to our channel if you", "subtitle-291759-294080-138": "haven't yet and hit the bell icon to get", "subtitle-294080-295759-139": "instant notifications about our new", "subtitle-295759-296720-140": "content", "subtitle-296720-311520-141": "stay tuned and keep learning", "subtitle-311520-313600-142": "you"}

Revision as of 13:05, 4 August 2022

@metadata
sourceLanguage"en"
priorityLanguages
"fr"
allowOnlyPriorityLanguagestrue
display_name"Welcome to the Data Engineering Nanodegree Program"
subtitle-80-2399-1"we all use smartphones but have you ever"
subtitle-2399-4480-2"wondered how much data it generates in"
subtitle-4480-5839-3"the form of texts"
subtitle-5839-9120-4"phone calls emails photos videos"
subtitle-9120-12000-5"searches and music approximately 40"
subtitle-12000-14240-6"exabytes of data gets generated every"
subtitle-14240-14880-7"month"
subtitle-14880-18160-8"by a single smartphone user now imagine"
subtitle-18160-20480-9"this number multiplied by 5 billion"
subtitle-20480-21920-10"smartphone users"
subtitle-21920-24160-11"that's a lot for our mind even process"
subtitle-24160-24960-12"isn't it"
subtitle-24960-27199-13"in fact this amount of data is quite a"
subtitle-27199-29199-14"lot for traditional computing systems to"
subtitle-29199-30000-15"handle"
subtitle-30000-32000-16"and this massive amount of data is what"
subtitle-32000-33200-17"we term as"
subtitle-33200-36079-18"big data let's have a look at the data"
subtitle-36079-38559-19"generated per minute on the internet"
subtitle-38559-41840-20"2.1 million snaps are shared on snapchat"
subtitle-41840-44320-21"3.8 million search queries are made on"
subtitle-44320-45039-22"google"
subtitle-45039-47680-23"one million people log on to facebook"
subtitle-47680-50239-24"4.5 million videos are watched on"
subtitle-50239-51399-25"youtube"
subtitle-51399-54160-26"188 million emails are sent"
subtitle-54160-57039-27"that's a lot of data so how do you"
subtitle-57039-58000-28"classify any"
subtitle-58000-60559-29"data as big data this is possible with"
subtitle-60559-61440-30"the concept of"
subtitle-61440-65119-31"five v's volume velocity"
subtitle-65119-69040-32"variety veracity and value"
subtitle-69040-70880-33"let us understand this with an example"
subtitle-70880-72720-34"from the health care industry"
subtitle-72720-75040-35"hospitals and clinics across the world"
subtitle-75040-76159-36"generate massive"
subtitle-76159-79600-37"volumes of data 2 314"
subtitle-79600-82080-38"exabytes of data are collected annually"
subtitle-82080-84080-39"in the form of patient records and test"
subtitle-84080-85040-40"results"
subtitle-85040-87119-41"all this data is generated at a very"
subtitle-87119-88880-42"high speed which attributes to the"
subtitle-88880-91119-43"velocity of big data"
subtitle-91119-94000-44"variety refers to the various data types"
subtitle-94000-95280-45"such as structured"
subtitle-95280-98079-46"semi-structured and unstructured data"
subtitle-98079-99200-47"examples include"
subtitle-99200-103200-48"excel records log files and x-ray images"
subtitle-103200-105119-49"accuracy and trustworthiness of the"
subtitle-105119-107119-50"generated data is termed as"
subtitle-107119-110000-51"veracity analyzing all this data will"
subtitle-110000-112240-52"benefit the medical sector by enabling"
subtitle-112240-114320-53"faster disease detection"
subtitle-114320-117439-54"better treatment and reduced cost"
subtitle-117439-120560-55"this is known as the value of big data"
subtitle-120560-123040-56"but how do we store and process this big"
subtitle-123040-123840-57"data"
subtitle-123840-125680-58"to do this job we have various"
subtitle-125680-127680-59"frameworks such as cassandra"
subtitle-127680-130800-60"hadoop and spark let us take hadoop as"
subtitle-130800-132000-61"an example"
subtitle-132000-134560-62"and see how hadoop stores and processes"
subtitle-134560-136000-63"big data"
subtitle-136000-138720-64"hadoop uses a distributed file system"
subtitle-138720-141599-65"known as hadoop distributed file system"
subtitle-141599-143920-66"to store big data if you have a huge"
subtitle-143920-146080-67"file your file will be broken down into"
subtitle-146080-147200-68"smaller chunks"
subtitle-147200-149840-69"and stored in various machines not only"
subtitle-149840-151519-70"that when you break the file"
subtitle-151519-153680-71"you also make copies of it which goes"
subtitle-153680-154959-72"into different nodes"
subtitle-154959-156879-73"this way you store your big data in a"
subtitle-156879-158239-74"distributed way"
subtitle-158239-160239-75"and make sure that even if one machine"
subtitle-160239-164400-76"fails your data is safe on another"
subtitle-164400-166800-77"mapreduce technique is used to process"
subtitle-166800-167760-78"big data"
subtitle-167760-170560-79"a lengthy task a is broken into smaller"
subtitle-170560-171760-80"tasks"
subtitle-171760-176239-81"b c and d now instead of one machine"
subtitle-176239-178720-82"three machines take up each task and"
subtitle-178720-180800-83"complete it in a parallel fashion"
subtitle-180800-182879-84"and assemble the results at the end"
subtitle-182879-185120-85"thanks to this the processing becomes"
subtitle-185120-188159-86"easy and fast this is known as parallel"
subtitle-188159-190640-87"processing"
subtitle-190640-192400-88"now that we have stored and processed"
subtitle-192400-193840-89"our big data we can"
subtitle-193840-195519-90"analyze this data for numerous"
subtitle-195519-196879-91"applications"
subtitle-196879-199920-92"in games like halo 3 and call of duty"
subtitle-199920-202480-93"designers analyze user data to"
subtitle-202480-204480-94"understand at which stage most of the"
subtitle-204480-205760-95"users pause"
subtitle-205760-208720-96"restart or quit playing this insight can"
subtitle-208720-210560-97"help them rework on the story line of"
subtitle-210560-211200-98"the game"
subtitle-211200-213840-99"and improve the user experience which in"
subtitle-213840-216799-100"turn reduces the customer churn rate"
subtitle-216799-219120-101"similarly big data also helped with"
subtitle-219120-221120-102"disaster management during hurricane"
subtitle-221120-222720-103"sandy in 2012"
subtitle-222720-224000-104"it was used to gain a better"
subtitle-224000-225920-105"understanding of the storm's effect on"
subtitle-225920-227680-106"the east coast of the u.s"
subtitle-227680-230000-107"and necessary measures were taken it"
subtitle-230000-231920-108"could predict the hurricane's landfall"
subtitle-231920-233200-109"five days in advance"
subtitle-233200-235680-110"which wasn't possible earlier these are"
subtitle-235680-237599-111"some of the clear indications of how"
subtitle-237599-239360-112"valuable big data can be"
subtitle-239360-241519-113"once it is accurately processed and"
subtitle-241519-242799-114"analyzed"
subtitle-242799-244879-115"so here's a question for you which of"
subtitle-244879-246799-116"the following statements is not correct"
subtitle-246799-248959-117"about hadoop distributed file system"
subtitle-248959-252319-118"hdfs a hdfs"
subtitle-252319-255599-119"is the storage layer of hadoop b data"
subtitle-255599-257759-120"gets stored in a distributed manner in"
subtitle-257759-259440-121"hdfs"
subtitle-259440-262800-122"c hdfs performs parallel processing of"
subtitle-262800-264000-123"data"
subtitle-264000-266720-124"d smaller chunks of data are stored on"
subtitle-266720-268080-125"multiple data nodes in"
subtitle-268080-270880-126"hdfs give it a thought and leave your"
subtitle-270880-272960-127"answers in the comment section below"
subtitle-272960-275280-128"three lucky winners will receive amazon"
subtitle-275280-276560-129"gift vouchers"
subtitle-276560-278240-130"now that you have learned what big data"
subtitle-278240-280080-131"is what do you think will be the most"
subtitle-280080-280880-132"significant"
subtitle-280880-283360-133"impact of big data in the future let us"
subtitle-283360-285040-134"know in the comments below"
subtitle-285040-287040-135"if you enjoyed this video it would only"
subtitle-287040-289759-136"take a few seconds to like and share it"
subtitle-289759-291759-137"also to subscribe to our channel if you"
subtitle-291759-294080-138"haven't yet and hit the bell icon to get"
subtitle-294080-295759-139"instant notifications about our new"
subtitle-295759-296720-140"content"
subtitle-296720-311520-141"stay tuned and keep learning"
subtitle-311520-313600-142"you"