Course-v1:MIT+MIT-500+en/en/block-v1:MIT+MIT-500+en+type@video+block@c93eae50a2cf457781994e14a4fc8597: Difference between revisions
Jump to navigation
Jump to search
update_content |
update_content |
||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
{"@metadata": {"sourceLanguage": "en", "priorityLanguages": ["fr"], "allowOnlyPriorityLanguages": true}, "display_name": "Welcome to the Data Engineering Nanodegree Program"} | {"@metadata": {"sourceLanguage": "en", "priorityLanguages": ["fr", "ar"], "allowOnlyPriorityLanguages": true, "description": "video in Data Engineering - "}, "display_name": "Welcome to the Data Engineering Nanodegree Program", "subtitle-80-2399-1": "we all use smartphones but have you ever", "subtitle-2399-4480-2": "wondered how much data it generates in", "subtitle-4480-5839-3": "the form of texts", "subtitle-5839-9120-4": "phone calls emails photos videos", "subtitle-9120-12000-5": "searches and music approximately 40", "subtitle-12000-14240-6": "exabytes of data gets generated every", "subtitle-14240-14880-7": "month", "subtitle-14880-18160-8": "by a single smartphone user now imagine", "subtitle-18160-20480-9": "this number multiplied by 5 billion", "subtitle-20480-21920-10": "smartphone users", "subtitle-21920-24160-11": "that's a lot for our mind even process", "subtitle-24160-24960-12": "isn't it", "subtitle-24960-27199-13": "in fact this amount of data is quite a", "subtitle-27199-29199-14": "lot for traditional computing systems to", "subtitle-29199-30000-15": "handle", "subtitle-30000-32000-16": "and this massive amount of data is what", "subtitle-32000-33200-17": "we term as", "subtitle-33200-36079-18": "big data let's have a look at the data", "subtitle-36079-38559-19": "generated per minute on the internet", "subtitle-38559-41840-20": "2.1 million snaps are shared on snapchat", "subtitle-41840-44320-21": "3.8 million search queries are made on", "subtitle-44320-45039-22": "google", "subtitle-45039-47680-23": "one million people log on to facebook", "subtitle-47680-50239-24": "4.5 million videos are watched on", "subtitle-50239-51399-25": "youtube", "subtitle-51399-54160-26": "188 million emails are sent", "subtitle-54160-57039-27": "that's a lot of data so how do you", "subtitle-57039-58000-28": "classify any", "subtitle-58000-60559-29": "data as big data this is possible with", "subtitle-60559-61440-30": "the concept of", "subtitle-61440-65119-31": "five v's volume velocity", "subtitle-65119-69040-32": "variety veracity and value", "subtitle-69040-70880-33": "let us understand this with an example", "subtitle-70880-72720-34": "from the health care industry", "subtitle-72720-75040-35": "hospitals and clinics across the world", "subtitle-75040-76159-36": "generate massive", "subtitle-76159-79600-37": "volumes of data 2 314", "subtitle-79600-82080-38": "exabytes of data are collected annually", "subtitle-82080-84080-39": "in the form of patient records and test", "subtitle-84080-85040-40": "results", "subtitle-85040-87119-41": "all this data is generated at a very", "subtitle-87119-88880-42": "high speed which attributes to the", "subtitle-88880-91119-43": "velocity of big data", "subtitle-91119-94000-44": "variety refers to the various data types", "subtitle-94000-95280-45": "such as structured", "subtitle-95280-98079-46": "semi-structured and unstructured data", "subtitle-98079-99200-47": "examples include", "subtitle-99200-103200-48": "excel records log files and x-ray images", "subtitle-103200-105119-49": "accuracy and trustworthiness of the", "subtitle-105119-107119-50": "generated data is termed as", "subtitle-107119-110000-51": "veracity analyzing all this data will", "subtitle-110000-112240-52": "benefit the medical sector by enabling", "subtitle-112240-114320-53": "faster disease detection", "subtitle-114320-117439-54": "better treatment and reduced cost", "subtitle-117439-120560-55": "this is known as the value of big data", "subtitle-120560-123040-56": "but how do we store and process this big", "subtitle-123040-123840-57": "data", "subtitle-123840-125680-58": "to do this job we have various", "subtitle-125680-127680-59": "frameworks such as cassandra", "subtitle-127680-130800-60": "hadoop and spark let us take hadoop as", "subtitle-130800-132000-61": "an example", "subtitle-132000-134560-62": "and see how hadoop stores and processes", "subtitle-134560-136000-63": "big data", "subtitle-136000-138720-64": "hadoop uses a distributed file system", "subtitle-138720-141599-65": "known as hadoop distributed file system", "subtitle-141599-143920-66": "to store big data if you have a huge", "subtitle-143920-146080-67": "file your file will be broken down into", "subtitle-146080-147200-68": "smaller chunks", "subtitle-147200-149840-69": "and stored in various machines not only", "subtitle-149840-151519-70": "that when you break the file", "subtitle-151519-153680-71": "you also make copies of it which goes", "subtitle-153680-154959-72": "into different nodes", "subtitle-154959-156879-73": "this way you store your big data in a", "subtitle-156879-158239-74": "distributed way", "subtitle-158239-160239-75": "and make sure that even if one machine", "subtitle-160239-164400-76": "fails your data is safe on another", "subtitle-164400-166800-77": "mapreduce technique is used to process", "subtitle-166800-167760-78": "big data", "subtitle-167760-170560-79": "a lengthy task a is broken into smaller", "subtitle-170560-171760-80": "tasks", "subtitle-171760-176239-81": "b c and d now instead of one machine", "subtitle-176239-178720-82": "three machines take up each task and", "subtitle-178720-180800-83": "complete it in a parallel fashion", "subtitle-180800-182879-84": "and assemble the results at the end", "subtitle-182879-185120-85": "thanks to this the processing becomes", "subtitle-185120-188159-86": "easy and fast this is known as parallel", "subtitle-188159-190640-87": "processing", "subtitle-190640-192400-88": "now that we have stored and processed", "subtitle-192400-193840-89": "our big data we can", "subtitle-193840-195519-90": "analyze this data for numerous", "subtitle-195519-196879-91": "applications", "subtitle-196879-199920-92": "in games like halo 3 and call of duty", "subtitle-199920-202480-93": "designers analyze user data to", "subtitle-202480-204480-94": "understand at which stage most of the", "subtitle-204480-205760-95": "users pause", "subtitle-205760-208720-96": "restart or quit playing this insight can", "subtitle-208720-210560-97": "help them rework on the story line of", "subtitle-210560-211200-98": "the game", "subtitle-211200-213840-99": "and improve the user experience which in", "subtitle-213840-216799-100": "turn reduces the customer churn rate", "subtitle-216799-219120-101": "similarly big data also helped with", "subtitle-219120-221120-102": "disaster management during hurricane", "subtitle-221120-222720-103": "sandy in 2012", "subtitle-222720-224000-104": "it was used to gain a better", "subtitle-224000-225920-105": "understanding of the storm's effect on", "subtitle-225920-227680-106": "the east coast of the u.s", "subtitle-227680-230000-107": "and necessary measures were taken it", "subtitle-230000-231920-108": "could predict the hurricane's landfall", "subtitle-231920-233200-109": "five days in advance", "subtitle-233200-235680-110": "which wasn't possible earlier these are", "subtitle-235680-237599-111": "some of the clear indications of how", "subtitle-237599-239360-112": "valuable big data can be", "subtitle-239360-241519-113": "once it is accurately processed and", "subtitle-241519-242799-114": "analyzed", "subtitle-242799-244879-115": "so here's a question for you which of", "subtitle-244879-246799-116": "the following statements is not correct", "subtitle-246799-248959-117": "about hadoop distributed file system", "subtitle-248959-252319-118": "hdfs a hdfs", "subtitle-252319-255599-119": "is the storage layer of hadoop b data", "subtitle-255599-257759-120": "gets stored in a distributed manner in", "subtitle-257759-259440-121": "hdfs", "subtitle-259440-262800-122": "c hdfs performs parallel processing of", "subtitle-262800-264000-123": "data", "subtitle-264000-266720-124": "d smaller chunks of data are stored on", "subtitle-266720-268080-125": "multiple data nodes in", "subtitle-268080-270880-126": "hdfs give it a thought and leave your", "subtitle-270880-272960-127": "answers in the comment section below", "subtitle-272960-275280-128": "three lucky winners will receive amazon", "subtitle-275280-276560-129": "gift vouchers", "subtitle-276560-278240-130": "now that you have learned what big data", "subtitle-278240-280080-131": "is what do you think will be the most", "subtitle-280080-280880-132": "significant", "subtitle-280880-283360-133": "impact of big data in the future let us", "subtitle-283360-285040-134": "know in the comments below", "subtitle-285040-287040-135": "if you enjoyed this video it would only", "subtitle-287040-289759-136": "take a few seconds to like and share it", "subtitle-289759-291759-137": "also to subscribe to our channel if you", "subtitle-291759-294080-138": "haven't yet and hit the bell icon to get", "subtitle-294080-295759-139": "instant notifications about our new", "subtitle-295759-296720-140": "content", "subtitle-296720-311520-141": "stay tuned and keep learning", "subtitle-311520-313600-142": "you"} |
Latest revision as of 14:06, 17 May 2023
@metadata |
| ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
display_name | "Welcome to the Data Engineering Nanodegree Program" | ||||||||||
subtitle-80-2399-1 | "we all use smartphones but have you ever" | ||||||||||
subtitle-2399-4480-2 | "wondered how much data it generates in" | ||||||||||
subtitle-4480-5839-3 | "the form of texts" | ||||||||||
subtitle-5839-9120-4 | "phone calls emails photos videos" | ||||||||||
subtitle-9120-12000-5 | "searches and music approximately 40" | ||||||||||
subtitle-12000-14240-6 | "exabytes of data gets generated every" | ||||||||||
subtitle-14240-14880-7 | "month" | ||||||||||
subtitle-14880-18160-8 | "by a single smartphone user now imagine" | ||||||||||
subtitle-18160-20480-9 | "this number multiplied by 5 billion" | ||||||||||
subtitle-20480-21920-10 | "smartphone users" | ||||||||||
subtitle-21920-24160-11 | "that's a lot for our mind even process" | ||||||||||
subtitle-24160-24960-12 | "isn't it" | ||||||||||
subtitle-24960-27199-13 | "in fact this amount of data is quite a" | ||||||||||
subtitle-27199-29199-14 | "lot for traditional computing systems to" | ||||||||||
subtitle-29199-30000-15 | "handle" | ||||||||||
subtitle-30000-32000-16 | "and this massive amount of data is what" | ||||||||||
subtitle-32000-33200-17 | "we term as" | ||||||||||
subtitle-33200-36079-18 | "big data let's have a look at the data" | ||||||||||
subtitle-36079-38559-19 | "generated per minute on the internet" | ||||||||||
subtitle-38559-41840-20 | "2.1 million snaps are shared on snapchat" | ||||||||||
subtitle-41840-44320-21 | "3.8 million search queries are made on" | ||||||||||
subtitle-44320-45039-22 | "google" | ||||||||||
subtitle-45039-47680-23 | "one million people log on to facebook" | ||||||||||
subtitle-47680-50239-24 | "4.5 million videos are watched on" | ||||||||||
subtitle-50239-51399-25 | "youtube" | ||||||||||
subtitle-51399-54160-26 | "188 million emails are sent" | ||||||||||
subtitle-54160-57039-27 | "that's a lot of data so how do you" | ||||||||||
subtitle-57039-58000-28 | "classify any" | ||||||||||
subtitle-58000-60559-29 | "data as big data this is possible with" | ||||||||||
subtitle-60559-61440-30 | "the concept of" | ||||||||||
subtitle-61440-65119-31 | "five v's volume velocity" | ||||||||||
subtitle-65119-69040-32 | "variety veracity and value" | ||||||||||
subtitle-69040-70880-33 | "let us understand this with an example" | ||||||||||
subtitle-70880-72720-34 | "from the health care industry" | ||||||||||
subtitle-72720-75040-35 | "hospitals and clinics across the world" | ||||||||||
subtitle-75040-76159-36 | "generate massive" | ||||||||||
subtitle-76159-79600-37 | "volumes of data 2 314" | ||||||||||
subtitle-79600-82080-38 | "exabytes of data are collected annually" | ||||||||||
subtitle-82080-84080-39 | "in the form of patient records and test" | ||||||||||
subtitle-84080-85040-40 | "results" | ||||||||||
subtitle-85040-87119-41 | "all this data is generated at a very" | ||||||||||
subtitle-87119-88880-42 | "high speed which attributes to the" | ||||||||||
subtitle-88880-91119-43 | "velocity of big data" | ||||||||||
subtitle-91119-94000-44 | "variety refers to the various data types" | ||||||||||
subtitle-94000-95280-45 | "such as structured" | ||||||||||
subtitle-95280-98079-46 | "semi-structured and unstructured data" | ||||||||||
subtitle-98079-99200-47 | "examples include" | ||||||||||
subtitle-99200-103200-48 | "excel records log files and x-ray images" | ||||||||||
subtitle-103200-105119-49 | "accuracy and trustworthiness of the" | ||||||||||
subtitle-105119-107119-50 | "generated data is termed as" | ||||||||||
subtitle-107119-110000-51 | "veracity analyzing all this data will" | ||||||||||
subtitle-110000-112240-52 | "benefit the medical sector by enabling" | ||||||||||
subtitle-112240-114320-53 | "faster disease detection" | ||||||||||
subtitle-114320-117439-54 | "better treatment and reduced cost" | ||||||||||
subtitle-117439-120560-55 | "this is known as the value of big data" | ||||||||||
subtitle-120560-123040-56 | "but how do we store and process this big" | ||||||||||
subtitle-123040-123840-57 | "data" | ||||||||||
subtitle-123840-125680-58 | "to do this job we have various" | ||||||||||
subtitle-125680-127680-59 | "frameworks such as cassandra" | ||||||||||
subtitle-127680-130800-60 | "hadoop and spark let us take hadoop as" | ||||||||||
subtitle-130800-132000-61 | "an example" | ||||||||||
subtitle-132000-134560-62 | "and see how hadoop stores and processes" | ||||||||||
subtitle-134560-136000-63 | "big data" | ||||||||||
subtitle-136000-138720-64 | "hadoop uses a distributed file system" | ||||||||||
subtitle-138720-141599-65 | "known as hadoop distributed file system" | ||||||||||
subtitle-141599-143920-66 | "to store big data if you have a huge" | ||||||||||
subtitle-143920-146080-67 | "file your file will be broken down into" | ||||||||||
subtitle-146080-147200-68 | "smaller chunks" | ||||||||||
subtitle-147200-149840-69 | "and stored in various machines not only" | ||||||||||
subtitle-149840-151519-70 | "that when you break the file" | ||||||||||
subtitle-151519-153680-71 | "you also make copies of it which goes" | ||||||||||
subtitle-153680-154959-72 | "into different nodes" | ||||||||||
subtitle-154959-156879-73 | "this way you store your big data in a" | ||||||||||
subtitle-156879-158239-74 | "distributed way" | ||||||||||
subtitle-158239-160239-75 | "and make sure that even if one machine" | ||||||||||
subtitle-160239-164400-76 | "fails your data is safe on another" | ||||||||||
subtitle-164400-166800-77 | "mapreduce technique is used to process" | ||||||||||
subtitle-166800-167760-78 | "big data" | ||||||||||
subtitle-167760-170560-79 | "a lengthy task a is broken into smaller" | ||||||||||
subtitle-170560-171760-80 | "tasks" | ||||||||||
subtitle-171760-176239-81 | "b c and d now instead of one machine" | ||||||||||
subtitle-176239-178720-82 | "three machines take up each task and" | ||||||||||
subtitle-178720-180800-83 | "complete it in a parallel fashion" | ||||||||||
subtitle-180800-182879-84 | "and assemble the results at the end" | ||||||||||
subtitle-182879-185120-85 | "thanks to this the processing becomes" | ||||||||||
subtitle-185120-188159-86 | "easy and fast this is known as parallel" | ||||||||||
subtitle-188159-190640-87 | "processing" | ||||||||||
subtitle-190640-192400-88 | "now that we have stored and processed" | ||||||||||
subtitle-192400-193840-89 | "our big data we can" | ||||||||||
subtitle-193840-195519-90 | "analyze this data for numerous" | ||||||||||
subtitle-195519-196879-91 | "applications" | ||||||||||
subtitle-196879-199920-92 | "in games like halo 3 and call of duty" | ||||||||||
subtitle-199920-202480-93 | "designers analyze user data to" | ||||||||||
subtitle-202480-204480-94 | "understand at which stage most of the" | ||||||||||
subtitle-204480-205760-95 | "users pause" | ||||||||||
subtitle-205760-208720-96 | "restart or quit playing this insight can" | ||||||||||
subtitle-208720-210560-97 | "help them rework on the story line of" | ||||||||||
subtitle-210560-211200-98 | "the game" | ||||||||||
subtitle-211200-213840-99 | "and improve the user experience which in" | ||||||||||
subtitle-213840-216799-100 | "turn reduces the customer churn rate" | ||||||||||
subtitle-216799-219120-101 | "similarly big data also helped with" | ||||||||||
subtitle-219120-221120-102 | "disaster management during hurricane" | ||||||||||
subtitle-221120-222720-103 | "sandy in 2012" | ||||||||||
subtitle-222720-224000-104 | "it was used to gain a better" | ||||||||||
subtitle-224000-225920-105 | "understanding of the storm's effect on" | ||||||||||
subtitle-225920-227680-106 | "the east coast of the u.s" | ||||||||||
subtitle-227680-230000-107 | "and necessary measures were taken it" | ||||||||||
subtitle-230000-231920-108 | "could predict the hurricane's landfall" | ||||||||||
subtitle-231920-233200-109 | "five days in advance" | ||||||||||
subtitle-233200-235680-110 | "which wasn't possible earlier these are" | ||||||||||
subtitle-235680-237599-111 | "some of the clear indications of how" | ||||||||||
subtitle-237599-239360-112 | "valuable big data can be" | ||||||||||
subtitle-239360-241519-113 | "once it is accurately processed and" | ||||||||||
subtitle-241519-242799-114 | "analyzed" | ||||||||||
subtitle-242799-244879-115 | "so here's a question for you which of" | ||||||||||
subtitle-244879-246799-116 | "the following statements is not correct" | ||||||||||
subtitle-246799-248959-117 | "about hadoop distributed file system" | ||||||||||
subtitle-248959-252319-118 | "hdfs a hdfs" | ||||||||||
subtitle-252319-255599-119 | "is the storage layer of hadoop b data" | ||||||||||
subtitle-255599-257759-120 | "gets stored in a distributed manner in" | ||||||||||
subtitle-257759-259440-121 | "hdfs" | ||||||||||
subtitle-259440-262800-122 | "c hdfs performs parallel processing of" | ||||||||||
subtitle-262800-264000-123 | "data" | ||||||||||
subtitle-264000-266720-124 | "d smaller chunks of data are stored on" | ||||||||||
subtitle-266720-268080-125 | "multiple data nodes in" | ||||||||||
subtitle-268080-270880-126 | "hdfs give it a thought and leave your" | ||||||||||
subtitle-270880-272960-127 | "answers in the comment section below" | ||||||||||
subtitle-272960-275280-128 | "three lucky winners will receive amazon" | ||||||||||
subtitle-275280-276560-129 | "gift vouchers" | ||||||||||
subtitle-276560-278240-130 | "now that you have learned what big data" | ||||||||||
subtitle-278240-280080-131 | "is what do you think will be the most" | ||||||||||
subtitle-280080-280880-132 | "significant" | ||||||||||
subtitle-280880-283360-133 | "impact of big data in the future let us" | ||||||||||
subtitle-283360-285040-134 | "know in the comments below" | ||||||||||
subtitle-285040-287040-135 | "if you enjoyed this video it would only" | ||||||||||
subtitle-287040-289759-136 | "take a few seconds to like and share it" | ||||||||||
subtitle-289759-291759-137 | "also to subscribe to our channel if you" | ||||||||||
subtitle-291759-294080-138 | "haven't yet and hit the bell icon to get" | ||||||||||
subtitle-294080-295759-139 | "instant notifications about our new" | ||||||||||
subtitle-295759-296720-140 | "content" | ||||||||||
subtitle-296720-311520-141 | "stay tuned and keep learning" | ||||||||||
subtitle-311520-313600-142 | "you" |