Editing Course-v1:MIT+MIT-500+en/en/block-v1:MIT+MIT-500+en+type@video+block@c93eae50a2cf457781994e14a4fc8597

Jump to navigation Jump to search
Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
{"@metadata": {"sourceLanguage": "en", "priorityLanguages": ["fr", "ar"], "allowOnlyPriorityLanguages": true, "description": "video in Data Engineering - "}, "display_name": "Welcome to the Data Engineering Nanodegree Program", "subtitle-80-2399-1": "we all use smartphones but have you ever", "subtitle-2399-4480-2": "wondered how much data it generates in", "subtitle-4480-5839-3": "the form of texts", "subtitle-5839-9120-4": "phone calls emails photos videos", "subtitle-9120-12000-5": "searches and music approximately 40", "subtitle-12000-14240-6": "exabytes of data gets generated every", "subtitle-14240-14880-7": "month", "subtitle-14880-18160-8": "by a single smartphone user now imagine", "subtitle-18160-20480-9": "this number multiplied by 5 billion", "subtitle-20480-21920-10": "smartphone users", "subtitle-21920-24160-11": "that's a lot for our mind even process", "subtitle-24160-24960-12": "isn't it", "subtitle-24960-27199-13": "in fact this amount of data is quite a", "subtitle-27199-29199-14": "lot for traditional computing systems to", "subtitle-29199-30000-15": "handle", "subtitle-30000-32000-16": "and this massive amount of data is what", "subtitle-32000-33200-17": "we term as", "subtitle-33200-36079-18": "big data let's have a look at the data", "subtitle-36079-38559-19": "generated per minute on the internet", "subtitle-38559-41840-20": "2.1 million snaps are shared on snapchat", "subtitle-41840-44320-21": "3.8 million search queries are made on", "subtitle-44320-45039-22": "google", "subtitle-45039-47680-23": "one million people log on to facebook", "subtitle-47680-50239-24": "4.5 million videos are watched on", "subtitle-50239-51399-25": "youtube", "subtitle-51399-54160-26": "188 million emails are sent", "subtitle-54160-57039-27": "that's a lot of data so how do you", "subtitle-57039-58000-28": "classify any", "subtitle-58000-60559-29": "data as big data this is possible with", "subtitle-60559-61440-30": "the concept of", "subtitle-61440-65119-31": "five v's volume velocity", "subtitle-65119-69040-32": "variety veracity and value", "subtitle-69040-70880-33": "let us understand this with an example", "subtitle-70880-72720-34": "from the health care industry", "subtitle-72720-75040-35": "hospitals and clinics across the world", "subtitle-75040-76159-36": "generate massive", "subtitle-76159-79600-37": "volumes of data 2 314", "subtitle-79600-82080-38": "exabytes of data are collected annually", "subtitle-82080-84080-39": "in the form of patient records and test", "subtitle-84080-85040-40": "results", "subtitle-85040-87119-41": "all this data is generated at a very", "subtitle-87119-88880-42": "high speed which attributes to the", "subtitle-88880-91119-43": "velocity of big data", "subtitle-91119-94000-44": "variety refers to the various data types", "subtitle-94000-95280-45": "such as structured", "subtitle-95280-98079-46": "semi-structured and unstructured data", "subtitle-98079-99200-47": "examples include", "subtitle-99200-103200-48": "excel records log files and x-ray images", "subtitle-103200-105119-49": "accuracy and trustworthiness of the", "subtitle-105119-107119-50": "generated data is termed as", "subtitle-107119-110000-51": "veracity analyzing all this data will", "subtitle-110000-112240-52": "benefit the medical sector by enabling", "subtitle-112240-114320-53": "faster disease detection", "subtitle-114320-117439-54": "better treatment and reduced cost", "subtitle-117439-120560-55": "this is known as the value of big data", "subtitle-120560-123040-56": "but how do we store and process this big", "subtitle-123040-123840-57": "data", "subtitle-123840-125680-58": "to do this job we have various", "subtitle-125680-127680-59": "frameworks such as cassandra", "subtitle-127680-130800-60": "hadoop and spark let us take hadoop as", "subtitle-130800-132000-61": "an example", "subtitle-132000-134560-62": "and see how hadoop stores and processes", "subtitle-134560-136000-63": "big data", "subtitle-136000-138720-64": "hadoop uses a distributed file system", "subtitle-138720-141599-65": "known as hadoop distributed file system", "subtitle-141599-143920-66": "to store big data if you have a huge", "subtitle-143920-146080-67": "file your file will be broken down into", "subtitle-146080-147200-68": "smaller chunks", "subtitle-147200-149840-69": "and stored in various machines not only", "subtitle-149840-151519-70": "that when you break the file", "subtitle-151519-153680-71": "you also make copies of it which goes", "subtitle-153680-154959-72": "into different nodes", "subtitle-154959-156879-73": "this way you store your big data in a", "subtitle-156879-158239-74": "distributed way", "subtitle-158239-160239-75": "and make sure that even if one machine", "subtitle-160239-164400-76": "fails your data is safe on another", "subtitle-164400-166800-77": "mapreduce technique is used to process", "subtitle-166800-167760-78": "big data", "subtitle-167760-170560-79": "a lengthy task a is broken into smaller", "subtitle-170560-171760-80": "tasks", "subtitle-171760-176239-81": "b c and d now instead of one machine", "subtitle-176239-178720-82": "three machines take up each task and", "subtitle-178720-180800-83": "complete it in a parallel fashion", "subtitle-180800-182879-84": "and assemble the results at the end", "subtitle-182879-185120-85": "thanks to this the processing becomes", "subtitle-185120-188159-86": "easy and fast this is known as parallel", "subtitle-188159-190640-87": "processing", "subtitle-190640-192400-88": "now that we have stored and processed", "subtitle-192400-193840-89": "our big data we can", "subtitle-193840-195519-90": "analyze this data for numerous", "subtitle-195519-196879-91": "applications", "subtitle-196879-199920-92": "in games like halo 3 and call of duty", "subtitle-199920-202480-93": "designers analyze user data to", "subtitle-202480-204480-94": "understand at which stage most of the", "subtitle-204480-205760-95": "users pause", "subtitle-205760-208720-96": "restart or quit playing this insight can", "subtitle-208720-210560-97": "help them rework on the story line of", "subtitle-210560-211200-98": "the game", "subtitle-211200-213840-99": "and improve the user experience which in", "subtitle-213840-216799-100": "turn reduces the customer churn rate", "subtitle-216799-219120-101": "similarly big data also helped with", "subtitle-219120-221120-102": "disaster management during hurricane", "subtitle-221120-222720-103": "sandy in 2012", "subtitle-222720-224000-104": "it was used to gain a better", "subtitle-224000-225920-105": "understanding of the storm's effect on", "subtitle-225920-227680-106": "the east coast of the u.s", "subtitle-227680-230000-107": "and necessary measures were taken it", "subtitle-230000-231920-108": "could predict the hurricane's landfall", "subtitle-231920-233200-109": "five days in advance", "subtitle-233200-235680-110": "which wasn't possible earlier these are", "subtitle-235680-237599-111": "some of the clear indications of how", "subtitle-237599-239360-112": "valuable big data can be", "subtitle-239360-241519-113": "once it is accurately processed and", "subtitle-241519-242799-114": "analyzed", "subtitle-242799-244879-115": "so here's a question for you which of", "subtitle-244879-246799-116": "the following statements is not correct", "subtitle-246799-248959-117": "about hadoop distributed file system", "subtitle-248959-252319-118": "hdfs a hdfs", "subtitle-252319-255599-119": "is the storage layer of hadoop b data", "subtitle-255599-257759-120": "gets stored in a distributed manner in", "subtitle-257759-259440-121": "hdfs", "subtitle-259440-262800-122": "c hdfs performs parallel processing of", "subtitle-262800-264000-123": "data", "subtitle-264000-266720-124": "d smaller chunks of data are stored on", "subtitle-266720-268080-125": "multiple data nodes in", "subtitle-268080-270880-126": "hdfs give it a thought and leave your", "subtitle-270880-272960-127": "answers in the comment section below", "subtitle-272960-275280-128": "three lucky winners will receive amazon", "subtitle-275280-276560-129": "gift vouchers", "subtitle-276560-278240-130": "now that you have learned what big data", "subtitle-278240-280080-131": "is what do you think will be the most", "subtitle-280080-280880-132": "significant", "subtitle-280880-283360-133": "impact of big data in the future let us", "subtitle-283360-285040-134": "know in the comments below", "subtitle-285040-287040-135": "if you enjoyed this video it would only", "subtitle-287040-289759-136": "take a few seconds to like and share it", "subtitle-289759-291759-137": "also to subscribe to our channel if you", "subtitle-291759-294080-138": "haven't yet and hit the bell icon to get", "subtitle-294080-295759-139": "instant notifications about our new", "subtitle-295759-296720-140": "content", "subtitle-296720-311520-141": "stay tuned and keep learning", "subtitle-311520-313600-142": "you"}
{"@metadata": {"sourceLanguage": "en", "priorityLanguages": ["fr"], "allowOnlyPriorityLanguages": true}, "display_name": "Welcome to the Data Engineering Nanodegree Program", "subtitle-80-2399-1": "we all use smartphones but have you ever", "subtitle-2399-4480-2": "wondered how much data it generates in", "subtitle-4480-5839-3": "the form of texts", "subtitle-5839-9120-4": "phone calls emails photos videos", "subtitle-9120-12000-5": "searches and music approximately 40", "subtitle-12000-14240-6": "exabytes of data gets generated every", "subtitle-14240-14880-7": "month", "subtitle-14880-18160-8": "by a single smartphone user now imagine", "subtitle-18160-20480-9": "this number multiplied by 5 billion", "subtitle-20480-21920-10": "smartphone users", "subtitle-21920-24160-11": "that's a lot for our mind even process", "subtitle-24160-24960-12": "isn't it", "subtitle-24960-27199-13": "in fact this amount of data is quite a", "subtitle-27199-29199-14": "lot for traditional computing systems to", "subtitle-29199-30000-15": "handle", "subtitle-30000-32000-16": "and this massive amount of data is what", "subtitle-32000-33200-17": "we term as", "subtitle-33200-36079-18": "big data let's have a look at the data", "subtitle-36079-38559-19": "generated per minute on the internet", "subtitle-38559-41840-20": "2.1 million snaps are shared on snapchat", "subtitle-41840-44320-21": "3.8 million search queries are made on", "subtitle-44320-45039-22": "google", "subtitle-45039-47680-23": "one million people log on to facebook", "subtitle-47680-50239-24": "4.5 million videos are watched on", "subtitle-50239-51399-25": "youtube", "subtitle-51399-54160-26": "188 million emails are sent", "subtitle-54160-57039-27": "that's a lot of data so how do you", "subtitle-57039-58000-28": "classify any", "subtitle-58000-60559-29": "data as big data this is possible with", "subtitle-60559-61440-30": "the concept of", "subtitle-61440-65119-31": "five v's volume velocity", "subtitle-65119-69040-32": "variety veracity and value", "subtitle-69040-70880-33": "let us understand this with an example", "subtitle-70880-72720-34": "from the health care industry", "subtitle-72720-75040-35": "hospitals and clinics across the world", "subtitle-75040-76159-36": "generate massive", "subtitle-76159-79600-37": "volumes of data 2 314", "subtitle-79600-82080-38": "exabytes of data are collected annually", "subtitle-82080-84080-39": "in the form of patient records and test", "subtitle-84080-85040-40": "results", "subtitle-85040-87119-41": "all this data is generated at a very", "subtitle-87119-88880-42": "high speed which attributes to the", "subtitle-88880-91119-43": "velocity of big data", "subtitle-91119-94000-44": "variety refers to the various data types", "subtitle-94000-95280-45": "such as structured", "subtitle-95280-98079-46": "semi-structured and unstructured data", "subtitle-98079-99200-47": "examples include", "subtitle-99200-103200-48": "excel records log files and x-ray images", "subtitle-103200-105119-49": "accuracy and trustworthiness of the", "subtitle-105119-107119-50": "generated data is termed as", "subtitle-107119-110000-51": "veracity analyzing all this data will", "subtitle-110000-112240-52": "benefit the medical sector by enabling", "subtitle-112240-114320-53": "faster disease detection", "subtitle-114320-117439-54": "better treatment and reduced cost", "subtitle-117439-120560-55": "this is known as the value of big data", "subtitle-120560-123040-56": "but how do we store and process this big", "subtitle-123040-123840-57": "data", "subtitle-123840-125680-58": "to do this job we have various", "subtitle-125680-127680-59": "frameworks such as cassandra", "subtitle-127680-130800-60": "hadoop and spark let us take hadoop as", "subtitle-130800-132000-61": "an example", "subtitle-132000-134560-62": "and see how hadoop stores and processes", "subtitle-134560-136000-63": "big data", "subtitle-136000-138720-64": "hadoop uses a distributed file system", "subtitle-138720-141599-65": "known as hadoop distributed file system", "subtitle-141599-143920-66": "to store big data if you have a huge", "subtitle-143920-146080-67": "file your file will be broken down into", "subtitle-146080-147200-68": "smaller chunks", "subtitle-147200-149840-69": "and stored in various machines not only", "subtitle-149840-151519-70": "that when you break the file", "subtitle-151519-153680-71": "you also make copies of it which goes", "subtitle-153680-154959-72": "into different nodes", "subtitle-154959-156879-73": "this way you store your big data in a", "subtitle-156879-158239-74": "distributed way", "subtitle-158239-160239-75": "and make sure that even if one machine", "subtitle-160239-164400-76": "fails your data is safe on another", "subtitle-164400-166800-77": "mapreduce technique is used to process", "subtitle-166800-167760-78": "big data", "subtitle-167760-170560-79": "a lengthy task a is broken into smaller", "subtitle-170560-171760-80": "tasks", "subtitle-171760-176239-81": "b c and d now instead of one machine", "subtitle-176239-178720-82": "three machines take up each task and", "subtitle-178720-180800-83": "complete it in a parallel fashion", "subtitle-180800-182879-84": "and assemble the results at the end", "subtitle-182879-185120-85": "thanks to this the processing becomes", "subtitle-185120-188159-86": "easy and fast this is known as parallel", "subtitle-188159-190640-87": "processing", "subtitle-190640-192400-88": "now that we have stored and processed", "subtitle-192400-193840-89": "our big data we can", "subtitle-193840-195519-90": "analyze this data for numerous", "subtitle-195519-196879-91": "applications", "subtitle-196879-199920-92": "in games like halo 3 and call of duty", "subtitle-199920-202480-93": "designers analyze user data to", "subtitle-202480-204480-94": "understand at which stage most of the", "subtitle-204480-205760-95": "users pause", "subtitle-205760-208720-96": "restart or quit playing this insight can", "subtitle-208720-210560-97": "help them rework on the story line of", "subtitle-210560-211200-98": "the game", "subtitle-211200-213840-99": "and improve the user experience which in", "subtitle-213840-216799-100": "turn reduces the customer churn rate", "subtitle-216799-219120-101": "similarly big data also helped with", "subtitle-219120-221120-102": "disaster management during hurricane", "subtitle-221120-222720-103": "sandy in 2012", "subtitle-222720-224000-104": "it was used to gain a better", "subtitle-224000-225920-105": "understanding of the storm's effect on", "subtitle-225920-227680-106": "the east coast of the u.s", "subtitle-227680-230000-107": "and necessary measures were taken it", "subtitle-230000-231920-108": "could predict the hurricane's landfall", "subtitle-231920-233200-109": "five days in advance", "subtitle-233200-235680-110": "which wasn't possible earlier these are", "subtitle-235680-237599-111": "some of the clear indications of how", "subtitle-237599-239360-112": "valuable big data can be", "subtitle-239360-241519-113": "once it is accurately processed and", "subtitle-241519-242799-114": "analyzed", "subtitle-242799-244879-115": "so here's a question for you which of", "subtitle-244879-246799-116": "the following statements is not correct", "subtitle-246799-248959-117": "about hadoop distributed file system", "subtitle-248959-252319-118": "hdfs a hdfs", "subtitle-252319-255599-119": "is the storage layer of hadoop b data", "subtitle-255599-257759-120": "gets stored in a distributed manner in", "subtitle-257759-259440-121": "hdfs", "subtitle-259440-262800-122": "c hdfs performs parallel processing of", "subtitle-262800-264000-123": "data", "subtitle-264000-266720-124": "d smaller chunks of data are stored on", "subtitle-266720-268080-125": "multiple data nodes in", "subtitle-268080-270880-126": "hdfs give it a thought and leave your", "subtitle-270880-272960-127": "answers in the comment section below", "subtitle-272960-275280-128": "three lucky winners will receive amazon", "subtitle-275280-276560-129": "gift vouchers", "subtitle-276560-278240-130": "now that you have learned what big data", "subtitle-278240-280080-131": "is what do you think will be the most", "subtitle-280080-280880-132": "significant", "subtitle-280880-283360-133": "impact of big data in the future let us", "subtitle-283360-285040-134": "know in the comments below", "subtitle-285040-287040-135": "if you enjoyed this video it would only", "subtitle-287040-289759-136": "take a few seconds to like and share it", "subtitle-289759-291759-137": "also to subscribe to our channel if you", "subtitle-291759-294080-138": "haven't yet and hit the bell icon to get", "subtitle-294080-295759-139": "instant notifications about our new", "subtitle-295759-296720-140": "content", "subtitle-296720-311520-141": "stay tuned and keep learning", "subtitle-311520-313600-142": "you"}
Please note that all contributions to MLEB Master are considered to be released under the Creative Commons Attribution-ShareAlike (see MLEB Master:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!
Cancel Editing help (opens in new window)