Course-v1:MIT+MIT-500+en/en/block-v1:MIT+MIT-500+en+type@video+block@c93eae50a2cf457781994e14a4fc8597

From MLEB Master
Jump to navigation Jump to search
@metadata
sourceLanguage"en"
priorityLanguages
"fr"
"ar"
allowOnlyPriorityLanguagestrue
description"video in Data Engineering - "
display_name"Welcome to the Data Engineering Nanodegree Program"
subtitle-80-2399-1"we all use smartphones but have you ever"
subtitle-2399-4480-2"wondered how much data it generates in"
subtitle-4480-5839-3"the form of texts"
subtitle-5839-9120-4"phone calls emails photos videos"
subtitle-9120-12000-5"searches and music approximately 40"
subtitle-12000-14240-6"exabytes of data gets generated every"
subtitle-14240-14880-7"month"
subtitle-14880-18160-8"by a single smartphone user now imagine"
subtitle-18160-20480-9"this number multiplied by 5 billion"
subtitle-20480-21920-10"smartphone users"
subtitle-21920-24160-11"that's a lot for our mind even process"
subtitle-24160-24960-12"isn't it"
subtitle-24960-27199-13"in fact this amount of data is quite a"
subtitle-27199-29199-14"lot for traditional computing systems to"
subtitle-29199-30000-15"handle"
subtitle-30000-32000-16"and this massive amount of data is what"
subtitle-32000-33200-17"we term as"
subtitle-33200-36079-18"big data let's have a look at the data"
subtitle-36079-38559-19"generated per minute on the internet"
subtitle-38559-41840-20"2.1 million snaps are shared on snapchat"
subtitle-41840-44320-21"3.8 million search queries are made on"
subtitle-44320-45039-22"google"
subtitle-45039-47680-23"one million people log on to facebook"
subtitle-47680-50239-24"4.5 million videos are watched on"
subtitle-50239-51399-25"youtube"
subtitle-51399-54160-26"188 million emails are sent"
subtitle-54160-57039-27"that's a lot of data so how do you"
subtitle-57039-58000-28"classify any"
subtitle-58000-60559-29"data as big data this is possible with"
subtitle-60559-61440-30"the concept of"
subtitle-61440-65119-31"five v's volume velocity"
subtitle-65119-69040-32"variety veracity and value"
subtitle-69040-70880-33"let us understand this with an example"
subtitle-70880-72720-34"from the health care industry"
subtitle-72720-75040-35"hospitals and clinics across the world"
subtitle-75040-76159-36"generate massive"
subtitle-76159-79600-37"volumes of data 2 314"
subtitle-79600-82080-38"exabytes of data are collected annually"
subtitle-82080-84080-39"in the form of patient records and test"
subtitle-84080-85040-40"results"
subtitle-85040-87119-41"all this data is generated at a very"
subtitle-87119-88880-42"high speed which attributes to the"
subtitle-88880-91119-43"velocity of big data"
subtitle-91119-94000-44"variety refers to the various data types"
subtitle-94000-95280-45"such as structured"
subtitle-95280-98079-46"semi-structured and unstructured data"
subtitle-98079-99200-47"examples include"
subtitle-99200-103200-48"excel records log files and x-ray images"
subtitle-103200-105119-49"accuracy and trustworthiness of the"
subtitle-105119-107119-50"generated data is termed as"
subtitle-107119-110000-51"veracity analyzing all this data will"
subtitle-110000-112240-52"benefit the medical sector by enabling"
subtitle-112240-114320-53"faster disease detection"
subtitle-114320-117439-54"better treatment and reduced cost"
subtitle-117439-120560-55"this is known as the value of big data"
subtitle-120560-123040-56"but how do we store and process this big"
subtitle-123040-123840-57"data"
subtitle-123840-125680-58"to do this job we have various"
subtitle-125680-127680-59"frameworks such as cassandra"
subtitle-127680-130800-60"hadoop and spark let us take hadoop as"
subtitle-130800-132000-61"an example"
subtitle-132000-134560-62"and see how hadoop stores and processes"
subtitle-134560-136000-63"big data"
subtitle-136000-138720-64"hadoop uses a distributed file system"
subtitle-138720-141599-65"known as hadoop distributed file system"
subtitle-141599-143920-66"to store big data if you have a huge"
subtitle-143920-146080-67"file your file will be broken down into"
subtitle-146080-147200-68"smaller chunks"
subtitle-147200-149840-69"and stored in various machines not only"
subtitle-149840-151519-70"that when you break the file"
subtitle-151519-153680-71"you also make copies of it which goes"
subtitle-153680-154959-72"into different nodes"
subtitle-154959-156879-73"this way you store your big data in a"
subtitle-156879-158239-74"distributed way"
subtitle-158239-160239-75"and make sure that even if one machine"
subtitle-160239-164400-76"fails your data is safe on another"
subtitle-164400-166800-77"mapreduce technique is used to process"
subtitle-166800-167760-78"big data"
subtitle-167760-170560-79"a lengthy task a is broken into smaller"
subtitle-170560-171760-80"tasks"
subtitle-171760-176239-81"b c and d now instead of one machine"
subtitle-176239-178720-82"three machines take up each task and"
subtitle-178720-180800-83"complete it in a parallel fashion"
subtitle-180800-182879-84"and assemble the results at the end"
subtitle-182879-185120-85"thanks to this the processing becomes"
subtitle-185120-188159-86"easy and fast this is known as parallel"
subtitle-188159-190640-87"processing"
subtitle-190640-192400-88"now that we have stored and processed"
subtitle-192400-193840-89"our big data we can"
subtitle-193840-195519-90"analyze this data for numerous"
subtitle-195519-196879-91"applications"
subtitle-196879-199920-92"in games like halo 3 and call of duty"
subtitle-199920-202480-93"designers analyze user data to"
subtitle-202480-204480-94"understand at which stage most of the"
subtitle-204480-205760-95"users pause"
subtitle-205760-208720-96"restart or quit playing this insight can"
subtitle-208720-210560-97"help them rework on the story line of"
subtitle-210560-211200-98"the game"
subtitle-211200-213840-99"and improve the user experience which in"
subtitle-213840-216799-100"turn reduces the customer churn rate"
subtitle-216799-219120-101"similarly big data also helped with"
subtitle-219120-221120-102"disaster management during hurricane"
subtitle-221120-222720-103"sandy in 2012"
subtitle-222720-224000-104"it was used to gain a better"
subtitle-224000-225920-105"understanding of the storm's effect on"
subtitle-225920-227680-106"the east coast of the u.s"
subtitle-227680-230000-107"and necessary measures were taken it"
subtitle-230000-231920-108"could predict the hurricane's landfall"
subtitle-231920-233200-109"five days in advance"
subtitle-233200-235680-110"which wasn't possible earlier these are"
subtitle-235680-237599-111"some of the clear indications of how"
subtitle-237599-239360-112"valuable big data can be"
subtitle-239360-241519-113"once it is accurately processed and"
subtitle-241519-242799-114"analyzed"
subtitle-242799-244879-115"so here's a question for you which of"
subtitle-244879-246799-116"the following statements is not correct"
subtitle-246799-248959-117"about hadoop distributed file system"
subtitle-248959-252319-118"hdfs a hdfs"
subtitle-252319-255599-119"is the storage layer of hadoop b data"
subtitle-255599-257759-120"gets stored in a distributed manner in"
subtitle-257759-259440-121"hdfs"
subtitle-259440-262800-122"c hdfs performs parallel processing of"
subtitle-262800-264000-123"data"
subtitle-264000-266720-124"d smaller chunks of data are stored on"
subtitle-266720-268080-125"multiple data nodes in"
subtitle-268080-270880-126"hdfs give it a thought and leave your"
subtitle-270880-272960-127"answers in the comment section below"
subtitle-272960-275280-128"three lucky winners will receive amazon"
subtitle-275280-276560-129"gift vouchers"
subtitle-276560-278240-130"now that you have learned what big data"
subtitle-278240-280080-131"is what do you think will be the most"
subtitle-280080-280880-132"significant"
subtitle-280880-283360-133"impact of big data in the future let us"
subtitle-283360-285040-134"know in the comments below"
subtitle-285040-287040-135"if you enjoyed this video it would only"
subtitle-287040-289759-136"take a few seconds to like and share it"
subtitle-289759-291759-137"also to subscribe to our channel if you"
subtitle-291759-294080-138"haven't yet and hit the bell icon to get"
subtitle-294080-295759-139"instant notifications about our new"
subtitle-295759-296720-140"content"
subtitle-296720-311520-141"stay tuned and keep learning"
subtitle-311520-313600-142"you"