I once found myself trying to debug a job that took a full six hours to fail. It took more than a week -- a whole week! -- to find and fix the problem. Of course, I was doing other things at the same time, but the need to constantly check up on the status of the job was a huge drain on my energy and productivity. It was a Very Bad Week.
Painful experiences like this have taught me to follow a test-driven approach to hadoop development. Whenever I'm working on a new hadoop-based data pipe, my goal is to isolate six distinct kinds of problems that arise in hadoop development.
- Explore the data: The pipe must accept data from a given format, which might not be fully understood at the outset.
- Test basic logic: The pipe must execute the intended data transformation for "normal" data.
- Test edge cases: The pipe must deal gracefully with edge cases, missing or misformatted fields, rare divide-by-zeroes, etc.
- Test deployment parameters: The pipe must be deployable on hadoop, with all the right filenames, code dependencies, and permissions.
- Test cluster performance: For big enough jobs, the pipe must run efficiently. If not, we need to tune or scale up the cluster.
- Test scheduling parameters: Once pipes are built, routine jobs must be scheduled and executed.
Steps 1 through 3 should be solved locally, using progressively larger data sets. Steps 4 and 5 must be run remotely, again using progressively larger data sets.
Step 6 depends on your scheduling system and has a very slow cycle time (i.e. you must wait a day to test whether your daily jobs run on the proper schedule.). However, it's independent of hadoop, so you can build, test, and deploy it separately. (There may be some crossover with #4, but you can test this with small data sets.)
Going through six different rounds of testing may seem like overkill, but in my experience it's absolutely worth it. Very likely, you'll encounter at least one new bug/mistake/unanticipated case at each stage. Progressive testing ensures that each bug is dealt with as quickly as possible, and prevents them from ganging up on you.
Other suggestions:
- Definitely use an abstraction layer that allows you to seamlessly deploy local code to your staging and production clusters. Cascalog and mrJob are good examples. Otherwise, you'll find yourself solving steps 2 and 3 all over again in deployment.
- Config files and object-oriented code can reduce a lot of headaches in step 4. Most of your deployment hooks can be written once and saved in a config file. If you have strong naming conventions, then most of your filenames can be constructed (and tested) programmatically. It's amazing how many hours you can waste debugging a simple typo in hadoop. Good OOP will spare you many of these headaches.
- Part of the beauty of Hive and HBase is that they abstract away most of the potential pitfalls on the deployment side, especially in step 4. By the same token, tools like Azkaban and Oozie can take a lot of the pain out of step 6. (Be careful, though -- each of these scheduling tools has its limitations.)
This comment has been removed by a blog administrator.
ReplyDeleteUniqe informative article and of course True words, thanks for sharing. Today I see myself proud to be a hadoop professional with strong dedication and will power by blasting the obstacles. Thanks to Big Data Training Chennai
ReplyDeleteI get a lot of great information from this blog. Thank you for your sharing this informative blog.
ReplyDeleteAWS course chennai | AWS Certification in chennai | AWS Certification chennai
This comment has been removed by the author.
ReplyDeleteNice article i was really impressed by seeing this article, it was very interesting and it is very useful for Learners.
ReplyDeleteVMWare course chennai | VMWare certification in chennai | VMWare certification chennai
Your posts is really helpful for me.Thanks for your wonderful post. I am very happy to read your post.
ReplyDeleteCloud Computing Training in chennai | Cloud Computing Training chennai | Cloud Computing Course in chennai | Cloud Computing Course chennai
Nice piece of article you have shared here, my dream of becoming a hadoop professional become true with the help of Hadoop Training Chennai, keep up your good work of sharing quality articles.
ReplyDeleteThanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai
ReplyDeleteIt seems there is no difference between the subject mentioned at this blog and hadoop online training center. Thanks for presenting the information in an excellent way.
ReplyDeletethank you for your information about hadoop development it is very useful to me.
ReplyDeleteHadoop Online Training
Hadoop Developer Online Training
Hadoop admin Online Training
Hadoop Architecture Online Training
Using big data analytics may give the companies many fruitful results, the findings can be implemented in their business decisions so as to minimize their risk and to cut the costs.
ReplyDeletehadoop training in chennai|big data training|big data training in chennai
Cloud computing is the next big thing, through cloud the users have the liberty to use a shared network. The companies can focus on core business parts rather than investing heavily on infrastucture.
ReplyDeletecloud computing training in chennai|cloud computing courses in chennai|cloud computing training
I think this is an great blogs. Such a very informative and creative contents. These concept is good for these knowledge.I like it and help me to development very well.Thank you for this brief explanations.
ReplyDeleteDot net Training in Chennai
thank you for sharing.Hadoop Admin Online Training Hyderabad
ReplyDeleteThank's For Sharing The Article....
ReplyDeleteWith regarsd@Selenium Grid
Ciitnoida provides Core and java training institute in
ReplyDeletenoida. We have a team of experienced Java professionals who help our students learn Java with the help of Live Base Projects. The object-
oriented, java training in noida , class-based build
of Java has made it one of most popular programming languages and the demand of professionals with certification in Advance Java training is at an
all-time high not just in India but foreign countries too.
By helping our students understand the fundamentals and Advance concepts of Java, we prepare them for a successful programming career. With over 13
years of sound experience, we have successfully trained hundreds of students in Noida and have been able to turn ourselves into an institute for best
Java training in Noida.
java training institute in noida
java training in noida
best java training institute in noida
java coaching in noida
java institute in noida
Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.
ReplyDeleteselenium training in bangalore
BCA Colleges in Noida
ReplyDeleteCIIT Noida provides Sofracle Specialized B Tech colleges in Noida based on current industry standards that helps students to secure placements in their dream jobs at MNCs. CIIT provides Best B.Tech Training in Noida. It is one of the most trusted B.Tech course training institutes in Noida offering hands on practical knowledge and complete job assistance with basic as well as advanced B.Tech classes. CIITN is the best B.Tech college in Noida, greater noida, ghaziabad, delhi, gurgaon regoin .
At CIIT’s well-equipped Sofracle Specialized M Tech colleges in Noida aspirants learn the skills for designing, analysis, manufacturing, research, sales, management, consulting and many more. At CIIT B.Tech student will do practical on real time projects along with the job placement and training. CIIT Sofracle Specialized M.Tech Classes in Noida has been designed as per latest IT industry trends and keeping in mind the advanced B.Tech course content and syllabus based on the professional requirement of the student; helping them to get placement in Multinational companies (MNCs) and achieve their career goals.
MCA colleges in Noida we have high tech infrastructure and lab facilities and the options of choosing multiple job oriented courses after 12th at Noida Location. CIIT in Noida prepares thousands of engineers at reasonable B.Tech course fees keeping in mind training and B.Tech course duration and subjects requirement of each attendee.
Engineering College in Noida"
ReplyDeleteHi Your Blog is very nice!!
Get All Top Interview Questions and answers PHP, Magento, laravel,Java, Dot Net, Database, Sql, Mysql, Oracle, Angularjs, Vue Js, Express js, React Js,
Hadoop, Apache spark, Apache Scala, Tensorflow.
Mysql Interview Questions for Experienced
php interview questions for freshers
php interview questions for experienced
python interview questions for freshers
tally interview questions and answers
Thanks for sharing nice information Best amazon web and online training class in hyderabad
ReplyDeleteNice to read this article... Thanks for sharing
ReplyDeleteDevOps Online Training
This blog really pushed to explore more information. Thanks for sharing.
ReplyDeleteSelenium training in Chennai
Selenium Courses in Chennai
best ios training in chennai
JAVA J2EE Training Institutes in Chennai
.Net coaching centre in chennai
French Classes in Chennai
web designing training in chennai
Hadoop Course in Chennai
Loadrunner Training in Chennai
Really very happy to say, your post is very interesting to read. I never stop myself to say something about it. You’re doing a great job. Keep it up…
ReplyDeleteBecome an Expert In SAP BASIS Training! The most trusted and trending Programming Language. Learn from experienced Trainers and get the knowledge to crack a coding interview, @Softgen Infotech Located in BTM Layout.
This is excellent information. It is amazing and wonderful to visit your site.Thanks for sharing this information&its very useful to me.
ReplyDeleteangular js training in chennai
angular js training in velachery
full stack training in chennai
full stack training in velachery
php training in chennai
php training in velachery
photoshop training in chennai
photoshop training in velachery
Interested in the DevOps Training in Chennai with Placement opportunities? Infycle Technologies will help you for getting an excellent DevOps training in practical teaching methods along with 100% placement guidance. To get your first DevOps job with the highest salary package, get a free demo from us by dialing 7504633633 or 7502633633!
ReplyDeleteCasino and Sports Betting - DRMCD
ReplyDeleteThis 밀양 출장안마 page is not intended 과천 출장안마 for commercial use or for financial purposes. 김해 출장안마 It may harm your computer, work, or access personal data. 남원 출장샵 Casino and Sports Betting. 진주 출장마사지
Mmorpg oyunları
ReplyDeleteinstagram takipçi satın al
TİKTOK JETON HİLESİ
TİKTOK JETON HİLESİ
Sac ekimi antalya
İnstagram Takipçi Satın Al
İNSTAGRAM TAKİPCİ
Metin pvp
instagram takipçi satın al
perde modelleri
ReplyDeletesms onay
MOBİL ÖDEME BOZDURMA
NFT NASIL ALİNİR
ANKARA EVDEN EVE NAKLİYAT
Trafik sigortası
DEDEKTÖR
Site Kurma
aşk kitapları