Bioinformatica p1-perl-introduction

Click here to load reader

  • date post

  • Category


  • view

  • download


Embed Size (px)


Perl Introduction

Transcript of Bioinformatica p1-perl-introduction

  • 1. FBW 01-10-2012Wim Van Criekinge

2. Communiceren van praktische zaken: waar enwanneer gaan de lessen door Ter beschikking stellen van lesmateriaal Aanvullend educatief materiaal (FAQ, Web Links) Practicum opgaven en programmacodeVoordelen Gebruik van het webtechnologie bij het assimilerenvan de cursus Veel vragen/antwoorden kunnen interessant voormeerdere mensen, Vermijden van terugkerendevragen Permante discussie (tijdens het jaar) tussenstudenten, prof maar ook thesis endoctoraatsstudenten 3. Practicum Practicum regeling ? Inleiding van 45min over de gebruikte editor,programmeertaal, websites 15min toelichting tot de opgaven Normaal in PC-zaal D (check!)Perl for BioinformaticsPart 1: BeginningPart 2: Mastering 4. Practicum Bioinformatica Practicum Inleiding tot Perl Write your first PERL program ! Execute your 5. What is Perl ? Perl is a High-level Scripting language Larry Wall created Perl in 1987 Practical Extraction (a)nd Reporting Language (or Pathologically Eclectic Rubbish Lister) Born from a system administration tool Faster than sh or csh Sslower than C No need for sed, awk, tr, wc, cut, Perl is open and free urooscon/ 6. What is Perl ? Perl is available for most computing platforms: all flavors of UNIX (Linux), MS- DOS/Win32, Macintosh, VMS, OS/2, Amig a, AS/400, Atari Perl is a computer language that is: Interpreted, compiles at run-time (need forperl.exe !) Loosely typed String/text oriented Capable of using multiple syntax formats In Perl, theres more than one way to do it 7. Why use Perl for bioinformatics ? Ease of use by novice programmers Flexible language: Fast software prototyping (quick and dirty creation of small analysis programs) Expressiveness. Compact code, Perl Poetry: @{$_[$#_]||[]} Glutility: Read disparate files and parse the relevant data into a new format Powerful pattern matching via regular expressions (Best Regular Expressions on Earth) With the advent of the WWW, Perl has become the language of choice to create Common Gateway Interface (CGI) scripts to handle form submissions and create compute severs on the WWW. Open Source Free. Availability of Perl modules for Bioinformatics and Internet. 8. Why NOT use Perl for bioinformatics ? Some tasks are still better done with otherlanguages (heavy computations / graphics) C(++),C#, Fortran, Java (Pascal,Visual Basic) With perl you can write simple programsfast, but on the other hand it is also suitablefor large and complex programs. (yet, it isnot adequate for very large projects) Python Larry Wall: For programmers, laziness isa virtue 9. What bioinformatics tasks are suited to Perl ? Sequence manipulation and analysis Parsing results of sequence analysis programs (Blast, Genscan, Hmmer etc) Parsing database (eg Genbank) files Obtaining multiple database entries over the internet 10. Example of problems we will be solving Primary Sequence analysis Perform alignments Simulation experiments to explain Blast statistics Predicting protein topology Predicting secondary structures Real-life problems Proteomics: Given aa masses find protein in database 11. Perl installation Perl (op USB): Perl is available for various operating systems. Todownload Perl and install it on your computer, have alook at the following resources: (OReilly). Downloading Perl Software ActiveState. ActivePerl for Windows, as well as forLinux and Solaris. ActivePerl binary packages. CPAN 12. Check installation Command-line flags for perl Perl v Gives the current version of Perl Perl e Executes Perl statements from the commentline. Perl e print 42; Perl e print Twonlinesn; Perl we Executes and print warnings Perl we print hello;x++; 13. How to enter your first program ? Gebruik een editor DOS: EDIT Windows: NOTEPAD (Let op!) Word(Pad) -> TEXT FILE Scite: Textpad Others VIM Eclipse 14. Brief Introduction to SubdirectoriesThe Path Path: Route followed by OS to locate, save, and/or retrieve a file 15. Het absolute pad probleem Probleem Ofwel kan je perl starten Ofwel kan je het script niet vinden Ofwel kan je een file nodig in het script niet vinden Oplossing Dont panic ! Gebruikt absolute path-namen D:Perlbinperl.exe Let wel in je script met je de slash escape $filename = d:Temppdb.fasta 16. Oplossingen (II) Kopieer al de files in dezelfde directory ! Dus als je perl start vanuit D:Perlbin met perl kan je wel verwijzen naar maar dan moet ook de absolute verwijzing gebruikt worden voor $filename ofwel moet je pdb.fasta copieren naar D:PerlBin Pas het zoekpad aan zodat je perl overal kan starten Path (geeft het zoekpad) Set Path (past het pad aan, Voorzichtig !). Gebruik dedos environment variabele %path% om een directorytoe te voegen Set path=%path%;d:Perlbin (nadien kan de aanpassing controleren door path uitte voeren) 17. RedirectionKeyboard: Standard input deviceScreen: Standard output deviceRedirection . . . changes output from monitor tosomewhere else (usually file orprinter). 18. TextpadMinimal install: via Minerva save to your folder. Createsystem folder in the same location. Insystem folder save plumb.exe(Minerva) and perl syntax files( Syntax Highlighting Document Class Launch Perl Tools 19. Perl 20. General Remarks Perl is mostly a free format language: addspaces, tabs or new lines wherever youwant. For clarity, it is recommended to writeeach statement in a separate line, and useindentation in nested structures. Comments: Anything from the # sign tothe end of the line is a comment. (Thereare no multi-line comments). A perl program consists of all of the Perlstatements of the file taken collectively asone big routine to execute. 21. How does the real perl program look like: #!/usr/local/bin/perl Mandatory first line (on UNIX) print Hello everyonen;How to run it:1. Save the text of your code as a file -- program.pl2. Execute it: perl Hello everyone 22. Three Basic Data Types Scalars - $ Arrays of scalars - @ Associative arrays ofscalers or Hashes - % 23. 2+2 = ? $ - indicates a variable $a = 2; $b = 2; $c = $a + $b;- ends every command ; = - assigns a value to a variable or $c = 2 + 2; or $c = 2 * 2; or $c = 2 / 2; or $c = 2 ^ 4;2^4 24 =16 or $c = 1.35 * 2 - 3 / (0.12 + 1); 24. Ok, $c is 4. How do we know it?$c = 4;print $c;print command: - bracket output expression print Hello n; n - print a end-of-the-line character(equivalent to pressing Enter)Strings concatenation: print Hello everyonen; print Hello . everyone . n;Expressions and strings together: print 2 + 2 = . (2+2) . n;2 + 2 = 4expression 25. Loops and cycles (for statement):# Output all the numbers from 1 to 100for ($n=1; $n