As of August 2013, GenBank has a collection of 16,72,95,840 sequences with 1,54,19,29,21,011 nucleotide bases!
Over the past few decades, scientists have developed techniques to sequence the genome of about a zillion different creatures, including humans. Then they realized that they could identify and manipulate specific genes to achieve desired results, like better health outcomes, increased food supply, natural pesticides, and herbicides, etc. But, such genetic manipulation requires the knowledge about the concerned DNA segment, down to the specific sequences. So, what do they do? Sequence everything before they use it? No. They can seek the necessary sequence information at a repository called GenBank, a global nucleotide database.
GenBank is a database of genes, primers, and even entire genome sequences, and is curated and maintained by the National Institutes of Health (NIH) as part of the International Nucleotide Sequence Database Collaboration (INSDC). GenBank exchanges information daily with the DNA DataBank of Japan (DDBJ) and the European Molecular Biology Laboratory Nucleotide Sequence Database (EMBL-Bank) to ensure the availability of comprehensive information worldwide.
Established in 1982, with a set of merely 606 sequences and 680,338 bases, GenBank has doubled in size every 18 months, and still continues to grow. Today, GenBank contains all the publicly available DNA sequences plus annotations that provide additional information about each sequence. So, for example, if you need to find out the restriction sites present on a particular sequence of DNA, you can simply go to the database, look up the organism, and retrieve the sequence and the related studies, if any. If sequence under study is that from a common and well-researched organism, there will be links to research studies that provide more in-depth information, and you may even find the exact information needed.
Because the purpose of GenBank is to encourage scientific study, it is totally open and can be freely accessed by anyone. There are no usage restrictions for either the database or information, and the database itself aligns with other genomic search engines and tools, like BLAST. You can search CoreNucleotide, which is the main bank, Expressed Sequence Tags, or Genomic Survey Sequences. You can also download entire sequences using a utility found on the site.
When you click on the results of your search, you’ll find identifying information, like the name of the organism, its location in the database, and a short description of the entry. You’ll also find the sequence length―while the database is open to submissions of entire genomes, each record only accepts 350 kb―and complete genome sequences are long. So, when your results come in, there may be many entries for whatever organism you’re searching―each one contains a different sequence, and you have to determine which one is of use to you. Each entry tells you what type of molecule was sequenced, so you know exactly what type of DNA or RNA you’re looking at. The 'definition' section will also tell you if any part of the sequence is coding, and whether or not it contains the complete coding region.
Another thing your results will tell you is the name of the researchers who submitted the sequence―submissions are open, and study and author information is displayed right there on the page. However, if the paper hasn't been published yet, this could potentially compromise the researchers’ work if someone else uses their information to publish first. Hence, at the time of submission, GenBank provides the option to delay the posting of a sequence, but only for a limited period of time.
Also, in the name of privacy, the database does not publish any identifying information about the source of the DNA. For example, for human DNA, they’ll tell you it’s human, but the name of the individual DNA donor will not be revealed. Because they’re open-access, yes, but that only goes so far.
Collaboration is the key to scientific advancement, and just browsing the database is enough to make you proud of the modern-day techniques and the speed at which we can decode the information hidden in molecules. It is through such databases and associated tools, that researchers who live continents away from each other and have never met, end up working together and collaborating for a common goal. After all, scientific progress cannot be achieved without the exchange of knowledge.