Michal Krupa

So, You Want to Be a WebApp Developer?

December 14, 2013

Hey there. Yes you. So, you want to become a modern webapp developer? Interested perhaps in fluid and responsive websites? Client-side and back-end MVC? Javascript-based full-featured sites? Kudos to you for taking some initiative. Mind you, there is a lot to learn. This post will only scratch the surface, as I brush on choosing a language, framework, datastore, and even a brief explanation of deployment practices. I throw around names of some useful tools, and even suggest some Javascript libraries/MVC libraries to use. I go over the full stack, from database to application layer from both the front and the back end. Even throw some server and hosting info in there. You know what? I’m feeling generous, I’ll even touch on using frameworks to write APIs.

Let’s begin with storage. Databases store the majority of the user-generated content. This allows your app to dynamically access things like posts, comments, data relationships, and all that jazz. Static pages will generally be stored as HTML content (or if you are using a templating engine like Jade or HAML, in pseudo-HTML documents that will be generated into HTML upon application runtime.) Static pages can easily be cached for faster recall when users are accessing your site content. Templates themselves can be cached, but the dynamically accessed data is loaded in when requested. You can use fragment-caching on many engines for the templates that this dynamic content is contained within, and also if using RESTful routes, you can cache this data in memory so the full page is pre-generated before a user accesses it. However, this takes more memory as these pages must stay “alive” during application runtime. There are caching stores that automatically purge this content if the store is out of memory. Two common cache stores are Memcached and Redis. However, caching is generally considered an advanced technique, but you should be aware of this concept as to better structure the apps you build.

Databases

There are two types of databases that are popularized in web development – NoSQL and SQL-based type. MySQL, PostgreSQL, and MongoDB or CouchDB (No-SQL) are usable across a broad range of languages. A language generally has a library built to utilize the database in a more language-specific, friendly manner. These libraries comprise ORM adapters, or Object-Relational Mapping adapters. All web languages are object-oriented (as far as I am aware, and if not then all the major languages are) so these ORM adapters make it easy to interact with databases in a way that is more native and comparable to how objects in the language you are using are interacted with.

The NoSQL vs SQL argument is mostly based on whether your approach to data storage is “normalized” or “denormalized”. Normalized databases have associations through the use of “joining” tables through things like keys, which prevent redundancy within the database. However, the “cost” with lookup, for example if we have a table for customers, and each customer has a number of products, selecting a customers products with an SQL query in a normalized setting would be looking up a join table that stores a product id and a customer id, and selecting product ids with a matching customer id. So you are locating the record id in a join table, then retrieving the record from the product table itself. In a denormalized setting, you would store the customer data in the product table, and then could lookup products based on the customer identifier just in that table alone, without querying the customer. This results in larger tables, but a faster lookup time. These issues can be fixed in multiple ways, such as database caching as well as building key indexes, as to optimize your database.

NoSQL solutions are mostly in the denormalized arena, and as they are solely key-value storage, are considered to be faster in lookup. They do have their limitations. They are better suited for document-style storage, where you need to have a lot of properties associated with one document rather than do lookups and have to go through lots of join associations.

Personally, I would recommend learning SQL and using PostgreSQL as your primary engine. It is arguably the most advanced and feature rich, and with the most recent release also allows you to utilize NoSQL storage within your database, and also have the tools and engine performance of the SQL side. You get the best of both worlds, and once you feel comfortable with the SQL side of things, you can modify your application to take advantage of this NoSQL functionality.

Language of choice

However, it is important to note that ORMs are not necessarily going to create the optimal native query for SQL or NoSQL databases. Often developers opt to write their own database queries that return as objects to the application as they can be must more efficient.

For the application layer, some popular and well-document languages are Javascript, Ruby, and Python. Ruby has two popular web frameworks, Rails and Sinatra, while Python is the basis of Django. Javascript, which is considered the language for writing native web applications, has been adapted into NodeJS. Node is wonderful because you use Javascript on both the front and back-end to develop your app. However, the framework you choose to use is based on what languages you are most comfortable using. Additionally, you will be able to harness language-specific advantages using a language you are comfortable with already.

If you are new to web development and have no programming experience, not only has the majority of this post made little to no sense to you, but you are free to choose a language of your choice to begin working with. In my opinion, Ruby (on Rails) is a great jumping off point, not only because of the large community and great guides on sites like Railscasts.com (made by Ryan Bates, an awesome RoR programmer), but because it a widely accepted scripting language. Many companies hire Ruby engineers, and even large-scale sites such has Twitter and LinkedIn utilize/have utilized RoR.

Regardless of the language you choose to learn or choose to adapt to the web framework appropriate for that language, you will need to learn Javascript. Become familiar with how Javascript can manipulate the DOM, what the DOM is, how it’s loaded, and also the CSSDOM. Javascript is loaded asynchronously, unlike the HTML DOM. the HTML DOM is transferred in all at once, and this is where the initial wait time comes in for web sites. because Javascript is loaded in asynchronously, so your browser doesn’t need to wait to “paint” the canvas before loading in the Javascript (contrary to how the HTML DOM works).

Also, although it is important to understand the core of Javascript, in practical application I do recommend using jQuery. Once you are comfortable using and understanding Javascript, you can read through the source of jQuery to understand what is happening.

If you are building a modern web application, Javascript is the way to go. You will want to use a client-side Javascript framework. This is MVC for the client-side. the TCP/IP protocol was not built to transfer high amounts of data in one pipeline. Using a frontend MVC framework allows you to easily manage lots of data on the front-end and only make requests to your web server for bits of data. A good framework toutlize in this fashion is Backbone.js. You don’t need to use something like Backbone.js, but it is helpful when you have lots of variables and templates floating around. One of the most important parts of good programming is good organization. If you built a RESTful service utilizing a framework like Django or Rails, and have it only send back responses in JSON or plain text format rather than using it to render HTML template responses, you can have a very quick application. The nice thing is that the Javascript will also cache templates for you in the Webapp’s memory, again reducing load on the server. I am currently working on an application built in NodeJS and most of what the app does is in the front-end. Only requests to obtain data or to post updates to data are made to the backend.

As I used Django for this example and encourage you to look into various frameworks and languages before you commit yourself to a project. Focusing on Django, a great tool is Pipeline for asset compression and management. Rails has a built-in assets pipeline. On top of pipelines and the aforementioned templating engines, You can also utilize libraries like Handlebars.js or Mustache.js. These let you build Javascript templates. Essentially, you are using Django to build a web API, and using Javascript to handle that data. Based on my experiences with Ruby on Rails, this is the most effective way to utilize these types of languages in a “modern web”.

Keeping a Neat Development Environment

Git is the source control you want to use. Consider it the standard for any company that will be looking to hire you as a webdev. It is very popular and utilized by Github.com, which stores code repositories online for collaboration. However, some companies do use Subversion (SVN), and I have worked with developers who argue that it is better at merge conflict resolution and things like this (if you have multiple devs working on the same file and then want to add together the differences, merge conflicts can arise with the overlapping file lines).

Deployment & Maintenance

This is one of my specialties. For web application deployment, as long as your app is light and you are using this for a learning experience, I would recommend Heroku. They have great guides on getting set up in the Heroku DevCenter. This will allow you to get a basic understanding of a separation of the application – application layer, and database layer. They have lots of addons as well, so you can try different databases, learn about caching, and also have addon services like transactional email. Most of these services come with a free tier, and hosting on heroku is free. Heroku lets you deploy your code over Git, which is cool.

You can also sign up for free for Amazon Web Services. There is a free-tier which provides a micro-instance server for you for a full year without paying. Beyond the free-tier of service, you will be charged $30/mo for every micro0instance. The services are managed on AWS EC2. Unfortunately, you cannot deploy code through Git onto Amazon. You have to SSH into the instance, and set everything up manually. However, EC2 provides a great way for you how to learn to provision a server environment, and configure DNS hosting and application proxying yourself. The great thing is you learn about network policy control, and can set up restrictions on ports and stuff, and if you fuck anything up you can destroy the instance and launch a new one After you know how to do this and you want to learn to scale, you can try AWS Elastic Beanstalk. They have application-scaling, and based on server load will launch additional application instances for you (but those are beyond the free-tier).

You also have free access to their relational database service (RDS) which can run MySQL, and i believe postgresql. They have a content distribution network service as well, and lots of other tools you can explore within the free tier. I found them to be a huge resource to myself in learning about networking and the like.

Now….in terms of “how websites are maintained and deployed”. Most companies use custom built scripts to deploy the code versions using a sub-version control like SVN or Git, and go by a deployment schedule. Nowadays it’s usually a weekly release. Sometimes it’s continuous deployment. They propogate the changes over one server at a time, and there are servers that direct the traffic (load-balancers) to each of these application servers. This way, as a server is being restarted or if a server is down, the load-balancer knows to utilize one of the “healthy” servers and direct your traffic to that server. load-balancers mask the actual server that you are connecting to. because sessions are stored in client-side cookies, any of the application servers can use this cookie to verify a logged in session. In this way you are not necessarily always getting data from the same server, as the load balancer determines which node you are hitting. Additionally, the load-balancer helps to prevent DDOS attacks from hitting the application servers, as DDOS attacks will hit the load-balancer and in the worst-case scenarios take the load-balancer down, and the traffic will not hit the application at all.

There is too much too learn. Your vast informational post has discouraged me.

Don’t feel discouraged. Just come into this knowing that you will learn about new tools every day. The web space is vastly large and you will not believe how much new stuff goes up all the time. It’s a great space to be in and a great area for software developers. Learning to build web services and understanding how these can hook up local systems is very important as computing moves into the cloud.