Penetration testing course: 0x02.4.1 What is the best programming language for hacking?

I apologize for the click-bait or repellent title (according to your opinion) but this is intended. Not because I want low quality traffic, but because the title reflects how this question gets usually asked and it allows me to reach a bigger audience ready to be forged ūüôā . I’d prefer not having to talk about this, but I believe it is relevant to a ethical hacking course because many n00bs ask this and not all n00bs are the same. Some have the skills and mindset to become pros, while others simply won’t fit in this category. This article is targeted at the first group.

How to choose a programming language over another for developing security tools

There isn’t a go-to programming language that is considered the best to use, so I’m not going to simply tell you one name, there’s no such secret recipe for developing security tools. It is much wiser instead to make a comparison between the most used programming languages and be ready to some flexibility. Technologies change and so we need to adapt; besides as you learn new things, you have a wider choice than you had before. Don’t be static, allow some dynamism, be willing to learn and improve your skills each and every time you get a chance to do it.

Programming languages differ for many aspects:

  • Compiled

    The source code of the program is converted to machine code before runtime. Compiled programs usually take longer time to build, but are typically faster to run. A compiler is able to catch many errors before runtime (if the program contains errors it won’t just compile) and it is optimized for the architecture of the machine where the code is compiled and run. That’s why although using binaries is simple, many people (mostly *nix people) prefer to compile from source in order to have a speed gain. Besides a Windows executable won’t run natively on a Linux system (if not using something like Wine but that’s cheating and has some limits) and a Linux executable won’t run natively on Windows (unless using something like cygwin)

  • Interpreted

    The source code of the program isn’t converted to machine code as binary but is executed at runtime, line by line and usually there is dynamic typing and scoping. A program coded in a interpreted language requires an interpreter in order to run and it is generally cross platform (given that an interpreter exists for all the platforms). There’s no compilation time, but on the other hand the performance at runtime may be slower than a compiled program. Often interpreted languages are easier to debug and very suitable for rapid prototyping.

JIT compiled

When I talked about compilers before, I was referring to the most common type which is AOT (Ahead Of Time) compilers, although also JIT (Just In Time) compilers exist. JIT compilers don’t convert all the source code into machine code before runtime, but convert source code into machine code only when needed, at runtime, so it’s something in between normal compilers and interpreters. JIT compilation combines the speed of compiled code with the flexibility of interpretation, with the overhead of an interpreter and the additional overhead of compiling. JIT compilation is a form of dynamic compilation and allows adaptive optimization like dynamic recompilation so in some cases it could be even faster than static compilation.

Several modern runtime environments rely on JIT compilation for example most implementation of Java and the .NET framework.

A common implementation of JIT compilation is to first have AOT compilation to bytecode (virtual machine code), known as bytecode compilation and then have JIT compilation to machine code (dynamic compilation), rather than interpretation of the bytecode. This improves the runtime performance compared to interpretation at the cost of lag due to compilation, anyway caching of compiled code minimizes lag on future execution of the same code during a given run.

Security implications:

JIT compilation fundamentally uses executable data, and thus poses security challenges and possible exploits.¬†Implementation of JIT compilation consists of compiling source code or byte code to machine code and executing it. This is generally done directly in memory ‚Äď the JIT compiler outputs the machine code directly into memory and immediately executes it, rather than outputting it to disk and then invoking the code as a separate program, as in usual ahead of time compilation. In modern architectures this runs into a problem due to¬†executable space protection¬†‚Äď arbitrary memory cannot be executed, as otherwise there is a potential security hole. Thus memory must be marked as executable; for security reasons this should be done¬†after¬†the code has been written to memory, and marked read-only, as writable/executable memory is a security hole.

JIT spraying¬†is a class of¬†exploitation techniques¬†that uses JIT compilation for¬†heap spraying¬†‚Äď the resulting memory is then executable, which allows an exploit if execution can be moved into the heap. This may allow to bypass even space protection features like DEP/ASLR on Windows or generally speaking hardware or software-emulated protections like the NX (No-eXecute) bit used also on other platforms like Linux and Mac.

Low Level vs High Level

  • Low Level

    Low level programming languages are designed to operate and handle the entire hardware and instructions set architecture of a computer directly and so they are closer to computers. Examples of low level programming languages include machine code (made of binary data) or assembly that is still low level but it’s slightly more understandable by humans because it uses things like memory addresses and OP codes rather than just 0 and 1 and it requires a program called assembler to be executed by the computer.

    Also C and C++ are often considered low level languages in comparison to other languages, but they are definitely higher level than machine code and assembly.

  • High level

    High level programming languages are designed to be closer to humans and provide an abstraction from the hardware of the computer. They¬†may use¬†natural language¬†elements, be easier to use, or may automate (or even hide entirely) significant areas of computing systems (for example¬†memory management). C and C++ contain elements of low and high level programming principles.¬†Rather than dealing with registers, memory addresses and call stacks, high-level languages deal with variables, arrays,¬†objects, complex arithmetic or boolean expressions, subroutines and functions, loops,¬†threads, locks, and other abstract computer science concepts, with a focus on¬†usability¬†over optimal program efficiency. Almost all the programming languages I didn’t mention before are considered to be high level.

Strongly typed vs Loosely typed

  • Strongly typed

    Strongly typed languages are languages that enforce type safety. Usually they are statically typed so the type is checked at compile time. The type has to be explicit and bound to the variable, ambiguity should be avoided.

  • Loosely typed

    Loosely or weakly typed languages are languages that may consider type declaration as optional for developer. Since the type isn’t explicitly declared, it isn’t bound to the variable but to its value that can be determined at runtime, so a loosely typed language usually is a dynamically typed language.

    A particular case is represented for example by Lua, which is a dynamically typed language but also strongly typed.

Duck typing

Some dynamically typed languages check object’s type suitability at runtime. They infer the type and may use implicit conversion. This is called duck typing or type juggling. For example check the following Java code:

123 isn’t a string but it is automatically converted.

The code above is equivalent to this:

Automatic (implicit) conversion can be useful but may lead to unexpected behavior so even with loosely typed languages, when dealing with mixed types it is better to use type casting. It allows to do an implicit conversion, so we are sure that the compiler doesn’t make mistakes.

Let’s see a C code example:

In line 5 I forced the type of number 7 to be a float (otherwise it would be treated as an integer) and the printf() function would print the number 1 instead of the number 1.4 that we expect.

I want to show a last example in PHP that may lead to unexpected results if you don’t know what you are doing.

Consider this code:

the comment is “bad code” so there must be something wrong with it, right? ūüôā Let’s analyze the instructions. ¬†strpos() is a PHP function that returns the position of the first occurrence of a substring in a string. For example strpos(“ciao Fabio”, “ciao”) will return 0 because “ciao” is at the beginning of the string “ciao Fabio” and the first position starts at 0, not 1.

The code above basically says that if “xyz” isn’t contained in the string, the program has to do something.

Do you see the problem yet? If you don’t see it that’s perfectly fine, it isn’t an obvious thing, but let me tell you that PHP is naughty ūüėÄ, because it considers the number 0 equal to the boolean value False and the number 1 equal to the boolean value True.

So what? Well, if the substring is at position 0 it means that such substring exists, so the code inside the curly braces shouldn’t be execute, but PHP thinks to be smart, for him 0 is equal to False, so the code will execute.

How come? 0 is an integer number, False is a boolean value. PHP you disappoint me. How can we prevent the problem? Simple, we need to check the type.

Wait, when I saw some PHP code, I remember some strange symbols: = , ==, === .

Aren’t they the same? ūüėÄ

If you did some programming before, you should know that “=” is used for assigning the value to a variable, instead “==” is used to check if a variable is equal to a specified value. “===” instead is used to check if a variable is identical to a value. It means that both value and type are checked by using this operator.

In order to avoid unexpected behavior, we have to rewrite the code like this:

Programming paradigms

There are 7 types of programming paradigms:

  • Imperative programming
  • Procedural programming¬†
  • Declarative programming
  • Functional programming
  • Object-oriented programming¬†(OOP)
  • Event-driven programming
  • Automata-based programming

We are going to take a look at the 2 most common paradigms.

Procedural programming

If you are just starting learning programming, chances are you are using procedural programming. Procedural builds upon imperative programming in which statements are executed line by line. If they are only a few lines of code (really a few) that do only 1 thing, then imperative programming is fine, but if you find yourself using conditionals and repeating code, then you need to use procedural programming in order to manage better the increasing complexity and keep DRY (Do Not Repeat Yourself). If you have blocks of code that allow to perform some action, you can group them inside a function and this allows to reuse the code again and again indefinitely when you need it just by calling the function.

Example: a program in Python that says hello and then asks me “how are you?” in Italian 5 times (in case I’m deaf).

Imperative paradigm:

No, I didn’t type all those lines, I copied and pasted because I don’t want to waste time and I want my code to be efficient.

Procedural programming:

You may think..Wait a sec? I’m always saying the same things, I don’t need to learn all the sentences by heart, I can just write them down on a piece of paper and hand it to Fabio, because I’m lazy and I don’t wanna learn Italian. That way I just have to remember what to do, not how to do it. Let me try to wrap the code in a function like this:


Awesome. Now let me call the function 5 times.


That was better but your fingers may still hurt a bit at typing that much and if you were really speaking you may lose your voice. Let’s try to ¬†make the program a bit more efficient again by using a for loop.


Object-oriented programming

Okay you greeted me, but I’m sure if you bring some friends along they would like to do it too. I don’t know them, but if you introduce them to me, I’ll remember them. Who are all of you? People. Each of you is a person. Okay, so you belong to the category “Person” a class. Every friend of yours is a beautiful example of person, an instance of a class. They are able to perform some functions, all I want them to do is to greet me when they see me. Let’s see how all this turns into code.

I don’t know your name, so I’ll call you “you”.

you want to greet me?

okay, first let’s make sure you inherit the features of a person.

now you can greet me:

>>>Ciao Fabio

>>>Come stai?

I think also your friend Paul wants to greet me.

what about Alex? He wants to greet me as well.

See? They are all people so they know how to greet.

Now I’m not sure you learnt Italian, but you should have understood the concept of procedural and object oriented programming paradigms.

So which programming language should I choose?

You wanted a straightforward answer, didn’t you? I’m not going to tell you: learn programming language X. I’m giving you the knowledge required to make your choice.

There are hundreds of different programming languages and I encourage you to experiment with them, anyway there are some programming languages that are the most used, so it makes sense to focus on them.

Before you start creating a program you need to think well about what the program should do, divide the program in sub programs and start thinking whether you have the necessary knowledge to write some functions to solve each problem. Is there anything that you aren’t able to do? In that case I suggest you to search online articles or courses, buy the books you need. You have to think that it’s right to invest time and money in it if that’s important for you. If you think it’s too hard for you, you may develop the parts you are able to do and eventually ask for help online (on websites like Stackoverflow) or ask a friend. If you really can’t, then maybe the program you wanted to develop may be too hard for you now, that’s acceptable. You can do something easier, follow a study plan and try again when you have a better understanding of the basics.

If performance is important for you, you may consider choosing C or C++, also if you need better control over your data and the underlining hardware of your computer. Also Java can be a great choice, sometimes slower than C/C++ also due to the JVM (Java Virtual Machine) overhead but usually there isn’t a big difference for normal tasks. If you need to write some firmwares and you have memory constrains than you may consider adding some assembly, but I’m assuming you are an experienced programmer if you are thinking about that, otherwise just skip this for now. If you are targeting specifically the Windows platform you may benefit from using the .NET framework (see VB.NET or C# for a C-like lower level language) but I’m not telling you to do this, it’s entirely up to you. If you need ¬†the program to be cross platform C/C++ or Java may be a good choice. Remember that you need JRE installed for running Java programs, instead for C/C++ if you already built the binary and you don’t have a C/C++ compiler.on every machine it isn’t a problem.

Another possibility for making a cross platform program is using scripting languages like Perl, Python, Ruby and Go. Perl was great and probably still is, but unless you don’t know it already or you specifically need it, I would just learn one of the other 3. My personal preference is Python, I think it’s great and I like its syntax, anyway although they have some differences, they have many things in common. Many people like Ruby, no problem at all. Go is the newest of the three and less used, but it doesn’t mean that it’s worse, on the contrary I think if you like it, it will be a good choice. Remember that for these interpreted languages you usually need an interpreter installed on the computer you use for running the script. There are also ways to “freeze” the script into a binary that you can just run without dependencies. For example for Python you can use Pyinstaller, cx_Freeze, Py2Exe (Windows only) and a few others. For packaging ruby programs you have something like Ocra. Scripting languages were born to be used with an interpreter and it could happen that you come across some problems when you try to package them into binaries, for example you may have problems importing some modules, anyway there’s always a workaround, don’t panic. Some people who can’t solve those problems even end up rewriting their program in Go, because it was created with an official framework to ease the creation of binary programs, anyway in my opinion you should stick to the programming language you used and try to solve the problems that arise. Don’t worry, the tools I mentioned work most of the times, so just relax and have a try.

So we have seen that the choice of the programming language to use depends on your knowledge, the need for performance and control, the easiness of the language, but also the technologies you have to deal with. Sometimes you may be influenced also by the availability of useful libraries for a programming language that may not to exist or be so developed for another language.

You may have to do also some web development, not only if you want to make a website, but also if you want to test a web site. Now, you need to be aware that the most used language for web development is PHP (and PHP-based frameworks), so sooner or later you probably need to learn it. But websites are built using many languages and technologies, you’ll need to learn HTML, CSS and Javascript as well.

There are also web frameworks based on Python and Ruby as well so in case you like those languages (or you have to deal with them anyway), go check them out. There are client-side languages and server-side languages. If you need to develop the fronted and backend of a website you’ll need a database as well (MySQL, MSSQL, PostgreSQL, MariaDB, MongoDB, Firebase, SQLite..) and if you can use the same language for both the frontend and the backend it’s better, so look at full-stack development. For example if you want to use Javascript, you may use javascript on the front and Node.js on the server side. If you like Python, you can use Django, if you like Ruby, have a look at RoR (Ruby on Rails). Possibilities are endless. You can also mix languages, you just need to find out how to share data from a program to another. Data may be stored in a database, a text file or a json, CSV or YAML file. If you want to expose the functionality of your web service to others, you may implement public APIs that other people can interact with. You may also benefit from using APIs other people developed. Some may be free to use while some may require a premium account. Whether you want to use them or not depends on your needs, and if they allow you to save time and it’s okay for you to rely on them or you want to build a complete private infrastructure that you manage yourself.

All the things you read in this article apply to general programming, because at the end of the day, security tools development is just pure programming. You need to learn the basics before jumping into complex projects.

Did you enjoy this article?
Signup today and receive free updates straight in your inbox. We will never share or sell your email address.

Author: Fabio Baroni   Date: 2016-09-05 11:12:04

Related posts:

Leave a Reply

Your email address will not be published. Required fields are marked *