11 must-know data types and structures as a python programmer

A reference list to the most useful/common data types and structures you should know to have a career in the data world

Idowu Odesanmi
Nerd For Tech

--

Yellow files representing data in a yellow integrated circuit line
Istockphoto.com

Data types and structures are a big, big deal in tech!! 💻

You might have already heard your developer friends or a colleague at your new job talk about ‘Data Structures and Algorithms(DSA)’ like an impossible peak 🗻 of computer science knowledge that is difficult to attain; I know I have.

But the aim of this post is not to discuss ‘Data Structure and Algorithms’.

In the short time since I began my journey in data science, I have come across numerous data types and data structures that have made me realise that keeping a catalogue of them would help me, and maybe you, understand them better.

First, data types are the most basic and most common classification of data that most developers, no, most people interact with every day. In essence, data type specifies the kind and nature of data.

It is easy to conflate data types with data structures, as the latter is simply a collection of different data types for specific operations to be performed; Data structures help organise data items in memory.

Without further ado, let’s get into it!! 💪

DATA TYPES

Strings

Not to be confused with string objects in several languages, string literals (or primitive strings) are sequences of characters (char), visible or not, repeated or unique. A string can contain any number of characters and it’s always enclosed in single or double quotation marks.

>> "Hello World"  # A string
>> '12345' # Another string
>> "Start in 10 minutes" # A string

Basically, all keyboard inputs or text output surrounded by quotation marks are strings. We can as well call a string a collection of characters too.

Numeric (Integer, Floating-point & Complex)

>> 1000 # an Integer
>> 1001.2 # a float
>> 100 + 2J # a complex number

Like in basic Maths, integers include zero, positive and negative whole numbers (including binary, octal and hexadecimal); floating-point numbers are positive and negative real numbers with decimal preceding the fractional parts; while complex numbers have real and imaginary aspects, with letter ‘J’ showing the imaginary part.

DateTime

The good ol’ DateTime 📅🕐! What can we do without you?
DATETIME type is used for values that contain both date and time parts. There are so many different formats of date and time out there.

>> yyyy*mm*dd HH*MM*SS --datetime
>> mmm*dd*yyyy --date
>> HH*MM --time

Unlike in relational databases and some languages, Date and Time in Python are not exactly data types of their own, they are represented and manipulated as DateTime objects. To be specific, the name ‘DateTime’ addresses the in-built python module for manipulating date and time objects.

Date, time, datetime, timedelta, tzinfo, timezone are the six classes in the DateTime module; each with methods for dealing with a wide range of scenarios.

Boolean

It’s another built-in data type in many programming languages and it takes up just two values: True or False. The boolean data type is used to represent truth values of expressions, like in our day-to-day usages of mutually exclusive terms, such as yes or no, right or wrong and yin and yang.

>> 1 == 0 # comparison expression checking for equality
False
>> 25 > 18 # another comparison expression
True

DATA STRUCTURES I’VE COME ACROSS

Arrays

Now, bear with me for a minute here! If you’ve had any stint learning or working with the common programming languages (Python, Javascript, Java, etc), you’ve probably also come across arrays. Unambiguously, an array is a sequence of data items of the same type.

Beyond this definition, there is a whole host of technicalities associated with creating arrays and their implementation in different languages, which frankly speaking, I don’t know enough to give an opinion. What I do know is what array means in Python and this may apply to other common high-level languages.

In Python, an array is a data structure that stores a collection of items in a contiguous manner. They are mutable, ordered and enclosed in square brackets.

>> py_array_1 = arr.array("i", [3, 6, 9, 12]) # python in-built array declaration
>> print(py_array_1)
>> print(type(py_array_1))
array('i', [3, 6, 9, 12])
<class 'array.array'>

For your general use, the array module, as used above, allows easy access to the traditional homogenous array structure in python, while Numpy (a python library) has a more flexible array structure for processing and manipulating numerical data with its many special functions.

>> np_array_1 = np.array(["numbers", 3, 6, 9, 12])  # Numpy array declaration
>> print (np_array_1)
>> print(type(np_array_1))
['numbers' '3' '6' '9' '12']
<class 'numpy.ndarray'>

Lists

Yeah, not that kinda list!
Like an array, a list holds a collection of items that are bordered by square brackets. Lists differ from arrays in that the items don’t have to be homogenous (same type) and while arrays can have multiple axes, a list is just a 1-d array.

>> list = [3, 'friends', 9.5, 'blog', [0,1,1], 'binary']
>> print(list)
[3, 6, 9, 12]

Lists are mutable, ordered, do not need to be unique and can easily be nested.

Sets

Perhaps set theory from your high school days came to your mind when you saw sets. Yes, it’s the same set. We generally define sets as a well-defined collection of distinct (or unique) members ( or elements).

Furthermore, they are unordered and mutable, but each member is immutable in itself. This means mutables like dictionaries and lists cannot be elements in a set.

In python, sets are defined with squiggly brackets {}or with an in-built .set() method.

>> set([1,2,2,4,5,0,0]) # definining a set
{1,2,4,5,0}
>> x = 'floooowwww'
>> set(x)
{'w', 'o','f', 'l'}

Dictionary

You remember how you search for the meaning of words by looking them up in a dictionary, like a key to a doorway? A dictionary data structure is not so different. We store data in a dictionary using an immutable key that is paired/mapped to a specific value(s).

programmatically.com

Keys can be any immutable data type or structure, but the values can be anything. It helps to think of a dictionary as a key:value pair because that is exactly how it is displayed as an output.

Note that unlike sequences (arrays, sets and lists) where indexing is done by a range of integers, dictionaries are indexed using the keys.

Series

Although it is not an in-built python structure, it is as useful and popular as any structure, thanks to the pandas library. A series is more or less a table column with a label and index. Technically, it is a one-dimensional ndarray that can hold any data type, including time-series data.

>> d = {'idowu': 1, 'reader': 2, 'data': 3, 'partition': 'four','a':    5, 'b':6, 'c':7}
>> ser = pd.Series(data=d, )
>> ser
idowu 1
reader 2
data 3
partition four
a 5
b 6
c 7
dtype: object

Series elements can be easily accessed using integer-based or label-based indexing. All kinds of basic operations can be performed on a series depending on the data types, but most people rely on the robust pandas series methods to manipulate series data.

DataFrames

In my opinion, this is probably the most thoughtful and intuitive data structure out there. A pandas dataframe is a table with rows and columns, like in Excel. Whenever I see a dataframe, I immediately think of an excel table.

To be technical again, pandas official documentation on dataframes defines it as a ‘two-dimensional, size-mutable, potentially heterogeneous tabular data which can be thought of as a dictionary container for pandas series object.”

For more information on how to construct dataframes, perform operations on them or access their methods, kindly visit the documentation linked in the paragraph above.

GeoDataFrames 🌐

There are data structures I refer to as “exotic data structures”. I call them exotic because they’re structures used in specific fields and niches that most people might never come across them.

In this post, I’ll only mention geopandas GeoDataFrames, mainly because it builds on the knowledge from pandas DataFrames.

Built to add support for geographical data to DataFrames, a geopandas.GeoDataFrame object is a pandas.DataFrame object that has a column with geometry data using the Coordinate Reference System — CRS.

It is generally used to add and process spatial (geographical) information in python. Some satellite imagery 🌍 data, map data and so on are stored this way.

Honourable mentions include tuples, graph, tree, map and so on. For a more comprehensive understanding of data types and structures, I would suggest consulting an excellent Data Structure and Algorithm (DSA) text.

The data types and structures I have listed here are by no means exhaustive, but I can assure you that you can’t do without them in your journey to becoming a seasoned developer.

Which other popular data type or data structure do you think is missing from the list? Let me know in the comments!!

Thanks for reading!!!

--

--

Idowu Odesanmi
Nerd For Tech

Technical Writer and Developer— But sometimes, I dabble in territories unknown.