How we used Python to analyse communication data.

October 3, 2024

An innovative language model that interprets your data securely.

October 3, 2024

We live in an age where data has become a crucial resource for a company's success. In fact, with the growing amount of information available, the ability to analyse and interpret data has become a fundamental skill.

Do you remember that, in one of my last posts, I talked about my interest in data analysis? Well, this is a topic that I am particularly passionate about, because data, if used well, can become a real driver for continuous improvement.

One area where data analysis proves particularly useful is in corporate communications: understanding the context in which we operate and the behaviour of our audience allows us to optimise our texts, communication strategies and interactions with customers.

In other words, data not only tells us what is happening, but also shows us how to act accordingly.

Why Python for data analysis?

When it comes to analysing data, there are many possibilities. Some rely on tools such as Excel or Google Analytics to gain useful insights. I chose Python, an incredibly versatile programming language that is suitable for performing analyses quickly and effectively, making it perfect for those who, like me, want to transform numbers and statistics into useful information and concrete actions.

With the help of Google Colab, I examined in detail the performance of the content we published on social media, also comparing the trend with that of our competitors. This allowed me to clearly understand what was working and what was not, helping me to refine our communication strategies in a much more targeted and informed way.

In addition to its simple and intuitive syntax, which makes it accessible even to those without an advanced technical background, Python also has a large number of specialised libraries, such as Pandas, NumPy, and Matplotlib, which make it incredibly flexible and adaptable to any type of analysis, allowing you to easily switch from importing and manipulating data to visualising and modelling it.

Based on our experience of using it, let's take a closer look at how these libraries can be used to optimise communication and guide strategic decisions.

Analysis process: our experience

Libraries should be thought of as a set of ready-to-use tools developed by others to simplify certain coding tasks.

The analysis was carried out on content performance and comparison with competitors. Two libraries, mentioned above, proved particularly useful for this type of analysis: Pandas and Matplotlib which, although different in their purposes, complement each other perfectly, allowing raw data to be transformed into useful information and strategic actions. The former allows data to be managed and analysed efficiently by facilitating operations such as filtering, aggregating and transforming information, while the latter provides a wide range of options for creating customised visualisations: through its graphs and diagrams, results can be represented intuitively, making it easier to identify trends, anomalies and opportunities for improvement in content performance.

Define your goals

As with any technical analysis, before beginning to explore and analyse data, it is essential to have a clear understanding of the process you intend to follow to achieve your goal. This allows you to maintain focus and optimise time, avoiding unnecessary steps or distractions. For this reason, using a visual collaboration tool such as Miro, I have developed a model consisting of several steps that allows me to follow a precise and consistent guideline, ensuring that each stage of the process is well defined and functional:

1. Installation and importing datasets

The pipeline begins with the installation and import of datasets for data analysis, which are essential for faster data loading and optimising our analyses.

2. Understanding the big picture

This is a strategic phase that serves to understand the context in which the analysis will take place. Therefore, before actually working with the data, it is necessary to ask yourself: what are the objectives? What do I want to find out?

3. Preparation

This phase involves the practical preparation of data: before importing the data, it is extremely important to assess the quality and structure of the available datasets. Therefore, this phase involves checking whether the data is complete and reliable.

4. Study and understanding of variables

It is essential to check whether the variables or columns are well defined; for this reason, we proceed with the identification of relevant variables, deciding which characteristics or columns of the dataset are needed for the analysis and which are not. In fact, sometimes we proceed with the exclusion of some data, considered irrelevant or superfluous for the specific objective. In our case, it was inevitable to exclude from the analysis some columns containing data that was useless for the purpose (see point 5).

5. Data import

At this stage, we proceed with importing the Pandas library, which is essential for filtering and cleaning data, in order to prepare it for the next stage involving the manipulation and analysis of well-structured data.

1# Manipolazione dati
2import pandas as pd

After that, the data is imported via Pandas and organised into a DataFrame, which allows it to be structured in a similar way to a relational database table, with columns and rows. This structure facilitates visualisation and prepares the data for the next stage, which involves representing it on graphs.

1df_competitor = pd.read_csv(
2  'COMPETITOR_ANALYSIS.csv', header=1
3)  
4print(df_competitor)

1df_content = pd.read_csv(
2  'Content_Analysis.csv', header=1
3)
4print(df_content)

After thoroughly understanding the variables, for the second type of content-focused analysis, it was necessary to remove columns considered superfluous, particularly those relating to sponsorships, in order to avoid redundancy of certain data.

1df_content = df_content.drop([
2  'Impressioni (sponsorizzate)', 
3  'Clic (sponsorizzati)', 
4  'Reazioni (sponsorizzate)', 
5  'Commenti (sponsorizzati)', 
6  'Diffusioni post (sponsorizzate)', 
7  'Percentuale di interesse (sponsorizzato)'], 
8  axis=1
9)

6. Data display

Once you have obtained the right data, and after manipulating and modelling it, it is time to visualise it in order to draw clear conclusions and accurate interpretations. This is where Matplotlib comes into play, an essential library for creating detailed and customised graphs that visually represent information and allow you to give free rein to your creativity in

1# Data visualisation
2import matplotlib.pyplot as plt

presentation of data, whether in bar charts, pie charts, scatter plots, line graphs, etc.

1# Grafico a barre
2plt.figure(figsize=(6,4))
3plt.bar(df_competitor['Competitor'], df_competitor['Total Interactions'], color='lightcoral')
4plt.xlabel('Competitor')
5plt.ylabel('Total Interactions')
6plt.title('Total Interactions per Competitor')
7plt.xticks(rotation=45)
8plt.show()
9

In this specific case, to compare the total interactions between the various competitors in our dataset, we used a bar chart, which is considered ideal for clearly visualising the differences between categories. The bars allow us to quickly identify who has obtained the most interactions and make direct comparisons.

The same criterion was adopted for the other categories (such as Followers and Posts). Thanks to the use of distinct colours and well-defined labelling on the x-axis, it was easy to identify competitors and quantify interactions, as well as other categories of interest, allowing for immediate insights.

For content analysis, the “Date” column was converted to datetime format to efficiently manage dates in graphs. After selecting the columns of interest, previously filtered and renamed using Pandas, a multiple graph with subplots was created to display the different metrics on separate graphs, but arranged in a single figure. The size of the figure was set based on the number of rows, multiplying the height of each row by 5 units to ensure sufficient space for clear visualisation.

1# Conversione della colonna 'Date' 
2
3df_content['Date'] = pd.to_datetime(df_content['Date'])
4
5#Selezione delle colonne di interesse
6
7selected_cols = [
8  'Organic Impressions', 
9  'Total Impressions', 
10  'Unique Organic Impressions', 
11  'Organic Clicks', 
12  'Total Clicks', 
13  'Organic Reactions', 
14  'Total Reactions', 
15  'Organic Comments', 
16  'Total Comments', 
17  'Post Diffusions (organic)', 
18  'Post Diffusions (total)', 
19  'Percentage Of Interest (organic)', 
20  'Percentage Of Interest (total)'
21]
22
23#Definizione del numero di sottotrame in base al numero di colonne selezionate
24
25n_cols = 3  
26n_rows = (len(selected_cols) + n_cols - 1)  
27
28fig, axes = plt.subplots(n_rows, n_cols, figsize=(15, 5 * n_rows))
29

Next, graphs are created for each metric: the selected columns are enumerated and, through a cycle, a graph is generated for each metric.

1or i, col in enumerate(selected_cols): 
2  ax = axes.flatten()[i] 
3  ax.plot(df_content['Date'], df_content[col], marker='o', linestyle='-') 
4  ax.set_xlabel('Date')
5  ax.set_ylabel('Value')
6  ax.grid(True)
7  ax.tick_params(axis='x', rotation=45)

Using the axes.flatten() method, the axis matrix is flattened into a one-dimensional array, allowing easy access to each subplot via a single index.

For each subplot, the corresponding metric is plotted along the Date (x) axis and the selected metric (y), adding a circular marker for each point and a line connecting them.

Next, each graph is customised with a title corresponding to the name of the metric, with labels for dates and values, and a grid to improve readability. Finally, the date labels are rotated 45 degrees to avoid overlap.

To exclude empty graphs, subplots without data are removed, and the layout is optimised to automatically reduce the white space between graphs, ensuring that everything is clearly visible and legible before displaying the complete graph with plt.show():

1for j in range(i + 1, n_rows * n_cols):
2  fig.delaxes(axes.flatten()[j])
3
4plt.tight_layout()
5plt.show()

This made it possible to clearly display each metric on a separate graph, facilitating direct comparison between different performances over time and making the analysis as orderly and understandable as possible.

A good method

Every company has its own method of analysis. In our experience, an excellent analysis, capable of extracting concrete values from data in an immediate and intuitive way, is based on these recommendations:

Clearly define objectives before starting any analysis process in order to obtain more targeted and relevant results.
Fully understand the quality and type of data available, both in terms of content and structure.
using appropriate tools and techniques that facilitate analysis, such as Python and its specialised libraries, is essential to ensure that processes are efficient and that information is extracted in a timely and accurate manner. Tools such as Pandas and Matplotlib offer a competitive advantage by transforming raw data into easily interpretable information that is useful for strategic content planning;
It is necessary to iterate and improve continuously, as analysis is not a static process, but a tool that evolves with the company. Having a good, constantly updated data analysis method not only supports decision-making, but also becomes a real driver for business innovation, ensuring that decisions are made on the basis of concrete facts and not just intuition.

HowweusedPythontoanalysecommunicationdata