CETM50 ASSIGNMENT 2
Name: Istvan Franko
Programme: MSc Cybersecurity
Project Title:
Analysis and Report
1. Discussion of Big Data and Cyber Security issues
External threats and prevention
2. Discussion of Legal, GDRP, IRM and BI issues
b. General Data Protection Regulation (GDPR)
c. Information Risk Management
1. Who is the most frequent customer(s)
2. Which is the least profitable year in terms of money
3. What is the most profitable product(s) in terms of money in 2016
4. What is the best postcode(s) in terms of money in 2016
5. What is the least popular product in terms of frequency of sales in 2016
6. Use of ggplot2 to provide 2-3, graphs or charts etc
Customer advice – Report
The most spectacularly developing field of the last decades was IT. Companies that have realized the potential of data storage and IT networks have become world leaders. Today, these companies control and determine all areas of trade. Those who are unable or unwilling to line up and follow these trends are at a great disadvantage compared to their competitors. That is why every trading company has a vital need to keep abreast of developments and keep up with the latest technology. In this proposal, we review the current services available and the underlying legal regulations, as well as the security measures required to enable your company to compete with its competitors.
1. Discussion of Big Data and Cyber Security issues
In the first chapter, we review the data storage options and their security knowledge.
a. Big Data
Big Data refers to databases whose size or traffic exceeds the capabilities of a home computer. Just looking at the size of the databases, in most cases, the memory and data storage capacity of a home computer would be sufficient for the needs of a small and medium-sized business. Even the most commonly used servers do not have significantly larger hardware. However, data management and traffic require special software and deployment. It is advisable to store the data of similar companies in a structured way because the data are related to each other. Therefore, it is strongly recommended that you use a server and develop a data management and storage strategy. But first, let's look at the advantages and disadvantages of Big Data (Research Gate, 2014).

Pros of Real-Time Big Data
- In many ways, it is more economical than traditional manual data storage and management. Significantly reduces spending.
- Make companies competitive with large companies.
- Makes data analysis a lot easier, so you can follow trends and local conditions.
- Data can be accessed from anywhere and can be used for online sales and marketing.
- Connect and serve applications that serve different purposes.
Cons of Real-Time Big Data
- Traditional databases should be digitized and stored in new systems, which can be costly.
- Due to the large number of data formats and structures, some applications may be incompatible
- Are exposed to malicious attacks
- Data loss or damage can cause major damage to a company
- Due to their size and complexity it is not always easy to analyse the data
- Rapid development requires constant monitoring of system updates.
It is clear from the above lists that each advantage has its disadvantage. But the disadvantages can be minimized with careful planning, professional design and regular maintenance. However, the benefits are essential for any trading.
Data Management
As mentioned at the beginning, our data should be stored on servers. It sounds very simple, but more complex than it seems at first.
There are several options for anyone who needs a server. The first question you need to decide is where to store your server from. While a few years ago many companies used their own physical servers in their own server rooms, the situation has changed significantly. Cloud storage has become so widespread and cost effective that every company is moving their data to such a service. There are several reasons for this. First and foremost, owning a dedicated server requires a lot of investment and ongoing maintenance. Large-scale heat production must be eliminated, which further increases costs. Energy consumed is also significantly more expensive in European countries, such as America or Asia. A feature of cloud-connected servers is that their location cannot be determined locally. They are part of a complex system that takes advantage of the economic opportunity and thus provides a much cheaper service.
In the cloud, you can either rent your own physical server or you can request a virtual server. This depends on the size and data traffic of the company. A virtual server is perfectly enough for your company right now. Such a server can be rented for as little as $ 5 a month. This is only hardware, even if it is virtual. But we also need software. Among the software, commonly used Windows systems are not suitable or recommended for serving a server. UNIX-based operating systems are commonly used to run servers. They also have free versions of LINUX, some of which are commercial products where the maker guarantees the operation of the services. Free versions are fully functional and secure. Server operators also offer several free versions, such as Debian, which can be installed in a few seconds at the push of a button in pre-configured mode. If you choose such an operating system, pressing the button will give you an operating system that is not normally capable of providing network services. Therefore, we also need a control panel. You can then choose from several software. The most commonly used commercial software is cPanel, which contains all the necessary services. A good alternative might be the free ISPManager, which, while not fully featured, has everything you need to run an average server. If you have already done so, you already have a server with an operating system and a control panel. The next necessary thing to run an online business is a CMS system. These are software that make it easy to manage the content of our websites. Such systems are Joomla or WordPress. All of them are also free. There is also a version of these content management software that already includes a built-in online store. Such a web shop is VirtueMart, for example, which by default also costs nothing, but many add-ons such as email mailing list management and sending software are either subscription or one-time fees. In addition to the ones listed above, you also need a registered domain name to access your store. The typical annual fee for this is around $ 10-15, of course, depending on the extension you choose. In this section, we have described the most cost-effective solution. The total cost of these services is less than $ 20 per month. But you need a specialist who can install and configure them. You no longer need IT skills to operate. (The author of the proposal has been running such systems for over 15 years and is able to handle online sales tasks of a similar size company)

(Hosting UK, 2020)
Another costly solution is to hire a complete online trading system. There are countless offers to choose from. The experience, however, is that these websites always have some shortcomings that may be important to us. Until now, we couldn't find anything that contained everything we wanted to use. The biggest mistake of these ready-made systems is that it is difficult to find one that suits your individual needs, and any error or unique wish that arises depends only on the quality of support. Overall, we do not recommend renting or using such a web shop.
The third option is to join a large online market such as Amazon or eBay. Every global trading system has an online shop service that can be easily opened and operated. The benefits of these marketplaces are:
- High attendance
- Easy operation
- Built-in analysis tools
Disadvantages:
- Per transaction and listing fees
- Poor individual marketing opportunities
- Limited customer communication
- Generally over-regulated framework
(Ecommerce News Europe, 2020)
The fourth and most expensive solution is to develop custom software. This is only necessary if you have a special task that none of the three options listed above can provide. In this case, since the operation of an average company should be ensured, this solution is not recommended.
Apparently, this chapter is not just about our data. Although our data is most valuable, it must be stored and managed. The services described here are required to securely store our information and make it accessible to customers and employees in a manageable format.
b. Cyber Security
Cyber Security is a very complex task that covers all aspects of IT systems. It is not enough to build a system once, it must be constantly monitored and updated. Our systems also need to be protected from external and internal attacks. This chapter describes the attack options and their defence methods.
Internal attacks and tasks
First of all, it must be clarified who is responsible for building and operating the cyber security system. In a company of this size, which has several stores, it is imperative to appoint a person responsible for the security of the IT systems. Building a complete system is not a one-man job, it is advisable to entrust it to a certified expert company. Operation at its current size can be solved by a permanent full-time employee with at least a mid-level specialist qualification. This employee is able to oversee the systems if his job is well regulated and adhered to. However, in the event of a major attack or modification, it is advisable to re-engage a larger company.
The first and most important step in preventing possible attacks is to build a good system. The most necessary tasks are:
- Protecting your entire IT network. A well-built, configured, and encrypted network is almost unbreakable.
- Protect all networked computing devices. Not only must the server and client machines be protected by secure operating systems, firewalls and virus protection, but every device. The attack target could be a barcode reader or any intelligent sensor with its own software.
- Only genuine software from a dedicated distributor should be used and updated regularly. Most cyber-attacks are caused by software from unauthorized sources and modified with malicious content.
- Specify on which machines which tasks are required and only allow them. All unnecessary services should be disabled.
- Most of the attacks do not come from outside personnel, but from internal employees, so you have to monitor who did what and was entitled to it.
- All employees should be trained in the safe operation of the system and their knowledge should be checked periodically.
- You must also be prepared for cases of device failure. Therefore, regular backups and synchronized multiple data storage are required. If one of the storage media fails, you will not lose your data.
External threats and prevention
Most of the attacks do not come from inside but from malicious third parties we do not know. Therefore, it is necessary to analyse what attacks are expected and how we can prepare for them. Here are the tasks involved.
- Minimize customer upload data. Only the most important data needed for actual transactions need to be collected.
- All incoming digital data, such as emails or other formatted documents, should be screened for malicious software.
- The system must be shaken off by automatic software. Only real people should be able to shop. This can be accomplished, for example, by monitoring the IP address, intermittent blocking after bad logins, or using CAPTCHA.
- Our IT assets must be secured against external access. Must be stored and used safely in confined areas.
- All IT devices that leave our buildings must be equipped with software that encodes the contents of all data storage to prevent unauthorized access to data in the event of any loss or theft. These devices include mobile phones, laptops and portable data storage devices such as USB drives.

This chart illustrates the tasks required to prevent attacks and restore our data (Ray, 2018). The list of tasks listed in this chapter also fits in with the diagram, but in cultivation no list can be exhaustive as it is a constantly changing field.
2. Discussion of Legal, GDRP, IRM and BI issues
Data management is governed by a number of laws. Failure to comply with these rules can result in severe penalties, which can be fatal for a business. That's why everyone needs to know, not just the leaders. There is also a need to learn more about digital data in the areas of risk analysis and enterprise management software. In this chapter we will describe these. (IT Governance, 2019)
a. Legal Regulations
Data protection has always been important and regulated in the UK. The European Union Directive 95/46 of 1995 was adopted by the UK in 1998. At that time, the highest penalty was £ 500,000. Due to the increasing penetration of IT, new regulations were needed. The EU has released its latest allocation under the name of the General Data Protection Regulation (GDPR), which was adopted by the UK Data Protection Act (DPA). Under the new law, fines of up to € 20,000,000 can be imposed or up to 4% of the company's annual revenue.
The United Kingdom has additional privacy laws. The Privacy and Electronic Communications Regulations (PERC), commonly known as the "cookie's law", came into force in 2003 and regulated electronic sales, communications and marketing. In the meantime, this has become obsolete, so the legislation is working on an update under the name Regulation on Provision and Electronic Communication (ePR), but has not yet entered into force, so PERC is still in force.
Of course, the laws listed are not exactly the same, but they have much of the same content. If we are aware of and comply with the provisions of the latest law (GDPR), then we cannot make a big mistake, so we will explain the main rules here.
b. General Data Protection Regulation (GDPR)
GDPR is a European Union law (2016/679) that only regulates the processing of personal data of individuals. This applies to all persons residing and staying in the Union and will enter into force on 25 May 2018. The regulation clearly defines what organizations have to do with the handling of personal data, such as the collection, storage, retrieval, alteration and destruction of personal data.
Examples of personal information include:
- Names
- Titles
- Personal identification numbers, such as tax number, passport number
- Electronic identifiers such as email, IP address, customer number
- Bank details
but also, sensitive personal data:
- Sexual orientation
- Religion
- Political affiliation
- Health records
So, all the data that can be linked to a real individual.
The most important rules that organizations must adhere to.
- legal, fair and transparent process of data
- management for specific purposes only
- only relevant and necessary data for the task can be collected and stored for as long as is absolutely necessary
- they must be safe and must not be passed on to third parties
- ensure that the person is free to process, modify or delete data
- report any theft or attacks immediately (within 72 hours) to the authorities
c. Information Risk Management
One of the key elements of data process is risk management, which is a must for any company. You have to be prepared for all the upcoming events, even if you have a very low chance of occurring. The main tasks of risk management are:
- Authentication
- Raising awareness
- Recognition
- Reaction
- Inspection
All stations of risk management shall be documented. Identification can help to draw lessons from events that have already taken place, but you should also be prepared for events that have not yet occurred, but which may occur. You can also group events by occurrence:
- Almost certain
- Likely
- Possible
- Unlikely
- Rare
Another way to group the damage is:
- Catastrophic
- Major
- Moderate
- Minor
- Insignificant
The extent of this may vary from company to company, but it is a good idea to determine how much they represent to our company.
From the above two groupings the following table can be compiled (Latest quality, 2019):

In the risk analysis, it is necessary to decide on the method to reduce the hazards of the treatment. Examples include:
- Transfer (move to 3rd party)
- Reduce
- Avoid
- Ignore
- Removal
There are frameworks for risk management. These include the International Organization for Standardization (ISO) and the Control Objectives for Business and Information Technology (COBIT5). Using both can be an effective aid in a well-developed risk management policy.
One of the best ways to manage risk in IT systems is to have a well-established security and access system. Many software designed for this purpose can also help. It is not possible to describe all of them, so we only define the most important tasks. Central access system used for identification. Protect networks with firewalls and encryption. The security system can be supplemented with physical surveillance systems such as CCTV. There are MDM's and EMM's systems for mobile devices, which control the usable applications and the usable content. The basic knowledge of this can be found in the https://www.ncsc.gov.uk/guidance, which is worth reading and familiarizing yourself with your employees.
d. Business Intelligence
When analysing data like this one should keep in mind the main benefits of digital storage. And that makes it easier to run the company. A good database can easily analyse your company's processes and market changes. Significant cost savings when using administrative software, a large number of employees.
The following figure shows where such enterprise management software can be used (Research gate, 2013).

Not only can we use this software to analyse our data, but we can also make predictions with the right formulas. Easy comprehension is supported by many visual aids. Some examples of grouping and presentation are given in the Appendix to this document. The first such TPS programs were launched in the 1950's and have been a growing business ever since. In this recommendation, it is not possible to list all the achievements of the last 70 years and the companies that deal with it, but we wanted to mention the direction in which our data could further benefit your business after building a functioning web store.
3. Summary
I hope that this offer has aroused your interest and provided them with the most important information to help them make a meaningful decision. The development of this type of company is largely determined by the cost to invest. That's why we've introduced the cheapest initial solution, starting with a few thousand pounds of one-time investment and a monthly subscription fee of around £ 20. Revenue from upgrades can be used to continuously expand the system. Finally, we make it clear that we have the equipment, software, and the commissioning and operation of all the devices listed in the offer. In the event of a positive evaluation, we will be happy to undertake further consultations or fulfil one-off or long-term assignments.
4. References
Ecommerce News Europe. 2020. Global marketplaces to own 39% of online retail market in 2020. [Online]. [22 January 2020]. Available from: https://ecommercenews.eu/global-marketplaces-to-own-39-of-online-retail-market-in-2020/
Hosting uk. 2020. SSD VPS SERVERS. [Online]. [19 January 2020]. Available from: https://hostinguk.net/cloud-servers
IT Governance. 2019. Data Protection. [Online]. [20 January 2020]. Available from: https://www.itgovernance.co.uk/data-protection
Latest Quality. 2019. How To Create A Risk Heat Map in Excel. [Online]. [18 January 2020]. Available from: https://www.latestquality.com/risk-heat-map/
Ray, J. 2018. IDEFENSE CYBER THREAT INTELLIGENCE BLOG. [Online]. [15 January 2020]. Available from: https://www.accenture.com/us-en/blogs/blogs-cyber-intelligence
Research gate. 2014. A Data Analytic Framework for Unstructured Text A Data Analytic Framework for Unstructured Text. [Online]. [23 January 2020]. Available from: https://www.researchgate.net/publication/264129835_A_Data_Analytic_Framework_for_Unstructured_Text_A_Data_Analytic_Framework_for_Unstructured_Text
Research gate. 2013. Improving Decision Making with Information Systems Technology–A theoretical approach. [Online]. [20 January 2020]. Available from: https://www.researchgate.net/publication/289993997_Improving_Decision_Making_with_Information_Systems_Technology-A_theoretical_approach
5. Appendix
Analysis of the Data
General settings:
# Set working directory
setwd("D:/UNIVERSITY/CyberSecurity MSc/CETM50/Assignment 2")
# Load data from Workplace file
load("D:/UNIVERSITY/CyberSecurity MSc/CETM50/Assignment 2/.RData")
# Load dplyr library
library("dplyr")
1. Who is the most frequent customer(s)
# Join purchase and personal tables by id
orders <- left_join(purchases, personal, by = c("id", "id"))
# Top 10 frequent customer
orders %>% count(Name = name, sort = TRUE, name = "Orders") %>% top_n(10)

2. Which is the least profitable year in terms of money
# Join purchase and product tables by id
sales <- left_join(purchases, products, by = c("product", "product"))
# Sales group by year order by income
sales %>%
group_by(Year = year) %>%
summarise(Income = sum(cost)) %>%
arrange(Income)

3. What is the most profitable product(s) in terms of money in 2016
# Join purchase and product tables by id
sales <- left_join(purchases, products, by = c("product", "product"))
# TOP 10 profitable product in 2016
sales %>%
filter(year == "2016") %>%
group_by(Product = product) %>%
summarise(Total = sum(cost)) %>%
arrange(desc(Total)) %>%
top_n(10)

4. What is the best postcode(s) in terms of money in 2016
# Add postcode column from personal table to sales table by id
postcodes <- left_join(sales, select(personal, id, postcode), by = c("id", "id"))
# TOP 10 postcode in 2016

postcodes %>%
filter(year == "2016") %>%
group_by(Postcode = postcode) %>%
summarise(Total = sum(cost)) %>%
arrange(desc(Total)) %>%
top_n(10)
5. What is the least popular product in terms of frequency of sales in 2016
# Join purchase and personal tables by id
orders <- left_join(purchases, personal, by = c("id", "id"))
# TOP 10 unpopular product in 2016
orders %>%
filter(year == "2016") %>%
count(Product = product, sort = TRUE, name = "Sold") %>%
arrange(Sold) %>%
top_n(-10)

6. Use of ggplot2 to provide 2-3, graphs or charts etc
#Pie Chart
ggplot(sales %>%
group_by(Year = year) %>%
summarise(Income = sum(cost)) %>%
arrange(Income),
aes(x = "", y = Income, fill = factor(Year))) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Income)),
colour = "White",
position = position_stack(vjust = 0.5)) +
theme_classic() +
labs(x = NULL,
y = NULL,
fill = NULL,
title = "Total incomes per year") +
theme_classic() +
theme(axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.title = element_text(hjust = 0.5, color = "#666666"))

# Plot boxplot of Year Versus Income with line of evolution
ggplot(sales %>%
group_by(Year = year) %>%
summarise(Income = sum(cost)) %>%
arrange(Income),
aes(x = Year, y = Income)) +
geom_point(pch = 17, color = "blue", size = 2) +
geom_smooth(method = "lm", color = "red", linetype = 2) +
labs(subtitle = "Avarage evolution red dashed line",
title = "Company income per years",
x = "Year",
y = "Total Income")

# Scatterplot with geom_smooth
ggplot(sales %>%
group_by(Year = year, Product = product) %>%
summarise(Total = sum(cost))%>%
arrange(desc(Total)) ,
aes(x=Year, y=Total, colour = Product)) +
geom_point() +
geom_smooth(method = "loess", se = F) +
ylim(c(0, 20000)) +
labs(subtitle = "Income per year",
y = "Income",
x = "Year",
title = "Income of products",
caption = "Source: sales")

# Stacking bars
ggplot(sales %>%
group_by(Year = year, Product = product) %>%
summarise(Total = sum(cost)) %>%
arrange(desc(Total)) %>%
top_n(10)) +
geom_col(aes(x = Year, y = Total, fill = Product)) +
labs(subtitle = "Income per year",
y = "Income",
x = "Year",
title = "Profit on TOP 10 products",
caption = "Source: sales")
