6 Data Security Tips for Using AI Tools in Higher Education

Zac Amos July 10, 2024

Collected at: https://datafloq.com/read/6-data-security-tips-for-using-ai-tools-in-higher-education/

As more postsecondary institutions adopt artificial intelligence, data security becomes a larger concern. With education cyberattacks on the rise and educators still adapting to this unfamiliar technology, the risk level is high. What should universities do?

1. Follow the 3-2-1 Backup Rule

Cybercrime isn’t the only threat facing postsecondary institutions – data loss due to corruption, power failure or hard drive defects happen often. The 3-2-1 rule states that organizations must have three backups in two different mediums. One should be kept off-site to prevent factors like human error, weather and physical damage from affecting all copies.

Since machine learning and large language models are vulnerable to cyberattacks, university administrators should prioritize backing up their training datasets with the 3-2-1 rule. Notably, they should first ensure the information is clean and corruption-free before proceeding. Otherwise, they risk creating compromised backups.

2. Inventory AI Information Assets

The volume of data created, copied, captured and consumed will reach approximately 181 zettabytes by 2025, up from just 2 zettabytes in 2010 – a 90-fold increase in under two decades. Many institutions make the mistake of considering this abundance of information an asset rather than a potential security issue.

The more data a university stores, the easier it is to overlook tampering, unauthorized access, theft and corruption. However, deleting student, financial or academic records for the sake of security isn’t an option. Inventorying information assets is an effective alternative because it helps the information technology (IT) team better understand scope, scale and risk.

3. Deploy User Account Protections

As of 2023, only 13% of the world has data protections in place. Universities should strongly consider countering this trend by deploying security measures for students’ accounts. Currently, many consider passwords and CAPTCHAs adequate safeguards. If a bad actor gets past those defenses – which they easily can with a brute force attack – they could cause damage.

With techniques like prompt engineering, an attacker could force an AI to reveal de-anonymized or personally identifiable information from its training data. When the only thing standing between them and valuable educational data is a flimsy password, they won’t hesitate. For better security, university administrators should consider leveraging authentication measures.

One-time passcodes and security questions keep attackers out even if they brute force a password or use stolen login credentials. According to one study, accounts with multi-factor authentication enabled had a median estimated compromise rate of 0.0079%, while those without had a rate of 1.0071% – meaning this tool results in a risk reduction of 99.22%.

4. Use the Data Minimization Principle

According to the data minimization principle, institutions should collect and store information only if it is immediately relevant to a specific use case. Following it can significantly reduce data breach risk by simplifying database management and minimizing the number of values a bad actor could compromise.

Institutions should apply this principle to their AI information assets. In addition to improving data security, it can optimize the insight generation process – feeding an AI an abundance of tangentially relevant details will often muddle its output rather than increase its accuracy or pertinence.

5. Regularly Audit Training Data Sources

Institutions using models that pull information from the web should proceed with caution. Attackers can launch data poisoning attacks, injecting misinformation to cause unintended behavior. For uncurated datasets, research shows a poisoning rate as low as 0.001% can be effective at prompting misclassifications or creating a model backdoor.

This finding is concerning because, according to the study, attackers could poison at least 0.01% of the LAION-400M or COYO-700M datasets – popular large-scale, open-source options – for just $60. Apparently, they could purchase expired domains or portions of the dataset with relative ease. PubFig, VGG Face and Facescrub are also supposedly at risk.

Administrators should direct their IT team to audit training sources regularly. Even if they don’t pull from the web or update in real time, they remain vulnerable to other injection or tampering attacks. Periodic reviews can help them identify and address any suspicious data points or domains, minimizing the amount of damage attackers can do.

6. Use AI Tools From Reputable Vendors

A not insignificant number of universities have experienced third-party data breaches. Administrators seeking to avoid this outcome should prioritize selecting a reputable AI vendor. If they’re already using one, they should consider reviewing their contractual agreement and conducting periodic audits to ensure security and privacy standards are maintained.

Whether a university uses an AI-as-a-service provider or has contracted a third-party developer to build a specific model, it should strongly consider reviewing its tools. Since 60% of educators use AI in the classroom, the market is large enough that numerous disreputable companies have entered it.

Data Security Should Be a Priority for AI Users

University administrators planning to use AI tools should prioritize data security to safeguard the privacy and safety of students and educators. Although the process takes time and effort, addressing potential issues early on can make implementation more manageable and prevent further problems from arising down the road.