China proposes stricter curbs on training data and models used to build generative AI services in bid to tighten security

4 Min Read

China is planning stricter curbs on how generative artificial intelligence (AI) services are applied in the country, as authorities attempt to strike a balance between harnessing the benefits of the technology while mitigating the risks.

New draft guidance published on Wednesday by the National Information Security Standardisation Technical Committee, an agency that enacts standards for IT security, targets two key areas for improvement – the security of raw training data and the large language models (LLMs) used to build the generative AI services.

The draft stipulates that AI training materials should not infringe copyright or breach personal data security. It requires that training data be processed by authorised data labellers and reviewers to pass security checks first.

Do you have questions about the biggest topics and trends from around the world? Get the answers with SCMP Knowledge, our new platform of curated content with explainers, FAQs, analyses and infographics brought to you by our award-winning team.

Secondly, when developers build their LLMs – the deep learning algorithms trained with massive data sets that power generative AI chatbots such as Baidu’s Ernie Bot – they should be based on foundational models filed with and licensed by authorities, according to the draft.

The draft proposes a blacklist system to block training data materials that contain more than 5 per cent of illegal content, together with information deemed harmful under the nation’s cybersecurity laws.

Illegal content in China is typically defined as material that incites violence and extremism, spreads rumours and misinformation, or promotes pornography and superstition. Beijing also censors sensitive political information, such as questions about Taiwan’s status.

The draft proposes that during the training process, developers should consider the security of the content generated as one of the major points of evaluation, and “in every dialogue [with generative AI services], information keyed in by users should go through a security check to ensure the AI models generate positive content”.

The proposed draft is soliciting public feedback until October 25.

China in August imposed a general regulation targeting domestic generative AI services, making it one of the first countries to impose rules governing the emerging technology.

The Chinese government last month approved a batch of local generative AI services, including chatbots from search engine giant Baidu, state-backed iFlyTek, Zhipu AI, Sogou co-founder Wang Xiaochuan’s new venture Baichuan and SenseTime.

In tests performed by the Post, Chinese chatbots respond in a variety of ways when asked whether Taiwan is part of China. Some refuse to give a response and end the conversation abruptly, while others give a brief, affirmative response before also ending the interaction.

This article originally appeared in the South China Morning Post (SCMP), the most authoritative voice reporting on China and Asia for more than a century. For more SCMP stories, please explore the SCMP app or visit the SCMP’s Facebook and Twitter pages. Copyright © 2023 South China Morning Post Publishers Ltd. All rights reserved.

Copyright (c) 2023. South China Morning Post Publishers Ltd. All rights reserved.

Share This Article
By admin
test bio
Leave a comment
Please login to use this feature.