从结构复杂的合同文档中抽取关键信息,当前仅支持PDF格式的合同文档。
发送 POST 请求到:
澜舟科技合同信息抽取服务所提供的 API 接口均通过 HTTPS 进行通信,提供高安全性的通信通道。
按调用成功次数统计调用量。
请求头参数 | 是否必须 | 描述 |
---|---|---|
Accept | 是 | 固定值 application/json 。 |
Authorization | 是 | 用于验证请求合法性的认证信息,值为 AccessKey:signature格式。 |
Content-Type | 是 | 固定值 application/json 。 |
Content-MD5 | 是 | HTTP 协议消息体的 128-bit MD5 散列值转换成 Base64 编码的结果。 |
Date | 是 | 请求时间,GMT 格式,如: Wed, 20 Apr. 2022 17:01:00 GMT。 |
x-langboat-signature-nonce | 是 | 唯一随机数,用于防止网络重放攻击。在不同请求中要使用不同的随机数值。 |
x-langboat-signature-method | 是 | 签名方法,目前只支持 HMAC-SHA256 。 |
签名计算方法
计算示例:
{"sourceText": "Where there is a will, there is a way."}
{\"pdfBase64\": \"<pdf base64>\"}
Content-MD5 Header: Content-MD5: 1B2M2Y8AsgTpgAmY7PhCfg==
StringToSign =HTTP-Verb + "\n" + //HTTP-Verb只支持POSTAccept + “\n” + //Accept为application/jsonContent-MD5 + "\n" + //第1步中计算出来的MD5值Content-Type + "\n" + //Content-Type值为application/jsonDate + "\n" + //Date值为GMT时间x-langboat-signature-method + "\n" + // 只支持 HMAC-SHA256x-langboat-signature-nonce + "\n";
StringToSign 示例:
POSTapplication/jsonmZFLkyvTelC5g8XnyQrpOw==application/jsonWed, 20 Jul 2022 13:04:02 GMTHMAC-SHA25610191
action=contractExtraction
stringToSign = headerStringToSign + queryToSign;
Signature = Base64(HMAC-SHA256( AccessSecret, UTF-8-Encoding-Of(stringToSign)))Authorization = AccessKey + ":" + Signature
Signature 示例:po/vsPI0RcvY/eu4bohhxxADHyPj4/rcglLTQEBtHQM=
Authorization 示例: Authorization: 7Bo9ByyiTWRC1Y8KJJQ9cWtNpZLmrgyb:po/vsPI0RcvY/eu4bohhxxADHyPj4/rcglLTQEBtHQM=
QUERY参数 | 是否必须 | 描述 |
---|---|---|
action | 是 | 固定值:contractExtraction |
BODY参数 | 是否必须 | 描述 |
---|---|---|
pdfBase64 | 是 | PDF文件的Base64值 |
请求示例:
POST https://open.langboat.com/?action=contractExtractionContent-Type: application/jsonContent-MD5: 6XC6zMGHuc/2BI4Bx0lKRQ==Date: Sun, 17 Jul 2022 08:07:51 GMTAccept: application/jsonx-langboat-signature-method: HMAC-SHA256x-langboat-signature-nonce: 3588Authorization: aAM9NHSkWY40wkd7EUg6HFuFzJpJPG6E:9bu0dGv/1xGnUzin2HOe2TUa0Frf+5WE4FnNHJxeT6Q={"pdfBase64": "<PDF base64>"}
响应体是一个 JSON 。 data字段:
结构示例:
{"code": 0,"message": "success","requestId": "004700cf3e118d4a274c5564b4044860","data": {"results": [{"key": "合同名称","values": [{"start": 0,"end": 15,"text": "海关2021-2022年出入境预防接种疫苗供货合同","pred": "合同名称","page": 0}]},{"key": "合同编号","values": [{"start": 21,"end": 31,"text": "BJZX-HPV9-2022008","pred": "合同编号","page": 0}]},{"key": "采购人名称","values": [{"start": 36,"end": 50,"text": "重庆国际旅行卫生保健中心(重庆海关口岸门诊部)","pred": "采购人名称","page": 0}]},{"key": "供应商名称","values": [{"start": 55,"end": 61,"text": "重庆智飞生物制品股份有限公司","pred": "供应商名称","page": 0}]},{"key": "主要标的名称","values": [{"start": 192,"end": 197,"text": "九价人乳头","pred": "主要标的名称","page": 0},{"start": 213,"end": 216,"text": "瘤病毒疫苗","pred": "主要标的名称","page": 0}]},{"key": "主要标的单价","values": [{"start": 211,"end": 212,"text": "1298","pred": "主要标的单价","page": 0}]},{"key": "主要标的数量","values": [{"start": 219,"end": 220,"text": "192支","pred": "主要标的数量","page": 0}]},{"key": "合同金额","values": [{"start": 221,"end": 224,"text": "249216元","pred": "合同金额","page": 0},{"start": 231,"end": 249,"text": "人民币贰拾肆万玖仟贰佰壹拾陆元整249216元","pred": "合同金额","page": 0}]}],"status": 1}}
下面是可能的HTTP响应状态码、业务(错误)编码:
HTTP 状态码 | 业务编码 | 描述 |
---|---|---|
200 | 0 | 返回成功 |
400 | 10400 | 请求异常 |
401 | 10401 | 鉴权失败,核对AccessKey和AccessSecret 是否正确 |
403 | 10403 | 权限不足,查看是否开通服务;或QPS,字符数,次数超过限制 |
422 | 10422 | 参数错误,核对请求参数 |
429 | 10429 | 超过请求限制(QPS,字符数,次数超过限制) |
500 | 10500 | 服务异常 |
错误响应示例
{"code": 10401,"message": "鉴权失败,核对AccessKey和AccessSecret 是否正确","requestId": "33f0057e-f421-fb94-766f-608d837969ca"}
{"code": 10422,"message": "参数错误,核对请求参数, 不支持的action : generateTemplate","requestId": "6cc3f2a9-5fd1-4872-9c59-d6f10ceb1e62"}
{"code": 10403,"message": "权限不足,查看是否开通服务","requestId": "6aa60868-d3f1-a7e5-8fbb-e2f96d65e7b5"}
Java (JDK-11)
import cn.hutool.core.util.URLUtil;import com.fasterxml.jackson.core.JsonProcessingException;import com.fasterxml.jackson.databind.ObjectMapper;import javax.crypto.Mac;import javax.crypto.spec.SecretKeySpec;import java.io.*;import java.net.HttpURLConnection;import java.net.URL;import java.net.URLConnection;import java.security.MessageDigest;import java.text.SimpleDateFormat;import java.util.*;public class LangboatOpenClient {private final String accessKey;private final String accessSecret;private final String url;private static final Base64.Encoder ENCODER_64 = Base64.getEncoder();public LangboatOpenClient(String accessKey, String accessSecret) {this.accessKey = accessKey;this.accessSecret = accessSecret;this.url = "https://open.langboat.com";}public LangboatOpenClient(String accessKey, String accessSecret, String url) {this.accessKey = accessKey;this.accessSecret = accessSecret;this.url = url;}/** 计算 MD5 + Base64*/private String MD5Base64(String s) {if (s == null)return null;String encodeStr = "";byte[] utfBytes = s.getBytes();MessageDigest mdTemp;try {mdTemp = MessageDigest.getInstance("MD5");mdTemp.update(utfBytes);byte[] md5Bytes = mdTemp.digest();encodeStr = Base64.getEncoder().encodeToString(md5Bytes);} catch (Exception e) {throw new Error("Failed to generate MD5 : " + e.getMessage());}return encodeStr;}/** 计算 HMAC-SHA256 + Base64 编码*/private String HMACSha256Base64(String data, String key) {String result;try {SecretKeySpec signingKey = new SecretKeySpec(key.getBytes(), "HmacSHA256");Mac mac = Mac.getInstance("HmacSHA256");mac.init(signingKey);byte[] rawHmac = mac.doFinal(data.getBytes());result = Base64.getEncoder().encodeToString(rawHmac);} catch (Exception e) {throw new Error("Failed to generate HMAC : " + e.getMessage());}return result;}/** 获取时间*/private String toGMTString(Date date) {SimpleDateFormat df = new SimpleDateFormat("E, dd MMM yyyy HH:mm:ss z", Locale.CHINA);df.setTimeZone(new java.util.SimpleTimeZone(0, "GMT"));return df.format(date);}public Object inference(Map<String, String> queries, Map<String, Object> data) {PrintWriter out = null;BufferedReader in = null;StringBuilder result = new StringBuilder();try {StringBuilder queriesStr = new StringBuilder();queries.forEach((k, v) -> queriesStr.append("&").append(k).append("=").append(URLUtil.encode(v)));queriesStr.setCharAt(0, '?');URL openUrl = new URL(this.url +queriesStr);String body = new ObjectMapper().writeValueAsString(data);String method = "POST";String accept = "application/json";String contentType = "application/json";String date = toGMTString(new Date());// 1.对body做MD5+BASE64加密String bodyMd5 = MD5Base64(body);String nonce = "" + (int) (Math.random() * 65535);String headerToSign = method + "\n" + accept + "\n" + bodyMd5 + "\n"+ contentType + "\n" + date + "\n"+ "HMAC-SHA256\n"+ nonce + "\n";// 2.计算 queryToSignList<String> queriesList = new ArrayList<>();queries.forEach((k, v) -> queriesList.add(k + "=" + v));Collections.sort(queriesList);String queryToSign = String.join("&", queriesList);// 3.计算 stringToSignString stringToSign = headerToSign + queryToSign;// 4.计算 HMAC-SHA256 + Base64String signature = HMACSha256Base64(stringToSign, this.accessSecret);// 5.得到 authorization header 值String authorization = this.accessKey + ":" + signature;URLConnection conn = openUrl.openConnection();conn.setRequestProperty("Accept", accept);conn.setRequestProperty("Content-Type", contentType);conn.setRequestProperty("Content-MD5", bodyMd5);conn.setRequestProperty("Date", date);conn.setRequestProperty("Authorization", authorization);conn.setRequestProperty("x-langboat-signature-nonce", nonce);conn.setRequestProperty("x-langboat-signature-method", "HMAC-SHA256");// POSTconn.setDoOutput(true);conn.setDoInput(true);out = new PrintWriter(conn.getOutputStream());// 发送请求参数out.print(body);// flush输出流的缓冲out.flush();// 定义BufferedReader输入流来读取URL的响应InputStream is;HttpURLConnection httpConn = (HttpURLConnection) conn;if (httpConn.getResponseCode() == 200) {is = httpConn.getInputStream();} else {is = httpConn.getErrorStream();}in = new BufferedReader(new InputStreamReader(is));String line;while ((line = in.readLine()) != null) {result.append(line);}} catch (IOException e) {e.printStackTrace();} finally {try {if (out != null) {out.close();}if (in != null) {in.close();}} catch (IOException ex) {ex.printStackTrace();}}return result.toString();}/*** PDF 转 Base64字符串* @param file 需要转Base64的文件* @return Base64 字符串*/public String fileToBase64Str(File file) throws IOException {String base64Str = null;FileInputStream fin = null;BufferedInputStream bin = null;ByteArrayOutputStream baos = null;BufferedOutputStream bout = null;try {fin = new FileInputStream(file);bin = new BufferedInputStream(fin);baos = new ByteArrayOutputStream();bout = new BufferedOutputStream(baos);// iobyte[] buffer = new byte[1024];int len = bin.read(buffer);while (len != -1) {bout.write(buffer, 0, len);len = bin.read(buffer);}// 刷新此输出流,强制写出所有缓冲的输出字节bout.flush();byte[] bytes = baos.toByteArray();// Base64字符编码base64Str = ENCODER_64.encodeToString(bytes).trim();} catch (IOException e) {e.getMessage();} finally {try {fin.close();bin.close();bout.close();} catch (IOException e) {e.getMessage();}}return base64Str;}public static void main(String[] args) throws JsonProcessingException {LangboatOpenClient client = new LangboatOpenClient("<Your Access Key>", "<Your Access Secret>");// 合同信息抽取File filePath = new File("/Users/admin/Downloads/test.pdf");String pdfBase64 = "";try {pdfBase64 = client.fileToBase64Str(filePath);} catch (IOException e) {e.getMessage();}Map<String, String> queries = Map.of("action", "contractExtraction");Map<String, Object> data = Map.of("pdfBase64", pdfBase64);Object o = client.inference(queries, data);System.out.println(o);}}
Python (>=3.6)
# -*- coding: utf-8 -*-import base64import datetimeimport hashlibimport hmacimport jsonimport randomimport requestsclass LangboatOpenClient:"""澜舟开放平台客户端"""def __init__(self,access_key: str,access_secret: str,url: str = "https://open.langboat.com"):self.access_key = access_keyself.access_secret = access_secretself.url = urldef _build_header(self, query: dict, data: dict) -> dict:accept = "application/json"# 1. body MD5 加密content_md5 = base64.b64encode(hashlib.md5(json.dumps(data).encode("utf-8")).digest()).decode()content_type = "application/json"gmt_format = '%a, %d %b %Y %H:%M:%S GMT'date = datetime.datetime.utcnow().strftime(gmt_format)signature_method = "HMAC-SHA256"signature_nonce = str(random.randint(0, 65535))header_string = f"POST\n{accept}\n{content_md5}\n{content_type}\n" \f"{date}\n{signature_method}\n{signature_nonce}\n"# 2. 计算 queryToSignqueries_str = []for k, v in sorted(query.items(), key=lambda item: item[0]):if isinstance(v, list):for i in v:queries_str.append(f"{k}={i}")else:queries_str.append(f"{k}={v}")queries_string = '&'.join(queries_str)# 3.计算 stringToSignsign_string = header_string + queries_string# 4.计算 HMAC-SHA256 + Base64secret_bytes = self.access_secret.encode("utf-8")# 5.计算签名signature = base64.b64encode(hmac.new(secret_bytes, sign_string.encode("utf-8"), hashlib.sha256).digest()).decode()res = {"Content-Type": content_type,"Content-MD5": content_md5,"Date": date,"Accept": accept,"X-Langboat-Signature-Method": signature_method,"X-Langboat-Signature-Nonce": signature_nonce,"Authorization": f"{self.access_key}:{signature}"}return resdef inference(self, queries: dict, data: dict) -> (int, dict):"""调用:param queries: query 参数:param data: request body 数据:return: response status, response body to json"""headers = self._build_header(queries, data)response = requests.post(url=self.url, headers=headers, params=queries, json=data)return response.status_code, response.json()if __name__ == '__main__':_access_key = '<Your access_key>'_access_secret = '<Your access_secret>'client = LangboatOpenClient(access_key=_access_key,access_secret=_access_secret)# 合同信息抽取_queries = {"action": "contractExtraction",}path = "/Users/admin/Downloads/test.pdf"with open(path, "rb") as pdf_file:pdf_base64 = base64.b64encode(pdf_file.read())pdf_base64 = str(pdf_base64, 'utf8')_data = {"pdfBase64": str(pdf_base64),}status_code, result = client.inference(_queries, _data)print("response status:", status_code)print("response json:", json.dumps(result, ensure_ascii=False, indent=2))
Go (>=1.14)
package mainimport ("crypto/hmac""crypto/md5""crypto/sha256""encoding/base64""encoding/json""fmt""io/ioutil""log""math/rand""net/http""net/url""os""sort""strings""time")func main() {client := OpenClient{baseURL: "https://open.langboat.com",accessKey: "Your_Access_Key",accessSecret: "Your_Access_Secret",}file, err := os.Open("/Users/admin/Downloads/test.pdf")if err != nil {log.Fatal("fail to open file")}fileContent, err := ioutil.ReadAll(file)if err != nil {log.Fatal("fail to read file")}pdfBase64 := base64.StdEncoding.EncodeToString(fileContent)// 合同信息抽取queries := map[string]string{"action": "contractExtraction",}data := map[string]interface{}{"pdfBase64": pdfBase64,}resp := client.Inference(queries, data)response, ok := resp.(*http.Response)if !ok {log.Fatal("fail to convert response")}body, err := ioutil.ReadAll(response.Body)if err != nil {log.Fatal("fail to read response body")}log.Println(string(body))}type OpenClient struct {baseURL stringaccessKey stringaccessSecret string}// Inference 调用服务。queries: query 参数;data: request body 数据func (c *OpenClient) Inference(queries map[string]string, data map[string]interface{}) interface{} {var queriesStr = ""var first = truefor k, v := range queries {if first {queriesStr += "?" + k + "=" + url.QueryEscape(v)first = false} else {queriesStr += "&" + k + "=" + url.QueryEscape(v)}}dataJson, err := json.Marshal(data)if err != nil {log.Fatal(err.Error())}targetURL := c.baseURL + queriesStrclient := &http.Client{Timeout: 15 * time.Second,}// 构造headervar (payload = strings.NewReader(string(dataJson))date = time.Now().UTC().Format(http.TimeFormat)nonce = fmt.Sprint(10000 + rand.Intn(89999)))// 签名signParam := SignParam{Body: string(dataJson),Query: queriesStr[1:],DateGMT: date,Nonce: nonce,}contentMD5, signature := GenSignature(signParam, c.accessSecret)// 设置headerheaders := map[string]string{"Authorization": c.accessKey + ":" + signature,"Content-Type": "application/json","Accept": "application/json","Date": date,"Content-MD5": contentMD5,"x-langboat-signature-method": "HMAC-SHA256","x-langboat-signature-nonce": nonce,}req, _ := http.NewRequest("POST", targetURL, payload)for k, v := range headers {req.Header.Add(k, v)}resp, err := client.Do(req)if err != nil {log.Println(err.Error())}return resp}// SignParam 生成签名需要的参数type SignParam struct {Body string // body数据Query string // 原始queryDateGMT string // GTM时间Nonce string // 随机数}func getMD5(str string) []byte {h := md5.New()h.Write([]byte(str))return h.Sum(nil)}func hmacSha256(data string, secret string) []byte {h := hmac.New(sha256.New, []byte(secret))h.Write([]byte(data))return h.Sum(nil)}func resortQuery(src string) string {queries, _ := url.ParseQuery(src)keys := make([]string, 0)for k := range queries {keys = append(keys, k)}sort.Strings(keys)newQuery := url.Values{}for _, k := range keys {for _, value := range queries[k] {newQuery.Add(k, value)}}return newQuery.Encode()}// GenSignature 生成签名func GenSignature(src SignParam, accessSecret string) (string, string) {// 计算body的md5值md5str := getMD5(src.Body)// base64后得到contentMD5contentMD5 := base64.StdEncoding.EncodeToString(md5str)// query解析,并按照字典序重新排列query := resortQuery(src.Query)query, _ = url.QueryUnescape(query)// 需要做签名的字符串结构stringToSign := `POSTapplication/json%sapplication/json%sHMAC-SHA256%s%s`stringToSign = fmt.Sprintf(stringToSign, contentMD5, src.DateGMT, src.Nonce, query)hmac256 := hmacSha256(stringToSign, accessSecret)signature := base64.StdEncoding.EncodeToString(hmac256)return contentMD5, signature}