JNI中String为什么需要特殊处理

在ndk开发中,c或者c++使用java层传递的字符串通常需要经过GetStringChars、ReleaseStringChars等配套函数对jstring对象进行处理。同样c或者c++向java传递string也需要NewString等配套函数生成jstring对象。在ndk的c代码中使用const char *表示字符串类型,Java层则是通过String,jstring则是两者相互转换的中间产物。为什么需要做一次转换?经过分析,还是两者对字符串的定义和编码格式引起的。

1 NewString函数

使用NewString函数可以将char *类型的字符串转化为jstring,其代码如下。其中调用了dvmCreateStringFromUnicode生成了一个StringObject对象,把调用addLocalReference之后的结果转换为jstring类型。

 static jstring NewString(JNIEnv* env, const jchar* unicodeChars, jsize len) {
    ScopedJniThreadState ts(env);
    StringObject* jstr = dvmCreateStringFromUnicode(unicodeChars, len);
    if (jstr == NULL) {
        return NULL;
    }
    dvmReleaseTrackedAlloc((Object*) jstr, NULL);
    return (jstring) addLocalReference(ts.self(), (Object*) jstr);
}

看看dvmCreateStringFromUnicode中做了什么,通过makeStringObject传入长度和ArrayObject对象指针的指针生成StringObject对象,然后通过memcpy将字符串内容拷贝,计算hash。

StringObject* dvmCreateStringFromUnicode(const u2* unichars, int len)
{
  /* We allow a NULL pointer if the length is zero. */
    assert(len == 0 || unichars != NULL);

    ArrayObject* chars;
    StringObject* newObj = makeStringObject(len, &chars);
    if (newObj == NULL) {
        return NULL;
    }

    if (len > 0) memcpy(chars->contents, unichars, len * sizeof(u2));

    u4 hashCode = computeUtf16Hash((u2*)(void*)chars->contents, len);
    dvmSetFieldInt((Object*)newObj, STRING_FIELDOFF_HASHCODE, hashCode);

    return newObj;
}

在makeStringObject中从dvm上申请了两个对象,分别为Object和ArrayObject。将长度设置到对应的偏移STRING_FIELDOFF_COUNT上,将内容设置到对应的偏移STRING_FIELDOFF_VALUE上。然后返回Object对象。

  static StringObject* makeStringObject(u4 charsLength, ArrayObject** pChars)
{
    /*
     * The String class should have already gotten found (but not
     * necessarily initialized) before making it here. We assert it
     * explicitly, since historically speaking, we have had bugs with
     * regard to when the class String gets set up. The assert helps
     * make any regressions easier to diagnose.
     */
    assert(gDvm.classJavaLangString != NULL);

    if (!dvmIsClassInitialized(gDvm.classJavaLangString)) {
        /* Perform first-time use initialization of the class. */
        if (!dvmInitClass(gDvm.classJavaLangString)) {
            ALOGE("FATAL: Could not initialize class String");
            dvmAbort();
        }
    }

    Object* result = dvmAllocObject(gDvm.classJavaLangString, ALLOC_DEFAULT);
    if (result == NULL) {
        return NULL;
    }

    ArrayObject* chars = dvmAllocPrimitiveArray('C', charsLength, ALLOC_DEFAULT);
    if (chars == NULL) {
        dvmReleaseTrackedAlloc(result, NULL);
        return NULL;
    }

    dvmSetFieldInt(result, STRING_FIELDOFF_COUNT, charsLength);
    dvmSetFieldObject(result, STRING_FIELDOFF_VALUE, (Object*) chars);
    dvmReleaseTrackedAlloc((Object*) chars, NULL);
    /* Leave offset and hashCode set to zero. */

    *pChars = chars;
    return (StringObject*) result;
}

可以看到从c或者c++层次向java传递对象是通过调用一系列函数将char *类型转换为jstring对象。然后可以直接向上传递。从jstring到char *则是相反的过程。

2 GetStringUTFChars函数

GetStringUTFChars所做的事情和NewString相反,内部调用dvmCreateCstrFromString完成java string到c string的转换。

static const char* GetStringUTFChars(JNIEnv* env, jstring jstr, jboolean* isCopy) {
  ScopedJniThreadState ts(env);
  if (jstr == NULL) {
      /* this shouldn't happen; throw NPE? */
      return NULL;
  }
  if (isCopy != NULL) {
      *isCopy = JNI_TRUE;
  }
  StringObject* strObj = (StringObject*) dvmDecodeIndirectRef(ts.self(), jstr);
  char* newStr = dvmCreateCstrFromString(strObj);
  if (newStr == NULL) {
      /* assume memory failure */
      dvmThrowOutOfMemoryError("native heap string alloc failed");
  }
  return newStr;
}

dvmCreateCstrFromString则是dvmCreateStringFromUnicode的逆过程,读出从dvmCreateCstrFromString在不同偏移设置的值,然后在堆上申请空间。这里比较特殊的是做了一个utf-16到utf-8的转换。可见java层默认使用utf-16编码。

  char* dvmCreateCstrFromString(const StringObject* jstr)
{
    assert(gDvm.classJavaLangString != NULL);
    if (jstr == NULL) {
        return NULL;
    }

    int len = dvmGetFieldInt(jstr, STRING_FIELDOFF_COUNT);
    int offset = dvmGetFieldInt(jstr, STRING_FIELDOFF_OFFSET);
    ArrayObject* chars =
            (ArrayObject*) dvmGetFieldObject(jstr, STRING_FIELDOFF_VALUE);
    const u2* data = (const u2*)(void*)chars->contents + offset;
    assert(offset + len <= (int) chars->length);

    int byteLen = utf16_utf8ByteLen(data, len);
    char* newStr = (char*) malloc(byteLen+1);
    if (newStr == NULL) {
        return NULL;
    }
    convertUtf16ToUtf8(newStr, data, len);

    return newStr;
}

由此可见,为什么要提供GetStringUTFChars、NewString的辅助函数处理字符串,最根本的原因还是因为字符串在内存中的表示形态和编码格式的差异。