I've been at this all night, I'm attempting to record myself on my iPhone via expo using expo-av (records speech via iPhone) and upload it to openai's transcriptions endpoint using whisper-1 model.
The file is saved as mp4, I convert it to a base64 string, I have confirmed the base64 content is infact mp4:
base64 to file converting tool
uploading and checking file tool
Here's the react-native code:
const recordingOptions = { android: { extension: ".mp4", outputFormat: Audio.AndroidOutputFormat.MPEG_4, audioEncoder: Audio.AndroidAudioEncoder.AAC, sampleRate: 44100, numberOfChannels: 2, bitRate: 128000, }, ios: { extension: ".mp4", // outputFormat: Audio.IOSOutputFormat.MPEG4AAC, audioQuality: Audio.IOSAudioQuality.HIGH, sampleRate: 44100, numberOfChannels: 2, bitRate: 128000, }, web: { mimeType: "audio/mp4", bitsPerSecond: 128000 * 8, }, };
actual implementation:
const recordingUri = recording.getURI(); const recordingBase64 = await ExpoFileSystem.readAsStringAsync( recordingUri, { encoding: ExpoFileSystem.EncodingType.Base64, } ); const languageCode = "en"; // English console.log(languageCode); console.log(recordingBase64) const buffer = Buffer.from(recordingBase64, "base64") const blob= new Blob([buffer], { type:'audio/mp4' }) const file = new File([blob],'test.mp4', {type:'audio/mp4'}) const formData = new FormData(); formData.append('file',file); formData.append("model", "whisper-1"); const apiUrl = "https://api.openai.com/v1/audio/transcriptions"; const requestOptions = { method: "POST", headers: { Authorization: `Bearer ${OPENAI_API_KEY}`, }, body: formData, }; fetch(apiUrl, requestOptions) .then((response) => response.json()) .then((data) => console.log(data)) .catch((error) => console.log(error));
and every time the response is:
{"error": {"code": null, "message": "Invalid file format. Supported formats: ['m4a', 'mp3', 'webm', 'mp4', 'mpga', 'wav', 'mpeg']", "param": null, "type": "invalid_request_error"}}
Does anyone have any idea what I'm doing wrong?