Limited Results When trying to Access Delta Shared Tables from C#
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-26-2024 11:59 AM
This C# code only seems to return about 8 rows when it should return 100:
// Per Databricks AI, you must run the following command if you don't
// want to get an error about delta.enableDeltionVectors from C#:
// ALTER TABLE [myCatalog].setest.billboard_hot_100 SET TBLPROPERTIES (delta.enableDeletionVectors = false);
//
// This code was mostly written by Databricks AI but some other AI tools
// were used.
using System.Net.Http.Headers;
class Program
{
private static readonly string bearerToken = "[myBearerToken]";
private static readonly string endpoint = "https://[myRegion].azuredatabricks.net/api/2.0/delta-sharing/metastores/[myGUID]/shares/seary_billboard_100_test_share/schemas/setest/tables/billboard_hot_100/query";
static async Task Main(string[] args)
{
using (HttpClient client = new HttpClient())
{
client.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", bearerToken);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
var content = new StringContent("{\"responseFormat\": \"delta\"}", System.Text.Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync(endpoint, content);
if (response.IsSuccessStatusCode)
{
string responseData = await response.Content.ReadAsStringAsync();
Console.WriteLine("Response Data:");
Console.WriteLine(responseData);
File.WriteAllText("c:\\Users\\seary\\Downloads\\billBoard100-deltaShareResults.json", responseData);
}
else
{
Console.WriteLine($"Error: {response.StatusCode}");
string errorData = await response.Content.ReadAsStringAsync();
Console.WriteLine("Error Data:");
Console.WriteLine(errorData);
}
}
}
}
Also, the returned JSON seems slightly corrupt to me. How can I get the above code to return all 100 rows from
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-27-2024 06:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2024 02:22 PM
@gchandra you have a really good point.
If I Remember Correctly (IIRC), C# can get to the data I need using a service account and Personal Access Token (PAT) against the SQL Statement Execution API. I don't think I've tried ODBC yet though.
My team specifically wants me to use Delta Sharing (in this case) for two different reasons:
- Delta Sharing does not require us to spin a cluster up to get a result.
- Delta Sharing is easier to use with customers outside our organization.
It would be preferable if I could access Delta Sharing libraries directly from C# through a library or the REST API mentioned in my original post, but right now, I don't think that works so I've experimented with other means.
I managed to trigger Python code to call the Delta Sharing libraries using C#'s System.Diagnostics.Process. This works well for simple examples, but it's clunky and I'm concerned it might break in complex situations. On the other hand, I had a coworker suggest I also try calling the Delta Sharing libraries using Python.NET. Unfortunately, the latest version of Python.NET seems a little buggy to me and while it does appear to be getting back the correct data, there seem to be some "security issues" that prevent the processes it launches from terminating properly.
Yes, ODBC (or the SQL Statement Execution API) might be how I would personally work around these problems, but I'm not sure my team agrees with me. From a team perspective, it would be useful if Databricks could support a native C# library for Delta Sharing. It's currently a lot of unnecessary work to use C# to pull data from Databricks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 01:55 PM
This is very frustrating. The documentation for the Delta Sharing REST API is very bad. When I run a query, I get back a bunch of meta data describing the Parquet files instead of the actual rows of data. The Delta Sharing REST API is a huge hassle to use from C#. For this reason, many of my team members have resolved to simply use Python which unfortunately is not as useful a language as C# IMO.

