- 1495 Views
- 0 replies
- 0 kudos
Dear all,
The Spark JDBC driver (SparkJDBC42.jar) is unable to capture certain information from the below table structure:
1. table level comment
2. the TBLPROPERTIES key-value pair information
3. PARTITION BY information
However, it captures the co...
- 1495 Views
- 0 replies
- 0 kudos
- 2489 Views
- 0 replies
- 0 kudos
I'm using the Databricks autoloader to incrementally load a series of csv files on s3 which I update with an API. My tyipcal work process is to update only the latest year file each night. But, there are ocassions where previous years also get update...
- 2489 Views
- 0 replies
- 0 kudos
- 6724 Views
- 5 replies
- 0 kudos
Following the instruction on Week 1 > The Databricks Environment, it is supposed to create a new cluster. However, the cluster is not starting or able to attached notebook and due that I can not continue the tasks/assignments.
related documents
not...
- 6724 Views
- 5 replies
- 0 kudos
Latest Reply
Hey there, creating a cluster at the Community Edition is showing the same problem as last week. This time, I'm not getting any error coz the process of creating it is taking forever. Any suggestion, @databricks?
4 More Replies
by
vas610
• New Contributor III
- 5173 Views
- 5 replies
- 0 kudos
I'm getting the following error when I'm trying to load a h2o model using mlflow for prediction
Error:
Error
Job with key $03017f00000132d4ffffffff$_990da74b0db027b33cc49d1d90934149 failed with an exception: java.lang.IllegalArgumentException:...
- 5173 Views
- 5 replies
- 0 kudos
Latest Reply
Dan_Z
Databricks Employee
I ran this in Databricks and it worked with no issues. I suggest you make sure your wget path is correct, because the one you posted downloads HTML, not the raw csv. That may cause the problem.
%sh
wget https://raw.githubusercontent.com/mlflow/mlflo...
4 More Replies
- 1771 Views
- 0 replies
- 0 kudos
I'm trying to create a dashboard in Databricks SQL, parameterized by table name. We have a metadata table which contains the names of all the eligible tables, and we use it to populate a drop-down box for the dashboard. This is a simplified version ...
- 1771 Views
- 0 replies
- 0 kudos
- 1483 Views
- 0 replies
- 0 kudos
Not sure whether better do ask this in an Azure or Spark subject, but I thought I might get responses appropriate to our use cases here.
We have Azure Databricks set up and working, and not had any problems following along the tutorials, but I don't...
- 1483 Views
- 0 replies
- 0 kudos
- 1180 Views
- 0 replies
- 0 kudos
Hey guys, I am looking to create a real-time analytics application and I am pretty new to Data engineering. Any advice here would be appreciated. So I have been l appvalleyooking into spark streaming for my transformation process, so th tutuappe ove...
- 1180 Views
- 0 replies
- 0 kudos
by
satya
• New Contributor
- 90313 Views
- 10 replies
- 1 kudos
like in pandas I usually do df['columnname'].unique()
- 90313 Views
- 10 replies
- 1 kudos
Latest Reply
Hi, this worked for me.
distinct_ids = [x.id for x in data.select('id').distinct().collect()]
9 More Replies
- 1417 Views
- 1 replies
- 0 kudos
Hi guys, im new using databricks and i have a challenge in my new work.
routerlogin I need to access to one the database (the database is on DBFS) result of some ETLS trough any service, can be ODBC or by some API. I need to connect there because I...
- 1417 Views
- 1 replies
- 0 kudos
Latest Reply
Dan_Z
Databricks Employee
Use the Simba ODBC connector: https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html
by
Skier
• New Contributor
- 4335 Views
- 1 replies
- 1 kudos
I have been trying to create a new cluster to use and multiple attempts have gotten stuck in pending: "Finding instances for new nodes, acquiring more instances if necessary" until they time out. Up to today I have had no problems creating clusters ...
- 4335 Views
- 1 replies
- 1 kudos
Latest Reply
Dan_Z
Databricks Employee
This is typically a cloud provider issue. You can file a support ticket if the issue persists.
- 1993 Views
- 1 replies
- 0 kudos
I’m trying to use LSH approxSimilarityJoin on a dataset with ~25k 300-d vectors of floats. It gets stuck and eventually fails with ’Slave lost’ error. The size of cluster and memory are likely not a problem, the failure happens even with 16 nodes, 1...
- 1993 Views
- 1 replies
- 0 kudos
Latest Reply
Dan_Z
Databricks Employee
Use a PandasUDF with Arrow enabled. They are improved in Spark 3, but you can use them in Spark 2.4.5.
- 3860 Views
- 1 replies
- 0 kudos
Hello people,I'm trying to build a facial recognition application, and I have a working API, that takes in an image of a face and spits out a vector that encodes it. I need to run this on a million faces, store them in a db and when the system goes o...
- 3860 Views
- 1 replies
- 0 kudos
Latest Reply
Dan_Z
Databricks Employee
You could do this with Spark storing in parquet/Delta. For each face you would write out a record with a column for metadata, a column for the encoded vector array, and other columns for hashing. You could use a PandasUDF to do the distributed dista...
- 1653 Views
- 1 replies
- 0 kudos
Can we use databricks delta lake as a data warehouse kind of thing where business analysts can explore data according to their needs ?
Delta lake provides following features which I think supports this idea
support to sql syntaxprovide ACID guarante...
- 1653 Views
- 1 replies
- 0 kudos
Latest Reply
Dan_Z
Databricks Employee
@austiamel47, Yes, you can certainly do this. Delta Lake is designed to be competitive with traditional data warehouses and with some tuning can power low-latency dashboards.https://databricks.com/glossary/data-lakehouse
- 3394 Views
- 1 replies
- 2 kudos
- 3394 Views
- 1 replies
- 2 kudos
Latest Reply
Yes. A new workspace would need to be deployed because Azure allows people to change the vnet cidr but it requires you to remove all the vnet resources first. This includes the Databricks deployment, therefore, this is an Azure restriction on how VNE...
- 1918 Views
- 0 replies
- 1 kudos
Hi, we need create an interactive map from ipyleaflet library and this use jupyterlab extensionjupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-leafletWe achieved to show with displayHTML but we lose the widget events
- 1918 Views
- 0 replies
- 1 kudos