cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

davidmory38
by New Contributor
  • 1632 Views
  • 1 replies
  • 0 kudos

Best Database for facial recognition/ Fast comparisons of Euclidean distance

Hello people,I'm trying to build a facial recognition application, and I have a working API, that takes in an image of a face and spits out a vector that encodes it. I need to run this on a million faces, store them in a db and when the system goes o...

  • 1632 Views
  • 1 replies
  • 0 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 0 kudos

You could do this with Spark storing in parquet/Delta. For each face you would write out a record with a column for metadata, a column for the encoded vector array, and other columns for hashing. You could use a PandasUDF to do the distributed dista...

  • 0 kudos
austiamel47
by New Contributor
  • 883 Views
  • 1 replies
  • 0 kudos

Databricks delta lake

Can we use databricks delta lake as a data warehouse kind of thing where business analysts can explore data according to their needs ? Delta lake provides following features which I think supports this idea support to sql syntaxprovide ACID guarante...

  • 883 Views
  • 1 replies
  • 0 kudos
Latest Reply
Dan_Z
Honored Contributor
  • 0 kudos

@austiamel47, Yes, you can certainly do this. Delta Lake is designed to be competitive with traditional data warehouses and with some tuning can power low-latency dashboards.https://databricks.com/glossary/data-lakehouse

  • 0 kudos
Ryan_Chynoweth
by Esteemed Contributor
  • 1618 Views
  • 1 replies
  • 2 kudos
  • 1618 Views
  • 1 replies
  • 2 kudos
Latest Reply
Ryan_Chynoweth
Esteemed Contributor
  • 2 kudos

Yes. A new workspace would need to be deployed because Azure allows people to change the vnet cidr but it requires you to remove all the vnet resources first. This includes the Databricks deployment, therefore, this is an Azure restriction on how VNE...

  • 2 kudos
wallystart
by New Contributor II
  • 1208 Views
  • 0 replies
  • 1 kudos

Is possible to use jupyter extensions in databricks?

Hi, we need create an interactive map from ipyleaflet library and this use jupyterlab extensionjupyter labextension install @jupyter-widgets/jupyterlab-manager jupyter-leafletWe achieved to show with displayHTML but we lose the widget events

  • 1208 Views
  • 0 replies
  • 1 kudos
jholder
by New Contributor II
  • 1486 Views
  • 2 replies
  • 1 kudos

Cluster Pending

Hello, Relatively new to Databricks and I've been using the Community Edition for a little bit now. I've recently been having more and more issues with my clusters pending until they time out before ever starting up. I've seen a few other posts here...

  • 1486 Views
  • 2 replies
  • 1 kudos
Latest Reply
GustavoRocha
New Contributor III
  • 1 kudos

It seems that it's working now. At least for me...

  • 1 kudos
1 More Replies
daniil_terentye
by New Contributor III
  • 865 Views
  • 1 replies
  • 0 kudos

New task orchestration tool

Hi everybody! Is it possible to use one job cluster for multiple tasks using new task orchestration tool? If so then please tell how. If it's impossible then this new tool looks useless Regards, Daniil.

  • 865 Views
  • 1 replies
  • 0 kudos
Latest Reply
daniil_terentye
New Contributor III
  • 0 kudos

Databricks team, are there any news about it?

  • 0 kudos
Roy
by New Contributor II
  • 50578 Views
  • 5 replies
  • 0 kudos

Resolved! dbutils.notebook.exit() executing from except in try/except block even if there is no error.

I am using Python notebooks as part of a concurrently running workflow with Databricks Runtime 6.1. Within the notebooks I am using try/except blocks to return an error message to the main concurrent notebook if a section of code fails. However I h...

  • 50578 Views
  • 5 replies
  • 0 kudos
Latest Reply
vivekvardhanSha
New Contributor II
  • 0 kudos

You can add a fake except for the notebook.exit inside try blocktry: notebook.run(somenotebook) try: notebook.exit() except Exception as e print("Notebook exited") except: print("Main exception")

  • 0 kudos
4 More Replies
MichaelBlahay
by New Contributor
  • 2464 Views
  • 1 replies
  • 1 kudos

Where is dbfs mounted with community edition?

The the regular version of databricks, the dbfs is mounted at /dbfs. This does not seem to be the case with community edition. I am seeking more details.

  • 2464 Views
  • 1 replies
  • 1 kudos
Latest Reply
AlexandrePetrul
New Contributor II
  • 1 kudos

If you are using DBR 7.x. or newer versions the dbfs is disabled. You have to use dbutils.fs.cp commands as a workaround.

  • 1 kudos
User16752239289
by Valued Contributor
  • 2691 Views
  • 1 replies
  • 2 kudos

Resolved! EC2 instances are not stoped after the cluster terminated

I found that Databricks did not stop and delete ec2 instances of clusters. After the cluster terminate , the ec2 instances are still running.

  • 2691 Views
  • 1 replies
  • 2 kudos
Latest Reply
User16752239289
Valued Contributor
  • 2 kudos

Please make sure the Databricks IAM role does have all required permission mentioned https://docs.databricks.com/administration-guide/account-api/iam-role.html#0-language-Your%C2%A0VPC,%C2%A0defaultMake sure you did not change the EC2 tags especially...

  • 2 kudos
Vu_QuangNguyen
by New Contributor
  • 2644 Views
  • 0 replies
  • 0 kudos

Structured streaming from an overwrite delta path

Hi experts, I need to ingest data from an existing delta path to my own delta lake. The dataflow is as shown in the diagram: Data team reads full snapshot of a database table and overwrite to a delta path. This is done many times per day, but...

0693f000007OoRcAAK
  • 2644 Views
  • 0 replies
  • 0 kudos
RajuNagarajan
by New Contributor
  • 763 Views
  • 0 replies
  • 0 kudos

GroupBy in a multi node environment

I have a group of rows with Information on a nested product calls. example- Trxn1-product1-caller1-local1 Trxn1-Product1-local1-local2 Trxn1-Product1-local2-local3 here’s is a expected calls for a product product1-caller1-local1 Product1-local1-loc...

  • 763 Views
  • 0 replies
  • 0 kudos
User15787040559
by New Contributor III
  • 1207 Views
  • 1 replies
  • 1 kudos

What is the equivalent command for constructing the filepath in Databricks on AWS? filepath = f"{working_dir}/keras_checkpoint_weights.ckpt"

dbutils.fs.mkdirs("/foobar/")See https://docs.databricks.com/data/databricks-file-system.html

  • 1207 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Valued Contributor
  • 1 kudos

To access DBFS via local file APIs, you can try /dbfs/<foobar>https://docs.databricks.com/data/databricks-file-system.html?_ga=2.41953189.1820496821.1627689131-1247613683.1627514237#local-file-apis

  • 1 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels