cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

boyarkaandre
by New Contributor
  • 595 Views
  • 0 replies
  • 0 kudos

Buy ByBit Account

Buy ByBit Accounthello! Are you looking for a bybit account? You can go to our store and buy bybit account here. You can see crypto account in our store!Buy ByBit Account hereBuy ByBit Account here

bybit.jpg
  • 595 Views
  • 0 replies
  • 0 kudos
andyh
by New Contributor
  • 1571 Views
  • 2 replies
  • 0 kudos

Resolved! Job queue for pool limit

I have a cluster pool with a max capacity limit, to make sure we're not burning too extra silicon. We use this for some of our less critical workflow/jobs. They still spend a lot of time idle, but sometimes hit this max capacity limit. Is there a way...

  • 1571 Views
  • 2 replies
  • 0 kudos
Latest Reply
SSundaram
Contributor
  • 0 kudos

Try increasing your max capacity limit and might want to bring down the min number of nodes the job uses.At the job level try configuring retry and time interval between retries. 

  • 0 kudos
1 More Replies
yutaro_ono1_558
by New Contributor II
  • 9388 Views
  • 5 replies
  • 1 kudos

Resolved! How to read data from S3 Access Point by pyspark?

I want to read data from s3 access point.I successfully accessed using boto3 client to data through s3 access point.s3 = boto3.resource('s3')ap = s3.Bucket('arn:aws:s3:[region]:[aws account id]:accesspoint/[S3 Access Point name]')for obj in ap.object...

  • 9388 Views
  • 5 replies
  • 1 kudos
Latest Reply
shrestha-rj
New Contributor II
  • 1 kudos

I'm reaching out to seek assistance as I navigate an issue. Currently, I'm trying to read JSON files from an S3 Multi-Region Access Point using a Databricks notebook. While reading directly from the S3 bucket presents no challenges, I encounter an "j...

  • 1 kudos
4 More Replies
victor-nj-miami
by New Contributor II
  • 8110 Views
  • 2 replies
  • 2 kudos

Resolved! Cannot create a metastore anymore

Hi Community,I am trying to create a metastore for the Unity Catalog, but I am getting an error saying that there is already a metastore in the region, which is not true, because I deleted all the metastores. I used to have one working properly, but ...

  • 8110 Views
  • 2 replies
  • 2 kudos
Latest Reply
karthik_p
Esteemed Contributor
  • 2 kudos

@ashu_sama I see your issue got resolved by clearing or purging revision history, can you mark this as resolved  

  • 2 kudos
1 More Replies
Anonymous
by Not applicable
  • 30722 Views
  • 6 replies
  • 10 kudos

How to connect and extract data from sharepoint using Databricks (AWS) ?

We are using Databricks (on AWS). We need to connect to SharePoint and extract & load data to Databricks Delta table. Any possible solution on this ?

  • 30722 Views
  • 6 replies
  • 10 kudos
Latest Reply
yliu
New Contributor III
  • 10 kudos

Wondering the same.. Can we use Sharepoint REST API to download the file and save to dbfs/external location and read it? 

  • 10 kudos
5 More Replies
Baldur
by New Contributor II
  • 1946 Views
  • 3 replies
  • 1 kudos

Unable to follow H3 Quickstart

Hello,I'm following H3 quickstart(Databricks SQL) tutorial because I want to do point-in-polygon queries on 21k polygons and 95B points. The volume is pushing me towards using H3. In the tutorial, they use geopandas.According to H3 geospatial functio...

Baldur_0-1698768174045.png
Data Engineering
Geopandas
H3
  • 1946 Views
  • 3 replies
  • 1 kudos
Latest Reply
siddhathPanchal
New Contributor III
  • 1 kudos

Hi @Baldur . I hope that above answer solved your problem. If you have any follow up questions, please let us know. If you like the solution, please do not forget to press 'Accept as Solution' button.

  • 1 kudos
2 More Replies
NanthakumarYoga
by New Contributor II
  • 1042 Views
  • 3 replies
  • 1 kudos

Parrallel Processing : Pool with 8 core and Standard Instance with 28GB

Hi Team,Need your inputs here on desiging the pool for our parrallel processingWe are processing around 4 to 5 GB files ( Process having adding a row number, removing header/trailer, adding addition 8 column which calculates over all 104 columns per ...

  • 1042 Views
  • 3 replies
  • 1 kudos
Latest Reply
siddhathPanchal
New Contributor III
  • 1 kudos

Hi Nanthakumar. I also agree with the above solution. If this solution works for you, don't forget to press 'Accept as Solution' button.

  • 1 kudos
2 More Replies
Erik_L
by Contributor II
  • 1627 Views
  • 3 replies
  • 1 kudos

Microbatching incremental updates Delta Live Tables

I need to create a workflow that pulls recent data from a database every two minutes, then transforms that data in various ways, and appends the results to a final table. The problem is that some of these changes _might_ update existing rows in the f...

  • 1627 Views
  • 3 replies
  • 1 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 1 kudos

Hi @Erik_L, As my colleague mentioned, to ensure continuous operation of the Delta Live Tables pipeline compute during Workflow runs, choosing a prolonged Databricks Job over a triggered Databricks Workflow is a reliable strategy. This extended job w...

  • 1 kudos
2 More Replies
dbx_deltaSharin
by New Contributor II
  • 1911 Views
  • 3 replies
  • 1 kudos

Resolved! Open sharing protocol in Datbricks notebook

Hello,I utilize an Azure Databricks notebook to access Delta Sharing tables, employing the open sharing protocol. I've successfully uploaded the 'config.share' file to dbfs. Upon executing the commands:  client = delta_sharing.SharingClient(f"/dbfs/p...

Data Engineering
DELTA SHARING
  • 1911 Views
  • 3 replies
  • 1 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 1 kudos

Hi @dbx_deltaSharin, When querying the individual partitions, the files are being read by using an S3 access point location while it is using the actual S3 name when reading the table as a whole. This information is fetched from the table metadata it...

  • 1 kudos
2 More Replies
dowdark
by New Contributor
  • 1718 Views
  • 4 replies
  • 0 kudos

UPDATE or DELETE with pipeline that needs to reprocess in DLT

i'm currently trying to replicate a existing pipeline that uses standard RDBMS. No experience in DataBricks at allI have about 4-5 tables (much like dimensions) with different events types and I want to my pipeline output a streaming table as final o...

  • 1718 Views
  • 4 replies
  • 0 kudos
Latest Reply
Manisha_Jena
New Contributor III
  • 0 kudos

Hi @dowdark, What is the error that you get when the pipeline tries to update the rows instead of performing an insert? That should give us more info about the problem Please raise an SF case with us with this error and its complete stack trace.

  • 0 kudos
3 More Replies
nbakh
by New Contributor II
  • 9753 Views
  • 4 replies
  • 4 kudos

insert into a table with an identity column fails

i am trying to insert into a table with an identity column using a select query.However, if i include the identity column or ignore the identity column in my insert it throws errors. Is thee a way to insert into select * from a table if the insert t...

  • 9753 Views
  • 4 replies
  • 4 kudos
Latest Reply
karan_singh
New Contributor II
  • 4 kudos

Hi, Specify insert columns as below %sqlINSERT INTO demo_test (product_type, sales)SELECT product_type, sales FROM demo

  • 4 kudos
3 More Replies
pyyplacc
by New Contributor
  • 577 Views
  • 0 replies
  • 0 kudos

Buy Pyypl Account

Buy Pyypl AccountBuy Pyypl account in our store! If you are looking for a pyypl account you can go to our website and buy the account without any problems. You can use the link belowBuy Pyypl Account HereBuy Pyypl Account Here

Pyypl.jpg
  • 577 Views
  • 0 replies
  • 0 kudos
shiv4050
by New Contributor
  • 2997 Views
  • 4 replies
  • 0 kudos

Execute databricks notebook form a python source code.

Hello,I 'm trying to execute databricks notebook form a python source code but getting error.source code below------------------from databricks_api import DatabricksAPI   # Create a Databricks API client api = DatabricksAPI(host='databrick_host', tok...

  • 2997 Views
  • 4 replies
  • 0 kudos
Latest Reply
sewl
New Contributor II
  • 0 kudos

The error you are encountering indicates that there is an issue with establishing a connection to the Databricks host specified in your code. Specifically, the error message "getaddrinfo failed" suggests that the hostname or IP address you provided f...

  • 0 kudos
3 More Replies
dataslicer
by Contributor
  • 7776 Views
  • 7 replies
  • 3 kudos

Resolved! Successfully installed Maven:Coordinates:com.crealytics:spark-excel_2.12:3.2.0_0.16.0 on Azure DBX 9.1 LTS runtime but getting error for missing dependency: org.apache.commons.io.IOUtils.byteArray(I)

I am using Azure DBX 9.1 LTS and successfully installed the following library on the cluster using Maven coordinates: com.crealytics:spark-excel_2.12:3.2.0_0.16.0When I executed the following line:excelSDF = spark.read.format("excel").option("dataAdd...

  • 7776 Views
  • 7 replies
  • 3 kudos
Latest Reply
RamRaju
New Contributor II
  • 3 kudos

Hi @dataslicer  were you able to solve this issue?I am using 9.1 lts databricks version with Spark 3.1.2 and scala 2.12. I have installed com.crealytics:spark-excel-2.12.17-3.1.2_2.12:3.1.2_0.18.1.  It was working fine but now facing same exception a...

  • 3 kudos
6 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels