cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Greg
by New Contributor III
  • 1447 Views
  • 1 replies
  • 4 kudos

How to reduce storage space consumed by delta with many updates

I have 1 delta table that I continuously append events into, and a 2nd delta table that I continuously merge into (streamed from the 1st table) that has unique ID's where properties are updated from the events (An ID represents a unique thing that ge...

  • 1447 Views
  • 1 replies
  • 4 kudos
Latest Reply
Jb11
New Contributor II
  • 4 kudos

Did you already solved this problem?

  • 4 kudos
vanessafvg
by New Contributor III
  • 1712 Views
  • 1 replies
  • 3 kudos

Extracting data from excel in datalake storage using openpyxl

i am trying to extract some data into databricks but tripping all over openpyxl, newish user of databricks..from openpyxl import load_workbookdirectory_id="hidden"scope="hidden"client_id="hidden"service_credential_key="hidden"container_name="hidden"s...

  • 1712 Views
  • 1 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Hi @Vanessa Van Gelder​ Great to meet you, and thanks for your question! Let's see if your peers in the community have an answer to your question. Thanks.

  • 3 kudos
sintsan
by New Contributor II
  • 1184 Views
  • 1 replies
  • 1 kudos

Resolved! spark.sparkContext.setCheckpointDir - External Azure Storage

Is it possible to direct spark.sparkContext.setCheckpointDir to an external Azure Storage Container location (instead of DBFS) & if so how, there's very little documentation on that.

  • 1184 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

yes,the directory must be an HDFS path if running on a cluster.All you need to do is provide the correct path.

  • 1 kudos
tototox
by New Contributor III
  • 2508 Views
  • 3 replies
  • 2 kudos

dbutils.fs.ls overlaps with managed storage error

I created a schema with that route as a managed location.(abfss://~~@~~.dfs.core.windows.net/dejeong/)However, I dropped shcema with the cascade option, and also entered the azure portal and deleted the path directly. and made it again(abfss://~~@~~....

  • 2508 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @jin park​ Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your...

  • 2 kudos
2 More Replies
Akanksha533
by New Contributor
  • 2701 Views
  • 4 replies
  • 3 kudos
  • 2701 Views
  • 4 replies
  • 3 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 3 kudos

Hi @Akanksha Kumari​ , We haven’t heard from you on the last response from @Mark Ferguson​ and @Hubert Dudek​, and I was checking back to see if their suggestions helped you. Or else, If you have any solution, please do share that with the community ...

  • 3 kudos
3 More Replies
William_Scardua
by Valued Contributor
  • 4696 Views
  • 4 replies
  • 4 kudos

How do you structure and storage you medallion architecture ?

Hi guys,How you suggestion about how to create a medalion archeterure ? how many and what datalake zones, how store data, how databases used to store, anuthing I think that zones:1.landing zone, file storage in /landing_zone - databricks database.bro...

  • 4696 Views
  • 4 replies
  • 4 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 4 kudos

Hi @William Scardua​ â€‹, We haven’t heard from you since the last response from @Jose Gonzalez​ , and I was checking back to see if you have a resolution yet. If you have any solution, please share it with the community as it can be helpful to others....

  • 4 kudos
3 More Replies
bchaubey
by Contributor II
  • 1480 Views
  • 1 replies
  • 0 kudos
  • 1480 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16764241763
Honored Contributor
  • 0 kudos

@Bhagwan Chaubey​ May be you can give this a try, if this is a Blob Storage account.https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python?tabs=environment-variable-windowsFor Datalake storage, please try belowhttps://do...

  • 0 kudos
al_joe
by Contributor
  • 3225 Views
  • 3 replies
  • 1 kudos

Resolved! Where / how does DBFS store files?

I tried to use %fs head to print the contents of a CSV file used in a training%fs head "/mnt/path/file.csv"but got an error saying cannot head a directory!?Then I did %fs ls on the same CSV file and got a list of 4 files under a directory named as a ...

screenshot image
  • 3225 Views
  • 3 replies
  • 1 kudos
Latest Reply
User16753725182
Contributor III
  • 1 kudos

Hi @Al Jo​ , are you still seeing the error while printing the contents of te CSV file?

  • 1 kudos
2 More Replies
Hubert-Dudek
by Esteemed Contributor III
  • 1277 Views
  • 1 replies
  • 15 kudos

Resolved! Write to Azure Delta Lake - optimization request

Databricks/Delta team could optimize some commands which writes to Azure Blob Storage as Azure display that message:

image
  • 1277 Views
  • 1 replies
  • 15 kudos
Latest Reply
Anonymous
Not applicable
  • 15 kudos

Hey there. Thank you for your suggestion. I'll pass this up to the team.

  • 15 kudos
User16857281869
by New Contributor II
  • 1946 Views
  • 1 replies
  • 1 kudos

Resolved! Why do I see a cost explosion in my blob storage account (DBFS storage, blob storage, ...) for my structures streaming job?

Its usually one or more of the following reasons:1) If you are streaming into a table, you should be using .Trigger option to specify the frequency of checkpointing. Otherwise, the job will call the storage API every 10ms to log the transaction data...

  • 1946 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

please mount cheaper storage (LRS) to custom mount and set there checkpoints,please clear data regularly,if you are using forEac/forEatchBatchh in stream it will save every dataframe on dbfs,please remember not to use display() in production,if on th...

  • 1 kudos
Greg_Galloway
by New Contributor III
  • 6285 Views
  • 5 replies
  • 3 kudos

Resolved! Use of private endpoints for storage in workspace with EnableNoPublicIP=Yes and VnetInjection=No

We know that Databricks with VNET injection (our own VNET) allows is to connect to ADLS Gen2 over private endpoints. This is what we typically do.We have a customer who created Databricks with EnableNoPublicIP=Yes (secure cluster connectivity) and Vn...

  • 6285 Views
  • 5 replies
  • 3 kudos
Latest Reply
User16871418122
Contributor III
  • 3 kudos

Managed VNET is locked and allows very limited config tuning like VNET peering that too facilitated and needs to be done from Databricks UI. If they want more control on VNET they need to migrate to VNET injected workspace.

  • 3 kudos
4 More Replies
Labels