cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AndLuffman
by New Contributor II
  • 2431 Views
  • 2 replies
  • 1 kudos

QRY Results incorrect but Exported data is OK

I ran a query "Select * from fact_Orders".     This presented a lot of garbage,  The correct column headers, but the contents were extremely random, e.g.  blanks in the key column, VAT rates of 12282384234E-45  . When I export to CSV , it presents fi...

  • 2431 Views
  • 2 replies
  • 1 kudos
romangehrn
by New Contributor II
  • 968 Views
  • 0 replies
  • 0 kudos

speed issue DBR 13+ for R

I got a notebook running on DBR 12.2 with the following R code: install.packages("microbenchmark") install.packages("furrr") library(microbenchmark) library(tidyverse) # example tibble df_test <- tibble(id = 1:100000, street_raw = rep("Bahnhofs...

Data Engineering
DBR 13
performance slow
R
speed error
  • 968 Views
  • 0 replies
  • 0 kudos
sparkrookie
by New Contributor II
  • 2085 Views
  • 1 replies
  • 0 kudos

Structured Streaming Delta Table - Reading and writing from same table

Hi I have a structured streaming job that reads from a delta table "A" and pushes to another delta table "B".A Schema - group_key, id, timestamp, valueB Schema - group_key, watermark_timestamp, derived_valueOne requirement is that i need to get the m...

  • 2085 Views
  • 1 replies
  • 0 kudos
shraddharane
by New Contributor
  • 38721 Views
  • 1 replies
  • 1 kudos

Migrating legacy SSAS cube to databricks

We have SQL database. Database is designed in star schema. We are migrating data from SQL to databricks. There are cubes designed using SSAS. These cubes are used for end users in excel for analysis purpose. We are now looking for solution for:1) Can...

  • 38721 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

Databricks itself does not deliver semantic models like SSAS cubes.  So Databricks cannot migrate them because there is nothing to migrate to.However, there are some options:- use of PowerBI instead of SSAS (there might even be a migrate option?).  W...

  • 1 kudos
140015
by New Contributor III
  • 2857 Views
  • 3 replies
  • 1 kudos

Resolved! Using DLT pipeline with non-incremental data

Hi,I would like to know what you think about using the Delta Live Tables when the source for this pipeline is not incremental. What I mean by that is suppose that the data provider creates for me a new folder with files each time it has update to the...

  • 2857 Views
  • 3 replies
  • 1 kudos
Latest Reply
Joe_Suarez
New Contributor III
  • 1 kudos

When dealing with B2B data building, the process of updating and managing your data can present unique challenges. Since your data updates involve new folders with files and you need to process the entire new folder, the concept of incremental proces...

  • 1 kudos
2 More Replies
GNarain
by New Contributor II
  • 7450 Views
  • 7 replies
  • 4 kudos

Resolved! Is there api call to set "Table access control" workspace config ?

Is there api call to set "Table access control" workspace config ?

  • 7450 Views
  • 7 replies
  • 4 kudos
Latest Reply
SvenPeeters
New Contributor III
  • 4 kudos

Faciing the same issue, tried to fetch the current value via /api/2.0/workspace-conf?keys=enableTableAccessControlUnfortunately this is returning a 400 {    "error_code": "BAD_REQUEST",    "message": "Invalid keys: [\"enableTableAccessControl\"]"}

  • 4 kudos
6 More Replies
Eldar_Dragomir
by New Contributor II
  • 2465 Views
  • 1 replies
  • 2 kudos

Resolved! Reprocessing the data with Auto Loader

Could you please provide me an idea how I can start reprocessing of my data? Imagine I have a folder in adls gen2 "/test" with binaryFiles. They already processed with current pipeline. I want to reprocess the data + continue receive new data. What t...

  • 2465 Views
  • 1 replies
  • 2 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 2 kudos

@Eldar_Dragomir In order to re-process the data, we have to change the checkpoint directory. This will start processing the files from the beginning. You can use cloudFiles.maxFilesPerTrigger, to limit the number of files getting processed per micro-...

  • 2 kudos
anarad429
by New Contributor
  • 1797 Views
  • 1 replies
  • 1 kudos

Resolved! Unity Catalog + Reading variable from external notebook

I am trying to run a notebook which reads some of its variables from and external notebook (I used %run command for that purpose), but it keeps giving me error that these variables are not defined. These sequences of notebooks run perfectly fine on a...

  • 1797 Views
  • 1 replies
  • 1 kudos
Latest Reply
Atanu
Databricks Employee
  • 1 kudos

I think the issue here is the variable is not created until a value is assigned to it. So, you may need to assign a value to get_sql_schema

  • 1 kudos
NathanLaw
by New Contributor III
  • 1202 Views
  • 1 replies
  • 0 kudos

CPU and GPU Elapse Runtimes

I have to 2 questions about elapsed job runtimes. The same Scoring notebook is run 3 times as 3 Jobs.   The jobs are identical, same PetaStorm code, CPU cluster config ( not Spot cluster) and data but have varying elapsed runtimes.   Elapsed runtimes...

  • 1202 Views
  • 1 replies
  • 0 kudos
Latest Reply
shyam_9
Databricks Employee
  • 0 kudos

Hi @NathanLaw, Could you please confirm, if you have set any parameters for the best model? Is this stop after running some epochs if there is no improvement in the model performance? 

  • 0 kudos
Sanjay_AMP
by New Contributor II
  • 1297 Views
  • 1 replies
  • 1 kudos

Deployment-ready sample source-code for Delta Live Table & Autoloader

Hi all,We are planning to develop an Autoloader based DLT Pipeline that needs to beDeployable via a CI/CD PipelineObservableCan somebody please point me to source-code that we can start with a firm foundation instead of falling into a newbie-pattern ...

  • 1297 Views
  • 1 replies
  • 1 kudos
Latest Reply
Priyanka_Biswas
Databricks Employee
  • 1 kudos

Hi @Sanjay_AMP Delta Live Tables and AutoLoader can be used together to incrementally ingest data from cloud object storage.• Python code example: - Define a table called "customers" that reads data from a CSV file in cloud object storage. - Define a...

  • 1 kudos
wojciech_jakubo
by New Contributor III
  • 13934 Views
  • 7 replies
  • 3 kudos

Question about monitoring driver memory utilization

Hi databricks/spark experts!I have a piece on pandas-based 3rd party code that I need to execute as a part of a bigger spark pipeline. By nature, pandas-based code is executed on driver node. I ran into out of memory problems and started exploring th...

Driver memory cycles_ Busy cluster
  • 13934 Views
  • 7 replies
  • 3 kudos
Latest Reply
Tharun-Kumar
Databricks Employee
  • 3 kudos

Hi @wojciech_jakubo 1. JVM memory will not be utilized for python related activities. 2. In the image we could only see the storage memory. We also have execution memory which would also be the same. Hence I came up with the executor memory to be of ...

  • 3 kudos
6 More Replies
kg6ka
by New Contributor
  • 2751 Views
  • 1 replies
  • 1 kudos

Is it possible to do without the github token and integration?

Hey, guys.I have a question, so, I have databricks jobs in workflow that are linked to my databricks repo, which contains the necessary scripts for one or another job. That is, the job is linked to the databricks repo.The main code is developed in gi...

  • 2751 Views
  • 1 replies
  • 1 kudos
Latest Reply
User16752239289
Databricks Employee
  • 1 kudos

Does the user which the API token is generated from has the git credential configured for the git repo ?If not, you can follow the steps here : https://docs.databricks.com/en/repos/get-access-tokens-from-git-provider.html 

  • 1 kudos
Thor
by New Contributor III
  • 6978 Views
  • 1 replies
  • 2 kudos

Resolved! Dynamically change spark.task.cpus

Hello,I'm facing a problem with big tarballs to decompress and to fit in memory I had to limit Spark processing too many files at the same time so I changed the following property on my 8 cores VMs cluster:spark.task.cpus 4 This setting is the thresh...

  • 6978 Views
  • 1 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Thor, Spark does not offer the capability to dynamically modify configuration settings, such as spark.task.cpus, for individual stages or transformations while the application is running. Once a configuration property is set for a Spark applicati...

  • 2 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels